All posts by Lucas Pardue

Go and enhance your calm: demolishing an HTTP/2 interop problem

2025-10-31 Lucas Pardue

Post Syndicated from Lucas Pardue original https://blog.cloudflare.com/go-and-enhance-your-calm/

In September 2025, a thread popped up in our internal engineering chat room asking, “Which part of our stack would be responsible for sending ErrCode=ENHANCE_YOUR_CALM to an HTTP/2 client?” Two internal microservices were experiencing a critical error preventing their communication and the team needed a timely answer.

In this blog post, we describe the background to well-known HTTP/2 attacks that trigger Cloudflare defences, which close connections. We then document an easy-to-make mistake using Go’s standard library that can cause clients to send PING flood attacks and how you can avoid it.

HTTP/2 is powerful – but it can be easy to misuse

HTTP/2 defines a binary wire format for encoding HTTP semantics. Request and response messages are encoded as a series of HEADERS and DATA frames, each associated with a logical stream, sent over a TCP connection using TLS. There are also control frames that relate to the management of streams or the connection as a whole. For example, SETTINGS frames advertise properties of an endpoint, WINDOW_UPDATE frames provide flow control credit to a peer so that it can send data, RST_STREAM can be used to cancel or reject a request or response, while GOAWAY can be used to signal graceful or immediate connection closure.

HTTP/2 provides many powerful features that have legitimate uses. However, with great power comes responsibility and opportunity for accidental or intentional misuse. The specification details a number of denial-of-service considerations. Implementations are advised to harden themselves: “An endpoint that doesn’t monitor use of these features exposes itself to a risk of denial of service. Implementations SHOULD track the use of these features and set limits on their use.”

Cloudflare implements many different HTTP/2 defenses, developed over years in order to protect our systems and our customers. Some notable examples include mitigations added in 2019 to address “Netflix vulnerabilities” and in 2023 to mitigate Rapid Reset and similar style attacks.

When Cloudflare detects that HTTP/2 client behaviour is likely malicious, we close the connection using the GOAWAY frame and include the error code ENHANCE_YOUR_CALM.

One of the well-known and common attacks is CVE-2019-9512, aka PING flood: “The attacker sends continual pings to an HTTP/2 peer, causing the peer to build an internal queue of responses. Depending on how efficiently this data is queued, this can consume excess CPU, memory, or both.” Sending a PING frame causes the peer to respond with a PING acknowledgement (indicated by an ACK flag). This allows for checking the liveness of the HTTP connection, along with measuring the layer 7 round-trip time – both useful things. The requirement to acknowledge a PING, however, provides the potential attack vector since it generates work for the peer.

A client that PINGs the Cloudflare edge too frequently will trigger our CVE-2019-9512 mitigations, causing us to close the connection. Shortly after we launched support for gRPC in 2020, we encountered interoperability issues with some gRPC clients that sent many PINGs as part of a performance optimization for window tuning. We also discovered that the Rust Hyper crate had a feature called Adaptive Window that emulated the design and triggered a similar problem until Hyper made a fix.

Solving a microservice miscommunication mystery

When that thread popped up asking which part of our stack was responsible for sending the ENHANCE_YOUR_CALM error code, it was regarding a client communicating over HTTP/2 between two internal microservices.

We suspected that this was an HTTP/2 mitigation issue and confirmed it was a PING flood mitigation in our logs. But taking a step back, you may wonder why two internal microservices are communicating over the Cloudflare edge at all, and therefore hitting our mitigations. In this case, communicating over the edge provides us with several advantages:

We get to dogfood our edge infrastructure and discover issues like this!
We can use Cloudflare Access for authentication. This allows our microservices to be accessed securely by both other services (using service tokens) and engineers (which is invaluable for debugging).
Internal services that are written with Cloudflare Workers can easily communicate with services that are accessible at the edge.

The question remained: Why was this client behaving this way? We traded some ideas as we attempted to get to the bottom of the issue.

The client had a configuration that would indicate that it didn’t need to PING very frequently:

t2.PingTimeout = 2 * time.Second
t2.ReadIdleTimeout = 5 * time.Second

However, in situations like this it is generally a good idea to establish ground truth about what is really happening “on the wire.” For instance, grabbing a packet capture that can be dissected and explored in Wireshark can provide unequivocal evidence of precisely what was sent over the network. The next best option is detailed/trace logging at the sender or receiver, although sometimes logging can be misleading, so caveat emptor.

In our particular case, it was simpler to use logging with GODEBUG=http2debug=2. We built a simplified minimal reproduction of the client that triggered the error, helping to eliminate other potential variables. We did some group log analysis, combined with diving into some of the Go standard library code to understand what it was really doing. Issac Asimov is commonly credited with the quote “The most exciting phrase to hear in science, the one that heralds new discoveries, is not ‘Eureka!’ but ‘That’s funny…'” and sure enough, within the hour someone declared–

the funny part I see is this:

2025/09/02 17:33:18 http2: Framer 0x14000624540: wrote RST_STREAM stream=9 len=4 ErrCode=CANCEL
2025/09/02 17:33:18 http2: Framer 0x14000624540: wrote PING len=8 ping="j\xe7\xd6R\xdaw\xf8+"

every ping seems to be preceded by a RST_STREAM

Observant readers will recall the earlier mention of Rapid Reset. However, our logs clearly indicated ENHANCE_YOUR_CALM being triggered due to the PING flood. A bit of searching landed us on this mailing list thread and the comment “Sending a PING frame along with an RST_STREAM allows a client to distinguish between an unresponsive server and a slow response.” That seemed quite relevant. We also found a change that was committed related to this topic. This partly answered why there were so many PINGs, but it also raised a new question: Why so many stream resets?

So we went back to the logs and built up a little more context about the interaction:

2025/09/02 17:33:18 http2: Transport received DATA flags=END_STREAM stream=47 len=0 data=""
2025/09/02 17:33:18 http2: Framer 0x14000624540: wrote RST_STREAM stream=47 len=4 ErrCode=CANCEL
2025/09/02 17:33:18 http2: Framer 0x14000624540: wrote PING len=8 ping="\x97W\x02\xfa>\xa8\xabi"

The interesting thing here is that the server had sent a DATA frame with the END_STREAM flag set. Per the HTTP/2 stream state machine, the stream should have transitioned to closed when a frame with END_STREAM was processed. The client doesn’t need to do anything in this state – sending a RST_STREAM is entirely unnecessary.

A little more digging and noodling and an engineer proclaimed:

I noticed that the reset+ping only happens when you call resp.Body.Close()

I believe Go’s HTTP library doesn’t actually read the response body automatically, but keeps the stream open for you to use until you call resp.Body.Close(), which you can do at any point you like.

The hilarious thing in our example was that there wasn’t actually any HTTP body to read. From the earlier example: received DATA flags=END_STREAM stream=47 len=0 data="".

Science and engineering are at times weird and counterintuitive. We decided to tweak our client to read the (absent) body via io.Copy(io.Discard, resp.Body) before closing it.

Sure enough, this immediately stopped the client sending both a useless RST_STREAM and, by association, a PING frame.

Mystery solved?

To prove we had fixed the root cause, the production client was updated with a similar fix. A few hours later, all the ENHANCE_YOUR_CALM closures were eliminated.

Reading bodies in Go can be unintuitive

It’s worth noting that in some situations, ensuring the response body is always read can sometimes be unintuitive in Go. For example, at first glance it appears that the response body will always be read in the following example:

resp, err := http.DefaultClient.Do(req)
if err != nil {
	return err
}
defer resp.Body.Close()

if err := json.NewDecoder(resp.Body).Decode(&respBody); err != nil {
	return err
}

However, json.Decoder stops reading as soon as it finds a complete JSON document or errors. If the response body contains multiple JSON documents or invalid JSON, then the entire response body may still not be read.

Therefore, in our clients, we’ve started replacing defer response.Body.Close() with the following pattern to ensure that response bodies are always fully read:

resp, err := http.DefaultClient.Do(req)
if err != nil {
	return err
}
defer func() {
	io.Copy(io.Discard, resp.Body)
	resp.Body.Close()
}()

if err := json.NewDecoder(resp.Body).Decode(&respBody); err != nil {
	return err
}

Actions to take if you encounter ENHANCE_YOUR_CALM

HTTP/2 is a protocol with several features. Many implementations have implemented hardening to protect themselves from misuse of features, which can trigger a connection to be closed. The recommended error code for closing connections in such conditions is ENHANCE_YOUR_CALM. There are numerous HTTP/2 implementations and APIs, which may drive the use of HTTP/2 features in unexpected ways that could appear like attacks.

If you have an HTTP/2 client that encounters closures with ENHANCE_YOUR_CALM, we recommend that you try to establish ground truth with packet captures (including TLS decryption keys via mechanisms like SSLKEYLOGFILE) and/or detailed trace logging. Look for patterns of frequent or repeated frames that might be similar to malicious traffic. Adjusting your client may help avoid it getting misclassified as an attacker.

If you use Go, we recommend always reading HTTP/2 response bodies (even if empty) in order to avoid sending unnecessary RST_STREAM and PING frames. This is especially important if you use a single connection for multiple requests, which can cause a high frequency of these frames.

This was also a great reminder of the advantages of dogfooding our own products within our internal services. When we run into issues like this one, our learnings can benefit our customers with similar setups.

Open sourcing h3i: a command line tool and library for low-level HTTP/3 testing and debugging

2024-12-30 Lucas Pardue

Post Syndicated from Lucas Pardue original https://blog.cloudflare.com/h3i/

Have you ever built a piece of IKEA furniture, or put together a LEGO set, by following the instructions closely and only at the end realized at some point you didn’t quite follow them correctly? The final result might be close to what was intended, but there’s a nagging thought that maybe, just maybe, it’s not as rock steady or functional as it could have been.

Internet protocol specifications are instructions designed for engineers to build things. Protocol designers take great care to ensure the documents they produce are clear. The standardization process gathers consensus and review from experts in the field, to further ensure document quality. Any reasonably skilled engineer should be able to take a specification and produce a performant, reliable, and secure implementation. The Internet is central to everyone’s lives, and we depend on these implementations. Any deviations from the specification can put us at risk. For example, mishandling of malformed requests can allow attacks such as request smuggling.

h3i is a binary command line tool and Rust library designed for low-level testing and debugging of HTTP/3, which runs over QUIC. h3i is free and open source as part of Cloudflare’s quiche project. In this post we’ll explain the motivation behind developing h3i, how we use it to help develop robust and safe standards-compliant software and production systems, and how you can similarly use it to test your own software or services. If you just want to jump into how to use h3i, go to the h3i command line tool section.

A recap of QUIC and HTTP/3

QUIC is a secure-by-default transport protocol that provides performance advantages compared to TCP and TLS via a more efficient handshake, along with stream multiplexing that provides head-of-line blocking avoidance. HTTP/3 is an application protocol that maps HTTP semantics to QUIC, such as defining how HTTP requests and responses are assigned to individual QUIC streams.

Cloudflare has supported QUIC on our global network in some shape or form since 2018. We started while the Internet Engineering Task Force (IETF) was earnestly standardizing the protocol, working through early iterations and using interoperability testing and experience to help provide feedback for the standards process. We launched support for QUIC version 1 and HTTP/3 as soon as RFC 9000 (and its accompanying specifications) were published in 2021.

We work on the Protocols team, who own the ingress proxy into the Cloudflare network. This is essentially Cloudflare’s “front door” — HTTP requests that come to Cloudflare from the Internet pass through us first. The majority of requests are passed onwards to things like rulesets, workers, caches, or a customer origin. However, you might be surprised that many requests don’t ever make it that far because they are, in some way, invalid or malformed. Servers listening on the Internet have to be robust to traffic that is not RFC compliant, whether caused by accident or malicious intent.

The Protocols team actively participates in IETF standardization work and has also helped build and maintain other Cloudflare services that leverage quiche for QUIC and HTTP/3, from the proxies that help iCloud Private Relay via MASQUE proxying, to replacing WARP’s use of Wireguard with MASQUE, and beyond.

Throughout all of these different use cases, it is important for us to extensively test all aspects of the protocols. A deep dive into protocol details is a blog post (or three) in its own right. So let’s take a thin slice across HTTP to help illustrate the concepts.

HTTP Semantics are common to all versions of HTTP — the overall architecture, terminology, and protocol aspects such as request and response messages, methods, status codes, header and trailer fields, message content, and much more. Each individual HTTP version defines how semantics are transformed into a “wire format” for exchange over the Internet. You can read more about HTTP/1.1 and HTTP/2 in some of our previous blog posts.

With HTTP/3, HTTP request and response messages are split into a series of binary frames. HEADERS frames carry a representation of HTTP metadata (method, path, status code, field lines). The payload of the frame is the encoded QPACK compression output. DATA frames carry HTTP content (aka “message body”). In order to exchange these frames, HTTP/3 relies on QUIC streams. These provide an ordered and reliable byte stream and each have an identifier (ID) that is unique within the scope of a connection. There are four different stream types, denominated by the two least significant bits of the ID.

As a simple example, assuming a QUIC connection has already been established, a client can make a GET request and receive a 200 OK response with an HTML body using the follow sequence:

Client allocates the first available client-initiated bidirectional QUIC stream. (The IDs start at 0, then 4, 8, 12 and so on)
Client sends the request HEADERS frame on the stream and sets the stream’s FIN bit to mark the end of stream.
Server receives the request HEADERS frame and validates it against RFC 9114 rules. If accepted, it processes the request and prepares the response.
Server sends the response HEADERS frame on the same stream.
Server sends the response DATA frame on the same stream and sets the FIN bit.
Client receives the response frames and validates them. If accepted, the content is presented to the user.

At the QUIC layer, stream data is split into STREAM frames, which are sent in QUIC packets over UDP. QUIC deals with any loss detection and recovery, helping to ensure stream data is reliable. The layer cake diagram below provides a handy comparison of how HTTP/1.1, HTTP/2 and HTTP/3 use TCP, UDP and IP.

Background on testing QUIC and HTTP/3 at Cloudflare

The Protocols team has a diverse set of automated test tools that exercise our ingress proxy software in order to ensure it can stand up to the deluge that the Internet can throw at it. Just like a bouncer at a nightclub front door, we need to prevent as much bad traffic as possible before it gets inside and potentially causes damage.

HTTP/2 and HTTP/3 share several concepts. When we started developing early HTTP/3 support, we’d already learned a lot from production experience with HTTP/2. While HTTP/2 addressed many issues with HTTP/1.1 (especially problems like request smuggling, caused by its ASCII-based message delineation), HTTP/2 also added complexity and new avenues for attack. Security is an ongoing process, and the Protocols team continually hardens our software and systems to threats. For example, mitigating the range of denial-of-service attacks identified by Netflix in 2019, or the HTTP/2 Rapid Reset attacks of 2023.

For testing HTTP/2, we rely on the Python Requests library for testing conventional HTTP exchanges. However, that mostly only exercises HEADERS and DATA frames. There are eight other frame types and a plethora of ways that they can interact (hence the new attack vectors mentioned above). In order to get full testing coverage, we have to break down into the lower layer h2 library, which allows exact frame-by-frame control. However, even that is not always enough. Libraries tend to want to follow the RFC rules and prevent their users from doing “the wrong thing”. This is entirely logical for most purposes. For our needs though, we need to take off the safety guards just like any potential attackers might do. We have a few cases where the best way to exercise certain traffic patterns is to handcraft HTTP/2 frames in a hex editor, store that as binary, and replay it with a tool such as OpenSSL s_client.

We knew we’d need similar testing approaches for HTTP/3. However, when we started in 2018, there weren’t many other suitable client implementations. The rate of iteration on the specifications also meant it was hard to always keep in sync. So we built tests on quiche, using a mix of our quiche-client and http3_test. Over time, the python library aioquic has matured, and we have used it to add a range of lower-layer tests that break or bend HTTP/3 rules, in order to prove our proxies are robust.

Finally, we would be remiss not to mention that all the tests in our ingress proxy are in addition to the suite of over 500 integration tests that run on the quiche project itself.

Making HTTP/3 testing more accessible and maintainable with h3i

While we are happy with the coverage of our current tests, the smorgasbord of test tools makes it hard to know what to reach for when adding new tests. For example, we’ve had cases where aioquic’s safety guards prevent us from doing something, and it has needed a patch or workaround. This sort of thing requires a time investment just to debug/develop the tests.

We believe it shouldn’t take a protocol or code expert to develop what are often very simple to describe tests. While it is important to provide guide rails for the majority of conventional use cases, it is also important to provide accessible methods for taking them off.

Let’s consider a simple example. In HTTP/3 there is something called the control stream. It’s used to exchange frames such as SETTINGS, which affect the HTTP/3 connection. RFC 9114 Section 6.2.1 states:

Each side MUST initiate a single control stream at the beginning of the connection and send its SETTINGS frame as the first frame on this stream. If the first frame of the control stream is any other frame type, this MUST be treated as a connection error of type H3_MISSING_SETTINGS. Only one control stream per peer is permitted; receipt of a second stream claiming to be a control stream MUST be treated as a connection error of type H3_STREAM_CREATION_ERROR. The sender MUST NOT close the control stream, and the receiver MUST NOT request that the sender close the control stream. If either control stream is closed at any point, this MUST be treated as a connection error of type H3_CLOSED_CRITICAL_STREAM. Connection errors are described in Section 8.

There are many tests we can conjure up just from that paragraph:

Send a non-SETTINGS frame as the first frame on the control stream.
Open two control streams.
Open a control stream and then close it with a FIN bit.
Open a control stream and then reset it with a RESET_STREAM QUIC frame.
Wait for the peer to open a control stream and then ask for it to be reset with a STOP_SENDING QUIC frame.

All of the above actions should cause a remote peer that has implemented the RFC properly to close the connection. Therefore, it is not in the interest of the local client or server applications to ever do these actions.

Many QUIC and HTTP/3 implementations are developed as libraries that are integrated into client or server applications. There may be an extensive set of unit or integration tests of the library checking RFC rules. However, it is also important to run the same tests on the integrated assembly of library and application, since it’s all too common that an unhandled/mishandled library error can cascade to cause issues in upper layers. For instance, the HTTP/2 Rapid Reset attacks affected Cloudflare due to their impact on how one service spoke to another.

We’ve developed h3i, a command line tool and library, to make testing more accessible and maintainable for all. We started with a client that can exercise servers, since that’s what our focus has been. Future developments could support the opposite, a server that behaves in unusual ways in order to exercise clients.

Note: h3i is not intended to be a production client! Its flexibility may cause issues that are not observed in other production-oriented clients. It is also not intended to be used for any type of performance testing and measurement.

The h3i command line tool

The primary purpose of the h3i command line tool is quick low-level debugging and exploratory testing. Rather than worrying about writing code or a test script, users can quickly run an ad-hoc client test against a target, guided by interactive prompts.

In the simplest case, you can think of h3i a bit like curl but with access to some extra HTTP/3 parameters. In the example below, we issue a request to https://cloudflare-quic.com/ and receive a response.

Walking through a simple GET with h3i step-by-step:

Grab a copy of the h3i binary either by running cargo install h3i or cloning the quiche source repo at https://github.com/cloudflare/quiche/. Both methods assume you have some familiarity with Rust and Cargo. See the cargo documentation for more information.
1. cargo install will place the binary on your path, so you can then just run it by executing h3i.
2. If running from source, navigate to the quiche/h3i directory and then use cargo run.
Run the binary and provide the name and port of the target server. If the port is omitted, the default value 443 is assumed. E.g, cargo run cloudflare-quic.com
h3i then enters the action prompting phase. A series of one or more HTTP/3 actions can be queued up, such as sending frames, opening or terminating streams, or waiting on data from the server. The full set of options is documented in the readme.
1. The prompting interface adapts to keyboard inputs and supports tab completion.
2. In the example above, the headers action is selected, which walks through populating the fields in a HEADERS frame. It includes mandatory fields from RFC 9114 for convenience. If a test requires omitting these, the headers_no_pseudo can be used instead.
The commit prompt choice finalizes the action list and moves to the connection phase. h3i initiates a QUIC connection to the server identified in step 2. Once connected, actions are executed in order.
By default, h3i reports some limited information about the frames the server sent. To get more detailed information, the RUST_LOG environment can be set with either debug or trace levels.

Instant record and replay, powered by qlog

It can be fun to play around with the h3i command line tool to see how different servers respond to different combinations or sequences of actions. Occasionally, you’ll find a certain set that you want to run over and over again, or share with a friend or colleague. Having to manually enter the prompts repeatedly, or share screenshots of the h3i input can turn tedious. Fortunately, h3i records all the actions in a log file by default — the file path is printed immediately after h3i starts. The format of this file is based on qlog, an in-progress standard in development at the IETF for network protocol logging. It’s a perfect fit for our low-level needs.

Here’s an example h3i qlog file:

{"qlog_version":"0.3","qlog_format":"JSON-SEQ","title":"h3i","description":"h3i","trace":{"vantage_point":{"type":"client"},"title":"h3i","description":"h3i","configuration":{"time_offset":0.0}}}
{
  "time": 0.172783,
  "name": "http:frame_created",
  "data": {
    "stream_id": 0,
    "frame": {
      "frame_type": "headers",
      "headers": [
        {
          "name": ":method",
          "value": "GET"
        },
        {
          "name": ":authority",
          "value": "cloudflare-quic.com"
        },
        {
          "name": ":path",
          "value": "/"
        },
        {
          "name": ":scheme",
          "value": "https"
        },
        {
          "name": "user-agent",
          "value": "h3i"
        }
      ]
    }
  },
  "fin_stream": true
}

h3i logs can be replayed using the --qlog-input option. You can change the target server host and port, and keep all the same actions. However, most servers will validate the :authority pseudo-header or Host header contained in a HEADERS frame. The –replay-host-override option allows changing these fields without needing to modify the file by hand.

And yes, qlog files are human-readable text in the JSON-SEQ format. So you can also just write these by hand in the first place if you like! However, if you’re going to start writing things, maybe Rust is your preferred option…

Using the h3i library to send a malformed request with Rust

In our previous example, we just sent a valid request so there wasn’t anything interesting to observe. Where h3i really shines is in generating traffic that isn’t RFC compliant, such as malformed HTTP messages, invalid frame sequences, or other actions on streams. This helps determine if a server is acting robustly and defensively.

Let’s explore this more with an example of HTTP content-length mismatch. RFC 9114 section 4.1.2 specifies:

A request or response that is defined as having content when it contains a Content-Length header field (Section 8.6 of [HTTP]) is malformed if the value of the Content-Length header field does not equal the sum of the DATA frame lengths received. A response that is defined as never having content, even when a Content-Length is present, can have a non-zero Content-Length header field even though no content is included in DATA frames.

Intermediaries that process HTTP requests or responses (i.e., any intermediary not acting as a tunnel) MUST NOT forward a malformed request or response. Malformed requests or responses that are detected MUST be treated as a stream error of type H3_MESSAGE_ERROR.

For malformed requests, a server MAY send an HTTP response indicating the error prior to closing or resetting the stream.

There are good reasons that the RFC is so strict about handling mismatched content lengths. They can be a vector for desynchronization attacks (similar to request smuggling), especially when a proxy is converting inbound HTTP/3 to outbound HTTP/1.1.

We’ve provided an example of how to use the h3i Rust library to write a tailor-made test client that sends a mismatched content length request. It sends a Content-Length header of 5, but its body payload is “test”, which is only 4 bytes. It then waits for the server to respond, after which it explicitly closes the connection by sending a QUIC CONNECTION_CLOSE frame.

When running low-level tests, it can be interesting to also take a packet capture (pcap) and observe what is happening on the wire. Since QUIC is an encrypted transport, we’ll need to use the SSLKEYLOG environment variable to capture the session keys so that tools like Wireshark can decrypt and dissect.

To follow along at home, clone a copy of the quiche repository, start a packet capture on the appropriate network interface and then run:

cd quiche/h3i
SSLKEYLOGFILE="h3i-example.keys" cargo run --example content_length_mismatch

In our decrypted capture, we see the expected sequence of handshake, request, response, and then closure.

Surveying the example code

The example is a simple binary app with a main() entry point. Let’s survey the key elements.

First, we set up an h3i configuration to a target server:

let config = Config::new()
        .with_host_port("cloudflare-quic.com".to_string())
        .with_idle_timeout(2000)
        .build()
        .unwrap();

The idle timeout is a QUIC concept which tells each endpoint when it should close the connection if the connection has been idle. This prevents endpoints from spinning idly if the peer hasn’t closed the connection. h3i’s default is 30 seconds, which can be too long for tests, so we set ours to 2 seconds here.

Next, we define a set of request headers and encode them with QPACK compression, ready to put in a HEADERS frame. Note that h3i does provide a send_headers_frame helper method which does this for you, but the example does it manually for clarity:

let headers = vec![
        Header::new(b":method", b"POST"),
        Header::new(b":scheme", b"https"),
        Header::new(b":authority", b"cloudflare-quic.com"),
        Header::new(b":path", b"/"),
        // We say that we're going to send a body with 5 bytes...
        Header::new(b"content-length", b"5"),
    ];

    let header_block = encode_header_block(&headers).unwrap();

Then, we define the set of h3i actions that we want to execute in order: send HEADERS, send a too-short DATA frame, wait for the server’s HEADERS, then close the connection.

let actions = vec![
        Action::SendHeadersFrame {
            stream_id: STREAM_ID,
            fin_stream: false,
            headers,
            frame: Frame::Headers { header_block },
        },
        Action::SendFrame {
            stream_id: STREAM_ID,
            fin_stream: true,
            frame: Frame::Data {
                // ...but, in actuality, we only send 4 bytes. This should yield a
                // 400 Bad Request response from an RFC-compliant
                // server: https://datatracker.ietf.org/doc/html/rfc9114#section-4.1.2-3
                payload: b"test".to_vec(),
            },
        },
        Action::Wait {
            wait_type: WaitType::StreamEvent(StreamEvent {
                stream_id: STREAM_ID,
                event_type: StreamEventType::Headers,
            }),
        },
        Action::ConnectionClose {
            error: quiche::ConnectionError {
                is_app: true,
                error_code: quiche::h3::WireErrorCode::NoError as u64,
                reason: vec![],
            },
        },
    ];

Finally, we’ll set things in motion with connect(), which sets up the QUIC connection, executes the actions list and collects the summary.

let summary =
        sync_client::connect(config, &actions).expect("connection failed");

    println!(
        "=== received connection summary! ===\n\n{}",
        serde_json::to_string_pretty(&summary).unwrap_or_else(|e| e.to_string())
    );

ConnectionSummary provides data about the connection, including the frames h3i received, details about why the connection closed, and connection statistics. The example prints the summary out. However, you can programmatically check it. We do this to write our own internal automation tests.

If you’re running the example, it should print something like the following:

=== received connection summary! ===

{
  "stream_map": {
    "0": [
      {
        "UNKNOWN": {
          "raw_type": 2471591231244749708,
          "payload": ""
        }
      },
      {
        "UNKNOWN": {
          "raw_type": 2031803309763646295,
          "payload": "4752454153452069732074686520776f7264"
        }
      },
      {
        "enriched_headers": {
          "header_block_len": 75,
          "headers": [
            {
              "name": ":status",
              "value": "400"
            },
            {
              "name": "server",
              "value": "cloudflare"
            },
            {
              "name": "date",
              "value": "Sat, 07 Dec 2024 00:34:12 GMT"
            },
            {
              "name": "content-type",
              "value": "text/html"
            },
            {
              "name": "content-length",
              "value": "155"
            },
            {
              "name": "cf-ray",
              "value": "8ee06dbe2923fa17-ORD"
            }
          ]
        }
      },
      {
        "DATA": {
          "payload_len": 104
        }
      },
      {
        "DATA": {
          "payload_len": 51
        }
      }
    ]
  },
  "stats": {
    "recv": 10,
    "sent": 5,
    "lost": 0,
    "retrans": 0,
    "sent_bytes": 1712,
    "recv_bytes": 4178,
    "lost_bytes": 0,
    "stream_retrans_bytes": 0,
    "paths_count": 1,
    "reset_stream_count_local": 0,
    "stopped_stream_count_local": 0,
    "reset_stream_count_remote": 0,
    "stopped_stream_count_remote": 0,
    "path_challenge_rx_count": 0
  },
  "path_stats": [
    {
      "local_addr": "0.0.0.0:64418",
      "peer_addr": "104.18.29.7:443",
      "active": true,
      "recv": 10,
      "sent": 5,
      "lost": 0,
      "retrans": 0,
      "rtt": 0.008140072,
      "min_rtt": 0.004645536,
      "rttvar": 0.004238173,
      "cwnd": 13500,
      "sent_bytes": 1712,
      "recv_bytes": 4178,
      "lost_bytes": 0,
      "stream_retrans_bytes": 0,
      "pmtu": 1350,
      "delivery_rate": 247720
    }
  ],
  "error": {
    "local_error": {
      "is_app": true,
      "error_code": 256,
      "reason": ""
    },
    "timed_out": false
  }
}

Let’s walk through the output. Up first is the StreamMap, which is a record of all frames received on each stream. We can see that we received 5 frames on stream 0: 2 UNKNOWNs, one EnrichedHeaders frame, and two DATA frames.

The UNKNOWN frames are extension frames that are unknown to h3i; the server under test is sending what are known as GREASE frames to help exercise the protocol and ensure clients are not erroring when they receive something unexpected per RFC 9114 requirements.

The EnrichedHeaders frame is essentially an HTTP/3 HEADERS frame, but with some small helpers, like one to get the response status code. The server under test sent a 400 as expected.

The DATA frames carry response body bytes. In this case, the body is the HTML required to render the Cloudflare Bad Request page (you can peek at the HTML yourself in Wireshark). We chose to omit the raw bytes from the ConnectionSummary since they may not be representable safely as text. A future improvement could be to encode the bytes in base64 or hex, in order to support tests that need to check response content.

h3i for test automation

We believe h3i is a great library for building automated tests on. You can take the above example and modify it to fit within various types of (continuous) integration tests.

We outlined earlier how the Protocols team HTTP/3 testing has organically grown to use three different frameworks. Even within those, we still didn’t have much flexibility and ease of use. Over the last year we’ve been building h3i itself and reimplementing our suite of ingress proxy test cases using the Rust library. This has helped us improve test coverage with a range of new tests not previously possible. It also surprisingly identified some problems with the old tests, particularly for some edge cases where it wasn’t clear how the old test code implementation was running under the hood.

Bake offs, interop, and wider testing of HTTP

RFC 1025 was published in 1987. Authored by Jon Postel, it discusses bake offs:

In the early days of the development of TCP and IP, when there were very few implementations and the specifications were still evolving, the only way to determine if an implementation was “correct” was to test it against other implementations and argue that the results showed your own implementation to have done the right thing. These tests and discussions could, in those early days, as likely change the specification as change the implementation.

There were a few times when this testing was focused, bringing together all known implementations and running through a set of tests in hopes of demonstrating the N squared connectivity and correct implementation of the various tricky cases. These events were called “Bake Offs”.

While nearly 4 decades old, the concept of exercising Internet protocol implementations and seeing how they compare to the specification still holds true. The QUIC WG made heavy use of interoperability testing through its standardization process. We started off sitting in a room and running tests manually by hand (or with some help from scripts). Then Marten Seemann developed the QUIC Interop Runner, which runs regular automated testing and collects and renders all the results. This has proven to be incredibly useful.

The state of HTTP/3 interoperability testing is not quite as mature. Although there are tools such as Kazu Yamamoto’s excellent h3spec (in Haskell) for testing conformance, there isn’t a similar continuous integration process of collection and rendering of results. While h3i shares similarities with h3spec, we felt it important to focus on the framework capabilities rather than creating a corpus of tests and assertions. Cloudflare is a big fan of Rust and as several teams move to Rust-based proxies, having a consistent ecosystem provides advantages (such as developer velocity).

We certainly feel there is a great opportunity for continued collaboration and cross-pollination between projects in the QUIC and HTTP space. For example, h3i might provide a suitable basis to build another tool (or set of scripts) to run bake offs or interop tests. Perhaps it even makes sense to have a common collection of test cases owned by the community, that can be specialized to the most appropriate or preferred tooling. This topic was recently presented at the HTTP Workshop 2024 by Mohammed Al-Sahaf, and it excites us to see new potential directions of testing improvements.

When using any tools or methods for protocol testing, we encourage responsible handling of security-related matters. If you believe you may have identified a vulnerability in an IETF Internet protocol itself, please follow the IETF’s reporting guidance. If you believe you may have discovered an implementation vulnerability in a product, open source project, or service using QUIC or HTTP, then you should report these directly to the responsible party. Implementers or operators often provide their own publicly-available guidance and contact details to send reports. For example, the Cloudflare quiche security policy is available in the Security tab of the GitHub repository.

Summary and outlook

Cloudflare takes testing very seriously. While h3i has a limited feature set as a test HTTP/3 client, we believe it provides a strong framework that can be extended to a wider range of different cases and different protocols. For example, we’d like to add support for low-level HTTP/2.

We’ve designed h3i to integrate into a wide range of testing methodologies, from manual ad-hoc testing, to native Rust tests, to conformance testbenches built with scripting languages. We’ve had great success migrating our existing zoo of test tools to a single one that is more accessible and easier to maintain.

Now that you’ve read about h3i’s capabilities, it’s left as an exercise to the reader to go back to the example of HTTP/3 control streams and consider how you could write tests to exercise a server.

We encourage the community to experiment with h3i and provide feedback, and propose ideas or contributions to the GitHub repository as issues or Pull Requests.

HTTP/2 Rapid Reset: deconstructing the record-breaking attack

2023-10-10 Lucas Pardue

Post Syndicated from Lucas Pardue original http://blog.cloudflare.com/technical-breakdown-http2-rapid-reset-ddos-attack/

HTTP/2 Rapid Reset: deconstructing the record-breaking attack

Starting on Aug 25, 2023, we started to notice some unusually big HTTP attacks hitting many of our customers. These attacks were detected and mitigated by our automated DDoS system. It was not long however, before they started to reach record breaking sizes — and eventually peaked just above 201 million requests per second. This was nearly 3x bigger than our previous biggest attack on record.

Concerning is the fact that the attacker was able to generate such an attack with a botnet of merely 20,000 machines. There are botnets today that are made up of hundreds of thousands or millions of machines. Given that the entire web typically sees only between 1–3 billion requests per second, it's not inconceivable that using this method could focus an entire web’s worth of requests on a small number of targets.

Detecting and Mitigating

This was a novel attack vector at an unprecedented scale, but Cloudflare's existing protections were largely able to absorb the brunt of the attacks. While initially we saw some impact to customer traffic — affecting roughly 1% of requests during the initial wave of attacks — today we’ve been able to refine our mitigation methods to stop the attack for any Cloudflare customer without it impacting our systems.

We noticed these attacks at the same time two other major industry players — Google and AWS — were seeing the same. We worked to harden Cloudflare’s systems to ensure that, today, all our customers are protected from this new DDoS attack method without any customer impact. We’ve also participated with Google and AWS in a coordinated disclosure of the attack to impacted vendors and critical infrastructure providers.

This attack was made possible by abusing some features of the HTTP/2 protocol and server implementation details (see CVE-2023-44487 for details). Because the attack abuses an underlying weakness in the HTTP/2 protocol, we believe any vendor that has implemented HTTP/2 will be subject to the attack. This included every modern web server. We, along with Google and AWS, have disclosed the attack method to web server vendors who we expect will implement patches. In the meantime, the best defense is using a DDoS mitigation service like Cloudflare’s in front of any web-facing web or API server.

This post dives into the details of the HTTP/2 protocol, the feature that attackers exploited to generate these massive attacks, and the mitigation strategies we took to ensure all our customers are protected. Our hope is that by publishing these details other impacted web servers and services will have the information they need to implement mitigation strategies. And, moreover, the HTTP/2 protocol standards team, as well as teams working on future web standards, can better design them to prevent such attacks.

RST attack details

HTTP is the application protocol that powers the Web. HTTP Semantics are common to all versions of HTTP — the overall architecture, terminology, and protocol aspects such as request and response messages, methods, status codes, header and trailer fields, message content, and much more. Each individual HTTP version defines how semantics are transformed into a "wire format" for exchange over the Internet. For example, a client has to serialize a request message into binary data and send it, then the server parses that back into a message it can process.

HTTP/1.1 uses a textual form of serialization. Request and response messages are exchanged as a stream of ASCII characters, sent over a reliable transport layer like TCP, using the following format (where CRLF means carriage-return and linefeed):

 HTTP-message   = start-line CRLF
                   *( field-line CRLF )
                   CRLF
                   [ message-body ]

For example, a very simple GET request for https://blog.cloudflare.com/ would look like this on the wire:

GET / HTTP/1.1 CRLFHost: blog.cloudflare.comCRLF

And the response would look like:

HTTP/1.1 200 OK CRLFServer: cloudflareCRLFContent-Length: 100CRLFtext/html; charset=UTF-8CRLF<100 bytes of data>

This format frames messages on the wire, meaning that it is possible to use a single TCP connection to exchange multiple requests and responses. However, the format requires that each message is sent whole. Furthermore, in order to correctly correlate requests with responses, strict ordering is required; meaning that messages are exchanged serially and can not be multiplexed. Two GET requests, for https://blog.cloudflare.com/ and https://blog.cloudflare.com/page/2/, would be:

GET / HTTP/1.1 CRLFHost: blog.cloudflare.comCRLFGET /page/2 HTTP/1.1 CRLFHost: blog.cloudflare.comCRLF

With the responses:

HTTP/1.1 200 OK CRLFServer: cloudflareCRLFContent-Length: 100CRLFtext/html; charset=UTF-8CRLF<100 bytes of data>HTTP/1.1 200 OK CRLFServer: cloudflareCRLFContent-Length: 100CRLFtext/html; charset=UTF-8CRLF<100 bytes of data>

Web pages require more complicated HTTP interactions than these examples. When visiting the Cloudflare blog, your browser will load multiple scripts, styles and media assets. If you visit the front page using HTTP/1.1 and decide quickly to navigate to page 2, your browser can pick from two options. Either wait for all of the queued up responses for the page that you no longer want before page 2 can even start, or cancel in-flight requests by closing the TCP connection and opening a new connection. Neither of these is very practical. Browsers tend to work around these limitations by managing a pool of TCP connections (up to 6 per host) and implementing complex request dispatch logic over the pool.

HTTP/2 addresses many of the issues with HTTP/1.1. Each HTTP message is serialized into a set of HTTP/2 frames that have type, length, flags, stream identifier (ID) and payload. The stream ID makes it clear which bytes on the wire apply to which message, allowing safe multiplexing and concurrency. Streams are bidirectional. Clients send frames and servers reply with frames using the same ID.

In HTTP/2 our GET request for https://blog.cloudflare.com would be exchanged across stream ID 1, with the client sending one HEADERS frame, and the server responding with one HEADERS frame, followed by one or more DATA frames. Client requests always use odd-numbered stream IDs, so subsequent requests would use stream ID 3, 5, and so on. Responses can be served in any order, and frames from different streams can be interleaved.

Stream multiplexing and concurrency are powerful features of HTTP/2. They enable more efficient usage of a single TCP connection. HTTP/2 optimizes resources fetching especially when coupled with prioritization. On the flip side, making it easy for clients to launch large amounts of parallel work can increase the peak demand for server resources when compared to HTTP/1.1. This is an obvious vector for denial-of-service.

In order to provide some guardrails, HTTP/2 provides a notion of maximum active concurrent streams. The SETTINGS_MAX_CONCURRENT_STREAMS parameter allows a server to advertise its limit of concurrency. For example, if the server states a limit of 100, then only 100 requests can be active at any time. If a client attempts to open a stream above this limit, it must be rejected by the server using a RST_STREAM frame. Stream rejection does not affect the other in-flight streams on the connection.

The true story is a little more complicated. Streams have a lifecycle. Below is a diagram of the HTTP/2 stream state machine. Client and server manage their own views of the state of a stream. HEADERS, DATA and RST_STREAM frames trigger transitions when they are sent or received. Although the views of the stream state are independent, they are synchronized.

HEADERS and DATA frames include an END_STREAM flag, that when set to the value 1 (true), can trigger a state transition.

Let's work through this with an example of a GET request that has no message content. The client sends the request as a HEADERS frame with the END_STREAM flag set to 1. The client first transitions the stream from idle to open state, then immediately transitions into half-closed state. The client half-closed state means that it can no longer send HEADERS or DATA, only WINDOW_UPDATE, PRIORITY or RST_STREAM frames. It can receive any frame however.

Once the server receives and parses the HEADERS frame, it transitions the stream state from idle to open and then half-closed, so it matches the client. The server half-closed state means it can send any frame but receive only WINDOW_UPDATE, PRIORITY or RST_STREAM frames.

The response to the GET contains message content, so the server sends HEADERS with END_STREAM flag set to 0, then DATA with END_STREAM flag set to 1. The DATA frame triggers the transition of the stream from half-closed to closed on the server. When the client receives it, it also transitions to closed. Once a stream is closed, no frames can be sent or received.

Applying this lifecycle back into the context of concurrency, HTTP/2 states:

Streams that are in the "open" state or in either of the "half-closed" states count toward the maximum number of streams that an endpoint is permitted to open. Streams in any of these three states count toward the limit advertised in the SETTINGS_MAX_CONCURRENT_STREAMS setting.

In theory, the concurrency limit is useful. However, there are practical factors that hamper its effectiveness— which we will cover later in the blog.

HTTP/2 request cancellation

Earlier, we talked about client cancellation of in-flight requests. HTTP/2 supports this in a much more efficient way than HTTP/1.1. Rather than needing to tear down the whole connection, a client can send a RST_STREAM frame for a single stream. This instructs the server to stop processing the request and to abort the response, which frees up server resources and avoids wasting bandwidth.

Let's consider our previous example of 3 requests. This time the client cancels the request on stream 1 after all of the HEADERS have been sent. The server parses this RST_STREAM frame before it is ready to serve the response and instead only responds to stream 3 and 5:

Request cancellation is a useful feature. For example, when scrolling a webpage with multiple images, a web browser can cancel images that fall outside the viewport, meaning that images entering it can load faster. HTTP/2 makes this behaviour a lot more efficient compared to HTTP/1.1.

A request stream that is canceled, rapidly transitions through the stream lifecycle. The client's HEADERS with END_STREAM flag set to 1 transitions the state from idle to open to half-closed, then RST_STREAM immediately causes a transition from half-closed to closed.

Recall that only streams that are in the open or half-closed state contribute to the stream concurrency limit. When a client cancels a stream, it instantly gets the ability to open another stream in its place and can send another request immediately. This is the crux of what makes CVE-2023-44487 work.

Rapid resets leading to denial of service

HTTP/2 request cancellation can be abused to rapidly reset an unbounded number of streams. When an HTTP/2 server is able to process client-sent RST_STREAM frames and tear down state quickly enough, such rapid resets do not cause a problem. Where issues start to crop up is when there is any kind of delay or lag in tidying up. The client can churn through so many requests that a backlog of work accumulates, resulting in excess consumption of resources on the server.

A common HTTP deployment architecture is to run an HTTP/2 proxy or load-balancer in front of other components. When a client request arrives it is quickly dispatched and the actual work is done as an asynchronous activity somewhere else. This allows the proxy to handle client traffic very efficiently. However, this separation of concerns can make it hard for the proxy to tidy up the in-process jobs. Therefore, these deployments are more likely to encounter issues from rapid resets.

When Cloudflare's reverse proxies process incoming HTTP/2 client traffic, they copy the data from the connection’s socket into a buffer and process that buffered data in order. As each request is read (HEADERS and DATA frames) it is dispatched to an upstream service. When RST_STREAM frames are read, the local state for the request is torn down and the upstream is notified that the request has been canceled. Rinse and repeat until the entire buffer is consumed. However this logic can be abused: when a malicious client started sending an enormous chain of requests and resets at the start of a connection, our servers would eagerly read them all and create stress on the upstream servers to the point of being unable to process any new incoming request.

Something that is important to highlight is that stream concurrency on its own cannot mitigate rapid reset. The client can churn requests to create high request rates no matter the server's chosen value of SETTINGS_MAX_CONCURRENT_STREAMS.

Rapid Reset dissected

Here's an example of rapid reset reproduced using a proof-of-concept client attempting to make a total of 1000 requests. I've used an off-the-shelf server without any mitigations; listening on port 443 in a test environment. The traffic is dissected using Wireshark and filtered to show only HTTP/2 traffic for clarity. Download the pcap to follow along.

It's a bit difficult to see, because there are a lot of frames. We can get a quick summary via Wireshark's Statistics > HTTP2 tool:

The first frame in this trace, in packet 14, is the server's SETTINGS frame, which advertises a maximum stream concurrency of 100. In packet 15, the client sends a few control frames and then starts making requests that are rapidly reset. The first HEADERS frame is 26 bytes long, all subsequent HEADERS are only 9 bytes. This size difference is due to a compression technology called HPACK. In total, packet 15 contains 525 requests, going up to stream 1051.

Interestingly, the RST_STREAM for stream 1051 doesn't fit in packet 15, so in packet 16 we see the server respond with a 404 response. Then in packet 17 the client does send the RST_STREAM, before moving on to sending the remaining 475 requests.

Note that although the server advertised 100 concurrent streams, both packets sent by the client sent a lot more HEADERS frames than that. The client did not have to wait for any return traffic from the server, it was only limited by the size of the packets it could send. No server RST_STREAM frames are seen in this trace, indicating that the server did not observe a concurrent stream violation.

Impact on customers

As mentioned above, as requests are canceled, upstream services are notified and can abort requests before wasting too many resources on it. This was the case with this attack, where most malicious requests were never forwarded to the origin servers. However, the sheer size of these attacks did cause some impact.

First, as the rate of incoming requests reached peaks never seen before, we had reports of increased levels of 502 errors seen by clients. This happened on our most impacted data centers as they were struggling to process all the requests. While our network is meant to deal with large attacks, this particular vulnerability exposed a weakness in our infrastructure. Let's dig a little deeper into the details, focusing on how incoming requests are handled when they hit one of our data centers:

We can see that our infrastructure is composed of a chain of different proxy servers with different responsibilities. In particular, when a client connects to Cloudflare to send HTTPS traffic, it first hits our TLS decryption proxy: it decrypts TLS traffic, processes HTTP 1, 2 or 3 traffic, then forwards it to our "business logic" proxy. This one is responsible for loading all the settings for each customer, then routing the requests correctly to other upstream services — and more importantly in our case, it is also responsible for security features. This is where L7 attack mitigation is processed.

The problem with this attack vector is that it manages to send a lot of requests very quickly in every single connection. Each of them had to be forwarded to the business logic proxy before we had a chance to block it. As the request throughput became higher than our proxy capacity, the pipe connecting these two services reached its saturation level in some of our servers.

When this happens, the TLS proxy cannot connect anymore to its upstream proxy, this is why some clients saw a bare "502 Bad Gateway" error during the most serious attacks. It is important to note that, as of today, the logs used to create HTTP analytics are also emitted by our business logic proxy. The consequence of that is that these errors are not visible in the Cloudflare dashboard. Our internal dashboards show that about 1% of requests were impacted during the initial wave of attacks (before we implemented mitigations), with peaks at around 12% for a few seconds during the most serious one on August 29th. The following graph shows the ratio of these errors over a two hours while this was happening:

We worked to reduce this number dramatically in the following days, as detailed later on in this post. Both thanks to changes in our stack and to our mitigation that reduce the size of these attacks considerably, this number is today is effectively zero:

499 errors and the challenges for HTTP/2 stream concurrency

Another symptom reported by some customers is an increase in 499 errors. The reason for this is a bit different and is related to the maximum stream concurrency in a HTTP/2 connection detailed earlier in this post.

HTTP/2 settings are exchanged at the start of a connection using SETTINGS frames. In the absence of receiving an explicit parameter, default values apply. Once a client establishes an HTTP/2 connection, it can wait for a server's SETTINGS (slow) or it can assume the default values and start making requests (fast). For SETTINGS_MAX_CONCURRENT_STREAMS, the default is effectively unlimited (stream IDs use a 31-bit number space, and requests use odd numbers, so the actual limit is 1073741824). The specification recommends that a server offer no fewer than 100 streams. Clients are generally biased towards speed, so don't tend to wait for server settings, which creates a bit of a race condition. Clients are taking a gamble on what limit the server might pick; if they pick wrong the request will be rejected and will have to be retried. Gambling on 1073741824 streams is a bit silly. Instead, a lot of clients decide to limit themselves to issuing 100 concurrent streams, with the hope that servers followed the specification recommendation. Where servers pick something below 100, this client gamble fails and streams are reset.

There are many reasons a server might reset a stream beyond concurrency limit overstepping. HTTP/2 is strict and requires a stream to be closed when there are parsing or logic errors. In 2019, Cloudflare developed several mitigations in response to HTTP/2 DoS vulnerabilities. Several of those vulnerabilities were caused by a client misbehaving, leading the server to reset a stream. A very effective strategy to clamp down on such clients is to count the number of server resets during a connection, and when that exceeds some threshold value, close the connection with a GOAWAY frame. Legitimate clients might make one or two mistakes in a connection and that is acceptable. A client that makes too many mistakes is probably either broken or malicious and closing the connection addresses both cases.

While responding to DoS attacks enabled by CVE-2023-44487, Cloudflare reduced maximum stream concurrency to 64. Before making this change, we were unaware that clients don't wait for SETTINGS and instead assume a concurrency of 100. Some web pages, such as an image gallery, do indeed cause a browser to send 100 requests immediately at the start of a connection. Unfortunately, the 36 streams above our limit all needed to be reset, which triggered our counting mitigations. This meant that we closed connections on legitimate clients, leading to a complete page load failure. As soon as we realized this interoperability issue, we changed the maximum stream concurrency to 100.

Actions from the Cloudflare side

In 2019 several DoS vulnerabilities were uncovered related to implementations of HTTP/2. Cloudflare developed and deployed a series of detections and mitigations in response. CVE-2023-44487 is a different manifestation of HTTP/2 vulnerability. However, to mitigate it we were able to extend the existing protections to monitor client-sent RST_STREAM frames and close connections when they are being used for abuse. Legitimate client uses for RST_STREAM are unaffected.

In addition to a direct fix, we have implemented several improvements to the server's HTTP/2 frame processing and request dispatch code. Furthermore, the business logic server has received improvements to queuing and scheduling that reduce unnecessary work and improve cancellation responsiveness. Together these lessen the impact of various potential abuse patterns as well as giving more room to the server to process requests before saturating.

Mitigate attacks earlier

Cloudflare already had systems in place to efficiently mitigate very large attacks with less expensive methods. One of them is named "IP Jail". For hyper volumetric attacks, this system collects the client IPs participating in the attack and stops them from connecting to the attacked property, either at the IP level, or in our TLS proxy. This system however needs a few seconds to be fully effective; during these precious seconds, the origins are already protected but our infrastructure still needs to absorb all HTTP requests. As this new botnet has effectively no ramp-up period, we need to be able to neutralize attacks before they can become a problem.

To achieve this we expanded the IP Jail system to protect our entire infrastructure: once an IP is "jailed", not only it is blocked from connecting to the attacked property, we also forbid the corresponding IPs from using HTTP/2 to any other domain on Cloudflare for some time. As such protocol abuses are not possible using HTTP/1.x, this limits the attacker's ability to run large attacks, while any legitimate client sharing the same IP would only see a very small performance decrease during that time. IP based mitigations are a very blunt tool — this is why we have to be extremely careful when using them at that scale and seek to avoid false positives as much as possible. Moreover, the lifespan of a given IP in a botnet is usually short so any long term mitigation is likely to do more harm than good. The following graph shows the churn of IPs in the attacks we witnessed:

As we can see, many new IPs spotted on a given day disappear very quickly afterwards.

As all these actions happen in our TLS proxy at the beginning of our HTTPS pipeline, this saves considerable resources compared to our regular L7 mitigation system. This allowed us to weather these attacks much more smoothly and now the number of random 502 errors caused by these botnets is down to zero.

Observability improvements

Another front on which we are making change is observability. Returning errors to clients without being visible in customer analytics is unsatisfactory. Fortunately, a project has been underway to overhaul these systems since long before the recent attacks. It will eventually allow each service within our infrastructure to log its own data, instead of relying on our business logic proxy to consolidate and emit log data. This incident underscored the importance of this work, and we are redoubling our efforts.

We are also working on better connection-level logging, allowing us to spot such protocol abuses much more quickly to improve our DDoS mitigation capabilities.

Conclusion

While this was the latest record-breaking attack, we know it won’t be the last. As attacks continue to become more sophisticated, Cloudflare works relentlessly to proactively identify new threats — deploying countermeasures to our global network so that our millions of customers are immediately and automatically protected.

Cloudflare has provided free, unmetered and unlimited DDoS protection to all of our customers since 2017. In addition, we offer a range of additional security features to suit the needs of organizations of all sizes. Contact us if you’re unsure whether you’re protected or want to understand how you can be.

HTTP/2 Rapid Reset：記録的勢いの攻撃を無効化

2023-10-10 Lucas Pardue

Post Syndicated from Lucas Pardue original http://blog.cloudflare.com/ja-jp/technical-breakdown-http2-rapid-reset-ddos-attack-ja-jp/

2023年8月25日以降、当社では、多くのお客様を襲った異常に大規模なHTTP攻撃を目撃し始めました。これらの攻撃は当社の自動DDoSシステムによって検知され、軽減されました。しかし、これらの攻撃が記録的な規模に達するまで、それほど時間はかかりませんでした。その規模は、過去に記録された最大の攻撃の約3倍にも達したのです。

攻撃を受けているか、追加の防御が必要ですか？支援を受けるには、こちらをクリックしてください。

懸念となるのは、攻撃者がわずか2万台のボットネットでこの攻撃を実行できたという事実です。今日、数万台から数百万台のマシンで構成されるボットネットが存在しています。Web上では全体として通常1秒間に10億から30億のリクエストしかないことを考えると、この方法を使えば、Webのリクエスト全体を少数のターゲットに集中させることができます。

検出と軽減

これは前例のない規模の斬新な攻撃ベクトルでしたが、Cloudflareの既存の保護システムは攻撃の矛先をほぼ吸収することができました。当初はお客様のトラフィックに若干の影響が見られたものの（攻撃の初期波ではリクエストのおよそ1％に影響）、現在では緩和方法を改良し、当社のシステムに影響を与えることなく、Cloudflareのすべてのお客様に対する攻撃を阻止することができるようになりました。

当社は、業界の最大手であるGoogleとAWSの2社と同時に、この攻撃に気づきました。当社はCloudflareのシステムを強化し、今日ではすべてのお客様がこの新しいDDoS攻撃手法から保護され、お客様への影響がないことを確認しました。当社はまた、グーグルやAWSとともに、影響を受けるベンダーや重要インフラストラクチャプロバイダーへの攻撃に関する協調的な情報開示に参加しました。

この攻撃は、HTTP/2プロトコルのいくつかの機能とサーバー実装の詳細を悪用することで行われました（詳細は、CVE-2023-44487をご覧ください）。この攻撃はHTTP/2プロトコルにおける根本的な弱点を悪用しているため、HTTP/2を実装しているすべてのベンダーがこの攻撃の対象になると考えられます。これには、すべての最新のWebサーバーも含まれます。当社は、GoogleとAWSとともに、Webサーバーベンダーがパッチを実装できるよう、攻撃方法を開示しました。一方で、Webに面したWebやAPIサーバーの前段階に設置されるCloudflareのようなDDoS軽減サービスを利用するのが最善の防御策とります。

この投稿では、HTTP/2プロトコルの詳細、攻撃者がこれらの大規模な攻撃を発生させるために悪用した機能、およびすべてのお客様が保護されていることを保証するために当社が講じた緩和策について詳細を掘り下げて紹介します。これらの詳細を公表することで、影響を受ける他のWebサーバーやサービスが緩和策を実施するために必要な情報を得られることを期待しています。そしてさらに、HTTP/2プロトコル規格チームや、将来のWeb規格に取り組むチームには、こうした攻撃を防ぐためHTTP/2プロトコルの設計改善に役立てていただければと思っています。

RST攻撃の詳細

HTTPは、Webを稼働するにあたって用いられるアプリケーションプロトコルです。HTTPセマンティクスとは、リクエストとレスポンスメッセージ、メソッド、ステータスコード、ヘッダーフィールドとトレーラフィールド、メッセージコンテンツなど、全体的なアーキテクチャ、用語、プロトコルの側面に関し、あらゆるバージョンに共通しています。個々のHTTPバージョンでは、セマンティクスをインターネット上でやりとりするための「ワイヤーフォーマット」に変換する方法を定義しています。例えば、クライアントはリクエストメッセージをバイナリデータにシリアライズして送信し、サーバーがこれを解析して処理可能なメッセージに戻します。

HTTP/1.1は、テキスト形式のシリアライズを使用します。リクエストメッセージとレスポンスメッセージはASCII文字のストリームとしてやりとりされ、TCPのような信頼性の高いトランスポートレイヤーを介して、以下のフォーマットで送信されます（「CRLF」はキャリッジリターンとラインフィードを意味します）：

 HTTP-message   = start-line CRLF
                   *( field-line CRLF )
                   CRLF
                   [ message-body ]

例えば、ワイヤ上の非常に簡単なGETリクエストは、https://blog.cloudflare.com/となります：

GET / HTTP/1.1 CRLFHost: blog.cloudflare.comCRLFCRLF

そして、応答は次のようなものになります：

HTTP/1.1 200 OK CRLFServer: cloudflareCRLFContent-Length: 100CRLFtext/html; charset=UTF-8CRLFCRLF<100 bytes of data>

このフォーマットは、ワイヤ上でメッセージをフレーム化します。つまり、1つのTCP接続を使って複数のリクエストとレスポンスをやり取りすることが可能になります。しかし、このフォーマットでは、各メッセージがまとめて送信される必要があります。さらに、リクエストと応答を正しく関連付けるために、厳密な順序が要求されます。つまり、メッセージは順序だてて交換され、多重化することはできません。https://blog.cloudflare.com/、https://blog.cloudflare.com/page/2/の2つのGETリクエストは、次のようになります：

GET / HTTP/1.1 CRLFHost: blog.cloudflare.comCRLFCRLFGET /page/2/ HTTP/1.1 CRLFHost: blog.cloudflare.comCRLFCRLF

レスポンスは、次のようになります：

HTTP/1.1 200 OK CRLFServer: cloudflareCRLFContent-Length: 100CRLFtext/html; charset=UTF-8CRLFCRLF<100 bytes of data>CRLFHTTP/1.1 200 OK CRLFServer: cloudflareCRLFContent-Length: 100CRLFtext/html; charset=UTF-8CRLFCRLF<100 bytes of data>

Webページでは、これらの例よりも複雑なHTTPインタラクションが必要となります。Cloudflareブログにアクセスすると、ブラウザは複数のスクリプト、スタイル、メディアアセットを読み込みます。HTTP/1.1を使ってトップページを訪れ、すぐに2ページ目に移動した場合、ブラウザは2つの選択肢から選ぶことになります。ページ2が始まる前に、ページに対するキューに入れられたものでもう必要のないものすべての応答を待つか、TCP接続を閉じて新しい接続を開くことで、実行中のリクエストをキャンセルすることのいずれかとなります。どちらも、あまり現実的ではありません。ブラウザは、TCP接続のプール（ホストあたり最大6つ）を管理し、プール上で複雑なリクエストディスパッチロジックを実装することによって、これらの制限を回避する傾向があります。

HTTP/2は、HTTP/1.1の多くの問題に対処しています。各HTTPメッセージは、型、長さ、フラグ、ストリーム識別子（ID）と悪意のあるペイロードを持つHTTP/2フレームのセットにシリアライズされます。ストリームIDは、ワイヤ上のどのバイトがどのメッセージに適用されるかを明確にし、安全な多重化と同時実行を可能にします。ストリームは、双方向となります。クライアントはフレームを送信し、サーバーは同じIDを使ったフレームを返信します。

HTTP/2では、https://blog.cloudflare.comへの当社のGETリクエストはストリームID 1でやり取りされ、クライアントは1つのHEADERSフレームを送信し、サーバーは1つのHEADERSフレームと、それに続く1つ以上のDATAフレームで応答します。クライアントのリクエストは常に奇数番号のストリームIDを使用するので、後続のリクエストはストリームID 3、5、…を使用することになります。レスポンスはどのような順番でも提供することができ、異なるストリームからのフレームをインターリーブすることもできます。

ストリーム多重化と同時実行は、HTTP/2の強力な機能です。これらは、単一のTCP接続をより効率的に使用することを可能にします。HTTP/2は、特に優先順位付けと組み合わせると、リソースの取得を最適化します。反面、クライアントが大量の並列作業を簡単に開始できるようにすることは、HTTP/1.1と比くらべサーバーリソースに対するピーク需要を増加させる可能性があります。これは明らかに、サービス拒否のベクトルです。

複数の防護策を提供するため、HTTP/2は最大アクティブ同時ストリームの概念を活用します。SETTINGS_MAX_CONCURRENT_STREAMSパラメータにより、サーバーは同時処理数の上限をアドバタイズできます。例えば、サーバーが上限を100とした場合、常時アクティブにできるのは100リクエストだけになります。クライアントがこの制限を超えてストリームを開こうとした場合、RST_STREAMフレームを使用してサーバーに拒否される必要があります。ストリーム拒否は、接続中の他のストリームには影響しません。

本当のところはもう少し複雑になります。ストリームには、ライフサイクルがあります。下図はHTTP/2ストリームのステートマシンの図です。クライアントとサーバーはストリームの状態をそれぞれ管理します。HEADERS、DATA、RST_STREAMフレームが送受信されると遷移が発生します。ストリームの状態のビューは独立していますが、同期しています。

HEADERSとDATAフレームはEND_STREAMフラグを含み、このフラグが値1（true）にセットされると、ステート遷移のトリガーとなります。

メッセージコンテンツを持たないGETリクエストの例で説明します。クライアントはまずストリームをアイドル状態からオープン状態に、続いて即座にハーフクローズ状態に遷移させます。クライアントのハーフクローズ状態は、もはやHEADERSやDATAを送信できないことを意味し、WINDOW_UPDATE、PRIORITY、RST_STREAMフレームのみを送信できます。ただし、任意のフレームを受信することができます。

サーバーがHEADERSフレームを受信して解析すると、ストリームの状態をアイドル状態からオープン状態、そしてハーフクローズ状態に遷移させ、クライアントと一致させます。サーバーがハーフクローズ状態であれば、どんなフレームでも送信できますが、WINDOW_UPDATE、PRIORITY、またはRST_STREAMフレームしか受信できないことを意味します。

そのため、サーバーはEND_STREAMフラグを0に設定したHEADERSを送信し、次にEND_STREAMフラグを1に設定したDATAを送信します。DATAフレームは、サーバーでハーフクローズドからクローズドへのストリームの遷移をトリガーします。クライアントがこのフレームを受信すると、ストリームもクローズドに遷移します。ストリームがクローズされると、フレームの送受信はできなくなります。

このライフサイクルを今カレンシーの文脈に当てはめ直し、HTTP/2は次のように記述します：

「オープン」状態にあるストリーム、または「ハーフクローズド」状態のいずれかにあるストリームは、エンドポイントが開くことを許可されるストリームの最大数にカウントされます。これら3つの状態のいずれかにあるストリームは、SETTINGS_MAX_CONCURRENT_STREAMS設定でアドバタイズされる制限にカウントされます。

理論的には、コンカレンシーの制限は、有用です。しかし、その効果を妨げる現実的な要因があります。詳細は、このブログの後半で開設します。

HTTP/2リクエストの取り消し

前段で、クライアントからの実行中のリクエストのキャンセルについて説明しました。HTTP/2は、HTTP/1.1よりもはるかに効率的な方法でこれをサポートしています。接続全体を切断するのではなく、クライアントは1つのストリームに対してRST_STREAMフレームを送信することができます。サーバーにリクエストの処理を中止し、レスポンスを中止するよう指示するものです。これによって、Free 、サーバーのリソースを節約し、帯域幅の浪費を避けることができます。

先ほどの3つのリクエストの例を考えてみます。このときクライアントは、すべてのHEADERSが送信された後に、ストリーム1のリクエストをキャンセルします。サーバーは、応答を提供する準備ができる前にこのRST_STREAMフレームを解析し、代わりにストリーム3と5にのみ応答します：

リクエストのキャンセルは、便利な機能です。たとえば、複数の画像を含むウェブページをスクロールするとき、Webブラウザはビューポートの外にある画像をキャンセルすることができ、ビューポートに入る画像をより速く読み込むことができます。HTTP/2は、HTTP/1.1に比べてこの動作をより効率的にしています。

キャンセルされたリクエストストリームは、ストリームのライフサイクルを急速に遷移していきます。END_STREAMフラグが1に設定されたクライアントのHEADERSは、状態をアイドルからオープン、ハーフクローズへと遷移させ、RST_STREAMは直ちにハーフクローズからクローズへと遷移させます。

ストリームの同時実行数制限に寄与するのは、オープン状態またはハーフクローズ状態にあるストリームだけであることを思い出してください。クライアントがストリームをキャンセルすると、そのクライアントは即座に別のストリームをオープンできるようになり、すぐに別のリクエストを送信できるようになります。これがCVE-2023-44487を機能させる要諦なのです。

サービス拒否につながるRapid Reset

HTTP/2リクエストのキャンセルは、制限のない数のストリームを急速にリセットするために悪用される可能性があります。HTTP/2サーバーがクライアントから送信されたRST_STREAMフレームを処理し、十分に迅速に状態を取りやめることができる場合、こうした迅速なリセットは問題を引き起こしません。問題が発生し始めるのは、片付ける際に何らかの遅延やタイムラグがある場合です。クライアントは非常に多くのリクエストを処理するため、作業のバックログが蓄積され、サーバーのリソースを過剰に消費することになります。

一般的なHTTPデプロイメントアーキテクチャは、HTTP/2プロキシやロードバランサーを他のコンポーネントの前で実行することになっています。クライアントのリクエストが到着すると、それはすぐにディスパッチされ、実際の作業は非同期アクティビティとして別の場所で行われます。これにより、プロキシはクライアントのトラフィックを非常に効率的に処理することができます。しかし、このような懸念の層別は、プロキシが処理中のジョブを片付けることを難しくします。そのため、これらのデプロイでは、急速なリセットによる問題が発生しやすくなります。

Cloudflareのリバースプロキシは、HTTP/2クライアントのトラフィックを処理する際、接続ソケットからデータをバッファにコピーし、バッファリングされたデータを順番に処理していきます。各リクエストが読み込まれると（HEADERSとDATAフレーム）、アップストリームサービスにディスパッチされます。RST_STREAMフレームが読み込まれると、リクエストのローカル状態が破棄され、リクエストがキャンセルされたことがアップストリームに通知されます。バッファ全体が消費されるまで、これが繰り返されます。しかしながら、このロジックは悪用される可能性があります。悪意のあるクライアントが膨大なリクエストの連鎖を送信し始め、接続の開始時にリセットされると、当社のサーバーはそれらすべてを熱心に読み込み、新しい着信リクエストを処理できなくなるほどのストレスをアップストリームサーバーにもたらすでしょう。

強調すべき重要な点は、ストリームの同時実行性だけでは急激なリセットを緩和できないということです。サーバーがSETTINGS_MAX_CONCURRENT_STREAMSの値を選んだとしても、クライアントは高いリクエストレートを生成するためにリクエストを繰り返すことができます。

Rapid Resetの全貌

以下、合計1000リクエストを試みる概念実証クライアントを使用して再現された高速リセットの例を示します。軽減策は一切設けず、市販品のサーバーを用いたテスト環境で、443番ポートを用いています。トラフィックはWiresharkを使って分解され、わかりやすくするためにHTTP/2トラフィックだけを表示するようにフィルタリングされています。進めるには、pcapをダウンロードしてください。

コマ数が多いので、ちょっと見づらいかもしれません。WiresharkのStatistics> HTTP2ツールで簡単な要約をまとめています：

このトレースの最初のフレームであるパケット14はサーバーのSETTINGSフレームであり、最大ストリーム同時実行数100をアドバタイズしています。パケット15では、クライアントはいくつかの制御フレームを送信し、その後、急速にリセットするリクエストを開始します。最初のHEADERSフレームは26バイト長ですが、それ以降のHEADERSはすべて9バイトです。このサイズの違いは、HPACKと呼ばれる圧縮技術によるものです。パケット15は合計で525のリクエストを含み、ストリーム1051まで増やされます。

興味深いことに、ストリーム1051のRST_STREAMはパケット15に適合しないため、パケット16ではサーバーが404応答しているのがわかります。その後、残りの475リクエストの送信に移る前に、パケット17でクライアントがRST_STREAMを送信しています。

サーバーは100の同時ストリームをアドバタイズしていますが、クライアントが送信したパケットはいずれも、それよりも多くのHEADERSフレームを送信しています。クライアントはサーバーからの折り返しのトラフィックを待つ必要がなく、送信できるパケットのサイズによってのみ制限されています。このトレースにはサーバーのRST_STREAMフレームは見られず、サーバーは同時ストリーム違反を観測していないことを示しています。

顧客への影響

上述したように、リクエストがキャンセルされると、アップストリームサービスは通知を受け、多くのリソースを浪費する前にリクエストを中止することができます。今回の攻撃では、ほとんどの悪意あるリクエストが配信元サーバーに転送されることはありませんでした。しかし、これらの攻撃の規模が大きいため、何らかの影響が引き起こされます。

まず、リクエストの着信率がこれまでにないピークに達したため、クライアントが目にする502エラーのレベルが上昇したという報告がありました。これは、最も影響を受けたデータセンターで発生しており、すべてのリクエストを処理するのに難儀しました。当社のネットワークは大規模な攻撃にも対応できるようになっているものの、今回の脆弱性は当社のインフラストラクチャの弱点を露呈するものでした。データセンターのひとつに届いたリクエストがどのように処理されるかを中心に、詳細をもう少し掘り下げてみましょう：

Cloudflareのインフラストラクチャは、役割の異なるプロキシサーバのチェーンで構成されていることがわかります。特に、クライアントがHTTPSトラフィックを送信するためにCloudflareに接続すると、まずTLS復号化プロキシに当たります。このプロキシはTLSトラフィックを復号化し、HTTP 1、2、または3トラフィックを処理した後、「ビジネスロジック」プロキシに転送します。このプロキシは、各顧客のすべての設定をロードし、リクエストを他のアップストリームサービスに正しくルーティングする役割を担っており、さらに当社の場合ではセキュリティ機能も担っています。このプロキシで、L7攻撃緩和の処理が行われます。

この攻撃ベクトルでの問題点は、すべての接続で非常に多くのリクエストを素早く送信することになります。当社がブロックするチャンスを得る前に、そのひとつひとつがビジネスロジックプロキシに転送されなければならなりませんでした。リクエストのスループットがプロキシのキャパシティを上回るようになると、この2つのサービスをつなぐパイプは、いくつかのサーバーで飽和レベルに達しました。

これが起こると、TLSプロキシはアップストリームプロキシに接続できなくなり、最も深刻な攻撃時に「502 Bad Gateway」エラーが表示されるクライアントがあるのは、これが理由です。重要なのは、現在ではHTTP分析の作成に使用されるログは、ビジネスロジックプロキシからも出力されることになります。その結果、これらのエラーはCloudflareのダッシュボードには表示されません。当社内部のダッシュボードによると、（緩和策を実施する前の）最初の攻撃波では、リクエストの約1％が影響を受け、8月29日の最も深刻な攻撃では数秒間で約12％のピークが見られました。次のグラフは、この現象が起きていた2時間にわたるエラーの割合を示したものです：

当社では、この記事の後半で詳述するとおり、その後の数日間でこの数を劇的に減らすことに努めました。当社によるスタックの変更と軽減策により、こうした攻撃の規模が大幅に縮小されたおかげで、この数は今日では事実上ゼロになっています：

499エラーとHTTP/2ストリーム同時実行の課題

一部の顧客から報告されたもう一つの症状に、499エラーの増加がありました。この理由は少し違っており、この投稿で前述したHTTP/2接続の最大ストリームの同時実行数に関連しています。

HTTP/2の設定は、SETTINGSフレームを使用して接続の開始時に交換されます。明示的なパラメータを受け取らない場合、デフォルト値が適用されます。クライアントがHTTP/2接続を確立すると、サーバーの設定を待つ（遅い）か、デフォルト値を想定してリクエストを開始（速い）することになります。SETTINGS_MAX_CONCURRENT_STREAMSでは、デフォルトでは事実上無制限となります（ストリームIDは31ビットの数値空間を使用し、リクエストは奇数を使用するため、実際の制限は1073741824となります）。仕様では、サーバーが提供するストリーム数は100を下回らないようにすることを推奨しています。クライアントは一般的にスピードを重視するため、サーバーの設定を待つ傾向がなく、ちょっとした競合状態が発生します。つまり、クライアントは、サーバーがどのリミットを選択するかという賭けに出ているのです。もし間違ったリミットを選択すれば、リクエストは拒否され、再試行しなければならなくなります。1073741824ストリームに賭けるギャンブルは、賢明ではありません。その代わり、多くのクライアントは、サーバーが仕様の推奨に従うことを期待し、同時ストリーム発行数を100に制限することにしています。サーバーが100以下のものを選んだ場合、このクライアントのギャンブルは失敗し、ストリームはリセットされます。

サーバーによるストリームのリセットには、同時実行数の上限を超えた場合など、たくさんの理由があります。HTTP/2は厳格であり、構文解析やロジックエラーが発生した場合はストリームを閉じる必要があります。2019年、CloudflareはHTTP/2のDoS脆弱性に対応して複数の緩和策を開発しました。これらの脆弱性のいくつかは、クライアントが誤動作を起こし、サーバーがストリームをリセットすることによって引き起こされていました。そのようなクライアントを取り締まるための非常に効果的な戦略は、接続中のサーバーリセットの回数をカウントし、それがある閾値を超えたらGOAWAYフレームで接続を閉じることになります。正当なクライアントは、接続中に1つか2つのミスをするかもしれないものの、それは許容範囲内となります。あまりにも多くのミスをするクライアントは、おそらく壊れているか悪意のあるクライアントであり、接続を閉じることで両方のケースに対処できます。

CVE-2023-44487によるDoS攻撃に対応している間、Cloudflareはストリームの最大同時実行数を64に減らしました。この変更を行う前、当社はクライアントがSETTINGSを待たず、代わりに100の同時実行を想定していることを知りませんでした。画像ギャラリーのような一部のWebページでは、接続開始時にブラウザがすぐに100リクエストを送信することがあります。残念ながら、制限を超えた36のストリームはすべてリセットする必要があり、これがカウント緩和のトリガーとなりました。つまり、正当なクライアントの接続を閉じてしまい、ページのロードが完全に失敗してしまったのです。この相互運用性の問題に気づいてすぐに、ストリームの最大同時接続数を100に変更しました。

Cloudflare側での対応

2019年、HTTP/2の実装に関連する複数のDoS脆弱性が発覚しました。Cloudflareはこれを受けて一連の検出と緩和策を開発し、デプロイしました。CVE-2023-44487は、HTTP/2の脆弱性の異なる症状です。しかし、この脆弱性を緩和するために、クライアントから送信されるRST_STREAMフレームを監視し、不正に使用されている場合は接続を閉じるよう、既存の保護を拡張することができました。RST_STREAMの正当なクライアント利用への影響はありませんでした。

直接的な修正に加え、サーバーのHTTP/2フレーム処理とリクエストディスパッチコードにいくつかの改善を実装しました。さらに、ビジネスロジックサーバーではキューイングとスケジューリングを改善し、不要な作業が減り、キャンセルの応答性が向上しました。これらを組み合わせることで、様々な悪用パターンの可能性の影響を軽減し、サーバーに飽和する前にリクエストを処理するための余裕を与えることができました。

攻撃の早期軽減

Cloudflareはすでに、より安価な方法で非常に大規模な攻撃を効率的に軽減するシステムを導入していました。その一つが、「IP Jail」というものです。ハイパー帯域幅消費型攻撃の場合、このシステムは攻撃に参加しているクライアントIPを収集し、攻撃されたプロパティへの接続をIPレベルまたは当社のTLSプロキシで阻止します。この貴重な数秒の間にオリジンはすでに保護されているものの、当社のインフラストラクチャはまだすべてのHTTPリクエストを吸収する必要があります。この新しいボットネットには事実上立ち上がり期間がないため、問題になる前に攻撃を無力化する必要があります。

これを実現するため、当社はIP Jailシステムを拡張してインフラストラクチャ全体を保護しました。IPが「ジェイル」（投獄）されると、攻撃されたプロパティへの接続がブロックされるだけでなく、対応するIPがCloudflare上の他のドメインに対してHTTP/2を使用することも、しばらくの間禁止されます。このようなプロトコルは、HTTP/1.xでの悪用は不可能です。このため、攻撃者による大規模な攻撃の実行は制限されるものの、同じIPを共有する正当なクライアントであればその間のパフォーマンスの低下はごくわずかなものとなります。IPベースの攻撃軽減策は、非常に鈍感なツールです。このため、このような規模で使用する場合、細心の注意を払い、誤検知をできるだけ避けるようにしなければなりません。さらに、ボットネット内の特定のIPの寿命は通常短いため、長期的な緩和策は良いことよりも悪いことの方が多い可能性が高くなります。以下のグラフは、我々が目撃した攻撃におけるIPの入れ替わりを示したものです：

このように、ある日に発見された多くの新規IPは、その後すぐに消えてしまいます。

これらの動作は、すべてHTTPSパイプラインの最初にあるTLSプロキシで行われるため、通常のL7攻撃低減システムと比較してかなりのリソースを節約できます。これにより、これらの攻撃をはるかにスムーズに切り抜けることができるようになり、現在ではこれらのボットネットによって引き起こされるランダムな502エラーの数は、ゼロになりました。

可観測性の向上

当社が変革しようとしているもうひとつの地平に、観測可能性があります。顧客分析で捉えられることなく、クライアントにエラーを返してしまうのは、不満につながります。幸いなことに、今回の攻撃のはるか以前から、これらのシステムをオーバーホールするプロジェクトが進行中でした。最終的には、ビジネスロジックプロキシがログ・データを統合して出力する代わりに、インフラ内の各サービスが独自にデータをログできるようになります。今回の事件は、この取り組みの重要性を浮き彫りにしました。

また、接続レベルのロギングの改善にも取り組んでおり、このようなプロトコルの乱用をより迅速に発見し、DDoS軽減能力を向上させることができます。

まとめ

今回の攻撃は記録的な規模であったものの、これが最後ではないことは明白です。攻撃がますます巧妙化する中、Cloudflareでは新たな脅威を能動的に特定し、当社のグローバル・ネットワークに対策をデプロイすることで、数百万人の顧客が即座に自動的に保護されるようたゆまぬ努力を続けています。

Cloudflareは、2017年以来すべてのお客様に無料、従量制、無制限のDDoS攻撃対策を提供してきました。さらに、あらゆる規模の組織のニーズに合わせて、さまざまな追加のセキュリティ機能を提供しています。保護されているかどうかわからない場合、または保護方法をお知りになりたい場合、当社にお問い合わせください。

HTTP/2 Rapid Reset : anatomie de l'attaque record

2023-10-10 Lucas Pardue

Post Syndicated from Lucas Pardue original http://blog.cloudflare.com/fr-fr/technical-breakdown-http2-rapid-reset-ddos-attack-fr-fr/

À compter du 25 août 2023, nous avons commencé à observer des attaques HTTP inhabituellement volumineuses frappant bon nombre de nos clients. Ces attaques ont été détectées et atténuées par notre système anti-DDoS automatisé. Il n’a pas fallu longtemps pour que ces attaques atteignent des tailles record, pour finir par culminer à un peu plus de 201 millions de requêtes par seconde, soit un chiffre près de trois fois supérieur à la précédente attaque la plus volumineuse que nous ayons enregistrée.

Vous êtes victime d’une attaque ou avez besoin d’une protection supplémentaire ? Cliquez ici pour demander de l’aide.

Le fait que l’acteur malveillant soit parvenu à générer une attaque d’une telle ampleur à l’aide d’un botnet de tout juste 20 000 machines s’avère préoccupant. Certains botnets actuels se composent de centaines de milliers ou de millions de machines. Comme qu’Internet dans son ensemble ne reçoit habituellement qu’entre 1 et 3 milliards de requêtes chaque seconde, il n’est pas inconcevable que l’utilisation de cette méthode puisse concentrer l’intégralité du nombre de requêtes du réseau sur un petit nombre de cibles.

Détection et atténuation

Il s’agissait d’un nouveau vecteur d’attaque évoluant à une échelle sans précédent, mais les protections Cloudflare existantes ont largement pu absorber le plus gros de ces attaques. Si nous avons constaté au départ un certain impact sur le trafic client (environ 1 % des requêtes ont été touchées pendant la vague d’attaques initiale), nous avons ensuite pu perfectionner nos méthodes d’atténuation afin de bloquer l’attaque pour n’importe quel client Cloudflare sans affecter nos systèmes.

Nous avons remarqué ces attaques en même temps que deux autres acteurs majeurs du secteur : Google et AWS. Nous nous sommes attelés au renforcement des systèmes de Cloudflare afin de nous assurer qu’aujourd’hui tous nos clients sont protégés contre cette nouvelle méthode d’attaque DDoS sans impact sur ces derniers. Nous avons également participé, avec Google et AWS, à une révélation coordonnée de l’attaque aux prestataires affectés et aux fournisseurs d’infrastructure essentielle.

Cette attaque a été rendue possible par l’abus de certaines fonctionnalités du protocole HTTP/2 et des détails de mise en œuvre des serveurs (voir la CVE-2023-44487 pour plus d’informations). Comme l’attaque tire parti d’une faiblesse sous-jacente du protocole HTTP/2, nous pensons que tous les fournisseurs qui ont déployé le HTTP/2 subiront l’attaque. Ce constat comprend tous les serveurs web modernes. Aux côtés de Google et d’AWS, nous avons divulgué la méthode d’attaque aux fournisseurs de serveurs web qui, nous l’espérons, déploieront les correctifs. Entre temps, la meilleure défense consiste à utiliser un service d’atténuation des attaques DDoS tel que Cloudflare en amont de chaque réseau en contact avec Internet ou de chaque serveur d’API.

Cet article s’intéressera en profondeur aux détails du protocole HTTP/2, la fonctionnalité exploitée par les acteurs malveillants pour générer ces attaques d’envergure, ainsi qu’aux stratégies d’atténuation que nous avons appliquées pour nous assurer que tous nos clients sont protégés. Nous espérons qu’en publiant ces détails d’autres serveurs web et services affectés disposeront des informations dont ils ont besoin pour mettre en œuvre ces stratégies d’atténuation. En outre, l’équipe chargée des normes du protocole HTTP/2, de même que les équipes travaillant sur les futures normes web, pourront mieux concevoir ces dernières afin d’empêcher de telles attaques.

Détails de l’attaque RST

Le protocole d’application HTTP sous-tend Internet. La norme HTTP Semantics est commune à toutes les versions de HTTP : l’architecture générale, la terminologie et les aspects de protocole, comme les messages de requête et de réponse, les méthodes, les codes d’état, les champs d’en-tête et de trailer, le contenu des messages et bien d’autres. Chaque version individuelle de HTTP définit la manière dont la sémantique est transformée au « format conversation » (wire) pour l’échange sur Internet. Un client doit, par exemple, sérialiser un message de requête en données binaires avant de l’envoyer. Le serveur l’analyse ensuite et le retransforme en message qu’il peut traiter.

Le protocole HTTP/1.1 utilise une forme textuelle de sérialisation. Les messages de requête et de réponse sont échangés sous la forme d’un flux de caractères ASCII, envoyé via une couche de transport fiable, comme le TCP, selon le format suivant (dans lequel CRLF signifie retour chariot et saut de ligne) :

 HTTP-message   = start-line CRLF
                   *( field-line CRLF )
                   CRLF
                   [ message-body ]

Une requête GET très simple pour https://blog.cloudflare.com/ ressemblerait, par exemple, à ceci sur la conversation :

GET / HTTP/1.1 CRLFHost: blog.cloudflare.comCRLFCRLF

Et la réponse ressemblerait à ce qui suit :

HTTP/1.1 200 OK CRLFServer: cloudflareCRLFContent-Length: 100CRLFtext/html; charset=UTF-8CRLFCRLF<100 bytes of data>

Ce format encapsule les messages sur la conversation, pour indiquer qu’il est possible d’utiliser une unique connexion TCP pour échanger plusieurs requêtes et réponses. Le format nécessite toutefois que chaque message soit envoyé en entier. En outre, afin de faire entrer correctement en corrélation les requêtes avec les réponses, un ordre strict se révèle nécessaire. Les messages peuvent donc être échangés de manière sérielle et ne peuvent pas être multiplexés. Deux requêtes GET, pour https://blog.cloudflare.com/ et https://blog.cloudflare.com/page/2/,, se présenteraient ainsi sous la forme suivante :

GET / HTTP/1.1 CRLFHost: blog.cloudflare.comCRLFCRLFGET /page/2/ HTTP/1.1 CRLFHost: blog.cloudflare.comCRLFCRLF

Et les réponses :

Les pages web nécessitent davantage d’interactions HTTP compliquées que ces exemples. Lorsque vous visitez le blog de Cloudflare, votre navigateur charge plusieurs scripts, styles et ressources multimédias. Si vous accédez à la page d’accueil à l’aide du protocole HTTP/1.1 et que vous décidez rapidement de vous rendre sur la page 2, votre navigateur a le choix entre deux options. Soit attendre l’ensemble des réponses en attente pour la page que vous ne souhaitez plus consulter avant de démarrer la page 2, soit annuler les requêtes en transit en mettant fin à la connexion TCP et en établissant une nouvelle connexion. Aucune de ces options ne s’avère particulièrement pratique. Les navigateurs ont tendance à contourner ces limitations en gérant un pool de connexions TCP (jusqu’à 6 par hôte) et en mettant en œuvre une logique complexe de répartition des requêtes au sein du pool.

Le protocole HTTP/2 répond à bon nombre des problèmes du HTTP/1.1. Chaque message HTTP est sérialisé sous la forme d’un ensemble de trames HTTP/2 disposant d’un type, d’une longueur, de marqueurs, d’un identifiant (ID) de flux et d’un contenu. L’ID de flux indique clairement quels octets sur la conversation s’appliquent à un message donné, afin de permettre le multiplexage et la concurrence en toute sécurité. Les flux sont bidirectionnels. Les clients envoient des trames et les serveurs répondent par des trames utilisant le même ID.

En HTTP/2, notre requête GET pour https://blog.cloudflare.com serait échangée sur l’ID de flux 1, le client envoyant une trame HEADERS et le serveur répondant par une trame HEADERS, suivies par une ou plusieurs trames DATA. Comme les requêtes du client utilisent toujours des ID de flux impairs, les requêtes suivantes utiliseront donc les ID de flux 3, 5 et ainsi de suite. Les réponses peuvent être transmises dans n’importe quel ordre et les trames provenant de flux différents peuvent être entrelacées.

Le multiplexage et la concurrence des flux constituent de puissantes fonctionnalités du protocole HTTP/2. Elles permettent l’utilisation plus efficace d’une unique connexion TCP. Le HTTP/2 optimise la récupération de ressources, notamment lorsqu’elle est associée à la priorisation. En réciproque, le fait de faciliter le lancement de vastes quantités de tâches parallèles aux clients peut accroître le pic de demande de ressources serveur par rapport au HTTP/1.1. Il s’agit là d’un vecteur évident de déni de service.

Afin de proposer quelques garde-fous, le HTTP/2 avance la notion de maximum de flux concurrents actifs. Le paramètre SETTINGS_MAX_CONCURRENT_STREAMS permet à un serveur d’annoncer sa limite de concurrence. Par exemple, si le serveur annonce une limite de 100, seules 100 requêtes pourront être actives à un moment donné. Si un client tente d’ouvrir un flux au-delà de cette limite, ce dernier devra être rejeté par le serveur à l’aide d’une trame RST_STREAM. Le rejet d’un flux n’affecte pas les autres flux en transit sur la connexion.

La réalité de l’affaire est un peu plus compliquée. Les flux présentent un cycle de vie. Vous trouverez ci-dessous un schéma de l’état d’un flux HTTP/2. Le client et le serveur gèrent leurs propres vues de l’état d’un flux. L’envoi ou la réception de trames HEADERS, DATA et RST_STREAM déclenchent les transitions. Les vues de l’état d’un flux sont indépendantes, mais restent synchronisées.

Les trames HEADERS et DATA intègrent un marqueur END_STREAM qui, lorsqu’il est défini sur la valeur 1 (true), peut déclencher une transition d’état.

Examinons ceci plus en détail avec un exemple de requête GET sans contenu de message. Le client envoie la requête sous la forme d’une trame HEADERS comportant le marqueur END_STREAM défini sur 1. Il déclenche en premier lieu la transition de l’état « idle » (à l’arrêt) à « open » (ouvert), avant de déclencher immédiatement une transition vers l’état « half-closed » (mi-fermé). L’état « half-closed » du client indique qu’il ne peut plus envoyer de trames HEADERS ou DATA, mais uniquement des trames WINDOW_UPDATE, PRIORITY ou RST_STREAM. Il peut toutefois recevoir n’importe quelle trame.

Une fois que le serveur reçoit et analyse la trame HEADERS, il fait passer l’état du flux d’« idle » à « open », puis à « half-closed », afin de correspondre à celui du client. L’état « half-closed » du serveur indique qu’il peut envoyer n’importe quelle trame, mais qu’il ne peut recevoir que des trames WINDOW_UPDATE, PRIORITY ou RST_STREAM.

La réponse à la requête GET contient un contenu de message, aussi le serveur envoie-t-il une trame HEADERS comportant le marqueur END_STREAM défini sur 0, puis une trame DATA comportant le marqueur END_STREAM défini sur 1. La trame DATA déclenche la transition du flux de half-closed à closed (fermé) sur le serveur. Lorsque le client la reçoit, il lance également sa transition vers l’état « closed ». Une fois un flux fermé, plus aucune trame ne peut être envoyée ou reçue.

En appliquant ce cycle de vie dans le contexte de la concurrence, le protocole HTTP/2 précise :

Les flux à l’état « open » ou dans l’un des deux états « half-closed » comptent dans le nombre maximum de flux qu’un point de terminaison est autorisé à ouvrir. Les flux dans l’un de ces trois états comptent à l’égard de la limite annoncée dans le paramètre SETTINGS_MAX_CONCURRENT_STREAMS.

En théorie, la limite de concurrence est utile. Certains facteurs pratiques entravent toutefois son efficacité, que nous aborderons plus tard dans cet article.

Annulation de requête HTTP/2

Un peu plus tôt, nous avons évoqué l’annulation de requêtes en transit par le client. Le protocole HTTP/2 prend cette fonctionnalité en charge de manière plus efficace que le HTTP/1.1. Plutôt que de devoir abandonner la connexion dans son ensemble, un client peut désormais envoyer une trame RST_STREAM pour un seul flux. Cette dernière demande au serveur de mettre fin au traitement de la requête et d’abandonner la réponse. Cette opération libère des ressources serveur et permet d’éviter de gaspiller de la bande passante.

Reprenons notre exemple précédent, avec les trois requêtes. Cette fois, le client annule la requête sur le flux 1 après l’envoi de toutes les trames HEADERS. Le serveur analyse la trame RST_STREAM avant d’être prêt à diffuser la réponse et, à la place, ne répond qu’aux flux 3 et 5 :

L’annulation de requête constitue une fonctionnalité bien utile. Lorsque vous parcourez une page web comportant plusieurs images, par exemple, un navigateur web peut annuler les images qui ne sont pas affichées dès l’ouverture. Les images qui lui parviennent peuvent donc être chargées plus rapidement. Le protocole HTTP/2 rend ce comportement bien plus efficace par rapport au HTTP/1.1.

Un flux de requête annulé passe rapidement par tous les états du cycle de vie d’un flux. La trame HEADERS envoyée par le client, comportant le marqueur END_STREAM défini sur 1, passe de l’état « idle » à « open », puis à « half-closed », avant que la trame RST_STREAM ne déclenche immédiatement sa transition de l’état « half-closed » à « closed ».

Souvenez-vous que seuls les flux à l’état « open » ou « half-closed » sont comptabilisés dans la limite de concurrence du flux. Lorsqu’un client annule un flux, il regagne instantanément la capacité d’ouvrir un autre flux à la place et peut immédiatement envoyer une nouvelle requête. C’est là le cœur du fonctionnement de la vulnérabilité CVE-2023-44487.

Des réinitialisations rapides conduisant à un déni de service

Le processus d’annulation de requête du protocole HTTP/2 peut être utilisé de manière abusive en réinitialisant rapidement un nombre illimité de flux. Lorsqu’un serveur HTTP/2 peut traiter des trames client-sent RST_STREAM et leur faire changer d’état suffisamment rapidement, ces réinitialisations rapides ne posent pas de problème. Les soucis commencent lorsqu’une quelconque forme de retard ou de latence apparaît lors du nettoyage. Le client peut avoir à traiter un nombre de requêtes si important que les tâches s’accumulent, en entraînant une consommation excessive de ressources sur le serveur.

Une architecture de déploiement HTTP courante consiste à exécuter un proxy HTTP/2 ou un équilibreur de charge en amont des autres composants. Lorsqu’une requête client arrive, elle est rapidement retransmise et la tâche réelle est effectuée sous forme d’activité asynchrone à un autre endroit. Cette opération permet au proxy de traiter le trafic client très efficacement. Toutefois, cette séparation des préoccupations peut compliquer la phase de nettoyage des tâches en cours pour le proxy. Ce type de déploiement est donc plus susceptible de rencontrer des problèmes en cas de réinitialisations rapides.

Lorsque les proxys inverses de Cloudflare traitent du trafic client entrant HTTP/2, ils copient les données du socket de la connexion au sein d’un tampon et traitent ces données en tampon dans l’ordre. Chaque requête est lue (trames HEADERS et DATA) et transmise à un service en amont. Lorsque les trames RST_STREAM sont lues, l’état local de la requête est abandonné et l’amont est notifié de l’annulation de la requête. Les proxys répètent ensuite le processus jusqu’à ce que toutes les données en tampon aient été traitées. Cette logique peut toutefois être utilisée de manière abusive : si un client malveillant commence à envoyer une énorme chaîne de requêtes, qu’il réinitialise au début d’une connexion, nos serveurs s’empresseront de toutes les lire. Cette situation engendrera alors une pression sur les serveurs en amont, au point qu’ils se retrouveront incapables de traiter les nouvelles requêtes entrantes.

Un point important à souligner est que la concurrence de flux ne peut pas, par elle-même, atténuer les réinitialisations rapides. Le client peut créer des requêtes afin de produire des taux de requêtes élevés, peu importe la valeur choisie par le serveur pour le paramètre SETTINGS_MAX_CONCURRENT_STREAMS.

Anatomie d’une réinitialisation rapide

Voici un exemple de réinitialisation rapide (Rapid Reset) reproduite à l’aide d’un client de démonstration de faisabilité tentant d’envoyer un total de 1 000 requêtes. J’ai utilisé un serveur du commerce ne disposant d’aucune mesure d’atténuation et écoutant le port 443 au sein d’un environnement de test. Le trafic est disséqué à l’aide de Wireshark et filtré pour ne montrer que le trafic HTTP/2, pour plus de clarté. Téléchargez la pcap pour suivre la démonstration.

Il est un peu difficile à analyser, en raison du grand nombre de trames. Nous pouvons en obtenir un résumé rapide à l’aide de l’outil HTTP/2 de Wireshark, disponible sous « Statistics » (Statistiques) :

La première trame de cette trace, dans le paquet 14, est la trame SETTINGS du serveur, qui annonce un nombre maximum de flux concurrents de 100. Dans le paquet 15, le client envoie quelques trames de contrôle, puis commence à envoyer des requêtes, rapidement réinitialisées. La première trame HEADERS fait 26 octets de long, tandis que toutes les trames HEADERS suivantes ne mesurent que 9 octets. Cette différence de taille est due à une technologie de compression nommée HPACK. Au total, le paquet 15 contient 525 requests, remontant le long du flux 1051.

Curieusement, la trame RST_STREAM du flux 1051 ne rentre pas dans le paquet 15. Nous voyons donc, dans le paquet 16, le serveur répondre par une erreur 404. Le client envoie ensuite la trame RST_STREAM dans le paquet 17, avant de passer à l’envoi des 475 requêtes suivantes.

Veuillez noter que bien que le serveur ait annoncé 100 flux concurrents, les deux paquets envoyés par le client comportaient bien plus de trames HEADERS. Le client n’a pas attendu le trafic de retour du serveur, il n’était limité que par la taille des paquets qu’il pouvait envoyer. Aucune trame RST_STREAM du serveur n’apparaît dans cette trace, un constat qui indique que le serveur n’a pas observé de violation du nombre de flux concurrents.

Impact sur les clients

Comme mentionné plus haut, lorsque les requêtes sont annulées, les services en amont sont notifiés et peuvent abandonner ces dernières avant de gaspiller trop de ressources sur leur traitement. C’est ce qui s’est passé dans cette attaque, au cours de laquelle les requêtes malveillantes n’ont jamais été retransmises aux serveurs d’origine. Toutefois, l’ampleur de ces attaques a engendré des effets.

Tout d’abord, lorsque le taux de requêtes entrantes a atteint des pics jamais encore observés jusqu’ici, nous avons reçu des signalements de niveaux élevés d’erreurs 502 observées par les clients. C’est ce qui s’est produit dans nos datacenters les plus impactés, car ils avaient du mal à traiter toutes les requêtes. Notre réseau est conçu pour faire face aux attaques d’envergure, mais cette vulnérabilité a révélé une faiblesse au sein de notre infrastructure. Intéressons-nous de plus près aux détails, en nous concentrant sur la manière dont les requêtes entrantes sont traitées lorsqu’elles arrivent dans l’un de nos datacenters :

Nous pouvons voir que notre infrastructure se compose d’une chaîne de différents serveurs de proxy aux responsabilités différentes. Plus particulièrement, lorsqu’un client se connecte à Cloudflare pour envoyer du trafic HTTPS, ce dernier passe en premier par notre proxy de déchiffrement TLS, qui déchiffre le trafic TLS et traite le trafic HTTP 1, 2 ou 3, avant de le transmettre à notre proxy de « logique métier ». Ce dernier est responsable du chargement de l’ensemble des paramètres pour chaque client, puis du routage correct des requêtes vers les autres services d’amont. Plus important encore dans le cas qui nous intéresse, il est également responsable des fonctionnalités de sécurité. C’est là que l’atténuation des attaques sur la couche 7 est mise en œuvre.

Le problème avec ce vecteur d’attaque réside dans le fait qu’il parvient à envoyer un grand nombre de requêtes de manière très rapide, sur chaque connexion. Chacune d’elles devait être retransmise au proxy de logique métier avant que nous n’ayons l’occasion de la bloquer. Lorsque le volume de requêtes s’est révélé supérieur à la capacité de notre proxy, le pipeline reliant ces deux services a atteint son niveau de saturation dans certains de nos serveurs.

Quand cette situation se produit, le proxy TLS ne peut plus se connecter à son proxy d’amont. C’est pourquoi certains de nos clients ont vu s’afficher une erreur « 502 Bad Gateway » lors des attaques les plus graves. Il est important de noter qu’à la date d’aujourd’hui, les journaux utilisés pour produire les analyses HTTP sont également émis par notre proxy de logique métier. En conséquence, ces erreurs ne sont pas visibles au sein du tableau de bord Cloudflare. Nos tableaux de bord internes révèlent qu’environ 1 % des requêtes ont été affectées lors de la vague d’attaques initiale (avant la mise en œuvre des mesures d’atténuation), avec un pic se situant autour de 12 % pendant quelques secondes lors de l’attaque la plus massive, le 29 août. Le graphique suivant montre la proportion de ces erreurs sur une période de deux heures au cours de l’attaque :

Nous nous sommes efforcés de réduire ce nombre de manière considérable les jours suivants, comme nous le détaillons plus loin dans cet article. Ce nombre aujourd’hui est effectivement de zéro, à la fois grâce aux modifications apportées à notre pile et à nos mesures d’atténuation, qui ont drastiquement réduit la taille de ces attaques.

Erreurs 499 et les défis liés à la concurrence des flux HTTP/2

Un autre symptôme signalé par certains clients réside dans l’augmentation des erreurs 499. La raison est ici quelque peu différente et se trouve liée à la concurrence de flux maximale au sein d’une connexion HTTP/2, comme détaillée précédemment dans l’article.

Les paramètres HTTP/2 sont échangés au début d’une connexion à l’aide de trames SETTINGS. En l’absence de réception d’un paramètre explicite, ce sont les valeurs par défaut qui s’appliquent. Lorsqu’un client établit une connexion HTTP/2, il peut soit attendre la trame SETTINGS d’un serveur (lent), soit présupposer les valeurs par défaut et commencer à envoyer des requêtes (rapide). Pour le paramètre SETTINGS_MAX_CONCURRENT_STREAMS, la valeur par défaut est, dans les faits, illimitée (les ID de flux s’appuient sur un espace mathématique de 31 bits et les requêtes utilisent les nombres impairs. la limite réelle est donc établie à 1 073 741 824). La spécification recommande qu’un serveur ne propose pas moins de 100 flux concurrents. Les clients sont généralement axés sur la vitesse. Ils n’ont donc pas tendance à attendre les paramètres du serveur et ce fait entraîne en quelque sorte une situation de compétition. Ils « parient » sur la limite que le serveur pourrait avoir choisie. S’ils se trompent, la requête sera rejetée et devra être renvoyée. Le fait de parier sur un ensemble numérique de 1 073 741 824 nombres s’avère pour le moins absurde. Pour contrebalancer cette situation, de nombreux clients décident de se limiter à l’émission de 100 flux concurrents, dans l’espoir que les serveurs suivent la recommandation de la spécification. Si les serveurs ont sélectionné une valeur inférieure à 100, le pari du client échoue et les flux sont réinitialisés.

Un serveur pourrait réinitialiser un flux pour de nombreuses raisons en dehors d’un dépassement de la limite de concurrence. Le HTTP/2 est strict et nécessite qu’un flux soit fermé (closed) en cas d’erreurs d’interprétation ou d’erreurs logiques. En 2019, Cloudflare a développé plusieurs mesures d’atténuation en réponse aux vulnérabilités DoS du protocole HTTP/2. Plusieurs de ces vulnérabilités résultaient d’un mauvais comportement de la part du client, qui poussait le serveur à réinitialiser un flux. Une stratégie très efficace pour freiner ces clients consiste à compter le nombre de réinitialisations du serveur au cours d’une connexion puis, lorsque ce chiffre dépasse un certain seuil, de mettre un terme à cette dernière à l’aide d’une trame GOAWAY. Les clients peuvent commettre une ou deux erreurs au cours d’une connexion et il s’agit là d’un constat acceptable. Un client qui commet trop d’erreurs est probablement soit défectueux, soit malveillant, et le fait de mettre fin à la connexion répond aux deux cas.

En réponse aux attaques DoS permises par la vulnérabilité CVE-2023-44487, Cloudflare a réduit la concurrence de flux maximale à 64. Avant d’effectuer cette modification, nous n’avions pas conscience que les clients n’attendaient pas la trame SETTINGS et supposaient à la place que la concurrence était fixée à 100. Certaines pages web, comme les galeries d’images, entraînent effectivement l’envoi immédiat de 100 requêtes par le navigateur au début d’une connexion. Malheureusement, les 36 flux au-delà de notre limite devaient être réinitialisés et cette opération déclenchait les compteurs de nos mesures d’atténuation. Nous interrompions donc des connexions sur des clients légitimes, avec pour résultat un échec total du chargement des pages. Dès que nous avons constaté ce problème d’interopérabilité, nous avons de nouveau fixé la concurrence de flux maximale à 100.

Actions côté Cloudflare

En 2019, nous avons découvert plusieurs vulnérabilités DoS liées à l’implémentation du protocole HTTP/2. Cloudflare a développé et déployé une série de mesures de détection et d’atténuation en réponse. La vulnérabilité CVE-2023-44487 est une différente manifestation de la vulnérabilité HTTP/2. Toutefois, pour l’atténuer, nous avons pu étendre les protections existantes afin de surveiller les trames RST_STREAM envoyées par les clients et de mettre fin aux connexions lorsque ces dernières étaient utilisées à des fins abusives. Les scénarios d’utilisation légitimes des trames RST_STREAM par les clients n’ont pas été affectés.

En plus d’un correctif direct, nous avons mis en œuvre plusieurs améliorations du serveur concernant le traitement des trames HTTP/2 et du code de répartition des requêtes. Le serveur de logique métier a, en outre, fait l’objet de perfectionnements au niveau de la mise en file d’attente et de la planification. Ces derniers réduisent le travail inutile et améliorent la réponse aux annulations. Ensemble, ces mesures diminuent l’impact des divers schémas d’abus potentiels, tout en accordant plus d’espace au serveur pour traiter les requêtes avant d’atteindre la saturation.

Atténuer les attaques à un moment plus précoce

Cloudflare dispose déjà de systèmes en place permettant d’atténuer efficacement les attaques de très grande ampleur à l’aide de méthodes moins coûteuses. L’une d’elles se nomme « IP Jail » (Prison IP). En cas d’attaques hypervolumétriques, ce système collecte les adresses IP des clients participant à l’attaque et les empêche de se connecter à la propriété attaquée, que ce soit au niveau de l’adresse IP ou de notre proxy TLS. Ce système demande toutefois quelques secondes pour être pleinement efficace. Au cours de ces précieuses secondes, les serveurs d’origine sont déjà protégés, mais notre infrastructure doit encore absorber l’ensemble des requêtes HTTP. Comme ce nouveau botnet ne dispose dans les faits d’aucune période de démarrage, nous devons pouvoir neutraliser ces attaques avant qu’elles ne deviennent un problème.

Pour y parvenir, nous avons étendu le système IP Jail afin qu’il protège l’intégralité de notre infrastructure. Une fois une adresse IP « en prison », nous l’empêchons non seulement de se connecter à la propriété attaquée, mais interdisons également aux adresses IP correspondants d’utiliser le HTTP/2 pour se connecter à un autre domaine sur Cloudflare pendant quelque temps. Comme de tels abus du protocole ne sont pas possibles à l’aide du HTTP/1.x, l’acteur malveillant se trouve sévèrement limité dans sa capacité à conduire des attaques d’envergure, tandis qu’un client légitime partageant la même adresse IP ne constaterait d’une très légère diminution des performances pendant ce temps. Les mesures d’atténuation basées sur l’IP constituent un outil pour le moins brutal. C’est pourquoi nous devons faire preuve d’une extrême prudence lorsque nous les utilisons à grande échelle et chercher à éviter les faux positifs autant que possible. De même, comme la durée de vie d’une adresse IP donnée au sein d’un botnet est généralement courte, l’atténuation à long terme risque davantage de nuire que d’aider. Le graphique suivant montre l’évolution du nombre d’adresses IP lors de l’attaque dont nous avons été témoins :

Il apparaît très clairement que les nouvelles adresses IP repérées lors d’une journée donnée disparaissent très rapidement après l’attaque.

Le fait que ces actions se déroulent dans notre proxy TLS, à l’entrée de notre pipeline HTTPS, permet d’économiser des ressources considérables par rapport à notre système d’atténuation de couche 7 habituel. Cette situation nous a permis de supporter ces attaques d’autant plus facilement et, aujourd’hui, le nombre d’erreurs 502 aléatoires dues à ces botnets a été réduit à zéro.

Améliorations en matière d’observabilité

L’un des autres fronts sur lequel nous avons apporté des modifications est celui de l’observabilité. Le fait de renvoyer des erreurs aux clients sans que ces dernières soient visibles dans les outils d’analyse des clients se révèle pour le moins insatisfaisant. Fort heureusement, nous avons lancé un projet visant à réorganiser ces systèmes bien avant les attaques récentes. Il permettra à terme à chaque service compris au sein de notre infrastructure de journaliser ses propres données, au lieu de s’en remettre à notre proxy de logique métier pour consolider et émettre les données de journalisation. Cet incident a fait ressortir l’importance de ce travail, dans le cadre duquel nous redoublons d’efforts.

Nous travaillons également à une meilleure journalisation au niveau de la connexion, afin de nous permettre de repérer ce type d’abus de protocole bien plus rapidement, afin d’améliorer nos capacités d’atténuation des attaques DDoS.

Conclusion

Si l’attaque à laquelle cet article est consacré constituait sans conteste la dernière attaque record à ce jour, nous savons que ce ne sera pas la dernière. Alors que les attaques gagnent chaque jour en sophistication, Cloudflare travaille avec acharnement aux moyens d’identifier les nouvelles menaces de manière proactive, en déployant des contremesures sur notre réseau mondial afin de protéger nos millions de clients, immédiatement et automatiquement.

Cloudflare fournit une protection contre les attaques DDoS gratuite, totalement illimitée et sans surcoût lié à l’utilisation à l’ensemble de ses clients, et ce depuis 2017. Nous proposons en outre une gamme de fonctionnalités de sécurité supplémentaires afin de répondre aux besoins des entreprises de toutes les tailles. Contactez-nous si vous n’êtes pas sûr de savoir si vous êtes protégé ou si vous souhaitez comprendre comment vous pourriez l’être.

HTTP/2 Rapid Reset: cómo desarmamos el ataque sin precedentes

2023-10-10 Lucas Pardue

Post Syndicated from Lucas Pardue original http://blog.cloudflare.com/es-es/technical-breakdown-http2-rapid-reset-ddos-attack-es-es/

El 25 de agosto de 2023, empezamos a observar algunos ataques de inundación HTTP inusualmente voluminosos que afectaban a muchos de nuestros clientes. Nuestro sistema DDoS automatizado detectó y mitigó estos ataques. Sin embargo, no pasó mucho tiempo antes de que empezaran a alcanzar tamaños sin precedentes, hasta alcanzar finalmente un pico de 201 millones de solicitudes por segundo, casi el triple que el mayor ataque registrado hasta ese momento.

¿Estás siendo blanco de un ataque o necesitas mayor protección? Si necesitas ayuda, haz clic aquí.

Lo más inquietante es que el atacante fuera capaz de generar semejante ataque con una botnet de solo 20 000 máquinas. Hoy en día existen botnets formadas por cientos de miles o millones de máquinas. La web suele recibir solo entre 1000 y 3000 millones de solicitudes por segundo, por eso no parece imposible que utilizando este método se pudiera concentrar el volumen de solicitudes de toda una web en un pequeño número de objetivos.

Detección y mitigación

Se trataba de un vector de ataque novedoso a una escala sin precedentes, pero las soluciones de protección de Cloudflare pudieron mitigar en gran medida los efectos más graves de los ataques. Si bien al principio observamos cierto impacto en el tráfico de los clientes, que afectó aproximadamente al 1 % de las solicitudes durante la oleada inicial de ataques, hoy hemos podido perfeccionar nuestros métodos de mitigación para detener el ataque de cualquier cliente de Cloudflare sin que nuestros sistemas se vean afectados.

Nos dimos cuenta de estos ataques mientras otros dos grandes empresas del sector, Google y AWS, observaban lo mismo. Trabajamos para consolidar los sistemas de Cloudflare y garantizar que, a día de hoy, todos nuestros clientes estén protegidos de este nuevo método de ataque DDoS sin que afecte a ningún cliente. También hemos participado con Google y AWS en una divulgación coordinada del ataque a los proveedores afectados y a los proveedores de infraestructuras críticas.

Este ataque fue posible mediante el abuso de algunas funciones del protocolo HTTP/2 y de detalles de implementación del servidor (para más información, véase CVE-2023-44487). Dado que el ataque aprovecha una deficiencia subyacente en el protocolo HTTP/2, creemos que cualquier proveedor que haya implementado este protocolo será objeto del ataque. Esto incluye todos los servidores web modernos. Nosotros, junto con Google y AWS, hemos revelado el método de ataque a los proveedores de servidores web, que esperamos implementen revisiones. Mientras tanto, la mejor protección es utilizar un servicio de mitigación de DDoS como el de Cloudflare frente a cualquier servidor web o API pública.

Este artículo analiza los detalles del protocolo HTTP/2, la función que explotaron los atacantes para generar estos ataques masivos, y las estrategias de mitigación que adoptamos para garantizar la protección de todos nuestros clientes. Nuestra esperanza es que, al publicar estos detalles, otros servidores y servicios web afectados dispongan de la información que necesitan para aplicar estrategias de mitigación. Y, además, el equipo de estándares del protocolo HTTP/2, así como los equipos que trabajen en futuros estándares web, puedan diseñarlos mejor para evitar este tipo de incidentes.

Detalles del ataque RST

HTTP es el protocolo de aplicación que permite la transferencia de información en la web. La semántica HTTP es común a todas las versiones de HTTP. La arquitectura general, la terminología y los aspectos del protocolo, como los mensajes de solicitud y respuesta, los métodos, los códigos de estado, los campos de encabezado y finalizador, el contenido de los mensajes y mucho más. Cada versión individual de HTTP define cómo se transforma la semántica en un “formato para transmisión” (wire format) para el intercambio a través de Internet. Por ejemplo, un cliente tiene que serializar un mensaje de solicitud en datos binarios y enviarlo, y luego el servidor lo vuelve a analizar en un mensaje que pueda procesar.

HTTP/1.1 utiliza una forma textual de serialización. Los mensajes de solicitud y respuesta se intercambian como una secuencia de caracteres ASCII, enviados a través de una capa de transporte fiable como TCP y utiliza el siguiente formato (donde CRLF significa “retorno de carro” y “salto de línea”):

 HTTP-message   = start-line CRLF
                   *( field-line CRLF )
                   CRLF
                   [ message-body ]

Por ejemplo, una solicitud GET muy sencilla para https://blog.cloudflare.com/ tendría este aspecto en el formato para transmisión:

GET / HTTP/1.1 CRLFHost: blog.cloudflare.comCRLFCRLF

Y la respuesta sería la siguiente:

HTTP/1.1 200 OK CRLFServer: cloudflareCRLFContent-Length: 100CRLFtext/html; charset=UTF-8CRLFCRLF<100 bytes of data>

Este formato encapsula los mensajes en la transmisión, lo que significa que es posible utilizar una única conexión TCP para intercambiar varias solicitudes y respuestas. Sin embargo, el formato requiere que cada mensaje se envíe entero. Además, para correlacionar correctamente las solicitudes con las respuestas, se requiere un orden estricto, lo que significa que los mensajes se intercambian en serie y no se pueden multiplexar. Dos solicitudes GET, para https://blog.cloudflare.com/ y https://blog.cloudflare.com/page/2/,, serían:

GET / HTTP/1.1 CRLFHost: blog.cloudflare.comCRLFCRLFGET /page/2/ HTTP/1.1 CRLFHost: blog.cloudflare.comCRLFCRLF

Con las respuestas:

Las páginas web requieren interacciones HTTP más complicadas que estos ejemplos. Cuando visites el blog de Cloudflare, tu navegador cargará numerosos scripts, estilos y activos multimedia. Si visitas la página principal utilizando HTTP/1.1 y decides rápidamente ir a la página 2, tu navegador puede elegir entre dos opciones. O bien esperar todas las respuestas en cola para la página que ya no quieres antes de que la página 2 pueda siquiera comenzar, o bien cancelar las solicitudes abiertas cerrando la conexión TCP y abriendo una nueva conexión. Ninguna de estas opciones es muy práctica. Los navegadores tienden a sortear estas limitaciones gestionando un conjunto de conexiones TCP (hasta 6 por host) e implementando una compleja lógica de envío de solicitudes sobre el conjunto.

HTTP/2 aborda muchos de los problemas de HTTP/1.1. Cada mensaje HTTP se serializa en un conjunto de tramas HTTP/2 que tienen tipo, longitud, etiquetas, identificador (Id.) de secuencia y carga malintencionada. El identificador de secuencia deja claro qué bytes de la transmisión corresponden a cada mensaje, lo que permite una multiplexación y concurrencia seguras. Las secuencias son bidireccionales. Los clientes envían tramas y los servidores responden con tramas que utilizan el mismo Id.

En HTTP/2, nuestra solicitud GET de https://blog.cloudflare.comse intercambiaría a través del identificador de secuencia 1. El cliente enviaría una trama HEADERS, y el servidor respondería con una trama HEADERS, seguida de una o más tramas DATA. Las solicitudes del cliente siempre utilizan identificadores de secuencia impares, por lo que las solicitudes posteriores utilizarían identificadores de secuencia 3, 5, etc. Las respuestas se pueden servir en cualquier orden, y se pueden intercalar tramas de distintas secuencias.

La multiplexación de secuencias y la concurrencia son potentes funciones de HTTP/2. Permiten un uso más eficiente de una única conexión TCP. HTTP/2 optimiza la obtención de recursos, especialmente cuando se combina con la priorización. Por otro lado, facilitar a los clientes el lanzamiento de grandes cantidades de trabajo paralelo puede aumentar el pico de demanda de recursos del servidor en comparación con HTTP/1.1. Este es un vector obvio de denegación de servicio.

Para ofrecer protección, HTTP/2 proporciona una noción de secuencias concurrentes activas máximas. El parámetro SETTINGS_MAX_CONCURRENT_STREAMS permite a un servidor anunciar su límite de concurrencia. Por ejemplo, si el servidor declara un límite de 100, entonces solo pueden estar activas 100 solicitudes en cualquier momento. Si un cliente intenta abrir una secuencia por encima de este límite, el servidor la rechazará mediante una trama RST_STREAM. El rechazo de la secuencia no afecta a las demás secuencias en curso en la conexión.

La realidad es un poco más complicada. Las secuencias tienen un ciclo de vida. A continuación se muestra un diagrama de la máquina de estado de la secuencia HTTP/2. El cliente y el servidor gestionan sus propias vistas del estado de una secuencia. Las tramas HEADERS, DATA y RST_STREAM activan transiciones cuando se envían o reciben. Aunque las vistas del estado de la secuencia son independientes, están sincronizadas.

Las tramas HEADERS y DATA incluyen una etiqueta END_STREAM, que cuando se establece en el valor 1 (verdadero), puede activar una transición de estado.

Examinemos esto con un ejemplo de una solicitud GET que no tiene contenido de mensaje. El cliente envía la solicitud como una trama HEADERS con la etiqueta END_STREAM establecida en 1. El cliente primero pasa la secuencia del estado inactivo al estado abierto, y luego pasa inmediatamente al estado semicerrado. El estado semicerrado del cliente significa que ya no puede enviar tramas HEADERS ni DATA, solo tramas WINDOW_UPDATE, PRIORITY o RST_STREAM. Sin embargo, puede recibir cualquier trama.

Una vez que el servidor recibe y analiza la trama HEADERS, cambia el estado de la secuencia de inactivo a abierto y luego a semicerrado, para que coincida con el cliente. El estado semicerrado del servidor significa que puede enviar cualquier trama, pero solo recibir tramas WINDOW_UPDATE, PRIORITY o RST_STREAM.

La respuesta a la solicitud GET incluye el contenido del mensaje, por lo que el servidor envía una trama HEADERS con la etiqueta END_STREAM establecida en 0, y luego una trama DATA con la etiqueta END_STREAM establecida en 1. La trama DATA marca la transición de la secuencia de semicerrado a cerrado en el servidor. Cuando el cliente la recibe, también pasa a cerrado. Una vez se cierra una secuencia, no se pueden enviar ni recibir tramas.

Aplicando de nuevo este ciclo de vida al contexto de la concurrencia, HTTP/2 establece:

Las secuencias que están en estado “abierto” o en cualquiera de los estados “semicerrado” cuentan para el número máximo de secuencias que un punto final puede abrir. Las secuencias en cualquiera de estos tres estados cuentan para el límite anunciado en el ajuste SETTINGS_MAX_CONCURRENT_STREAMS.

En teoría, el límite de concurrencia es útil. Sin embargo, hay factores prácticos que dificultan su eficacia, de los que hablaremos más adelante en el blog.

Anulación de solicitudes HTTP/2

Antes hemos hablado de la anulación por parte del cliente de solicitudes en curso. El protocolo HTTP/2 admite esta función de una forma mucho más eficaz que HTTP/1.1. En lugar de tener que interrumpir toda la conexión, un cliente puede enviar una trama RST_STREAM para una única secuencia. Esta ventaja indica al servidor que deje de procesar la solicitud y anule la respuesta, lo que libera recursos del servidor y evita malgastar ancho de banda.

Consideremos nuestro ejemplo anterior de tres solicitudes. Esta vez el cliente anula la solicitud en la secuencia 1 después de que se hayan enviado todas las tramas HEADERS. El servidor analiza esta trama RST_STREAM antes de estar preparado para servir la respuesta y, en su lugar, solo responde a las secuencias 3 y 5:

La anulación de solicitudes es una función útil. Por ejemplo, al desplazarse por una página web con varias imágenes, un navegador web puede cancelar las imágenes que quedan fuera de la ventanilla, lo que significa que las imágenes que entran en ella pueden cargarse más rápido. El protocolo HTTP/2 hace que este comportamiento sea mucho más eficiente en comparación con HTTP/1.1.

Una secuencia de solicitud que se anula, pasa rápidamente por el ciclo de vida de la secuencia. La trama HEADERS del cliente con la etiqueta END_STREAM establecida en 1 pasa del estado de inactivo a abierto a semicerrado, luego RST_STREAM marca inmediatamente una transición de semicerrado a cerrado.

Recuerda que solo las secuencias que están en estado abierto o semicerrado contribuyen al límite de concurrencia de la secuencia. Cuando un cliente anula una secuencia, obtiene instantáneamente la capacidad de abrir otra en su lugar, y puede enviar otra solicitud inmediatamente. Este es el quid de la cuestión que permite el funcionamiento de CVE-2023-44487.

Restablecimientos rápidos que conducen a la denegación de servicio

Se puede abusar de la anulación de solicitudes HTTP/2 para restablecer rápidamente un número ilimitado de secuencias. Cuando un servidor HTTP/2 es capaz de procesar las tramas RST_STREAM enviadas por el cliente y cambiar el estado con suficiente rapidez, estos restablecimientos rápidos no causan ningún problema. Los problemas empiezan a surgir cuando se produce algún tipo de retraso o demora en la limpieza. El cliente puede hacer tantas solicitudes que se acumule trabajo atrasado, lo que se traduce en un consumo excesivo de recursos en el servidor.

Una arquitectura común de implementación HTTP consiste en ejecutar un proxy HTTP/2 o un equilibrador de carga delante de otros componentes. Cuando llega una solicitud de un cliente, se envía rápidamente y el trabajo real se realiza como una actividad asíncrona en otro lugar. Esta operación permite al proxy gestionar el tráfico de clientes de forma muy eficiente. Sin embargo, esta separación de preocupaciones puede dificultar que el proxy ordene los trabajos en proceso. Por lo tanto, estas implementaciones son más propensas a tropezar con problemas derivados de los restablecimientos rápidos.

Cuando los proxies inversos de Cloudflare procesan el tráfico entrante de clientes HTTP/2, copian los datos del socket de la conexión en un búfer y procesan esos datos almacenados en búfer en orden. A medida que se lee cada solicitud (tramas HEADERS y DATA), se envía a un servicio ascendente. Cuando se leen tramas RST_STREAM, se elimina el estado local de la solicitud y se notifica al servicio ascendente que la solicitud se ha anulado. Todo el proceso se repite hasta que se consuma todo el búfer. Sin embargo, se puede abusar de esta lógica. Si un cliente malintencionado empieza a enviar una enorme cadena de solicitudes y restablecimientos al inicio de una conexión, nuestros servidores las leerán todas con impaciencia y añadirán tensión a los servidores ascendentes hasta el punto de ser incapaces de procesar ninguna nueva solicitud entrante.

Algo que es importante destacar es que la concurrencia de secuencias por sí sola no puede mitigar el restablecimiento rápido. El cliente puede editar las solicitudes para crear altas tasas de solicitudes, independientemente del valor SETTINGS_MAX_CONCURRENT_STREAMS elegido por el servidor.

Análisis exhaustivo de Rapid Reset

A continuación, mostramos un ejemplo de restablecimiento rápido reproducido utilizando un cliente de prueba de concepto que intenta enviar un total de 1000 solicitudes. He utilizado un servidor estándar sin ningún tipo de medida de mitigación, que escucha en el puerto 443 en un entorno de prueba. El tráfico se examina utilizando Wireshark y se filtra para que solo muestre el tráfico HTTP/2 para mayor claridad. Descarga la interfaz pcap para ver explicación.

Es un poco difícil de ver, porque hay muchas tramas. Podemos ver un resumen rápido con la herramienta Estadísticas > HTTP2 de Wireshark:

La primera trama de este rastreo, en el paquete 14, es la trama SETTINGS del servidor, que anuncia una concurrencia de secuencia máxima de 100. En el paquete 15, el cliente envía unas cuantas tramas de control y luego empieza a hacer solicitudes que se restablecen rápidamente. La primera trama HEADERS tiene 26 bytes de longitud, todas las tramas HEADERS posteriores tienen solo 9 bytes. Esta diferencia de tamaño se debe a una tecnología de compresión llamada HPACK. En total, el paquete 15 contiene 525 solicitudes, que llegan hasta la secuencia 1051.

Curiosamente, la trama RST_STREAM para la secuencia 1051 no cabe en el paquete 15, por lo que en el paquete 16 vemos que el servidor envía una respuesta 404. A continuación, en el paquete 17, el cliente sí envía la trama RST_STREAM, antes de pasar a enviar las 475 solicitudes restantes.

Observa que, aunque el servidor anunciaba 100 secuencias simultáneas, los dos paquetes enviados por el cliente enviaban muchas más tramas HEADERS. El cliente no tenía que esperar ningún tráfico de retorno del servidor, solo estaba limitado por el tamaño de los paquetes que podía enviar. No se ven tramas RST_STREAM del servidor en este rastreo, lo que indica que el servidor no observó una violación de secuencia concurrente.

Impacto en los clientes

Como se ha mencionado anteriormente, a medida que se anulan las solicitudes, los servicios ascendentes reciben una notificación y pueden cancelar las solicitudes antes de gastar demasiados recursos en ello. Así fue este ataque, en el que la mayoría de las solicitudes maliciosas nunca se reenviaron a los servidores de origen. Sin embargo, el gran tamaño de estos ataques causó cierto impacto.

En primer lugar, cuando la tasa de solicitudes entrantes alcanzó picos nunca vistos, recibimos informes de un aumento de los niveles de errores 502 que recibían los clientes. Esto ocurrió en nuestros centros de datos más afectados, mientras trabajaban para procesar todas las solicitudes. Aunque nuestra red está diseñada para hacer frente a grandes ataques, esta vulnerabilidad concreta expuso un punto débil de nuestra infraestructura. Profundicemos un poco más en los detalles, centrándonos en cómo se gestionan las solicitudes entrantes cuando llegan a uno de nuestros centros de datos:

Podemos ver que nuestra infraestructura está compuesta por una cadena de diferentes servidores proxy con distintas responsabilidades. En concreto, cuando un cliente se conecta a Cloudflare para enviar tráfico HTTPS, primero llega a nuestro proxy de descifrado TLS: descifra el tráfico TLS, procesa el tráfico HTTP 1, 2 o 3, y luego lo reenvía a nuestro proxy de “lógica empresarial”. Este se encarga de cargar todas las configuraciones para cada cliente, luego enruta las solicitudes correctamente a otros servicios ascendentes, y lo que es más importante en nuestro caso, también se encarga de las funciones de seguridad. Aquí es donde se procesa la mitigación del ataque a la capa 7.

El problema de este vector de ataque es que consigue enviar muchas solicitudes muy rápido en cada conexión. Cada una de ellas tenía que reenviarse al proxy de lógica empresarial antes de que tuviéramos la oportunidad de bloquearla. A medida que el procesamiento de solicitudes superaba la capacidad de nuestro proxy, la canalización que conecta estos dos servicios alcanzó su nivel de saturación en algunos de nuestros servidores.

Cuando esto ocurre, el proxy TLS ya no se puede conectar a su proxy ascendente, por eso algunos clientes recibieron un error básico “502 Bad Gateway” durante los ataques más graves. Es importante señalar que, a partir de hoy, los registros utilizados para crear análisis HTTP también se emiten por nuestro proxy de lógica empresarial. La consecuencia de ello es que estos errores no son visibles en el panel de control de Cloudflare. Nuestros paneles de control internos muestran que alrededor del 1 % de las solicitudes se vieron afectadas durante la oleada inicial de ataques (antes de que implementáramos las medidas de mitigación). En ese momento, se alcanzaron picos de alrededor del 12 % durante unos segundos en el ataque más grave, ocurrido el 29 de agosto. El siguiente gráfico muestra la relación de estos errores durante dos horas mientras ocurría el ataque:

Trabajamos para reducir este número drásticamente en los días siguientes, como se detalla más adelante en este artículo. Gracias a los cambios en nuestra pila y a nuestras medidas de mitigación, que reducen considerablemente el tamaño de estos ataques, este número es, en la práctica, nulo hoy día.

Errores 499 y los desafíos para la concurrencia de secuencias HTTP/2

Otro síntoma notificado por algunos clientes es un aumento de los errores 499. La razón de este inconveniente es un poco diferente y está relacionada con la máxima concurrencia de secuencias en una conexión HTTP/2 detallada anteriormente en este artículo.

Los parámetros de HTTP/2 se intercambian al inicio de una conexión mediante tramas SETTINGS. Si no se recibe un parámetro explícito, se aplican los valores por defecto. Una vez que un cliente establece una conexión HTTP/2, puede esperar las tramas SETTINGS de un servidor (lento) o puede asumir los valores por defecto y empezar a hacer solicitudes (rápido). Para el ajuste SETTINGS_MAX_CONCURRENT_STREAMS, el valor por defecto es ilimitado en la práctica (los identificadores de secuencia utilizan un espacio numérico de 31 bits, y las solicitudes utilizan números impares, por lo que el límite real es 1073741824). La especificación recomienda que un servidor no ofrezca menos de 100 secuencias. Los clientes se suelen inclinar por la velocidad, por lo que no suelen esperar a la configuración del servidor, lo que crea una especie de condición de anticipación. Los clientes arriesgan en cuanto al límite que puede elegir el servidor. Si se equivocan, la solicitud será rechazada y habrá que volver a intentar. El riesgo en las secuencias 1073741824 es ridículo. En su lugar, muchos clientes deciden limitarse a emitir 100 secuencias simultáneas, con la esperanza de que los servidores sigan la recomendación de la especificación. Cuando los servidores eligen algo por debajo de 100, esta apuesta del cliente falla y las secuencias se reinician.

Hay muchas razones por las que un servidor puede restablecer una secuencia más allá de la superación del límite de concurrencia. HTTP/2 es estricto y exige que se cierre una secuencia cuando se produzcan errores de análisis sintáctico o lógicos. En 2019, Cloudflare desarrolló varias medidas de mitigación en respuesta a las vulnerabilidades DoS de HTTP/2. Varias de esas vulnerabilidades obedecían a un mal comportamiento del cliente, que llevaba al servidor a reiniciar una secuencia. Una estrategia muy eficaz para restringir a esos clientes consiste en contar el número de restablecimientos del servidor durante una conexión y, cuando supera algún valor umbral, cerrar la conexión con una trama GOAWAY. Los clientes legítimos pueden cometer uno o dos errores en una conexión y eso es aceptable. Un cliente que cometa demasiados errores probablemente tenga un problema o sea malintencionado, por lo que el cierre de la conexión aborda ambos casos.

Al responder a los ataques DoS habilitados por CVE-2023-44487, Cloudflare redujo la concurrencia máxima de secuencias a 64. Antes de realizar este cambio, no éramos conscientes de que los clientes no esperan a la trama SETTINGS y, en su lugar, asumen una concurrencia de 100. En efecto, algunas páginas web, como una galería de imágenes, hacen que un navegador envíe 100 solicitudes inmediatamente al inicio de una conexión. Por desgracia, las 36 secuencias que superaban nuestro límite necesitaban restablecerse, lo que activó nuestras medidas de mitigación de recuento. Esta operación implicaba cerrar las conexiones de los clientes legítimos, lo que provocaba un fallo total en la carga de la página. En cuanto nos dimos cuenta de este problema de interoperabilidad, cambiamos la concurrencia máxima de secuencias a 100.

Respuesta de Cloudflare

En 2019 se revelaron varias vulnerabilidades DoS relacionadas con implementaciones de HTTP/2. Cloudflare desarrolló e implementó una serie de detecciones y medidas de mitigación en respuesta. CVE-2023-44487 es una manifestación diferente de la vulnerabilidad HTTP/2. Sin embargo, para mitigarla pudimos ampliar las protecciones existentes para supervisar las tramas RST_STREAM enviadas por el cliente y cerrar las conexiones cuando se utilizan con fines abusivos. Los usos legítimos de RST_STREAM por parte del cliente no se ven afectados.

Además de una solución directa, hemos implementado varias mejoras en el código de procesamiento de tramas y envío de solicitudes HTTP/2 del servidor. Además, hemos mejorado las colas y la programación del servidor de lógica empresarial para reducir el trabajo innecesario y mejorar la capacidad de respuesta de la anulación. En conjunto, estos avances disminuyen el impacto de varios patrones potenciales de abuso, además de dar más espacio al servidor para procesar las solicitudes antes de que pueda saturarse.

Mitigación previa de los ataques

Cloudflare ya disponía de sistemas para mitigar eficazmente los ataques muy grandes con métodos menos costosos. Uno de ellos se llama “IP Jail”. En los ataques hipervolumétricos, este sistema recoge las direcciones IP de los clientes que participan en el ataque e impide que se conecten a la propiedad que es objeto de ataque, bien a nivel de IP, bien en nuestro proxy TLS. Sin embargo, este sistema necesita unos segundos para ser plenamente eficaz. Durante estos preciados segundos, los servidores de origen ya están protegidos, pero nuestra infraestructura aún tiene que aceptar todas las solicitudes HTTP. Como esta nueva botnet no tiene un periodo de inicialización en la práctica, necesitamos poder neutralizar los ataques antes de que se conviertan en un problema.

Para conseguirlo, hemos ampliado el sistema de IP Jail para proteger toda nuestra infraestructura. Una vez que se “bloquea” una dirección IP, no solo se bloquea su conexión a la propiedad que está siendo blanco de ataque, sino que también prohibimos que las direcciones IP correspondientes utilicen HTTP/2 a cualquier otro dominio en Cloudflare durante un tiempo. Como tales abusos de protocolo no son posibles utilizando HTTP/1.x, este enfoque limita la capacidad del atacante para ejecutar grandes ataques, mientras que cualquier cliente legítimo que comparta la misma dirección IP solo percibirá un leve impacto en rendimiento durante ese tiempo. Las medidas de mitigación basadas en la IP son una herramienta deficiente, por eso debemos ser muy prudentes al utilizarlas a esa escala, y tratar de evitar los falsos positivos en la medida de lo posible. Además, la vida útil de una IP determinada en una botnet suele ser corta, por lo que cualquier medida de mitigación a largo plazo probablemente hará más mal que bien. El siguiente gráfico muestra la rotación de direcciones IP en los ataques que presenciamos:

Como podemos observar, muchas direcciones IP nuevas detectadas en un día determinado desaparecen muy rápido después.

Como todas estas acciones ocurren en nuestro proxy TLS al principio de nuestra canalización HTTPS, se ahorran considerables recursos en comparación con nuestro sistema de mitigación de capa 7 habitual. Esto nos ha permitido aguantar estos ataques mucho mejor y ahora el número de errores 502 aleatorios causados por estas botnets se ha reducido a cero.

Mejoras en la observabilidad

Otro frente en el que estamos implementando cambios es la observabilidad. La devolución de errores a los clientes sin que sean visibles en los análisis de los clientes es insatisfactorio. Afortunadamente, hay un proyecto en marcha para revisar estos sistemas desde mucho antes de estos ataques. Con el tiempo, permitirá que cada servicio de nuestra infraestructura registre sus propios datos, en lugar de depender de nuestro proxy de lógica empresarial para consolidar y emitir datos de registro. Este incidente subrayó la importancia de este trabajo, y estamos redoblando nuestros esfuerzos.

También estamos trabajando en mejorar el registro a nivel de conexión para que nos permita detectar estos abusos de protocolo mucho más rápido y así mejorar nuestras capacidades de mitigación de ataques DDoS.

Conclusión

Aunque este ha sido el último ataque que ha batido récords, sabemos que no será el último. Conforme los ataques se vuelven más sofisticados, Cloudflare trabaja sin descanso para identificar proactivamente nuevas amenazas, implementando contramedidas en nuestra red global para que nuestros millones de clientes estén protegidos de forma inmediata y automática.

Cloudflare ofrece protección DDoS gratuita e ilimitada a todos nuestros clientes desde 2017. Además, ofrecemos una serie de funciones de seguridad adicionales que se adaptan a las necesidades de organizaciones de todos los tamaños. Ponte en contacto con nosotros si no estás seguro de estar protegido o quieres saber cómo puedes estarlo].

Alle Einzelheiten über HTTP/2 Rapid Reset

2023-10-10 Lucas Pardue

Post Syndicated from Lucas Pardue original http://blog.cloudflare.com/de-de/technical-breakdown-http2-rapid-reset-ddos-attack-de-de/

Am 25. August 2023 begannen wir, ungewöhnlich große HTTP-Angriffe auf viele unserer Kunden zu bemerken. Diese Angriffe wurden von unserem automatischen DDoS-System erkannt und abgewehrt. Es dauerte jedoch nicht lange, bis sie rekordverdächtige Ausmaße annahmen und schließlich einen Spitzenwert von knapp über 201 Millionen Anfragen pro Sekunde erreichten. Damit waren sie fast dreimal so groß wie der bis zu diesem Zeitpunkt größte Angriff, den wir jemals verzeichnet hatten.

Sie werden angegriffen oder benötigen zusätzlichen Schutz? Klicken Sie hier, um Hilfe zu erhalten.

Besorgniserregend ist die Tatsache, dass der Angreifer in der Lage war, einen solchen Angriff mit einem Botnetz von lediglich 20.000 Rechnern durchzuführen. Es gibt heute Botnetze, die aus Hunderttausenden oder Millionen von Rechnern bestehen. Bedenkt man, dass das gesamte Web in der Regel nur zwischen 1 bis 3 Milliarden Anfragen pro Sekunde verzeichnet, ist es nicht unvorstellbar, dass sich mit dieser Methode quasi die Anzahl aller Anfragen im Internet auf eine kleine Reihe von Zielen konzentrieren ließe.

Erkennen und Abwehren

Dies war ein neuartiger Angriffsvektor in einem noch nie dagewesenen Ausmaß, aber die bestehenden Schutzmechanismen von Cloudflare konnten die Wucht der Angriffe weitgehend bewältigen. Zunächst sahen wir einige Auswirkungen auf den Traffic unserer Kunden – etwa 1 % der Anfragen waren während der ersten Angriffswelle betroffen –, doch heute konnten wir unsere Abwehrmethoden so verfeinern, dass der Angriff für jeden Cloudflare-Kunden gestoppt werden konnte, ohne dass er unsere Systeme beeinträchtigte.

Wir haben diese Angriffe zur gleichen Zeit bemerkt, als zwei andere große Branchenakteure – Google und AWS – dasselbe erlebten. Wir haben daran gearbeitet, die Systeme von Cloudflare zu verstärken, um sicherzustellen, dass alle unsere Kunden heute vor dieser neuen DDoS-Angriffsmethode geschützt sind, ohne dass es Auswirkungen auf die Kunden gibt. Außerdem haben wir gemeinsam mit Google und AWS an einer koordinierten Offenlegung des Angriffs gegenüber den betroffenen Anbietern und Betreibern kritischer Infrastrukturen mitgewirkt.

Dieser Angriff wurde durch den Missbrauch einiger Funktionen des HTTP/2-Protokolls und von Details der Server-Implementierung ermöglicht (siehe CVE-2023-44487 für Details). Da der Angriff eine zugrundeliegende Sicherheitslücke im HTTP/2-Protokoll ausnutzt, glauben wir, dass jeder Anbieter, der HTTP/2 implementiert hat, dem Angriff ausgesetzt ist. Dazu gehört jeder moderne Webserver. Gemeinsam mit Google und AWS haben wir die Angriffsmethode den Anbietern von Webservern offengelegt, von denen wir erwarten, dass sie Patches implementieren werden. Die beste Schutzmaßnahme ist einstweilen die Verwendung eines DDoS-Abwehrdienstes wie Cloudflare vor jedem Web- oder API-Server, der mit dem Internet verbunden ist.

Dieser Beitrag befasst sich mit allen Einzelheiten zum HTTP/2-Protokoll, der Funktion, die Angreifer ausnutzen, um diese massiven Angriffe zu generieren, und den Abwehrstrategien, die wir ergriffen haben, um sicherzustellen, dass alle unsere Kunden geschützt sind. Wir hoffen, dass durch die Bekanntgabe dieser Details andere betroffene Webserver und -dienste die Informationen erhalten, die sie benötigen, um Abwehrstrategien zum Schutz vor dieser Sicherheitslücke zu implementieren. Darüber hinaus können das Team für die HTTP/2-Protokollstandards sowie die Teams, die an künftigen Webstandards arbeiten, diese besser gestalten, um solche Angriffe zu verhindern.

Nähere Einzelheiten zum RST-Angriff

HTTP ist das Anwendungsprotokoll, auf dem das Web basiert. HTTP Semantics ist allen Versionen von HTTP gemeinsam – die Gesamtarchitektur, Terminologie und Protokollaspekte wie Anfrage- und Antwortnachrichten, Methoden, Statuscodes, Header- und Trailer-Felder, Nachrichteninhalte und vieles mehr. Jede einzelne HTTP-Version definiert, wie die Semantik in ein „Austauschformat“ („wire format“) für den Austausch über das Internet umgewandelt wird. So muss ein Client beispielsweise eine Anfragenachricht in binäre Daten serialisieren und senden, die dann vom Server wieder in eine verarbeitbare Nachricht umgewandelt werden.

HTTP/1.1 verwendet eine textuelle Form der Serialisierung. Anfrage- und Antwortnachrichten werden als Strom von ASCII-Zeichen ausgetauscht, die über eine zuverlässige Transportebene wie TCP unter Verwendung des folgenden Formats (wobei CRLF für Carriage-Return und Linefeed steht) gesendet werden:

 HTTP-message   = start-line CRLF
                   *( field-line CRLF )
                   CRLF
                   [ message-body ]

Eine sehr einfache GET-Anfrage für https://blog.cloudflare.com/ würde beim Austausch zum Beispiel so aussehen:

GET / HTTP/1.1 CRLFHost: blog.cloudflare.comCRLFCRLF

Und die Antwort würde so aussehen:

HTTP/1.1 200 OK CRLFServer: cloudflareCRLFContent-Length: 100CRLFtext/html; charset=UTF-8CRLFCRLF<100 bytes of data>

Dieses Format rahmt (Frame)Nachrichten bei der Übertragung ein, was bedeutet, dass es möglich ist, eine einzige TCP-Verbindung für den Austausch mehrerer Anfragen und Antworten zu verwenden. Das Format erfordert jedoch, dass jede Nachricht als Ganzes gesendet wird. Außerdem ist für die korrekte Zuordnung von Anfragen und Antworten eine strikte Reihenfolge erforderlich, d. h. die Nachrichten werden seriell ausgetauscht und können nicht gemultiplext werden. Zwei GET-Anfragen für https://blog.cloudflare.com/ und https://blog.cloudflare.com/page/2/, wären:

GET / HTTP/1.1 CRLFHost: blog.cloudflare.comCRLFCRLFGET /page/2/ HTTP/1.1 CRLFHost: blog.cloudflare.comCRLFCRLF

Mit den Antworten:

Webseiten erfordern kompliziertere HTTP-Interaktionen als diese Beispiele. Wenn Sie den Cloudflare-Blog besuchen, lädt Ihr Browser mehrere Skripte, Stile und Medieninhalte. Wenn Sie den Cloudflare-Blog besuchen, lädt Ihr Browser mehrere Skripte, Stile und Medieninhalte. Entweder muss man alle in der Warteschlange befindlichen Antworten für die nicht mehr gewünschte Seite abwarten, bevor Seite 2 überhaupt starten kann, oder man bricht laufende Anfragen ab, indem man die TCP-Verbindung schließt und eine neue Verbindung öffnet. Beides ist nicht gerade zweckmäßig. Browser neigen dazu, diese Beschränkungen zu umgehen, indem sie einen Pool von TCP-Verbindungen (bis zu 6 pro Host) verwalten und eine komplexe Logik für den Versand von Anfragen über den Pool implementieren.

HTTP/2 behebt viele der Probleme mit HTTP/1.1. Jede HTTP-Nachricht wird in einen Satz von HTTP/2-Frames serialisiert, die Typ, Länge, Flags, Stream Identifier (ID) und Payload haben. Die Stream-ID macht deutlich, welche Bytes bei der Übertragung zu welcher Nachricht gehören, was sicheres Multiplexing und Gleichzeitigkeit ermöglicht. Streams sind bidirektional. Clients senden Frames und Server antworten mit Frames, die dieselbe ID verwenden.

In HTTP/2 würde unsere GET-Anfrage für https://blog.cloudflare.com über Stream-ID 1 ausgetauscht, wobei der Client einen HEADERS-Frame sendet und der Server mit einem HEADERS-Frame antwortet, gefolgt von einem oder mehreren DATA-Frames. Client-Anfragen verwenden immer ungerade Stream-IDs, sodass nachfolgende Anfragen die Stream-ID 3, 5 usw. verwenden würden. Die Antworten können in beliebiger Reihenfolge gesendet werden, und Frames aus verschiedenen Streams können ineinander verschachtelt werden.

Stream-Multiplexing und Gleichzeitigkeit sind leistungsstarke Funktionen von HTTP/2. Sie ermöglichen eine effizientere Nutzung einer einzigen TCP-Verbindung. HTTP/2 optimiert den Abruf von Ressourcen, insbesondere in Verbindung mit der Priorisierung. Andererseits kann die Erleichterung für Clients, große Mengen an paralleler Arbeit zu starten, den Spitzenbedarf an Serverressourcen im Vergleich zu HTTP/1.1 erhöhen. Dies ist ein naheliegender Vektor für Denial-of-Service.

Um einige Leitlinien bereitzustellen, bietet HTTP/2 einen Begriff für maximal aktive gleichzeitige Streams. Der Parameter SETTINGS_MAX_CONCURRENT_STREAMS ermöglicht es einem Server, sein Limit für die Gleichzeitigkeit bekannt zu geben. Wenn der Server beispielsweise ein Limit von 100 angibt, können zu jedem Zeitpunkt nur 100 Anfragen aktiv sein. Versucht ein Client, einen Stream oberhalb dieser Grenze zu öffnen, muss er vom Server mit einem RST_STREAM-Frame abgelehnt werden. Die Ablehnung eines Streams hat keine Auswirkungen auf die anderen Streams, die sich in der Verbindung befinden.

In Wahrheit ist die Sache ein wenig komplizierter. Streams haben einen Lebenszyklus. Unten sehen Sie ein Diagramm des HTTP/2-Stream-Zustandsrechner. Client und Server verwalten ihre eigenen Ansichten über den Zustand bzw. Status eines Streams. HEADERS-, DATA- und RST_STREAM-Frames lösen Übergänge aus, wenn sie gesendet oder empfangen werden. Obwohl die Ansichten des Stream-Zustands unabhängig sind, werden sie synchronisiert.

HEADERS- und DATA-Frames enthalten ein END_STREAM-Flag, das, wenn es auf den Wert 1 (true) gesetzt ist, einen Übergang des Zustands auslösen kann.

Lassen Sie uns dies anhand eines Beispiels für eine GET-Anfrage ohne Nachrichteninhalt durchgehen. Der Client sendet die Anfrage als HEADERS-Frame, wobei das END_STREAM-Flag auf 1 gesetzt ist. Der Client überführt den Stream zunächst vom Zustand „idle“ in den Zustand „open“ und geht dann sofort in den Zustand „half-closed“ über. Hat der Client den Zustand „half-closed“, bedeutet dies, dass er keine HEADERS oder DATA mehr senden kann, sondern nur noch WINDOW_UPDATE-, PRIORITY- oder RST_STREAM-Frames. Er kann jedoch jeden Frame empfangen.

Sobald der Server den HEADERS-Frame empfängt und analysiert, ändert er den Stream-Zustand von „idle“ zu „open“ und dann zu „half-closed“, damit er mit dem Client übereinstimmt. Der Zustand „half-closed“ bedeutet, dass der Server jeden Frame senden kann, aber nur WINDOW_UPDATE-, PRIORITY- oder RST_STREAM-Frames empfangen kann.

Die Antwort auf die GET-Anfrage enthält Nachrichteninhalte, daher sendet der Server HEADERS mit dem END_STREAM-Flag auf 0, dann DATA mit dem END_STREAM-Flag auf 1. Der DATA-Frame löst auf dem Server den Übergang des Streams von „half-closed“ auf „closed“ aus. Wenn der Client ihn empfängt, geht er ebenfalls in den Zustand „closed“ über. Sobald ein Stream geschlossen ist, können keine Frames mehr gesendet oder empfangen werden.

Wenn man diesen Lebenszyklus zurück in den Kontext der Gleichzeitigkeit überträgt, stellt HTTP/2 fest:

Streams, die sich im Zustand „open“ oder im Zustand „half-closed“ befinden, werden auf die maximale Anzahl von Streams angerechnet, die ein Endpunkt öffnen darf. Streams, die sich in einem dieser drei Zustände befinden, werden auf das in der Einstellung SETTINGS_MAX_CONCURRENT_STREAMS angegebene Limit angerechnet.

Theoretisch ist das Limit für die Gleichzeitigkeit nützlich. Es gibt jedoch praktische Faktoren, die seine Wirksamkeit beeinträchtigen, worauf wir später in diesem Blog eingehen werden.

Widerruf einer HTTP/2-Anfrage

Vorhin haben wir über den Widerruf von gerade in Bearbeitung befindlichen Client-Anfragen gesprochen. HTTP/2 unterstützt dies auf wesentlich effizientere Weise als HTTP/1.1. Anstatt die gesamte Verbindung zu unterbrechen, kann ein Client einen RST_STREAM-Frame für einen einzelnen Stream senden. Dadurch wird der Server angewiesen, die Bearbeitung der Anfrage zu beenden und die Antwort abzubrechen, wodurch Serverressourcen frei werden und keine Bandbreite verschwendet wird.

Betrachten wir unser vorheriges Beispiel mit 3 Anfragen. Dieses Mal widerruft der Client die Anfrage auf Stream 1, nachdem alle HEADERS gesendet wurden. Der Server analysiert diesen RST_STREAM-Frame, bevor er bereit ist, die Antwort zu übermitteln, und antwortet stattdessen nur auf Stream 3 und 5:

Der Widerruf von Anfragen ist eine nützliche Funktion. Beim Scrollen einer Webseite mit mehreren Bildern kann ein Webbrowser beispielsweise Bilder, die außerhalb des Sichtfensters liegen, löschen, sodass Bilder, die in das Sichtfenster gelangen, schneller geladen werden können. HTTP/2 macht dieses Verhalten im Vergleich zu HTTP/1.1 wesentlich effizienter.

Ein widerrufener Anfrage-Stream durchläuft den Lebenszyklus des Streams sehr schnell. Die HEADERS des Clients mit dem auf 1 gesetzten END_STREAM-Flag wechseln den Zustand von „idle“ zu „open“ zu „half-closed“, dann bewirkt RST_STREAM sofort einen Übergang von „half-closed“ zu „closed“.

Erinnern Sie sich, dass nur Streams, die sich im „open“ oder „half-closed“ Zustand befinden, auf das Limit für die Gleichzeitigkeit von Streams angerechnet werden. Wenn ein Client einen Stream abbricht, erhält er sofort die Möglichkeit, an dessen Stelle einen anderen Stream zu öffnen und kann sofort eine weitere Anfrage senden. Genau darum funktioniert CVE-2023-44487.

Schnelles Reset führt zu Denial of Service

Der Widerruf von HTTP/2-Anfragen kann dazu missbraucht werden, eine unbegrenzte Anzahl von Streams schnell zurückzusetzen. Wenn ein HTTP/2-Server in der Lage ist, vom Client gesendete RST_STREAM-Frames zu verarbeiten und den Zustand schnell genug abzubauen, stellen solche schnellen Resets kein Problem dar. Problematisch wird es dann, wenn es bei den Aufräumarbeiten zu Verzögerungen oder Resets kommt.Der Client kann so viele Anfragen stellen, dass sich ein Rückstau bildet, der zu einem übermäßigen Ressourcenverbrauch auf dem Server führt.

Eine gängige HTTP-Bereitstellungsarchitektur besteht darin, einen HTTP/2-Proxy oder Load-Balancer vor anderen Komponenten zu betreiben. Wenn eine Client-Anfrage eintrifft, wird sie schnell abgewickelt und die eigentliche Arbeit wird als asynchrone Aktivität an anderer Stelle erledigt. So kann der Proxy den Client-Traffic sehr effizient verarbeiten. Diese Trennung kann es dem Proxy jedoch erschweren, die in Bearbeitung befindlichen Aufträge aufzuräumen. Daher ist es bei diesen Bereitstellungen wahrscheinlicher, dass es zu Problemen durch schnelle Resets kommt.

Wenn die Reverse-Proxies von Cloudflare eingehenden HTTP/2-Client-Traffic verarbeiten, kopieren sie die Daten aus dem Socket der Verbindung in einen Puffer und verarbeiten diese gepufferten Daten der Reihe nach. Beim Lesen jeder Anfrage (HEADERS- und DATA-Frames) wird diese an einen Upstream-Service weitergeleitet. Wenn RST_STREAM-Frames gelesen werden, wird der lokale Zustand für die Anfrage abgebaut und der vorgelagerte Dienst wird benachrichtigt, dass die Anfrage abgebrochen wurde. Dieser Vorgang wird so lange wiederholt, bis der gesamte Puffer verbraucht ist. Diese Logik kann jedoch missbraucht werden: Wenn ein böswilliger Client eine enorme Kette von Anfragen und Resets zu Beginn einer Verbindung sendet, würden unsere Server sie alle eifrig lesen und die vorgelagerten Server so stark belasten, dass sie keine neuen eingehenden Anfragen mehr verarbeiten können.

Es ist wichtig hervorzuheben, dass die Gleichzeitigkeit von Streams allein das schnelle Reset nicht abwehren kann. Der Client kann Anfragen abwälzen, um hohe Anfrageraten zu erzeugen, unabhängig von dem vom Server gewählten Wert von SETTINGS_MAX_CONCURRENT_STREAMS.

Rapid Reset genau analysiert

Hier ein Beispiel für das Reset anhand eines Proof-of-Concept-Clients, der versucht, insgesamt 1000 Anfragen zu stellen. Ich habe einen handelsüblichen Server ohne jegliche Abwehrmechanismen verwendet, der in einer Testumgebung auf Port 443 lauscht. Der Traffic wurde mit Wireshark analysiert und gefiltert, um nur HTTP/2-Traffic zu zeigen. Jetzt pcap herunterladen, um dem Vorgang zu folgen.

Das ist ein bisschen schwierig zu sehen, weil es viele Bilder gibt. Mit dem Wireshark-Tool Statistik > HTTP2 erhalten wir einen schnellen Überblick:

Der erste Frame in dieser Aufzeichnung, in Paket 14, ist der SETTINGS-Frame des Servers, der eine maximale Anzahl an gleichzeitigen Stream von 100 angibt. In Paket 15 sendet der Client einige Kontrollframes und beginnt dann mit Anfragen, die schnell zurückgesetzt werden. Der erste HEADERS-Frame ist 26 Byte lang, alle folgenden HEADERS sind nur 9 Byte lang. Dieser Größenunterschied ist auf eine Komprimierungstechnologie namens HPACK. zurückzuführen. Insgesamt enthält Paket 15 dabei 525 Anfragen, die bis zum Stream 1051 reichen.

Interessanterweise passt der RST_STREAM für Stream 1051 nicht in Paket 15, so dass der Server in Paket 16 mit einer 404-Antwort antwortet. In Paket 17 sendet der Client dann das RST_STREAM, bevor er mit dem Senden der restlichen 475 Anfragen fortfährt.

Beachten Sie, dass der Server zwar 100 gleichzeitige Streams ankündigte, die beiden vom Client gesendeten Pakete jedoch viel mehr HEADERS-Frames als diese Zahl enthielten. Der Client musste nicht auf den Antwort-Traffic des Servers warten, er war lediglich durch die Größe der Pakete begrenzt, die er senden konnte. In dieser Aufzeichnung sind keine RST_STREAM-Frames des Servers zu sehen, was darauf hindeutet, dass der Server keinen Verstoß gegen das Limit der Gleichzeitigkeit von Streams festgestellt hat.

Auswirkungen auf Kunden

Wie bereits erwähnt, werden vorgelagerte Dienste benachrichtigt, wenn Anfragen abgebrochen werden, und können diese abbrechen, bevor sie zu viele Ressourcen dafür verschwenden. Dies war bei diesem Angriff der Fall, bei dem die meisten bösartigen Anfragen nie an die Ursprungsserver weitergeleitet wurden. Die schiere Größe dieser Angriffe hatte jedoch einige Auswirkungen.

Erstens erreichten die eingehenden Anfragen höhere Spitzenwerte als jemals zuvor, und die Clients meldeten vermehrt 502-Fehler. Dies geschah in unseren am stärksten betroffenen Rechenzentren, da sie Mühe hatten, alle Anfragen zu verarbeiten. Unser Netzwerk ist zwar für große Angriffe ausgelegt, aber diese spezielle Sicherheitslücke deckte eine Schwäche in unserer Infrastruktur auf. Werfen wir einen genaueren Blick auf die Details, wobei wir uns darauf konzentrieren, wie eingehende Anfragen verarbeitet werden, wenn sie eines unserer Rechenzentren erreichen:

Wir sehen, dass unsere Infrastruktur aus einer Kette verschiedener Proxy-Server mit unterschiedlichen Zuständigkeiten besteht. Wenn sich ein Client mit Cloudflare verbindet, um HTTPS-Traffic zu senden, trifft er zunächst auf unseren TLS-Entschlüsselungs-Proxy: Er entschlüsselt den TLS-Traffic, verarbeitet den HTTP-1-, -2- oder -3-Traffic und leitet ihn dann an unseren Proxy für die „Geschäftslogik“ weiter. Dieser ist dafür zuständig, alle Einstellungen für jeden Kunden zu laden und dann die Anfragen korrekt an andere vorgelagerte Dienste weiterzuleiten – und, was in unserem Fall noch wichtiger ist, er ist auch für die Sicherheitsfunktionen zuständig. Hier wird die L7-Angriffsabwehr abgewickelt.

Das Problem bei diesem Angriffsvektor ist, dass er bei jeder einzelnen Verbindung sehr schnell sehr viele Anfragen senden kann. Jede dieser Anfragen musste an den Proxy für die Geschäftslogik weitergeleitet werden, bevor wir die Möglichkeit hatten, sie zu blockieren. Da der Anfragedurchsatz unsere Proxy-Kapazität überstieg, erreichte die Verbindungsleitung zwischen diesen beiden Diensten bei einigen unserer Server ihre Belastungsgrenze.

Wenn dies geschieht, kann der TLS-Proxy keine Verbindung mehr zu seinem vorgeschalteten Proxy herstellen, weshalb einige Clients bei den schwerwiegendsten Angriffen eine einfache „502 Bad Gateway“-Fehlermeldung erhielten. Es ist wichtig zu beachten, dass die Protokolle, die zur Erstellung von HTTP-Analysen verwendet werden, ab sofort auch von unserem Proxy für die Geschäftslogik ausgegeben werden. Dies hat zur Folge, dass diese Fehler im Cloudflare-Dashboard nicht sichtbar sind. Unsere internen Dashboards zeigen, dass etwa 1 % der Anfragen während der ersten Angriffswelle (bevor wir Abwehrmaßnahmen ergriffen) betroffen waren, mit Spitzenwerten von etwa 12 % für einige Sekunden während der schwerwiegendsten Angriffswelle am 29. August. Das folgende Diagramm zeigt das Verhältnis dieser Fehler über einen Zeitraum von zwei Stunden, in dem dies geschah:

In den darauffolgenden Tagen haben wir hart daran gearbeitet, diese Zahl drastisch zu reduzieren, wie in diesem Beitrag näher erläutert wird. Dank der Änderungen in unserem Stack und unserer Abwehreinrichtungen, die die Größe dieser Angriffe erheblich reduzieren, liegt diese Zahl heute praktisch bei null:

499-Fehler und die Herausforderungen für die Gleichzeitigkeit von HTTP/2-Streams

Ein weiteres Phänomen, von dem einige Kunden berichten, ist die Zunahme von 499.Fehlermeldungen. Die Ursache hierfür ist etwas anders und hängt mit der maximalen Anzahl gleichzeitiger Streams in einer HTTP/2-Verbindung zusammen, die weiter oben in diesem Beitrag beschrieben wurde.

HTTP/2-Einstellungen werden zu Beginn einer Verbindung über SETTINGS-Frames ausgetauscht. Wird kein expliziter Parameter angegeben, gelten die Standardwerte. Sobald ein Client eine HTTP/2-Verbindung aufgebaut hat, kann er auf die SETTINGS eines Servers warten (langsam) oder er kann die Standardwerte annehmen und mit den Anfragen beginnen (schnell). Für SETTINGS_MAX_CONCURRENT_STREAMS ist der Standardwert praktisch unbegrenzt (Stream-IDs verwenden einen 31-Bit-Zahlenraum, und Anfragen verwenden ungerade Zahlen, sodass das tatsächliche Limit bei 1073741824 liegt). In der Spezifikation wird empfohlen, dass ein Server nicht weniger als 100 Streams anbietet. Clients sind in der Regel auf Schnelligkeit bedacht und warten daher nicht auf Servereinstellungen, was zu einer Art Wettlauf führt. Clients wetten darauf, welchen Grenzwert der Server auswählt; wenn sie sich irren, wird die Anfrage abgelehnt und muss erneut gestellt werden. Auf 1073741824 zu wetten ist ein bisschen albern. Stattdessen beschließen viele Clients, sich auf die Ausgabe von 100 gleichzeitigen Streams zu beschränken, in der Hoffnung, dass die Server die empfohlene Spezifikation befolgen. Wenn die Server einen Wert unter 100 wählen, schlägt dieses Client-Ratespiel fehl und die Streams werden zurückgesetzt.

Es gibt viele Gründe, warum ein Server einen Stream bei Überschreitung des Limits für die Anzahl gleichzeitiger Streams zurücksetzen kann. HTTP/2 ist streng und verlangt, dass ein Stream geschlossen wird, wenn Parsing- oder Logikfehler auftreten. 2019 entwickelte Cloudflare mehrere Abwehrmaßnahmen als Reaktion auf HTTP/2 DoS-Schwachstellen. Mehrere dieser Schwachstellen wurden durch das Fehlverhalten eines Clients verursacht, was den Server dazu veranlasste, einen Stream zurückzusetzen. Eine sehr effektive Strategie, um solche Clients einzudämmen, besteht darin, die Anzahl der Server-Resets während einer Verbindung zu zählen und, wenn diese einen bestimmten Schwellenwert überschreitet, die Verbindung mit einem GOAWAY-Frame zu schließen. Legitime Clients machen vielleicht ein oder zwei Fehler während einer Verbindung, und das ist akzeptabel. Ein Client, der zu viele Fehler macht, ist wahrscheinlich entweder defekt oder böswillig; das Schließen der Verbindung ist in beiden Fällen zielführend.

Als Reaktion auf DoS-Angriffe, die durch CVE-2023-44487, ermöglicht wurden, hat Cloudflare die Anzahl der gleichzeitig zugelassenen Streams auf 64 reduziert. Vor dieser Änderung war uns nicht bewusst, dass Clients nicht auf SETTINGS warten und stattdessen für die maximale Anzahl gleichzeitiger Streams 100 annehmen. Einige Webseiten, wie z. B. eine Bildergalerie, veranlassen einen Browser in der Tat dazu, gleich zu Beginn einer Verbindung 100 Anfragen zu senden. Leider mussten die 36 Streams, die über unserem Limit lagen, alle zurückgesetzt werden, was unsere durch Zählung aktivierten Abwehrmaßnahmen auslöste. Das bedeutete, dass wir die Verbindungen von legitimen Clients beendeten, was zu einem kompletten Seitenladefehler führte. Als wir dieses Interoperabilitätsproblem erkannten, änderten wir die maximale Anzahl der gleichzeitig zugelassenen Streams auf 100.

Diese Schritte hat Cloudflare gesetzt

2019 wurden mehrere DoS-Schwachstellen im Zusammenhang mit Implementierungen von HTTP/2 aufgedeckt. Cloudflare hat daraufhin eine Reihe von Erkennungs- und Abwehrmaßnahmen entwickelt und implementiert. CVE-2023-44487 ist eine andere Ausprägung der HTTP/2-Schwachstelle. Um sie abzuwehren, konnten wir jedoch die bestehenden Schutzmaßnahmen erweitern, um vom Client gesendete RST_STREAM-Frames zu überwachen und Verbindungen zu schließen, wenn sie missbräuchlich verwendet werden. Legitime Client-Nutzungen für RST_STREAM sind davon nicht betroffen.

Neben einer direkten Korrektur haben wir mehrere Verbesserungen an der HTTP/2-Frame-Verarbeitung des Servers und am Code für die Anfragenabwicklung vorgenommen. Darüber hinaus wurden die Warteschlangen und die Scheduling-Funktion des Servers für die Geschäftslogik verbessert, um unnötige Arbeit zu vermeiden und die Reaktionsfähigkeit bei Widerrufen zu verbessern. Dadurch werden die Auswirkungen verschiedener potenzieller Missbrauchsmuster verringert und der Server erhält mehr Raum, um Anfragen zu bearbeiten, bevor er ausgelastet ist.

Angriffe einfacher abwehren

Cloudflare verfügte bereits über Systeme, um sehr große Angriffe mit weniger kostspieligen Methoden effizient abwehren zu können. Eines dieser Systeme heißt „IP Jail“ (weil IPs hier quasi „gefangen genommen“ werden). Bei hypervolumetrischen Angriffen sammelt dieses System die am Angriff beteiligten Client-IPs und verhindert, dass sie sich mit dem angegriffenen Objekt verbinden, entweder auf IP-Ebene oder in unserem TLS-Proxy. Dieses System benötigt jedoch einige Sekunden, um seine volle Wirkung zu entfalten; während dieser kostbaren Sekunden sind die Ursprünge bereits geschützt, aber unsere Infrastruktur muss immer noch alle HTTP-Anfragen aufnehmen. Da dieses neue Botnetz praktisch keine Anlaufzeit hat, müssen wir Angriffe neutralisieren können, bevor sie zu einem Problem werden können.

Zu diesem Zweck haben wir das IP-Jail-System erweitert, um unsere gesamte Infrastruktur zu schützen: Sobald eine IP darin „gefangen“ ist, kann sie sich nicht nur nicht mehr mit der angegriffenen Domain verbinden, sondern wir verbieten den entsprechenden IPs für einige Zeit auch die Nutzung von HTTP/2 für jede andere auf Cloudflare gehostete Domain. Da derartige Protokollmissbräuche mit HTTP/1.x nicht möglich sind, schränkt dies die Möglichkeiten des Angreifers ein, groß angelegte Angriffe auszuführen, während ein legitimer Client, der dieselbe IP-Adresse nutzt, in dieser Zeit nur einen sehr geringen Performance-Verlust erleiden würde. IP-basierte Abwehrmaßnahmen sind ein sehr hartes Mittel – deshalb müssen wir bei ihrem Einsatz in diesem Ausmaß äußerst vorsichtig sein und versuchen, Fehlalarme so weit wie möglich zu vermeiden. Außerdem ist die Lebensdauer einer bestimmten IP in einem Botnetz in der Regel kurz, so dass jede langfristige Abwehrmaßnahme wahrscheinlich mehr schadet als nützt. Die folgende Grafik zeigt den Wechsel der IPs bei den von uns beobachteten Angriffen:

Wir sehen: Viele neue IPs, die an einem bestimmten Tag entdeckt werden, verschwinden sehr schnell wieder.

Da alle diese Aktionen in unserem TLS-Proxy am Anfang unserer HTTPS-Pipeline stattfinden, spart dies im Vergleich zu unserem regulären L7-Abwehrsystem erhebliche Ressourcen. Dadurch konnten wir diese Angriffe viel besser abwehren, und die Zahl der zufälligen 502-Fehler, die von diesen Botnetzen verursacht werden, ist jetzt auf null gesunken.

Verbesserungen der Beobachtbarkeit

Auch im Bereich der Beobachtbarkeit nehmen wir Veränderungen vor. Es ist nicht zufriedenstellend, wenn Clients Fehler erhalten, ohne dass diese in der Kundenanalyse sichtbar sind. Glücklicherweise wurde bereits lange vor den jüngsten Angriffen ein Projekt zur Überarbeitung dieser Systeme eingeleitet. Damit kann jeder Dienst innerhalb unserer Infrastruktur seine eigenen Daten protokollieren, anstatt sich auf unseren Proxy für die Geschäftslogik zu verlassen, der die Protokolldaten konsolidiert und ausgibt. Dieser Vorfall hat gezeigt, wie wichtig diese Arbeit ist, und wir intensivieren unsere Bemühungen.

Wir arbeiten auch an einer besseren Protokollierung auf Verbindungsebene, damit wir solche Protokollmissbräuche viel schneller erkennen und unsere Fähigkeiten zur DDoS-Abwehr verbessern können.

Fazit

Auch wenn dies der jüngste rekordverdächtige Angriff war, wissen wir, dass es nicht der letzte sein wird. Die Angriffe werden immer raffinierter. Darum arbeiten wir bei Cloudflare unermüdlich daran, neue Bedrohungen proaktiv zu identifizieren und Gegenmaßnahmen in unserem globalen Netzwerk zu implementieren, damit unsere Millionen von Kunden sofort und automatisch geschützt sind.

Seit 2017 bietet Cloudflare allen Kunden kostenlosen und zeitlich unbefristeten DDoS-Schutz ohne Volumensbegrenzung. Darüber hinaus bieten wir eine Reihe zusätzlicher Sicherheitsfunktionen, die den Bedürfnissen von Unternehmen jeder Größe entsprechen. Kontaktieren Sie uns, wenn Sie sich nicht sicher sind, ob Sie geschützt sind, oder wenn Sie wissen möchten, wie Sie sich schützen können.

HTTP/2 Rapid Reset：解構破紀錄的攻擊

2023-10-10 Lucas Pardue

Post Syndicated from Lucas Pardue original http://blog.cloudflare.com/zh-tw/technical-breakdown-http2-rapid-reset-ddos-attack-zh-tw/

自 2023 年 8 月 25 日起，我們開始注意到很多客戶遭受到一些異常大型的 HTTP 攻擊。我們的自動化 DDoS 系統偵測到這些攻擊並加以緩解。但是，沒過多久，它們就開始達到破紀錄的規模，峰值最終剛好超過每秒 2.01 億次請求。這幾乎是之前記錄在案的最大規模攻擊的 3 倍。

正在遭受攻擊或需要額外保護？請按一下這裡以取得幫助。

而更加深入後發現，攻擊者能夠利用僅由 20,000 台機器組成的 Botnet 發起此類攻擊，而如今的 Botnet 規模可達數十萬或數百萬台機器。整個 web 網路通常每秒處理 10-30 億個請求，因此使用此方法可以將整個 web 網路的請求數量等級集中在少數目標上，而其實是可以達成的。

偵測和緩解

這是一種規模空前的新型攻擊手段，但 Cloudflare 的現有保護主要能夠吸收攻擊的壓力。我們一開始就注意到一些客戶流量影響 (在第一波攻擊期間約影響 1% 的請求)；如今，我們仍在持續改善緩解方法，以阻止對任何 Cloudflare 客戶發動的攻擊，而且不會影響我們的系統。

我們注意到這些攻擊，同時也有其他兩家主流的業內廠商（Google 和 AWS）發現相同的狀況。我們竭力強化 Cloudflare 的系統，以確保現今所有客戶都能免於此新型 DDoS 攻擊方法的侵害，而且沒有任何客戶受到影響。我們還與 Googel 和 AWS 合作，協調披露受影響廠商和關鍵基礎架構提供者遭受的攻擊。

此攻擊是透過濫用 HTTP/2 通訊協定部分功能和伺服器實作詳細資料才得以發動（請參閱 CVE-2023-44487 瞭解詳細資料）。因為攻擊會濫用 HTTP/2 通訊協定中的潛在弱點，所以我們認為任何實作 HTTP/2 的廠商都會遭受攻擊。這包括每部現代 Web 伺服器。我們與 Google 和 AWS 已經將攻擊方法披露給預期將實作修補程式的 Web 伺服器廠商。在這段期間，最佳防禦就是針對任何面向 Web 的網頁或 API 伺服器使用 DDoS 緩解服務（例如 Cloudflare）。

本文將深入探討 HTTP/2 通訊協定、攻擊者用於發動這些大規模攻擊的功能，以及我們為確保所有客戶均受到保護而採取的緩解策略。我們希望在發布這些詳細資料後，其他受影響的 Web 伺服器和服務即可取得實施緩解策略所需的資訊。此外，HTTP/2 通訊協定標準團隊和未來 Web 標準制定團隊，都能進一步設計出預防此類攻擊的功能。

RST 攻擊詳細資料

HTTP 是支援 Web 的應用程式通訊協定。HTTP 語意為所有版本的 HTTP 所共用；整體架構、術語及通訊協定方面，例如請求和回應訊息、方法、狀態碼、標頭和後端項目欄位、訊息內容等等。每個 HTTP 版本將定義如何將語意轉化為「有線格式」以透過網際網路交換。例如，客戶必須將請求訊息序列化為二進位資料並進行傳送，接著伺服器會將其剖析回可處理的訊息。

HTTP/1.1 採用文字形式的序列化。請求和回應訊息交換為 ASCII 字元串流，透過 TCP 等可靠的傳輸層傳送，並使用下列格式（其中的 CRLF 表示斷行和換行）：

 HTTP-message   = start-line CRLF
                   *( field-line CRLF )
                   CRLF
                   [ message-body ]

例如，對 https://blog.cloudflare.com/ 非常簡單的 GET 請求在網路上看起來像這樣：

GET / HTTP/1.1 CRLFHost: blog.cloudflare.comCRLFCRLF

回應如下所示：

HTTP/1.1 200 OK CRLFServer: cloudflareCRLFContent-Length: 100CRLFtext/html; charset=UTF-8CRLFCRLF<100 bytes of data>

此格式框住網路上的訊息，表示可使用單一 TCP 連線來交換多個請求和回應。但是，該格式要求完整傳送每則訊息。此外，為使請求與回應正確關聯，需要嚴格排序；表示訊息會依次交換且無法多工處理。以下是 https://blog.cloudflare.com/ 和 https://blog.cloudflare.com/page/2/ 的兩個 GET 請求：

GET / HTTP/1.1 CRLFHost: blog.cloudflare.comCRLFCRLFGET /page/2/ HTTP/1.1 CRLFHost: blog.cloudflare.comCRLFCRLF

回應如下：

比起這些範例，網頁需要更複雜的 HTTP 互動。造訪 Cloudflare 部落格時，您的瀏覽器會載入多個指令碼、樣式及媒體資產。如果您使用 HTTP/1.1 造訪首頁並決定瀏覽至第 2 頁，則瀏覽器可從兩個選項中任選。在第 2 頁還無法開始前，將您不再需要的頁面等待所有排入佇列的回應，或者關閉 TCP 連線並開啟新連線以取消處理中的請求。這些都不是非常實用。瀏覽器通常會管理 TCP 連線集區（每個主機最多 6 個），然後在集區上實作複雜的請求分派邏輯，藉此解決這些限制。

HTTP/2 解決了許多 HTTP/1.1 的問題。每個 HTTP 訊息會序列化為一組具有類型、長度、標誌、串流識別碼 (ID) 及負載的 HTTP/2 框架。串流 ID 明確說明了網路上的哪個位元組適用於哪個訊息，並允許安全的多工處理和並行處理。串流具有雙向特性。用戶端會使用相同 ID 來傳送框架和具有框架的伺服器回應。

在 HTTP/2 中，https://blog.cloudflare.com 的 GET 請求會跨 Stream ID 1 交換，其中用戶端傳送一個 HEADERS 框架，伺服器則回應一個 HEADERS 框架，接著是一個或多個 DATA 框架。用戶端請求一律使用奇數的 Stream ID，因此後續請求會使用 Stream ID 3、5 等等。回應可以是任何順序，而且不同串流的框架可以交錯。

Stream 多工處理和並行處理是 HTTP/2 的強大功能。它們可更有效地使用單一 TCP 連線。HTTP/2 將資源最佳化，特別是搭配優先順序時擷取的資源。另一方面，與 HTTP/1.1 相比，讓用戶端輕鬆啟動大量平行工作，可能會增加伺服器資源的尖峰需求。這是阻斷服務的明顯手段。

為建立一些防護機制，HTTP/2 提供最大作用中並行串流的概念。SETTINGS_MAX_CONCURRENT_STREAMS 參數允許伺服器宣告其並行處理限制。例如，如果伺服器指出 100 的限制，則任何時間都只會有 100 個作用中請求。如果用戶端嘗試開啟超過此限制的串流，則須由使用 RST_STREAM 框架的伺服器拒絕。串流拒絕不會影響連線上的其他處理中串流。

實際情況會更複雜一點。串流具有生命週期。以下是 HTTP/2 串流狀態機器的圖表。用戶端和伺服器會管理自己的串流狀態檢視。HEADERS、DATA 及 RST_STREAM 框架會在傳送或接收時觸發轉換。雖然串流狀態可獨立檢視，但已完成同步。

HEADERS 和 DATA 框架包括 END_STREAM 標誌，在設定為數值 1 (true) 時，可觸發狀態轉換。

讓我們利用無訊息內容的 GET 請求範例來逐步解決此狀況。用戶端會在具有 END_STREAM 標誌的 HEADERS 框架設定為 1 時傳送請求。用戶端首先會將串流狀態從閒置轉換為開啟狀態，然後立即轉換為半關閉狀態。用戶端半關閉狀態表示無法再傳送 HEADERS 或 DATA，只會傳送 WINDOW_UPDATE、PRIORITY 或 RST_STREAM 框架，但是可接收任何框架。

伺服器接收並剖析 HEADERS 框架後，即會將串流狀態從閒置轉換為開啟及半關閉，以便與用戶端相符。伺服器半關閉狀態表示可傳送任何框架，但只會接收 WINDOW_UPDATE、PRIORITY 或 RST_STREAM 框架。

GET 回應包含訊息內容，因此伺服器傳送 HEADERS 時 END_STREAM 標誌設定為 0，傳送 DATA 時 END_STREAM 標誌設定為 1。DATA 框架會在伺服器上觸發從半關閉到關閉的串流轉換。當用戶端接收時，也會轉換為關閉。串流關閉後，無法傳送或接收任何框架。

將此生命週期套用回並行處理背景後，HTTP/2 指出：

處於「開啟」狀態或「半關閉」狀態的串流會計入允許端點開啟的最大串流數。這些狀態中的串流會計入在 SETTINGS_MAX_CONCURRENT_STREAMS 設定中宣告的限制。

理論上，並行處理限制很實用。但還是有阻礙其效率的實際因素，我們稍後將在部落格中介紹。

HTTP/2 請求取消

之前，我們介紹了處理中請求的用戶端取消。與 HTTP/1.1 相比，HTTP/2 支援這項作業的方式更有效率。不需要斷開整個連線，用戶端即可傳送單一串流的 RST_STREAM 框架。這會指示伺服器停止處理請求和中止回應，進而釋放伺服器資源及避免浪費頻寬。

我們來考量一下先前 3 個請求範例。這次用戶端在所有 HEADERS 傳送完畢後取消了串流 1 上的請求。伺服器則會在準備提供回應前剖析此 RST_STREAM 框架，轉而僅回應串流 3 和 5：

請求取消是實用的功能。例如，捲動具有多張影像的網頁時，Web 瀏覽器可以取消檢視區以外的影像，表示影像進入該區後的載入速度更快。比起 HTTP/1.1，HTTP/2 讓這個行為變得更有效率。

取消的請求串流會在整個串流生命週期中轉換。用戶端的 HEADERS 在 END_STREAM 標誌設定為 1 時，狀態會從閒置轉換到開啟和半關閉，然後 RST_STREAM 會立即產生半關閉到關閉的轉換。

回想一下，只有開啟或半開放狀態的串流會造成串流並行處理限制。當用戶端取消串流時，才能在其位置中開啟另一個串流，而且可以立即傳送另一個請求。這是 CVE-2023-44487 得以運作的關鍵。

導致阻斷服務的快速重設

HTTP/2 請求取消可能被濫用來快速重設無數串流。當 HTTP/2 伺服器處理用戶端傳送之 RST_STREAM 框架及斷開狀態的速度夠快時，此類快速重設就不會造成問題。當整理作業有任何種類的延遲或延滯時，問題就會開始突然出現。用戶端可能會處理太多因工作積壓而累積的請求，導致伺服器上的資源過度消耗。

常見的 HTTP 部署架構是在其他元件前執行 HTTP/2 Proxy 或負載平衡器。用戶端請求到達時即會快速分派，而且實際工作在其他位置以非同步活動完成。這可讓 Proxy 非常高效地處理用戶端流量。但是，此關注點分離會讓 Proxy 難以整理進行中的工作。因此，這些部署更有可能遇到快速重設造成的問題。

當 Cloudflare 的反向 Proxy 處理傳入的 HTTP/2 用戶端流量時，它們會將連線通訊端的資料複製到依序緩衝資料的緩衝區和程序。讀取每個請求時（HEADERS 和 DATA 框架），即會分派至上游服務。讀取 RST_STREAM 框架時，即會卸除請求的本機狀態並通知上游已取消該請求。重複該程序直到消耗掉整個緩衝區。但是，此邏輯可能被濫用：當惡意用戶端在連線初期開始傳送大型請求和重設鏈時，我們的伺服器會迫不急待讀取所有內容並在上游伺服器上建立壓力，直到能夠處理任何新的傳入請求為止。

特別強調一點，串流並行處理無法自行緩解快速重設。用戶端可以處理請求以建立高請求率，無論伺服器的 SETTINGS_MAX_CONCURRENT_STREAMS 選定值為何。

Rapid Reset 詳細分析

以下範例為使用嘗試提出共計 1,000 個請求的概念驗證用戶端來重現的 Rapid Reset。我使用了現成伺服器且沒有任何緩解措施；在測試環境中接聽連接埠 443。已使用 Wireshark 詳細分析流量並經過篩選，且為了清楚起見，僅顯示 HTTP/2 流量。下載該 pcap 以遵循。

由於框架太多，有點難看到。我們可以透過 Wireshark 統計資料 > HTTP2 工具取得快速摘要：

在封包 14 中，此追蹤的第一個框架為伺服器的 SETTINGS 框架，其宣告 100 的串流並行處理上限。在封包 15 中，用戶端會傳送一些控制框架並開始提出快速重設的請求。第一個 HEADERS 框架長度為 26 個位元組，所有後續的 HEADERS 則只有 9 個位元組。名稱為 HPACK 的壓縮技術造成了這項大小差異。封包 15 共包含 525 個請求，並往上至串流 1051。

有趣的是，串流 1051 的 RST_STREAM 不適合封包 15，所以我們在封包 16 看到了 404 回應的伺服器回應。接著在封包 17 中，用戶端的確傳送了 RST_STREAM，然後繼續傳送剩餘的 475 個請求。

請注意，雖然伺服器宣告了 100 個並行串流，用戶端傳送的封包還傳送了數量更多的 HEADERS 框架。用戶端不必等待任何從伺服器傳回的流量，唯一限制為可傳送的封包大小。此追蹤沒有任何伺服器 RST_STREAM 框架，表明伺服器並未觀察到並行的串流違規事件。

對客戶的影響

如上所述，取消請求時即會通知上游服務，而且可能在浪費太多資源前中止請求。這是此次攻擊的情況，其中大多數的惡意請求從未轉傳至原始伺服器。但是，這些攻擊的龐大規模並未造成一些影響。

首先，隨著傳入請求率達到前所未有的峰值，我們收到了用戶端所發現 502 錯誤數量漸增的報告。這發生在我們受影響程度最大的資料中心，目前正努力處理所有請求。雖然我們的網路旨在處理大型攻擊，這項特殊漏洞還是暴露了我們基礎架構中的弱點。我們來進一步探討細節，並將重點放在傳入請求到達我們其中一個資料中心時如何受到處理：

我們可以看到基礎架構由責任不同的各種 Proxy 伺服器組成。特別是，當用戶端連線至 Cloudflare 以傳送 HTTPS 流量時，會先到達我們的 TLS 解密 Proxy：可解密 TLS 流量、處理 HTTP 1、2 或 3 流量，然後將其轉傳至「商務邏輯」Proxy。這會負責為每個客戶載入所有設定，然後將請求正確路由至其他上游服務；在我們案例中更重要的是，這也會負責安全功能。此時會處理 L7 攻擊緩解。

此攻擊手段的問題是，它有辦法非常快速地在每一次連線中傳送大量請求。每個請求都必須轉傳至商務邏輯 Proxy，我們才有機會加以封鎖。隨著請求輸送量變得比我們的 Proxy 容量還要高，連線這兩項服務的管道在部分伺服器中都達到了其飽和程度。

此情況發生時，TLS Proxy 即無法再連線至它的上游 Proxy，因此在大多數嚴重攻擊期間，有些用戶端會看到裸機的「502 錯誤的閘道」錯誤。請務必注意，截至今日為止，我們的商務邏輯 Proxy 也會發出用於建立 HTTP 分析的記錄。後果就是 Cloudflare 儀表板鐘看不到這些錯誤。我們的內部儀表板顯示第一波攻擊期間約有 1% 請求受到影響，並在 8 月 29 日最嚴重的一次期間，達到幾秒約 12% 的峰值。下列圖表顯示此情況發生期間的兩小時內，這些錯誤的比率：

如本文稍後詳述的內容，我們在接下來幾天內努力將這個數字大幅減少。得益於我們堆疊中的變更及緩解措施，顯著減少了這些攻擊的規模，這個數字在今日實際上是零。

499 錯誤及 HTTP/2 串流並行處理的挑戰

部分客戶回報的另一個症狀為 499 錯誤有所增加。此情況的原因有點不同，而且與本文稍早詳述之 HTTP/2 連線中的串流並行處理上限相關。

HTTP/2 設定在連線開始時使用 SETTINGS 框架交換。在沒有接收明確參數的情況下，則適用於預設值。在用戶端建立 HTTP/2 連線後，即可等待伺服器的 SETTINGS（緩慢），或者假設預設值並開始提出請求（快速）。對於 SETTINGS_MAX_CONCURRENT_STREAMS，預設值其實沒有限制（串流 ID 使用 31 位元數的空格且請求使用奇數，因此實際限制為 1,073,741,824 次）。規格建議伺服器提供的串流不少於 100 個。用戶端一般偏向速度，所以請勿等待伺服器設定，其中會產生一些競爭條件。用戶端會在伺服器可能挑選的限制上賭一把；如果它們選錯，請求即會遭到拒絕且必須重試一次。要賭 1,073,741,824 個串流的話，有點愚蠢。取而代之的是，很多用戶端決定限制自身發出 100 個並行串流，並希望伺服器遵循規格建議。如果伺服器挑選的數字為 100 以下，表示用戶端賭輸，然後會重設串流。

伺服器可能將串流重設為超出並行處理限制的原因有很多。HTTP/2 很嚴格且在出現剖析或邏輯錯誤時要求串流關閉。在 2019 年，Cloudflare 制度了多項緩解措施以回應 HTTP/2 DoS 漏洞。那些漏洞有很多是因用戶端行為不當造成，並導致伺服器重設串流。若要壓制此類用戶端，計算連線期間的伺服器重設數目是非常有效的策略，還可以在超出部分閾值時，關閉具有 GOAWAY 框架的連線。合法用戶端可能會出現一或兩個連線錯誤，此為合理情況。如果用戶端出現太多錯誤，則可能處於損壞或惡意狀態，關閉連線可解決這兩種情況。

在回應 CVE-2023-44487 啟用的 DoS 攻擊時，Cloudflare 將串流並行處理上限減少至 64。進行此變更前，我們並不清楚用戶端不會等待 SETTINGS，而是假設並行處理為 100。有些網頁（例如影像庫）的確會讓瀏覽器在連線初期立即傳送 100 個請求。遺憾的是，36 個超出我們限制的串流全都需要重設，因而觸發我們的計算緩解措施。這表示我們關閉了合法用戶端上的連線，導致完整頁面載入失敗。在我們意識到此互通性問題時，我們就立即將串流並行處理上限變更 100。

Cloudflare 端採取的行動

在 2019 年，我們發現多項 DoS 漏洞與 HTTP/2 實作相關。Cloudflare 在回應中發展並部署了一系列偵測和緩解措施。CVE-2023-44487 是 HTTP/2 漏洞的不同表現。但是，為緩解此問題，我們能夠擴展現有保護以監控用戶端傳送的 RST_STREAM 框架，並且在連線受到濫用時將其關閉。RST_STREAM 在合法用戶端使用上不會受到影響。

除了直接修復之外，我們對伺服器的 HTTP/2 框架處理和請求分派代碼實作了多項改善。此外，商務邏輯伺服器收到了佇列和排程改善，可減少不不要的工作並改善取消的回應能力。這些可共同降低各種潛在濫用模式的影響，並且在飽和前為伺服器提供更多處理請求的空間。

提早緩解攻擊

Cloudflare 已配備可有效緩解非常大型攻擊的系統，而且是更經濟實惠的方法。其中一項稱為「IP Jail」。對於巨流量攻擊，此系統會收集參與攻擊的用戶端 IP，並阻止它們連線至受到攻擊的財產，無論是 IP 層級或位於我們的 TLS Proxy 中。但是，此系統需要幾秒才能充分發揮效果；在珍貴的幾秒之間，原始伺服器已受到保護，但我們的基礎架構還是需要吸收所有 HTTP 請求。由於這種新的殭屍網路實際上沒有啟動期間，因此我們能夠抵禦攻擊，以免造成問題。

為實現此目標，我們擴展了 IP Jail 系統以保護整個基礎架構：只要 IP「受到監禁」，不僅無法連線至受到攻擊的財產，我們還會在一段期間內禁止相應 IP 使用 HTTP/2 連線至 Cloudflare 上的其他網域。由於使用 HTTP/1.x 時無法濫用此類通訊協定，這限制了攻擊者執行大型攻擊的能力，而任何共用相同 IP 的合法用戶端在此期間內只會看到非常小幅度的效能降低。基於 IP 的緩解措施是非常遲鈍的工具；這就是為什麼我們在大規模採用並儘量設法避免誤判時必須極度謹慎。此外，殭屍網路中特定 IP 的生命週期通常很短，所以任何長期緩解措施有可能弊大於利。下圖顯示我們所見證攻擊中 IP 流失的情況：

如我們所見，很多在特定日期發現的新 IP 隨後都消失得非常快。

由於所有動作都發生在 HTTPS 管道開始處的 TLS Proxy 中，因此相較於常規的第 7 層緩解系統，這節省了大量資源。我們還可更加順暢地處理這些攻擊，目前由這些殭屍網路造成的隨機 502 錯誤數目已降至零。

可觀察性改善

我們做出變更的另一方面是可觀察性。向用戶端傳回錯誤卻未顯示於客戶分析中，令人很不滿意。幸運的是，早在近期的攻擊之前，這些系統的檢修專案已在進行中。最終讓我們基礎架構中的每項服務記錄自己的資料，而不是依賴商務邏輯 Proxy 整合並發出記錄資料。此事件強調了這項工作的重要性，而且我們正在加倍投入心力。

我們也在發展更好的連線層級記錄功能，以便更快速地找出此類通訊協定濫用，進而改善我們的 DDoS 緩解能力。

結論

雖然這是最新一次破紀錄的攻擊，但我們知道不只如此。隨著攻擊變得越來越複雜，Cloudflare 只有堅持不懈地主動識別新的威脅，同時為全球網路部署對策，才能讓我們數百萬名客戶獲得即時和自動的保護。

Cloudflare 自 2017 年起向我們所有客戶提供了免費、非計量且無限制的 DDoS 保護。此外，我們還有一系列額外的安全功能，符合所有規模的組織需求。如果您不確定自己是否受到保護或想瞭解如何受到保護，請聯絡我們。

HTTP/2 Rapid Reset: 기록적인 공격의 분석

2023-10-10 Lucas Pardue

Post Syndicated from Lucas Pardue original http://blog.cloudflare.com/ko-kr/technical-breakdown-http2-rapid-reset-ddos-attack-ko-kr/

Cloudfare에서는 2023년 8월 25일부터 다수의 고객을 향한 일반적이지 않은 일부 대규모 HTTP 공격을 발견했습니다. 이 공격은 우리의 자동 DDos 시스템에서 탐지하여 완화되었습니다. 하지만 얼마 지나지 않아 기록적인 규모의 공격이 시작되어, 나중에 최고조에 이르러서는 초당 2억 1백만 요청이 넘었습니다. 이는 우리 기록상 가장 대규모 공격이었던 이전의 공격의 거의 3배에 달하는 크기입니다.

공격을 받고 있거나 추가 보호가 필요하신가요? 여기를 클릭하여 도움을 받으세요.

우려되는 부분은 공격자가 머신 20,000개로 이루어진 봇넷만으로 그러한 공격을 퍼부을 수 있었다는 사실입니다. 오늘날의 봇넷은 수십만 혹은 수백만 개의 머신으로 이루어져 있습니다. 웹 전체에서 일반적으로 초당 10억~30억 개의 요청이 목격된다는 점을 생각하면, 이 방법을 사용했을 때 웹 전체 요청에 달하는 규모를 소수의 대상에 집중시킬 수 있다는 가능성도 완전히 배제할 수는 없습니다.

감지 및 완화

이는 전례 없는 규모의 새로운 공격 벡터였으나, Cloudflare는 기존 보호 기능을 통해 치명적인 공격을 대부분 흡수할 수 있었습니다. 처음에 목격된 충격은 초기 공격 웨이브 동안 고객 트래픽 요청의 약 1%에 영향을 주었으나, 현재는 완화 방법을 개선하여 시스템에 영향을 주지 않고 Cloudflare 고객을 향한 공격을 차단할 수 있습니다.

우리는 업계의 다른 주요 대기업인 Google과 AWS에서도 같은 시기에 이러한 공격이 있었음을 알게 되었습니다. 이에 따라 지금은 우리의 모든 고객을 이 새로운 DDoS 공격 방법으로부터 어떤 영향도 받지 않도록 보호하기 위하여 Cloudflare의 시스템을 강화했습니다. 또한 Google 및 AWS와 협력하여 영향을 받은 업체와 주요 인프라 제공 업체에 해당 공격을 알렸습니다.

이 공격은 HTTP/2 프로토콜과 서버 구현 세부 사항의 일부 주요 기능을 악용했기에 가능했습니다(자세한 내용은 CVE-2023-44487 참조). HTTP/2 프로토콜의 숨어 있는 약점을 파고들었던 공격이었기 때문에, HTTP/2을 구현한 업체라면 해당 공격의 대상이 될 것이라 판단하고 있습니다. 여기에는 요즘의 모든 웹 서버가 포함됩니다. 우리는 Google 및 AWS와 더불어 패치를 구현할 것이라 예상하고 있는 웹 서버 업체를 향한 해당 공격 방법을 공개해왔습니다. 그 동안 최고의 방어는 웹을 대면하는 웹과 API 서버의 프론트에 Cloudflare와 같이 DDoS 완화 서비스를 적용하는 것입니다.

이 포스팅에서는 HTTP/2 프로토콜의 세부 사항, 즉 공격자가 대규모 공격을 만들어내는 데 악용한 주요 기능과 모든 고객을 보호하기 위해 Couldfare가 채택한 완화 전략을 상세히 살펴봅니다. 우리의 희망은 이러한 세부 사항을 공개해 영향을 받는 다른 웹 서버와 서비스에서 완화 전략 구현에 필요한 정보를 갖추는 것입니다. 또한 이에 그치지 않고, HTTP/2 프로토콜 표준 팀과 미래의 웹 표준을 수립하는 팀에서 더 나은 설계를 내놓아 이와 같은 공격을 예방하는 것입니다.

RST 공격 세부 내용

HTTP는 웹을 구동하는 애플리케이션 프로토콜입니다. HTTP Semantics는 HTTP의 모든 버전, 즉 전반적 아키텍처, 용어, 프로토콜 측면(예: 요청 및 응답 메시지, 메서드, 상태 코드, 헤더 및 트레일러 필드, 메시지 내용 등)에서 공통입니다. 각 HTTP 버전은 인터넷에서 상호작용을 위해 “와이어 포맷”으로 시맨틱을 변환하는 방법을 정의합니다. 예를 들어 클라이언트가 요청 메시지를 바이너리 데이터로 직렬화한 후 전송하면, 서버는 이를 다시 처리할 수 있는 메시지로 다시 구문 분석합니다.

HTTP/1.1은 직렬화된 텍스트 형식을 사용합니다. 요청과 응답 메시지는 ASCII 문자의 스트림으로 교환되고, TCP처럼 안정적인 전송 계층을 통해 다음 형식(CRLF는 캐리지 리턴 및 줄바꿈을 의미)으로 전송됩니다.

 HTTP-message   = start-line CRLF
                   *( field-line CRLF )
                   CRLF
                   [ message-body ]

예를 들어, https://blog.cloudflare.com/ 에 대한 매우 간단한 GET 요청은 와이어에서 다음과 같이 표시됩니다:

GET / HTTP/1.1 CRLFHost: blog.cloudflare.comCRLFCRLF

그리고 응답은 다음과 같습니다:

HTTP/1.1 200 OK CRLFServer: cloudflareCRLFContent-Length: 100CRLFtext/html; charset=UTF-8CRLFCRLF<100 bytes of data>

이 형식은 와이어에서 메시지의 형식을 정의하는데, 이는 단일 TCP 연결을 사용하여 다수의 요청과 응답을 주고받을 수 있다는 의미입니다. 그러나 이 형식은 각 메시지를 전체로 보내야 합니다. 추가로 요청과 응답을 정확하게 상호 연결하려면 순서를 엄격하게 지켜야 합니다. 즉, 메시지는 직렬로 교환되고, 다중화될 수 없습니다. https://blog.cloudflare.com/과 https://blog.cloudflare.com/page/2/에 대한 두 개의 GET 요청은 다음과 같이 표시됩니다:

GET / HTTP/1.1 CRLFHost: blog.cloudflare.comCRLFCRLFGET /page/2/ HTTP/1.1 CRLFHost: blog.cloudflare.comCRLFCRLF

응답은 다음과 같습니다.

웹 페이지에는 예시보다 더 복잡한 HTTP 상호작용이 필요합니다. Cloudflare 블로그를 방문할 때 브라우저는 여러 스크립트, 스타일, 미디어 에셋을 로드합니다. HTTP/1.1로 해당 페이지를 방문하고, 빠르게 다음 페이지로 넘어가고자 할 때, 브라우저는 2가지 선택지 중 하나를 고를 수 있습니다. 페이지가 시작하기도 전에 이제는 원하지 않는 다음 페이지에 대해 대기 중인 모든 응답을 기다리거나, TCP 연결을 닫고 새로운 연결을 열어 전송 중인 요청을 취소할 수 있습니다. 2가지 선택지 모두 그다지 실용적이지는 않습니다. 브라우저는 TCP 연결 풀(호스트당 최대 6개)을 관리하고 풀에서 복잡한 요청 전송 로직을 구현하며 이러한 제한을 준수합니다.

HTTP/2에서는 HTTP/1.1에서 발생한 문제 대부분이 해결됩니다. 각 HTTP 메시지는 유형, 길이, 플래그, 스트림 식별자(ID), 페이로드가 있는 HTTP/2 프레임 조합으로 직렬화됩니다. 스트림 ID는 와이어의 어떤 바이트가 어떤 메시지에 적용되는지 분명히 하는 동시에 안전하게 다중화하고 동시성을 부여합니다. 스트림은 양방향입니다. 클라이언트가 프레임을 전송하면, 서버는 같은 ID를 사용하는 프레임으로 응답합니다.

HTTP/2에서 https://blog.cloudflare.com에 대한 GET 요청은 스트림 ID 1을 통해 교환됩니다. 클라이언트가 1개의 HEADERS 프레임을 보내면 서버는 1개의 HEADERS 프레임과 뒤이어 1개 이상의 DATA 프레임으로 응답합니다. 클라이언트가 요청하는 스트림 ID는 항상 홀수이므로, 후속 요청은 스트림 ID 3, 5 등이 됩니다. 응답은 순서와 관계없이 전달되고, 다른 스트림의 프레임이 인터리빙될 수 있습니다.

스트림 다중화와 동시성은 HTTP/2의 강력한 주요 특징입니다. 두 이점은 단일 TCP 연결의 더욱 더 효율적인 사용을 지원합니다. HTTP/2는 우선순위 지정과 짝을 이룰 때 특히나 리소스 가져오기를 최적화합니다. 반대로 생각해보면, 클라이언트의 대규모 병렬 작업을 수월하게 시행할 수 있게 만들어 줌으로써 HTTP/1.1과 비교했을 때, 서버 리소스에 대한 피크 수요가 높아집니다. 이는 서비스 거부에 대한 분명한 벡터입니다.

HTTP/2는 약간의 방어 수단을 제공하기 위해 최대 활성 동시 스트림의 개념을 제시합니다. SETTINGS_MAX_CONCURRENT_STREAMS 매개변수는 서버가 자신의 동시성 제한을 알릴 수 있도록 합니다. 예를 들어, 서버에 100개로 제한을 둔다면, 어느 시점에서도 100개의 요청만이 활성화될 수 있습니다. 클라이언트가 제한을 초과하는 스트림을 열고자 한다면, 이 시도는 RST_STREAM 을 사용하는 서버에서 거부되어야 합니다. 스트림 거부는 연결 상의 이동 중인 다른 스트림에 영향을 주지 않습니다.

실제 이야기는 조금 더 복잡합니다. 스트림에는 수명 주기가 있습니다. 아래 HTTP/2 스트림 상태인 기기의 도표가 나와있습니다. 클라이언트와 서버는 스트림 상태인 자신의 뷰를 관리합니다. HEADERS, DATA, RST_STREAM 프레임은 전송되거나 수신될 때, 전환을 유발합니다. 스트림 상태의 뷰는 독립적이기는 하지만, 동기화됩니다.

HEADERS와 DATA 프레임에는 값이 1(true)로 설정되었을 때, 상태 전환을 일으킬 수 있는 END_STREAM이 포함되어 있습니다.

메시지 내용이 없는 GET 요청의 예시를 자세히 살펴보겠습니다. 클라이언트는 END_STREAM 플래그 조합을 1로 설정한 HEADERS 프레임으로 요청을 전송합니다. 클라이언트는 먼저 스트림 상태를 유휴에서 열림 상태로 전환한 다음, 곧바로 반 닫힘 상태로 전환합니다. 클라이언트의 반 닫힘 상태란 더 이상 HEADERS 또는 DATA를 보낼 수 없고, WINDOW_UPDATE, PRIORITY 또는 RST_STREAM 프레임만을 보낼 수 있는 상태를 말합니다. 하지만 클라이언트가 받는 프레임의 종류에는 제한이 없습니다.

서버가 HEADERS 프레임을 받아 구문 분석하고 나면 스트림 상태를 유휴에서 열림, 그리고 반 닫힘 상태로 전환해 클라이언트와 상태를 일치시킵니다. 서버가 반 닫힘 상태라는 의미는 어떤 프레임이든지 보낼 수 있지만, WINDOW_UPDATE, PRIORITY, RST_STREAM만 받을 수 있는 상태를 말합니다.

GET에 대한 응답에는 메시지 내용이 포함되므로 서버에서는 0으로 설정된 END_STREAM 플래그가 포함된 HEADERS를 보내고 난 다음, 1로 설정한 END_STREAM 플래그가 포함된 HEADERS를 보냅니다. DATA 프레임은 서버에서 스트림이 반 닫힘 상태에서 닫힘 상태로 전환되게 합니다. 클라이언트가 DATA 프레임을 받으면, 클라이언트 또한 닫힘 상태로 전환됩니다. 스트림이 닫히고 나면 어떤 프레임도 보내거나 받을 수 없습니다.

수명 주기를 동시성 컨텍스트에 다시 적용하면 HTTP/2는 다음과 같이 나타납니다:

“열림” 상태이거나 “반 닫힘” 상태인 스트림은 엔드포인트가 열 수 있도록 허용된 최대 스트림 수에 포함됩니다. 이 세 가지 상태 중 하나에 있는 스트림은 SETTINGS_MAX_CONCURRENT_STREAMS 설정에서 통지된 한도에 포함됩니다.

이론상, 동시성 제한은 유용합니다. 그렇지만, 그 효과를 방해하는 실용적인 요소가 있으며, 이는 블로그 뒷 부분에서 다룰 예정입니다.

HTTP/2 요청 취소

앞서 전송 중인 요청의 클라이언트 취소에 관해 설명했습니다. HTTP/2에서는 이를 HTTP/1.1보다 더 효율적인 방식으로 지원합니다. 클라이언트는 전체 연결을 끊지 않고도 단일 스트림에 대한 RST_STREAM 프레임을 보낼 수 있습니다. 이로 인해 서버는 요청 처리를 멈추고, 응답을 중단함으로써, 서버 리소스를 확보하고 대역폭 낭비를 방지할 수 있습니다.

앞선 예시인 3개의 요청을 살펴보겠습니다. 이번에는 클라이언트가 모든 HEADERS를 보내고 난 후에 스트림 1의 요청을 취소하는 경우를 가정해 보겠습니다. 서버는 응답을 보낼 준비를 하기 전에 이 RST_STREAM 프레임을 구문 분석하고 그 대신 스트림 3과 5에 대해서만 응답합니다.

요청 취소는 유용한 기능입니다. 예를 들어, 여러 이미지가 있는 웹 페이지를 스크롤할 때, 웹 브라우저는 뷰포트에서 벗어난 이미지를 취소할 수 있는데, 다시 말해 뷰포트로 들어오는 이미지는 더 빨리 로드된다는 의미입니다. HTTP/1.1과 비교했을 때 HTTP/2가 이러한 행위를 훨씬 더 효율적으로 수행합니다.

취소된 요청 스트림은 스트림 수명주기를 통해 빠르게 전환됩니다. END_STREAM 플래그가 1로 설정된 HEADERS는 클라이언트를 유휴 상태에서 열림, 반 닫힘 상태로 전환시키고, 이어서 RST_STREAM이 곧바로 반 닫힘 상태에서 닫힘 상태로 전환시킵니다.

열림 또는 반 닫힘 상태의 스트림만이 스트림 동시성 제한에 영향을 미친다는 점을 상기해 보세요. 클라이언트가 스트림을 취소하면 즉시 그 자리에 다른 스트림을 열 수 있고 바로 다른 요청을 보낼 수도 있습니다. 이는 CVE-2023-44487을 작동시키는 핵심입니다.

서비스 거부로 이어지는 Rapid Reset

HTTP/2 요청 취소는 무한한 스트림을 빠르게 초기화하는 데 악용될 수 있습니다. HTTP/2 서버가 클라이언트가 보낸 RST_STREAM 프레임을 처리할 수 있고, 충분히 빠르게 상태를 분해할 수 있다면, Rapid Reset은 문제가 되지 않습니다. 정리 과정에서 지연이나 처짐이 발생할 때 문제가 발생하기 시작합니다. 이때 클라이언트는 작업 백로그를 누적시키는 많은 요청으로 인해 이탈될 수 있으며, 이는 서버 리소스의 과도한 소모로 이어집니다.

일반적인 HTTP 배포 아키텍처는 다른 구성 요소보다 앞서서 HTTP/2 프록시 또는 부하 분산 장치를 실행할 수 있습니다. 클라이언트 요청이 도착하면 빠르게 전송되고, 실제 작업은 다른 곳에서 비동기 활동으로 이루어집니다. 이로 인해 프록시는 클라이언트 트래픽을 매우 효율적으로 처리할 수 있습니다. 그러나 이러한 분리로 인해 프록시가 처리 중인 작업은 정리하는 것이 어려워질 수 있습니다. 따라서 이러한 배포는 Rapid Reset로 인한 문제를 만날 가능성이 더 높습니다.

Cloudflare의 역프록시가 수신되는 HTTP/2 클라이언트 트래픽을 처리할 때, 연결의 소켓에서 버퍼로 데이터를 복사한 다음 버퍼에 있는 데이터를 순서대로 처리합니다. 각 요청을 읽으면(HEADER와 DATA 프레임) 이는 업스트림 서비스로 전송됩니다. RST_STREAM 프레임을 읽으면, 해당 요청에 대한 로컬 상태는 해체되고, 업스트림에서는 요청이 취소되었다는 알림을 받습니다. 버퍼를 비우고 버퍼 전체가 사용될 때까지 이를 반복합니다. 하지만 이 로직은 악용될 수 있습니다. 그 이유는 악의적 클라이언트가 연결을 시작할 때 엄청난 양의 요청과 초기화 체인을 보내기 시작하면, 우리의 서버는 의욕이 넘쳐서 체인을 모두 읽고, 업스트림 서버에 새롭게 수신되는 요청을 처리할 수 없는 시점까지 압력을 만들어내기 때문입니다.

짚고 넘어가야 하는 중요한 점은 그 자체의 스트림 동시성이 Rapid Reset를 완화시켜줄 수 없다는 것입니다. 클라이언트는 요청이 서버의 SETTINGS_MAX_CONCURRENT_STREAMS 설정 값과 관계 없이 높은 요청 속도를 만들어낼 수 있습니다.

Rapid Reset 분석

여기 총 1,000건의 요청을 생산하려는 개념 증명 클라이언트로 재생산된 Rapid Reset의 예가 있습니다. 여기서는 어떤 완화도 없는 상용 서버를 사용했으며, 테스트 환경에서는 443 포트가 수신 대기 중입니다. 트래픽은 Wireshark로 분석되며, 명확성을 위해 HTTP/2 트래픽만을 표시하도록 필터링됩니다. pcap을 다운로드하여 다음을 확인하세요.

프레임이 많으므로 트래픽을 확인하는 데 조금 어려울 수 있습니다. Wireshark 통계 > HTTP2 툴을 통한 빠른 요악을 받아볼 수 있습니다.

패킷 14에서 이 트레이스의 첫 번째 프레임은 서버의 SETTINGS 프레임으로, 최대 100개의 스트림 동시성을 알립니다. 패킷 15에서 클라이언트는 몇몇 제어 프레임을 보낸 다음 Rapid Reset이 되는 요청을 시작합니다. 첫 번째 HEADERS 프레임은 26바이트이고, 모든 후속 HEADERS는 9바이트에 불과합니다. 이 크기 차이는 HPACK이라는 압축 기술 덕분입니다. 패킷 15에는 총 525개의 요청이 포함되어 있으며, 스트림 1051까지 올라갑니다.

흥미롭게도 스트림 1051에 대한 RST_STREAM은 패킷 15에 맞지 않아 패킷 16에서 서버가 404로 응답하는 것을 볼 수 있습니다. 그 후에 클라이언트는 패킷 17에서 나머지 475개의 요청을 전송하기 전에 RST_STREAM을 전송합니다.

서버에서는 동시 스트림이 100개라고 알렸으나, 클라이언트에서 보낸 2개의 패킷 모두 그보다 훨씬 더 많은 HEADERS 프레임을 보냈습니다. 클라이언트는 서버로부터의 응답 트래픽을 기다릴 필요가 없었고, 보낼 수 있는 패킷의 크기에 따른 제한만을 받았습니다. 이 트레이스에서 서버의 RST_STREAM 프레임이 전혀 보이지 않는 것은 서버가 동시 스트림 위반을 관찰하지 못했음을 말해줍니다.

고객에게 미치는 영향

위에서 언급한 것과 같이 요청이 취소되면 업스트림 서비스는 알림을 전달받고, 너무 많은 리소스를 낭비하기 전에 요청을 중단할 수 있습니다. 이번 공격 케이스에서도 마찬가지였고, 대부분의 악의적 요청은 원본 서버로 전혀 포워딩되지 않았습니다. 그렇지만, 그 공격의 규모만으로도 일부 영향이 있었습니다.

첫째, 들어오는 요청의 속도가 이전에는 볼 수 없었던 최고치에 도달함에 따라, 고객의 502 오류 보고 횟수가 늘었다는 보고가 있었습니다. 이는 가장 지대한 영향을 받은 데이터 센터에서 모든 요청을 처리하기 위해 분투하는 과정에서 발생했습니다. 우리의 네트워크는 대규모 공격에 대응할 수 있도록 설계되었음에도, 이 특정한 취약점으로 인해 인프라의 약점이 노출되었습니다. 이제 더 세부 사항으로 들어가서, 데이터센터 중 한 곳에 공격이 발생했을 때, 들어오는 요청이 어떻게 처리되는지에 초점을 맞추어 보겠습니다.

Cloudflare 인프라는 각각 다른 책임을 맡은 여러 프록시 서버 체인으로 구성되어 있음을 확인할 수 있습니다. 특히 클라이언트가 HTTPS 트래픽을 전송하기 위해 Cloudflare에 연결하면 먼저 TLS 복호화 프록시에 연결됩니다. 프록시는 TLS 트래픽을 복호화하고 HTTP 1, 2 또는 3 트래픽을 처리한 다음, 이를 “비즈니스 로직” 프록시로 포워딩합니다. 이 프록시는 각 고객에 대한 모든 설정을 로드하고, 요청을 다른 업스트림 서비스로 정확하게 라우팅하는데, 이 케이스에서는 그보다 더 중요한 보안 기능도 담당합니다. L7 공격 완화도 이 프록시에서 처리됩니다.

이번 공격 벡터의 문제점은 모든 연결에서 매우 빠르게 다량의 요청을 보낼 수 있다는 사실입니다. 우리가 공격 벡터를 차단할 기회를 갖기도 전에 각 요청이 비즈니스 로직 프록시로 포워딩되어야 했습니다. 요청 처리량이 프록시 용량을 앞지르면서 이 두 서비스를 연결하는 채널이 우리의 일부 서버에서는 포화 상태에 도달했습니다.

이렇게 되면 TLS 프록시가 업스트림 프록시에 더 이상 연결될 수 없으며, 이것이 몇몇 클라이언트에서 가장 심각한 공격 중 소량의 “502 Bad Gateway” 오류가 발생했던 원인입니다. 주목해야 할 점은 오늘 현재 HTTP 분석을 생성하기 위해 사용하는 로그 또한 비즈니스 로직 프록시에서 전송된다는 사실입니다. 그 결과, 이러한 오류는 Cloudflare 대시보드에서 보이지 않습니다. 내부 대시보드에서는 초기 공격 웨이브(완화 조치를 구현하기 전) 중 약 1%의 요청이 영향을 받았으며, 가장 심각한 공격이 발생했던 8월 29일에는 수 초 동안 약 12%를 기록하며, 최고치를 달성했습니다. 다음 그래프에는 공격이 발생한 2시간 동안 오류의 비율이 나와있습니다.

이 글 아래에서 설명될 내용과 같이 그다음 날에는 이 숫자를 극적으로 낮추는 작업이 진행되었습니다. 스택의 변화와 이러한 공격 규모를 상당히 감소시킨 완화 조치 덕분에 현재 이 수치는 사실상 0입니다.

499 오류 및 HTTP/2 스트림 동시성의 문제

일부 고객이 알려온 또 다른 증상은 499 오류의 증가입니다. 그 이유는 조금 다른데, 이는 앞서 설명한 HTTP/2 연결의 최대 스트림 동시성 수와 관련이 있습니다.

HTTP/2 설정은 SETTINGS 프레임을 사용해 연결이 시작될 때 교환됩니다. 명시적 매개변수를 받지 않은 경우에는 기본값이 적용됩니다. 클라이언트가 HTTP/2 연결을 설정하고 나면, 서버의 SETTINGS(느림)를 기다리거나, 기본값을 가정하고 요청(빠름)을 시작할 수 있습니다. SETTINGS_MAX_CONCURRENT_STREAMS의 기본값은 사실상 무제한입니다(스트림 ID는 31비트 숫자 공간을 사용, 요청은 홀수를 사용하므로 실제 제한은 1073741824). 사양에서는 서버에서 100개 이상의 스트림을 제공하기를 권장합니다. 클라이언트는 일반적으로 속도에 편향되어 있으므로 서버 설정을 기다리지 않는 경향이 있으며, 이로 인해 약간의 경쟁 조건이 만들어집니다. 클라이언트는 서버에서 어떤 제한을 선택할지 도박을 하는 것과 같습니다. 잘못 선택한다면, 요청은 거부되고, 다시 시도해야 하기 때문입니다. 1073741824 스트림에 대한 도박은 다소 어리석은 일입니다. 그 대신 다수의 클라이언트는 서버가 사양 권장 사항을 따르기를 바라면서, 100개의 동시 스트림으로 제한할 것을 결정합니다. 서버가 100개 미만을 선택한다면, 클라이언트의 도박은 실패하고 스트림은 초기화됩니다.

서버가 동시성 제한을 초과한 스트림을 초기화하는 데는 여러가지 이유가 있습니다. HTTP/2는 프로토콜이 엄격하고, 구문 분석이 있거나 로직 오류가 있을 때 스트림을 닫아야 합니다. 2019년, Cloudflare에서는 HTTP/2 DoS 취약점에 대응해 몇 가지 완화 조치를 개발했습니다. 몇몇 취약점은 클라이언트의 올바르지 않은 행동, 서버에서 스트림을 초기화하도록 이끄는 클라이언트가 그 원인이었습니다. 이러한 클라이언트를 단속하는 매우 효과적인 방법은 연결 중 서버 초기화 횟수를 세고, 몇몇 임계값을 넘을 때 GOAWAY 프레임으로 연결을 끊는 것입니다. 합법적인 클라이언트라면 연결 시 한두 가지 정도의 실수는 할 수 있습니다. 클라이언트에서 너무 많은 실수를 범한다면, 문제가 있거나 악의를 가진 경우이며, 연결을 끊으면 두 가지 케이스가 모두 해결됩니다.

Cloudflare에서는 CVE-2023-44487으로 활성화된 DoS 공격에 대응하는 동안, 최대 스트림 동시성을 64개로 줄였습니다. 이러한 변화를 주기 전에는 클라이언트에서 SETTINGS를 기다리지 않고 100개의 동시성을 가정한다는 사실을 알지 못했습니다. 이미지 갤러리와 같은 일부 웹 페이지는 연결 시작 시에 곧바로 브라우저가 100개의 요청을 보내도록 하기도 합니다. 하지만, 우리의 제한을 초과한 36개의 스트림은 모두 초기화되어야 했으며, 이는 카운팅 완화 조치의 시행 조건으로 작용했습니다. 이는 합법적인 클라이언트와의 연결을 끊게 만들어 페이지 가져오기의 완전한 실패로 이어졌습니다. 따라서 이 상호운용성 문제를 인지하자마자 최대 스트림 동시성 수를 100개로 변경했습니다.

Cloudflare 측의 조치

2019년, HTTP/2 구현과 관련한 몇 가지 DoS 취약점이 발견되었습니다. Cloudflare에서는 이에 대한 대응으로 일련의 감지 및 완화 조치를 개발하고 배포했습니다. CVE-2023-44487은 HTTP/2 취약점의 다른 표현입니다. 하지만, 이를 완화하기 위해 기존의 보호 기능을 확장하여 클라이언트가 전송한 RST_STREAM 프레임을 모니터링하고, 악용되는 경우 연결을 닫을 수 있었습니다. 합법적인 클라이언트의 RST_STREAM 사용은 영향을 받지 않습니다.

우리는 직접적인 수정에 그치치 않고, 서버의 HTTP/2 프레임 처리 및 요청 전송 코드에도 몇 가지 개선점을 구현했습니다. 추가로, 비즈니스 로직 서버의 대기열 및 스케줄링 또한 개선해서, 불필요한 작업을 줄이고 취소 응답성을 높일 수 있었습니다. 이러한 개선 모두에 힘입어 다양한 잠재적 남용 패턴의 영향이 줄어들고 서버가 포화 상태에 이르기 전에 요청을 처리할 수 있는 공간이 더 확보됩니다.

공격의 조기 완화

Cloudflare에서는 더욱 저렴한 방식으로 대규모 공격을 효율적으로 완화할 수 있는 시스템을 이미 갖추고 있습니다. 그중 하나가 “IP Jail”입니다. IP Jail은 대규모 볼류메트릭 공격 시 공격에 참여하는 클라이언트 IP를 수집하고, 해당 IP 레벨이나 TLS 프록시에서 공격받은 위치와의 연결을 끊습니다. 하지만 이 시스템이 온전히 효과를 발휘하기 위해서는 몇 초가 필요합니다. 이 귀중한 시간에 원본은 보호받지만, 인프라는 여전히 모든 HTTP 요청을 받아들여야 합니다. 이 새로운 봇넷은 사실상 램프업 기간이 없으므로 우리는 문제가 되기 전에 공격을 무력화할 수 있는 능력을 갖추어야 합니다.

우리는 이를 구현하기 위해 전체 인프라를 보호하는 IP Jail 시스템을 확장했습니다. 일단 IP가 “구속되면”, 공격받은 위치로의 연결이 차단될 뿐만 아니라, 해당 IP는 일정 시간 Cloudflare의 다른 도메인에 HTTP/2를 사용하는 것도 금지됩니다. 이러한 프로토콜의 악용은 HTTP/1.x로는 불가능하므로, 공격자는 대규모 공격을 퍼부을 수 있는 능력이 제한되는 반면, 동일한 IP를 공유하는 합법적인 클라이언트는 해당 시간 아주 미세한 성능 저하만을 겪습니다. IP 기반의 완화 조치는 매우 무딘 도구이며, 그렇기 때문에 해당 규모로 사용할 때는 극도로 신중을 기해야 하며, 가능한 한 긍정 오류를 피해야 합니다. 또한 봇넷에서 특정 IP의 수명은 일반적으로 짧으므로, 장기간의 완화 조치는 장점보다 단점이 더 많을 수도 있습니다. 다음 그래프에는 우리가 목격한 공격 중 급격한 변화가 나와있습니다.

그래프에서 확인할 수 있듯이, 다수의 새로운 IP가 특정한 날 이후 매우 빠르게 사라집니다.

이 모든 조치는 HTTPS 파이프라인의 시작 부분에 있는 TLS 프록시에서 이루어지므로 우리의 일반 L7 완화 시스템과 비교했을 때 상당한 리소스를 절약할 수 있습니다. 이로 인해 훨씬 더 원활하게 공격에 대응할 수 있었고, 이제 봇넷으로 인한 무작위 502 오류 수는 0개가 되었습니다.

관찰 가능성 개선

Cloudflare에서 변화를 만들어 가고 있는 영역은 관찰 가능성입니다. 고객 분석에서 보이지 않고, 고객에게 오류를 돌려주는 상황은 만족할 수 없는 부분입니다. 다행스럽게도 최근 공격이 발생하기 훨씬 이전부터 해당 시스템을 정비하는 프로젝트가 진행 중이었습니다. 프로젝트의 궁극적인 방향은 비즈니스 로직 프록시에 의존하여 로그 데이터를 통합하고 방출하는 것이 아닌, 인프라 내의 각 서비스가 자체 데이터를 로깅할 수 있는 것입니다. 이번 사고로 인해 해당 작업의 중요성이 대두되었기에, 노력을 배가하고 있습니다.

또한 이러한 프로토콜 악용을 훨씬 더 빠르게 발견해 DDoS 완화 기능을 개선할 수 있는 연결 수준의 로깅 개선 업무도 작업 중입니다.

결론

이 사고는 가장 최근의 기록적인 공격이었지만, 이번이 마지막은 아닐 것이라는 사실은 잘 알고 있습니다. 공격이 점점 더 정교해짐에 따라 Cloudflare에서는 새로운 위협을 적극적으로 식별해서 수백만 고객이 즉각적이고 자동적으로 보호받을 수 있도록 전역 네트워크에 대응책을 배포하는 등 끊임없는 노력을 기울이고 있습니다.

Cloudflare는 2017년부터 고객에게 무료로 무제한 DDoS 방어를 제공해 왔습니다. 이와 더불어, 모든 규모의 조직 니즈에 맞는 광범위한 부가 보안 기능 또한 제공합니다. 보호받고 있는지 확신할 수 없거나 어떻게 보호받을 수 있는지 알고 싶다면 이곳으로 문의하세요.

HTTP/2 Rapid Reset：解构这场破纪录的攻击

2023-10-10 Lucas Pardue

Post Syndicated from Lucas Pardue original http://blog.cloudflare.com/zh-cn/technical-breakdown-http2-rapid-reset-ddos-attack-zh-cn/

从 2023 年 8 月 25 日开始，我们开始注意到一些异常大量的 HTTP 攻击袭击了我们的许多客户。我们的自动化 DDoS 系统检测到并缓解了这些攻击。然而，没过多久，它们就开始达到破纪录的规模 – 最终达到了每秒 2.01 亿次请求的峰值。此数量几乎是我们以前最大攻击记录数量的 3 倍。

受到攻击或需要额外保护？单击此处获取帮助。

而更令人担忧的是，攻击者能够利用一个只有 20,000 台机器的僵尸网络发起这样的攻击。而如今有的僵尸网络由数十万或数百万台机器组成。整个 web 网络通常每秒处理10-30 亿个请求，因此使用此方法可以将整个 web 网络的请求数量等级集中在少数目标上，而这并非不可想象。

检测和缓解

这是一种规模空前的新型攻击手段，Cloudflare 现有的保护措施在很大程度上能够抵御这种攻击的冲击。虽然最初我们看到了对客户流量的一些影响（在第一波攻击期间影响了大约1% 的请求），但今天我们已经能够改进我们的缓解方法，以阻止任何针对Cloudflare 客户的攻击，并保证自身的系统正常运行。

我们注意到这些攻击的同时，谷歌和 AWS 这两大行业巨头也发现了同样的情况。我们努力加固 Cloudflare 的系统，以确保目前我们所有的客户都能免受这种新的 DDoS 攻击方法的影响，而不会对客户造成任何影响。我们还与谷歌和 AWS 共同参与了向受影响的供应商和关键基础设施提供商披露攻击事件的协调工作。

这种攻击是通过滥用 HTTP/2 协议的某些功能和服务器实施详细信息实现的（详情请参见 CVE-2023-44487）。由于该攻击滥用了 HTTP/2 协议中的一个潜在弱点，我们认为实施了 HTTP/2 的任何供应商都会受到攻击。这包括所有现代网络服务器。我们已经与谷歌和 AWS 一起向网络服务器供应商披露了攻击方法，我们希望他们能够实施补丁。与此同时，最好的防御方法是在任何面向网络的 Web 服务器或 API 服务器前面使用诸如 Cloudflare 之类的 DDoS 缓解服务。

这篇文章深入探讨了 HTTP/2 协议的详细信息、攻击者利用来实施这些大规模攻击的功能，以及我们为确保所有客户受到保护而采取的缓解策略。我们希望通过公布这些详细信息，其他受影响的 Web 服务器和服务能够获得实施缓解策略所需的信息。此外，HTTP/2 协议标准团队以及开发未来 Web 标准的团队可以更好地设计这些标准，以防止此类攻击。

RST 攻击详细信息

HTTP 是为 Web 提供支持的应用协议。HTTP 语义对于所有版本的 HTTP 都是通用的 — 整体架构、术语和协议方面，例如请求和响应消息、方法、状态代码、标头和尾部字段、消息内容等等。每个单独的 HTTP 版本都定义了如何将语义转换为“有线格式”以通过 Internet 进行交换。例如，客户端必须将请求消息序列化为二进制数据并发送，然后服务器将其解析回它可以处理的消息。

HTTP/1.1 使用文本形式的序列化。请求和响应信息以 ASCII 字符流的形式进行交换，通过可靠的传输层（如 TCP）发送，使用以下格式（其中 CRLF 表示回车和换行）：

 HTTP-message   = start-line CRLF
                   *( field-line CRLF )
                   CRLF
                   [ message-body ]

例如，对于 https://blog.cloudflare.com/ 的一个非常简单的 GET 请求在线路上将如下所示：

GET / HTTP/1.1 CRLFHost: blog.cloudflare.comCRLFCRLF

响应将如下所示：

HTTP/1.1 200 OK CRLFServer: cloudflareCRLFContent-Length: 100CRLFtext/html; charset=UTF-8CRLFCRLF<100 bytes of data>

这种格式在线路上构造消息，这意味着可以使用单个 TCP 连接来交换多个请求和响应。但是，该格式要求每条消息都完整发送。此外，为了正确地将请求与响应关联起来，需要严格的排序；这意味着消息是串行交换的并且不能多路复用。https://blog.cloudflare.com/ 和 https://blog.cloudflare.com/page/2/ 的两个 GET 请求将是：

GET / HTTP/1.1 CRLFHost: blog.cloudflare.comCRLFCRLFGET /page/2/ HTTP/1.1 CRLFHost: blog.cloudflare.comCRLFCRLF

With the responses:

Web 页面需要比这些示例更复杂的 HTTP 交互。访问 Cloudflare 博客时，您的浏览器将加载多个脚本、样式和媒体资产。如果您使用 HTTP/1.1 访问首页，然后很快决定导航到第 2 页，您的浏览器可以从两个选项中进行选择。要么在第 2 页开始之前等待对您不再需要的页面的所有已排入队列的响应，要么通过关闭 TCP 连接并打开一个新连接来取消进行中的请求。这两种方法都不太实用。浏览器往往通过管理 TCP 连接池（每台主机最多 6 个连接）并在池上实现复杂的请求分派逻辑来绕过这些限制。

HTTP/2 解决了 HTTP/1.1 的许多问题。每个 HTTP 消息都被序列化为一组 HTTP/2 帧，这些帧具有类型、长度、标志、流标识符 (ID) 和有效负载。流 ID 清楚地表明线路上的哪些字节适用于哪个消息，从而允许安全的多路复用和并发。流是双向的。客户端发送帧，服务器使用相同的 ID 回复帧。

在 HTTP/2 中，我们对 https://blog.cloudflare.com 的 GET 请求将通过流 ID 1 交换，客户端发送一个 HEADERS 帧，服务器使用一个 HEADERS 帧进行响应，后跟一个或多个 DATA 帧。客户端请求始终使用奇数流 ID，因此后续请求将使用流 ID 3、5 等。可以以任何顺序提供响应，并且来自不同流的帧可以交织。

流多路复用和并发是 HTTP/2 的强大功能。它们可以更有效地使用单个 TCP 连接。HTTP/2 优化了资源获取，尤其是在与优先排序相结合时。另一方面，与 HTTP/1.1 相比，让客户端更容易启动大量并行工作会增加对服务器资源的峰值需求。这显然是拒绝服务的一个载体。

为了提供一些防护措施，HTTP/2 提供了最大活动并发流的概念。SETTINGS_MAX_CONCURRENT_STREAMS 参数允许服务器公布其并发限制。例如，如果服务器声明限制为 100，那么任何时候都最多只能有 100 个请求处于活动状态。如果客户端试图打开超过此限制的流，服务器一定会使用 RST_STREAM 帧拒绝它。流拒绝不会影响连接上的其他正在进行中的流。

真实情况要复杂一些。流有生命周期。下面是 HTTP/2 流状态机的示意图。客户端和服务器管理各自的流状态视图。当发送或接收 HEADERS、DATA 和 RST_STREAM 帧时，它们会触发转换。虽然流状态的视图是独立的，但它们是同步的。

HEADERS 和 DATA 帧包含一个 END_STREAM 标志，当设置为 1 (true) 时，可触发状态转换。

让我们通过一个没有消息内容的 GET 请求示例来解决这个问题。客户端以 HEADERS 帧的形式发送请求，并将 END_STREAM 标志设置为 1。客户端首先将流从空闲状态转换为打开状态，然后立即转换为半关闭状态。客户端半关闭状态意味着它不能再发送HEADERS或DATA，只能发送 WINDOW_UPDATE、PRIORITY 或 RST_STREAM 帧。然而，它可以接收任何帧。

一旦服务器接收并解析了 HEADERS 帧，它就会将流状态从空闲转变为打开，然后半关闭，因此它与客户端匹配。服务器半关闭状态意味着它可以发送任何帧，但只能接收 WINDOW_UPDATE、PRIORITY 或 RST_STREAM 帧。

对 GET 的响应包含消息内容，因此服务器发送 END_STREAM 标志设置为 0 的 HEADERS，然后发送 END_STREAM 标志设置为 1 的 DATA。DATA 帧触发服务器上流从半关闭到关闭的转换。当客户端收到它时，它也会转换为关闭状态。一旦流关闭，就无法发送或接收任何帧。

将此生命周期应用回并发上下文中，HTTP/2 指出：

处于“打开”状态或任一种“半关闭”状态的流计入允许端点打开的最大流数量。处于这三种状态中任何一种状态的流都将计入在 SETTINGS_MAX_CONCURRENT_STREAMS 设置中公布的限制。

理论上，并发限制是有用的。不过，也有一些实际因素会影响它的效果，我们将在这篇博文的后面部分讲述。

HTTP/2 请求取消

在前文中，我们谈到了客户端取消正在进行的请求的问题。与 HTTP/1.1 相比，HTTP/2 支持这种方式的效率要高得多。客户端无需中断整个连接，只需针对单个流发送一个 RST_STREAM 帧。这将指示服务器停止处理该请求并中止响应，从而释放服务器资源并避免浪费带宽。

让我们来看看前面 3 个请求的例子。这一次，客户端在发送完所有 HEADERS 帧后，取消了针对流 1 的请求。服务器在准备好提供响应之前，会解析此 RST_STREAM 帧，并改为只响应流 3 和流 5：

取消请求是一个非常有用的功能。例如，当滚动包含多个图像的网页时，网络浏览器可以取消落在视口之外的图像，这意味着进入视口的图像可以更快地加载。与 HTTP/1.1 相比，HTTP/2 使这种行为更加高效。

被取消的请求流会快速过渡整个流生命周期。 END_STREAM 标志设置为 1 的客户端 HEADERS 状态从空闲状态转换为打开状态再到半关闭状态，然后 RST_STREAM 立即导致从半关闭状态转换为关闭状态。

回想一下，只有处于打开或半关闭状态的流才会影响流并发限制。当客户端取消流时，它立即能够在其位置打开另一个流，并可以立即发送另一个请求。这就是 CVE-2023-44487 能够发挥作用的关键所在。

快速重置导致拒绝服务

HTTP/2 请求取消可能被滥用来快速重置无限数量的流。当 HTTP/2 服务器能够足够快地处理客户端发送的 RST_STREAM 帧并拆除状态时，这种快速重置不会导致问题。当整理工作出现任何延误或滞后时，问题就会开始出现。客户端可能会处理大量请求，从而导致工作积压，从而导致服务器上资源的过度消耗。

常见的 HTTP 部署架构会在其他组件前面运行 HTTP/2 代理或负载平衡器。当客户端请求到达时，它会被快速分派，而实际工作则作为异步活动在其他地方完成。这样，代理就能非常高效地处理客户端流量。然而，这种关注点的分离会使代理难以整理正在处理的作业。因此，这些部署更有可能遇到快速重置造成的问题。

Cloudflare 的反向代理处理传入的 HTTP/2 客户端流量时，会将数据从连接的套接字复制到缓冲区，并按顺序处理缓冲的数据。在读取每个请求（HEADERS 和 DATA 帧）时，它会被分派到上游服务。读取 RST_STREAM 帧时，请求的本地状态会被删除，并通知上游请求已被取消。如此循环往复，直到整个缓冲区中的数据处理完毕。然而，这种逻辑可能会被滥用：当恶意客户端开始发送大量请求链，并在连接开始时重置时，我们的服务器就会迫不及待地读取所有请求，给上游服务器造成压力，以至于无法处理任何新的传入请求。

需要强调的是，流并发本身并不能缓解快速重置。无论服务器选择的 SETTINGS_MAX_CONCURRENT_STREAMS 值是多少，客户端可以不断发送请求以创建高请求率。

快速重置剖析

以下是使用概念验证客户端尝试发出总共 1000 个请求的快速重置示例。我使用了现成的服务器，没有任何缓解措施；在测试环境中侦听端口 443。为了清楚起见，使用 Wireshark 剖析流量并进行过滤以仅显示 HTTP/2 流量。下载 pcap 以进行后续操作。

要看清楚有点难，因为有很多帧。我们可以通过 Wireshark 的“统计 > HTTP2”工具快速获得摘要：

在此跟踪中，数据包 14 中的第一个帧是服务器的 SETTINGS 帧，它标明最大流并发为 100。在数据包 15 中，客户端发送了几个控制帧，然后开始发出会快速重置的请求。第一个 HEADERS 帧长 26 字节，而随后的所有 HEADERS 帧都只有 9 字节。这种大小差异是由于一种名为 HPACK 的压缩技术造成。数据包 15 总共包含 525 个请求，最高可达 1051 个流。

有趣的是，流 1051 的 RST_STREAM 帧并未包含在数据包 15 中，因此在数据包 16 中，我们看到服务器传回 404 响应。然后，在数据包 17 中，客户端发送 RST_STREAM，然后继续发送余下的 475 个请求。

请注意，虽然服务器声明有 100 个并发流，但客户端发送的两个数据包的 HEADERS 帧数远超此值。客户端无需等待服务器的任何返回流量，它只受限于它可以发送的数据包大小。在此跟踪中未发现服务器 RST_STREAM 帧，表明服务器未发现并发流违规。

对客户的影响

如上所述，当请求被取消时，上游服务会收到通知，并中止请求以免浪费过多资源。这次攻击就是这种情况，大多数恶意请求从未被转发到源服务器。然而，这些攻击的规模很大，确实造成了一些影响。

首先，当传入请求的速度达到前所未有的峰值时，我们收到了客户端发现 502 错误增多的报告。这种情况发生在我们受影响最大的数据中心，彼时它们正在努力处理所有请求。虽然我们的网络可以应对大规模攻击，但这一特殊漏洞暴露了我们基础设施中的薄弱环节。让我们深入探讨一下详细信息，重点关注当传入的请求到达我们的数据中心之一时如何处理：

我们可以看到我们的基础设施由一系列具有不同职责的不同代理服务器组成。特别是，当客户端连接到 Cloudflare 发送 HTTPS 流量时，它首先会命中我们的 TLS 解密代理：它解密 TLS 流量，处理 HTTP 1、2 或 3 流量，然后将其转发到我们的“业务逻辑”代理。它负责加载每个客户的所有设置，然后将请求正确路由到其他上游服务 – 更重要的是，在我们的例子中，它还负责安全功能。这是处理 L7 攻击缓解的地方。

这种攻击手段的问题在于，它能在每个连接中快速发送大量请求。每个请求都必须转发到业务逻辑代理，我们才有机会阻止它。当请求吞吐量超过我们代理的处理能力时，连接这两项服务的管道在我们的一些服务器中达到了饱和水平。

当发生这种情况时，TLS 代理就无法再连接到其上游代理，因此在最严重的攻击中，一些客户端会看到“502 Bad Gateway”错误。值得注意的是，到目前为止，用于创建 HTTP 分析的日志也是由我们的业务逻辑代理发布的。这样做的后果是，在 Cloudflare 仪表板中看不到这些错误。我们的内部仪表板显示，在最初的攻击浪潮中（在我们实施缓解措施之前），约有 1% 的请求受到影响，在 8 月 29 日最严重的一次攻击中，有几秒钟的峰值达到了约 12%。下图显示了出现这种情况时的两小时内这些错误的比率：

在接下来的几天里，我们努力大幅减少了这一数字，详见本帖下文。由于我们的堆栈发生了变化，而且我们的缓解措施大大降低了这些攻击的规模，如今这一数字实际上为零。

499 错误和 HTTP/2 流并发挑战

一些客户报告的另一个症状是 499 错误增加。造成这种情况的原因有些不同，与本贴前面详述的 HTTP/2 连接中的最大流并发相关。

HTTP/2 设置在连接开始时使用 SETTINGS 帧进行交换。如果没有收到明确的参数，则会使用默认值。客户端建立 HTTP/2 连接后，可以等待服务器的 SETTINGS（慢），也可以使用默认值开始发出请求（快）。对于 SETTINGS_MAX_CONCURRENT_STREAMS，默认值实际上是无限的（流 ID 使用 31 位数字空间，请求使用奇数，因此实际限制是 1073741824）。规范建议服务器提供不少于 100 个数据流。客户端通常偏向于速度，因此不会等待服务器设置，这就造成了一些竞争情况。客户端会赌服务器可能选择的限制；如果客户端赌错了，请求将被拒绝，并且客户端必须重试。在 1073741824 个流上赌有点傻。取而代之，许多客户端决定将自己限制在发布 100 个并发流，希望服务器遵循规范建议。如果服务器选择低于 100 的数值，则客户端赌错了，流将被重置。

服务器重置超出并发限制的流的原因有很多。HTTP/2 是严格的，在出现解析或逻辑错误时会要求关闭流。在 2019 年，Cloudflare 针对 HTTP/2 DoS 漏洞开发了多个缓解措施。其中几个漏洞是由客户端行为不当导致服务器重置流造成的。遏制此类客户端的一个非常有效的策略是对在连接期间服务器重置的次数进行计数，当次数超过某个阈值时，就使用 GOAWAY 帧关闭连接。合法客户可能会在一次连接中犯一两个错误，这是可以接受的。如果客户端犯错误的次数过多，它很可能是损坏的客户端或恶意客户端，关闭连接可以解决这两种情况。

在应对由 CVE-2023-44487 引发的 DoS 攻击时，Cloudflare 将最大流并发降至 64。在进行此更改之前，我们并不知道客户端不会等待 SETTINGS，而是假设并发为 100。某些 Web 页面（如图片库）确实会导致浏览器在连接开始时立即发送 100 个请求。不幸的是，超过限制的 36 个流都需要重置，这触发了我们的计数缓解措施。这意味着我们关闭了合法客户端的连接，导致页面加载完全失败。我们意识到这个互操作性问题后，立即将最大流并发更改为 100。

Cloudflare 方面的行动

在 2019 年，发现了几个与 HTTP/2 实现相关的 DoS 漏洞。作为回应，Cloudflare 开发并部署了一系列检测和缓解措施。CVE-2023-44487 是 HTTP/2 漏洞的另一种表现形式。不过，为了缓解它，我们能够扩展现有的保护，以监控客户端发送的 RST_STREAM 帧，并在它们被用于滥用时关闭连接。客户端对 RST_STREAM 的合法使用不受影响。

除了直接修复外，我们还对服务器的 HTTP/2 帧处理和请求分派代码进行了多项改进。此外，业务逻辑服务器还改进了队列和分派，减少了不必要的工作，提高了对取消操作的响应速度。这些措施多管齐下，减轻了各种潜在滥用模式的影响，并为服务器在饱和前处理请求提供了更多空间。

尽早缓解攻击

Cloudflare 已经部署了一套系统，可以通过成本较低的方法有效缓解超大型攻击。其中一个系统名为 IP Jail。对于超容量攻击，该系统会收集参与攻击的客户端 IP，并阻止它们连接到受攻击的财产（无论是在 IP 级别还是在我们的 TLS 代理中）。然而，该系统需要几秒钟才能完全生效；在这宝贵的几秒钟内，源头已经受到保护，但我们的基础设施仍然需要吸收所有 HTTP 请求。由于这种新的僵尸网络实际上没有启动期，因此我们需要能够在攻击成为问题之前将其消灭。

为此，我们扩展了 IP Jail 系统，以保护我们的整个基础设施：一旦一个 IP 被“监禁”，它不仅会被阻止连接到受攻击的资产，我们还会禁止相应的 IP 在一段时间内使用 HTTP/2 连接到 Cloudflare 上的任何其他域。因此，无法通过使用 HTTP/1.x 来滥用协议。这就限制了攻击者实施大规模攻击的能力，而共用同一 IP 的任何合法客户端在此期间只会看到非常小的性能下降。基于 IP 的缓解措施是一种非常笨拙的工具 – 这就是为什么我们在这种规模下使用它们时必须非常小心，并尽可能避免误报。此外，僵尸网络中给定 IP 的寿命通常很短，因此任何长期缓解措施都可能弊大于利。下图显示了我们目睹的攻击中 IP 的变化情况：

我们可以看到，在某一天发现的许多新 IP 后来很快就消失了。

由于所有这些操作都在 HTTPS 管道开始时在我们的 TLS 代理中发生，因此与常规的 L7 缓解系统相比，可以节省大量资源。这使我们能够更顺利地应对这些攻击，现在这些僵尸网络造成的随机 502 错误数量已降至零。

攻击可观测性改进

我们正在改变的另一个方面是可观察性。将错误返回到客户端但在客户分析中不可见这种情况令人不满。幸运的是，早在最近的袭击发生之前，就有一个项目正在对这些系统进行全面检查。它最终将允许我们基础设施中的每个服务记录自己的数据，而不是依赖我们的业务逻辑代理来整合和发布日志数据。这次事件凸显了这项工作的重要性，我们将加倍努力。

我们也在努力改进连接层面的日志记录，使我们能够更快地发现此类协议滥用，从而提高我们的 DDoS 缓解能力。

总结

虽然这是最近一次破纪录的攻击，但我们知道这不会是最后一次。随着攻击的不断复杂化，Cloudflare 坚持不懈地努力，积极主动地识别新的威胁，并在我们的全球网络中部署应对措施，使我们的数百万客户能够立即自动地受到保护。

自从 2017 年以来，Cloudflare 一直为我们的所有客户提供免费、不计量且无限制的 DDoS 防护。此外，我们还提供一系列附加安全功能，以满足各种规模组织的需求。如果您不确定自己是否受到保护，或想了解如何才能受到保护，请联系我们。

Introducing HTTP/3 Prioritization

2023-06-20 Lucas Pardue

Post Syndicated from Lucas Pardue original http://blog.cloudflare.com/better-http-3-prioritization-for-a-faster-web/

Introducing HTTP/3 Prioritization

Today, Cloudflare is very excited to announce full support for HTTP/3 Extensible Priorities, a new standard that speeds the loading of webpages by up to 37%. Cloudflare worked closely with standards builders to help form the specification for HTTP/3 priorities and is excited to help push the web forward. HTTP/3 Extensible Priorities is available on all plans on Cloudflare. For paid users, there is an enhanced version available that improves performance even more.

Web pages are made up of many objects that must be downloaded before they can be processed and presented to the user. Not all objects have equal importance for web performance. The role of HTTP prioritization is to load the right bytes at the most opportune time, to achieve the best results. Prioritization is most important when there are multiple objects all competing for the same constrained resource. In HTTP/3, this resource is the QUIC connection. In most cases, bandwidth is the bottleneck from server to client. Picking what objects to dedicate bandwidth to, or share bandwidth amongst, is a critical foundation to web performance. When it goes askew, the other optimizations we build on top can suffer.

Today, we're announcing support for prioritization in HTTP/3, using the full capabilities of the HTTP Extensible Priorities (RFC 9218) standard, augmented with Cloudflare's knowledge and experience of enhanced HTTP/2 prioritization. This change is compatible with all mainstream web browsers and can improve key metrics such as Largest Contentful Paint (LCP) by up to 37% in our test. Furthermore, site owners can apply server-side overrides, using Cloudflare Workers or directly from an origin, to customize behavior for their specific needs.

Looking at a real example

The ultimate question when it comes to features like HTTP/3 Priorities is: how well does this work and should I turn it on? The details are interesting and we'll explain all of those shortly but first lets see some demonstrations.

In order to evaluate prioritization for HTTP/3, we have been running many simulations and tests. Each web page is unique. Loading a web page can require many TCP or QUIC connections, each of them idiosyncratic. These all affect how prioritization works and how effective it is.

To evaluate the effectiveness of priorities, we ran a set of tests measuring Largest Contentful Paint (LCP). As an example, we benchmarked blog.cloudflare.com to see how much we could improve performance:

As a film strip, this is what it looks like:

In terms of actual numbers, we see Largest Contentful Paint drop from 2.06 seconds down to 1.29 seconds. Let’s look at why that is. To analyze exactly what’s going on we have to look at a waterfall diagram of how this web page is loading. A waterfall diagram is a way of visualizing how assets are loading. Some may be loaded in parallel whilst some might be loaded sequentially. Without smart prioritization, the waterfall for loading assets for this web page looks as follows:

There are several interesting things going on here so let's break it down. The LCP image at request 21 is for 1937-1.png, weighing 30.4 KB. Although it is the LCP image, the browser requests it as priority u=3,i, which informs the server to put it in the same round-robin bandwidth-sharing bucket with all of the other images. Ahead of the LCP image is index.js, a JavaScript file that is loaded with a "defer" attribute. This JavaScript is non-blocking and shouldn't affect key aspects of page layout.

What appears to be happening is that the browser gives index.js the priority u=3,i=?0, which places it ahead of the images group on the server-side. Therefore, the 217 KB of index.js is sent in preference to the LCP image. Far from ideal. Not only that, once the script is delivered, it needs to be processed and executed. This saturates the CPU and prevents the LCP image from being painted, for about 300 milliseconds, even though it was delivered already.

The waterfall with prioritization looks much better:

We used a server-side override to promote the priority of the LCP image 1937-1.png from u=3,i to u=2,i. This has the effect of making it leapfrog the "defer" JavaScript. We can see at around 1.2 seconds, transmission of index.js is halted while the image is delivered in full. And because it takes another couple of hundred milliseconds to receive the remaining JavaScript, there is no CPU competition for the LCP image paint. These factors combine together to drastically improve LCP times.

How Extensible Priorities actually works

First of all, you don't need to do anything yourselves to make it work. Out of the box, browsers will send Extensible Priorities signals alongside HTTP/3 requests, which we'll feed into our priority scheduling decision making algorithms. We'll then decide the best way to send HTTP/3 response data to ensure speedy page loads.

Extensible Priorities has a similar interaction model to HTTP/2 priorities, client send priorities and servers act on them to schedule response data, we'll explain exactly how that works in a bit.

HTTP/2 priorities used a dependency tree model. While this was very powerful it turned out hard to implement and use. When the IETF came to try and port it to HTTP/3 during the standardization process, we hit major issues. If you are interested in all that background, go and read my blog post describing why we adopted a new approach to HTTP/3 prioritization.

Extensible Priorities is a far simpler scheme. HTTP/2's dependency tree with 255 weights and dependencies (that can be mutual or exclusive) is complex, hard to use as a web developer and could not work for HTTP/3. Extensible Priorities has just two parameters: urgency and incremental, and these are capable of achieving exactly the same web performance goals.

Urgency is an integer value in the range 0-7. It indicates the importance of the requested object, with 0 being most important and 7 being the least. The default is 3. Urgency is comparable to HTTP/2 weights. However, it's simpler to reason about 8 possible urgencies rather than 255 weights. This makes developer's lives easier when trying to pick a value and predicting how it will work in practice.

Incremental is a boolean value. The default is false. A true value indicates the requested object can be processed as parts of it are received and read – commonly referred to as streaming processing. A false value indicates the object must be received in whole before it can be processed.

Let's consider some example web objects to put these parameters into perspective:

An HTML document is the most important piece of a webpage. It can be processed as parts of it arrive. Therefore, urgency=0 and incremental=true is a good choice.
A CSS style is important for page rendering and could block visual completeness. It needs to be processed in whole. Therefore, urgency=1 and incremental=false is suitable, this would mean it doesn't interfere with the HTML.
An image file that is outside the browser viewport is not very important and it can be processed and painted as parts arrive. Therefore, urgency=3 and incremental=true is appropriate to stop it interfering with sending other objects.
An image file that is the "hero image" of the page, making it the Largest Contentful Pain element. An urgency of 1 or 2 will help it avoid being mixed in with other images. The choice of incremental value is a little subjective and either might be appropriate.

When making an HTTP request, clients decide the Extensible Priority value composed of the urgency and incremental parameters. These are sent either as an HTTP header field in the request (meaning inside the HTTP/3 HEADERS frame on a request stream), or separately in an HTTP/3 PRIORITY_UPDATE frame on the control stream. HTTP headers are sent once at the start of a request; a client might change its mind so the PRIORITY_UPDATE frame allows it to reprioritize at any point in time.

For both the header field and PRIORITY_UPDATE, the parameters are exchanged using the Structured Fields Dictionary format (RFC 8941) and serialization rules. In order to save bytes on the wire, the parameters are shortened – urgency to 'u', and incremental to 'i'.

Here's how the HTTP header looks alongside a GET request for important HTML, using HTTP/3 style notation:

HEADERS:
    :method = GET
    :scheme = https
    :authority = example.com
    :path = /index.html
     priority = u=0,i

The PRIORITY_UPDATE frame only carries the serialized Extensible Priority value:

PRIORITY_UPDATE:
    u=0,i

Structured Fields has some other neat tricks. If you want to indicate the use of a default value, then that can be done via omission. Recall that the urgency default is 3, and incremental default is false. A client could send "u=1" alongside our important CSS request (urgency=1, incremental=false). For our lower priority image it could send just "i=?1" (urgency=3, incremental=true). There's even another trick, where boolean true dictionary parameters are sent as just "i". You should expect all of these formats to be used in practice, so it pays to be mindful about their meaning.

Extensible Priority servers need to decide how best to use the available connection bandwidth to schedule the response data bytes. When servers receive priority client signals, they get one form of input into a decision making process. RFC 9218 provides a set of scheduling recommendations that are pretty good at meeting a board set of needs. These can be distilled down to some golden rules.

For starters, the order of requests is crucial. Clients are very careful about asking for things at the moment they want it. Serving things in request order is good. In HTTP/3, because there is no strict ordering of stream arrival, servers can use stream IDs to determine this. Assuming the order of the requests is correct, the next most important thing is urgency ordering. Serving according to urgency values is good.

Be wary of non-incremental requests, as they mean the client needs the object in full before it can be used at all. An incremental request means the client can process things as and when they arrive.

With these rules in mind, the scheduling then becomes broadly: for each urgency level, serve non-incremental requests in whole serially, then serve incremental requests in round robin fashion in parallel. What this achieves is dedicated bandwidth for very important things, and shared bandwidth for less important things that can be processed or rendered progressively.

Let's look at some examples to visualize the different ways the scheduler can work. These are generated by using quiche's qlog support and running it via the qvis analysis tool. These diagrams are similar to a waterfall chart; the y-dimension represents stream IDs (0 at the top, increasing as we move down) and the x-dimension shows reception of stream data.

Example 1: all streams have the same urgency and are non-incremental so get served in serial order of stream ID.

Example 2: the streams have the same urgency and are incremental so get served in round-robin fashion.

Example 3: the streams have all different urgency, with later streams being more important than earlier streams. The data is received serially but in a reverse order compared to example 1.

Beyond the Extensible Priority signals, a server might consider other things when scheduling, such as file size, content encoding, how the application vs content origins are configured etc.. This was true for HTTP/2 priorities but Extensible Priorities introduces a new neat trick, a priority signal can also be sent as a response header to override the client signal.

This works especially well in a proxying scenario where your HTTP/3 terminating proxy is sat in front of some backend such as Workers. The proxy can pass through the request headers to the backend, it can inspect these and if it wants something different, return response headers to the proxy. This allows powerful tuning possibilities and because we operate on a semantic request basis (rather than HTTP/2 priorities dependency basis) we don't have all the complications and dangers. Proxying isn't the only use case. Often, one form of "API" to your local server is via setting response headers e.g., via configuration. Leveraging that approach means we don't have to invent new APIs.

Let's consider an example where server overrides are useful. Imagine we have a webpage with multiple images that are referenced via <img> tags near the top of the HTML. The browser will process these quite early in the page load and want to issue requests. At this point, it might not know enough about the page structure to determine if an image is in the viewport or outside the viewport. It can guess, but that might turn out to be wrong if the page is laid out a certain way. Guessing wrong means that something is misprioritized and might be taking bandwidth away from something that is more important. While it is possible to reprioritize things mid-flight using the PRIORITY_UPDATE frame, this action is "laggy" and by the time the server realizes things, it might be too late to make much difference.

Fear not, the web developer who built the page knows exactly how it is supposed to be laid out and rendered. They can overcome client uncertainty by overriding the Extensible Priority when they serve the response. For instance, if a client guesses wrong and requests the LCP image at a low priority in a shared bandwidth bucket, the image will load slower and web performance metrics will be adversely affected. Here's how it might look and how we can fix it:

Request HEADERS:
    :method = GET
    :scheme = https
    :authority = example.com
    :path = /lcp-image.jpg
     priority = u=3,i

Response HEADERS:
:status = 200
content-length: 10000
content-type: image/jpeg
priority = u=2

Priority response headers are one tool to tweak client behavior and they are complementary to other web performance techniques. Methods like efficiently ordering elements in HTML, using attributes like "async" or "defer", augmenting HTML links with Link headers, or using more descriptive link relationships like “preload” all help to improve a browser's understanding of the resources comprising a page. A website that optimizes these things provides a better chance for the browser to make the best choices for prioritizing requests.

More recently, a new attribute called “fetchpriority” has emerged that allows developers to tune some of the browser behavior, by boosting or dropping the priority of an element relative to other elements of the same type. The attribute can help the browser do two important things for Extensible priorities: first, the browser might send the request earlier or later, helping to satisfy our golden rule #1 – ordering. Second, the browser might pick a different urgency value, helping to satisfy rule #2. However, "fetchpriority" is a nudge mechanism and it doesn't allow for directly setting a desired priority value. The nudge can be a bit opaque. Sometimes the circumstances benefit greatly from just knowing plainly what the values are and what the server will do, and that's where the response header can help.

Conclusions

We’re excited about bringing this new standard into the world. Working with standards bodies has always been an amazing partnership and we’re very pleased with the results. We’ve seen great results with HTTP/3 priorities, reducing Largest Contentful Paint by up to 37% in our test. If you’re interested in turning on HTTP/3 priorities for your domain, just head on over to the Cloudflare dashboard and hit the toggle.

HTTP RFCs have evolved: A Cloudflare view of HTTP usage trends

2022-06-06 Lucas Pardue

Post Syndicated from Lucas Pardue original https://blog.cloudflare.com/cloudflare-view-http3-usage/

HTTP RFCs have evolved: A Cloudflare view of HTTP usage trends

Today, a cluster of Internet standards were published that rationalize and modernize the definition of HTTP – the application protocol that underpins the web. This work includes updates to, and refactoring of, HTTP semantics, HTTP caching, HTTP/1.1, HTTP/2, and the brand-new HTTP/3. Developing these specifications has been no mean feat and today marks the culmination of efforts far and wide, in the Internet Engineering Task Force (IETF) and beyond. We thought it would be interesting to celebrate the occasion by sharing some analysis of Cloudflare’s view of HTTP traffic over the last 12 months.

However, before we get into the traffic data, for quick reference, here are the new RFCs that you should make a note of and start using:

HTTP Semantics – RFC 9110
- HTTP’s overall architecture, common terminology and shared protocol aspects such as request and response messages, methods, status codes, header and trailer fields, message content, representation data, content codings and much more. Obsoletes RFCs 2818, 7231, 7232, 7233, 7235, 7538, 7615, 7694, and portions of 7230.
HTTP Caching – RFC 9111
- HTTP caches and related header fields to control the behavior of response caching. Obsoletes RFC 7234.
HTTP/1.1 – RFC 9112
- A syntax, aka "wire format", of HTTP that uses a text-based format. Typically used over TCP and TLS. Obsolete portions of RFC 7230.
HTTP/2 – RFC 9113
- A syntax of HTTP that uses a binary framing format, which provides streams to support concurrent requests and responses. Message fields can be compressed using HPACK. Typically used over TCP and TLS. Obsoletes RFCs 7540 and 8740.
HTTP/3 – RFC 9114
- A syntax of HTTP that uses a binary framing format optimized for the QUIC transport protocol. Message fields can be compressed using QPACK.
QPACK – RFC 9204
- A variation of HPACK field compression that is optimized for the QUIC transport protocol.

On May 28, 2021, we enabled QUIC version 1 and HTTP/3 for all Cloudflare customers, using the final “h3” identifier that matches RFC 9114. So although today’s publication is an occasion to celebrate, for us nothing much has changed, and it’s business as usual.

Support for HTTP/3 in the stable release channels of major browsers came in November 2020 for Google Chrome and Microsoft Edge and April 2021 for Mozilla Firefox. In Apple Safari, HTTP/3 support currently needs to be enabled in the “Experimental Features” developer menu in production releases.

A browser and web server typically automatically negotiate the highest HTTP version available. Thus, HTTP/3 takes precedence over HTTP/2. We looked back over the last year to understand HTTP/3 usage trends across the Cloudflare network, as well as analyzing HTTP versions used by traffic from leading browser families (Google Chrome, Mozilla Firefox, Microsoft Edge, and Apple Safari), major search engine indexing bots, and bots associated with some popular social media platforms. The graphs below are based on aggregate HTTP(S) traffic seen globally by the Cloudflare network, and include requests for website and application content across the Cloudflare customer base between May 7, 2021, and May 7, 2022. We used Cloudflare bot scores to restrict analysis to “likely human” traffic for the browsers, and to “likely automated” and “automated” for the search and social bots.

Traffic by HTTP version

Overall, HTTP/2 still comprises the majority of the request traffic for Cloudflare customer content, as clearly seen in the graph below. After remaining fairly consistent through 2021, HTTP/2 request volume increased by approximately 20% heading into 2022. HTTP/1.1 request traffic remained fairly flat over the year, aside from a slight drop in early December. And while HTTP/3 traffic initially trailed HTTP/1.1, it surpassed it in early July, growing steadily and roughly doubling in twelve months.

HTTP/3 traffic by browser

Digging into just HTTP/3 traffic, the graph below shows the trend in daily aggregate request volume over the last year for HTTP/3 requests made by the surveyed browser families. Google Chrome (orange line) is far and away the leading browser, with request volume far outpacing the others.

Below, we remove Chrome from the graph to allow us to more clearly see the trending across other browsers. Likely because it is also based on the Chromium engine, the trend for Microsoft Edge closely mirrors Chrome. As noted above, Mozilla Firefox first enabled production support in version 88 in April 2021, making it available by default by the end of May. The increased adoption of that updated version during the following month is clear in the graph as well, as HTTP/3 request volume from Firefox grew rapidly. HTTP/3 traffic from Apple Safari increased gradually through April, suggesting growth in the number of users enabling the experimental feature or running a Technology Preview version of the browser. However, Safari’s HTTP/3 traffic has subsequently dropped over the last couple of months. We are not aware of any specific reasons for this decline, but our most recent observations indicate HTTP/3 traffic is recovering.

Looking at the lines in the graph for Chrome, Edge, and Firefox, a weekly cycle is clearly visible in the graph, suggesting greater usage of these browsers during the work week. This same pattern is absent from Safari usage.

Across the surveyed browsers, Chrome ultimately accounts for approximately 80% of the HTTP/3 requests seen by Cloudflare, as illustrated in the graphs below. Edge is responsible for around another 10%, with Firefox just under 10%, and Safari responsible for the balance.

We also wanted to look at how the mix of HTTP versions has changed over the last year across each of the leading browsers. Although the percentages vary between browsers, it is interesting to note that the trends are very similar across Chrome, Firefox and Edge. (After Firefox turned on default HTTP/3 support in May 2021, of course.) These trends are largely customer-driven – that is, they are likely due to changes in Cloudflare customer configurations.

Most notably we see an increase in HTTP/3 during the last week of September, and a decrease in HTTP/1.1 at the beginning of December. For Safari, the HTTP/1.1 drop in December is also visible, but the HTTP/3 increase in September is not. We expect that over time, once Safari supports HTTP/3 by default that its trends will become more similar to those seen for the other browsers.

Traffic by search indexing bot

Back in 2014, Google announced that it would start to consider HTTPS usage as a ranking signal as it indexed websites. However, it does not appear that Google, or any of the other major search engines, currently consider support for the latest versions of HTTP as a ranking signal. (At least not directly – the performance improvements associated with newer versions of HTTP could theoretically influence rankings.) Given that, we wanted to understand which versions of HTTP the indexing bots themselves were using.

Despite leading the charge around the development of QUIC, and integrating HTTP/3 support into the Chrome browser early on, it appears that on the indexing/crawling side, Google still has quite a long way to go. The graph below shows that requests from GoogleBot are still predominantly being made over HTTP/1.1, although use of HTTP/2 has grown over the last six months, gradually approaching HTTP/1.1 request volume. (A blog post from Google provides some potential insights into this shift.) Unfortunately, the volume of requests from GoogleBot over HTTP/3 has remained extremely limited over the last year.

Microsoft’s BingBot also fails to use HTTP/3 when indexing sites, with near-zero request volume. However, in contrast to GoogleBot, BingBot prefers to use HTTP/2, with a wide margin developing in mid-May 2021 and remaining consistent across the rest of the past year.

Major social media platforms use custom bots to retrieve metadata for shared content, improve language models for speech recognition technology, or otherwise index website content. We also surveyed the HTTP version preferences of the bots deployed by three of the leading social media platforms.

Although Facebook supports HTTP/3 on their main website (and presumably their mobile applications as well), their back-end FacebookBot crawler does not appear to support it. Over the last year, on the order of 60% of the requests from FacebookBot have been over HTTP/1.1, with the balance over HTTP/2. Heading into 2022, it appeared that HTTP/1.1 preference was trending lower, with request volume over the 25-year-old protocol dropping from near 80% to just under 50% during the fourth quarter. However, that trend was abruptly reversed, with HTTP/1.1 growing back to over 70% in early February. The reason for the reversal is unclear.

Similar to FacebookBot, it appears TwitterBot’s use of HTTP/3 is, unfortunately, pretty much non-existent. However, TwitterBot clearly has a strong and consistent preference for HTTP/2, accounting for 75-80% of its requests, with the balance over HTTP/1.1.

In contrast, LinkedInBot has, over the last year, been firmly committed to making requests over HTTP/1.1, aside from the apparently brief anomalous usage of HTTP/2 last June. However, in mid-March, it appeared to tentatively start exploring the use of other HTTP versions, with around 5% of requests now being made over HTTP/2, and around 1% over HTTP/3, as seen in the upper right corner of the graph below.

Conclusion

We’re happy that HTTP/3 has, at long last, been published as RFC 9114. More than that, we’re super pleased to see that regardless of the wait, browsers have steadily been enabling support for the protocol by default. This allows end users to seamlessly gain the advantages of HTTP/3 whenever it is available. On Cloudflare’s global network, we’ve seen continued growth in the share of traffic speaking HTTP/3, demonstrating continued interest from customers in enabling it for their sites and services. In contrast, we are disappointed to see bots from the major search and social platforms continuing to rely on aging versions of HTTP. We’d like to build a better understanding of how these platforms chose particular HTTP versions and welcome collaboration in exploring the advantages that HTTP/3, in particular, could provide.

Current statistics on HTTP/3 and QUIC adoption at a country and autonomous system (ASN) level can be found on Cloudflare Radar.

Running HTTP/3 and QUIC on the edge for everyone has allowed us to monitor a wide range of aspects related to interoperability and performance across the Internet. Stay tuned for future blog posts that explore some of the technical developments we’ve been making.

And this certainly isn’t the end of protocol innovation, as HTTP/3 and QUIC provide many exciting new opportunities. The IETF and wider community are already underway building new capabilities on top, such as MASQUE and WebTransport. Meanwhile, in the last year, the QUIC Working Group has adopted new work such as QUIC version 2, and the Multipath Extension to QUIC.

Unlocking QUIC’s proxying potential with MASQUE

2022-03-20 Lucas Pardue

Post Syndicated from Lucas Pardue original https://blog.cloudflare.com/unlocking-quic-proxying-potential/

Unlocking QUIC’s proxying potential with MASQUE

In the last post, we discussed how HTTP CONNECT can be used to proxy TCP-based applications, including DNS-over-HTTPS and generic HTTPS traffic, between a client and target server. This provides significant benefits for those applications, but it doesn’t lend itself to non-TCP applications. And if you’re wondering whether or not we care about these, the answer is an affirmative yes!

For instance, HTTP/3 is based on QUIC, which runs on top of UDP. What if we wanted to speak HTTP/3 to a target server? That requires two things: (1) the means to encapsulate a UDP payload between client and proxy (which the proxy decapsulates and forward to the target in an actual UDP datagram), and (2) a way to instruct the proxy to open a UDP association to a target so that it knows where to forward the decapsulated payload. In this post, we’ll discuss answers to these two questions, starting with encapsulation.

Encapsulating datagrams

While TCP provides a reliable and ordered byte stream for applications to use, UDP instead provides unreliable messages called datagrams. Datagrams sent or received on a connection are loosely associated, each one is independent from a transport perspective. Applications that are built on top of UDP can leverage the unreliability for good. For example, low-latency media streaming often does so to avoid lost packets getting retransmitted. This makes sense, on a live teleconference it is better to receive the most recent audio or video rather than starting to lag behind while you’re waiting for stale data

QUIC is designed to run on top of an unreliable protocol such as UDP. QUIC provides its own layer of security, packet loss detection, methods of data recovery, and congestion control. If the layer underneath QUIC duplicates those features, they can cause wasted work or worse create destructive interference. For instance, QUIC congestion control defines a number of signals that provide input to sender-side algorithms. If layers underneath QUIC affect its packet flows (loss, timing, pacing, etc), they also affect the algorithm output. Input and output run in a feedback loop, so perturbation of signals can get amplified. All of this can cause congestion control algorithms to be more conservative in the data rates they use.

If we could speak HTTP/3 to a proxy, and leverage a reliable QUIC stream to carry encapsulated datagrams payload, then everything can work. However, the reliable stream interferes with expectations. The most likely outcome being slower end-to-end UDP throughput than we could achieve without tunneling. Stream reliability runs counter to our goals.

Fortunately, QUIC’s unreliable datagram extension adds a new DATAGRAM frame that, as its name plainly says, is unreliable. It has several uses; the one we care about is that it provides a building block for performant UDP tunneling. In particular, this extension has the following properties:

DATAGRAM frames are individual messages, unlike a long QUIC stream.
DATAGRAM frames do not contain a multiplexing identifier, unlike QUIC’s stream IDs.
Like all QUIC frames, DATAGRAM frames must fit completely inside a QUIC packet.
DATAGRAM frames are subject to congestion control, helping senders to avoid overloading the network.
DATAGRAM frames are acknowledged by the receiver but, importantly, if the sender detects a loss, QUIC does not retransmit the lost data.

The Datagram “Unreliable Datagram Extension to QUIC” specification will be published as an RFC soon. Cloudflare’s quiche library has supported it since October 2020.

Now that QUIC has primitives that support sending unreliable messages, we have a standard way to effectively tunnel UDP inside it. QUIC provides the STREAM and DATAGRAM transport primitives that support our proxying goals. Now it is the application layer responsibility to describe how to use them for proxying. Enter MASQUE.

MASQUE: Unlocking QUIC’s potential for proxying

Now that we’ve described how encapsulation works, let’s now turn our attention to the second question listed at the start of this post: How does an application initialize an end-to-end tunnel, informing a proxy server where to send UDP datagrams to, and where to receive them from? This is the focus of the MASQUE Working Group, which was formed in June 2020 and has been designing answers since. Many people across the Internet ecosystem have been contributing to the standardization activity. At Cloudflare, that includes Chris (as co-chair), Lucas (as co-editor of one WG document) and several other colleagues.

MASQUE started solving the UDP tunneling problem with a pair of specifications: a definition for how QUIC datagrams are used with HTTP/3, and a new kind of HTTP request that initiates a UDP socket to a target server. These have built on the concept of extended CONNECT, which was first introduced for HTTP/2 in RFC 8441 and has now been ported to HTTP/3. Extended CONNECT defines the :protocol pseudo-header that can be used by clients to indicate the intention of the request. The initial use case was WebSockets, but we can repurpose it for UDP and it looks like this:

:method = CONNECT
:protocol = connect-udp
:scheme = https
:path = /target.example.com/443/
:authority = proxy.example.com

A client sends an extended CONNECT request to a proxy server, which identifies a target server in the :path. If the proxy succeeds in opening a UDP socket, it responds with a 2xx (Successful) status code. After this, an end-to-end flow of unreliable messages between the client and target is possible; the client and proxy exchange QUIC DATAGRAM frames with an encapsulated payload, and the proxy and target exchange UDP datagrams bearing that payload.

Anatomy of Encapsulation

UDP tunneling has a constraint that TCP tunneling does not – namely, the size of messages and how that relates to path MTU (Maximum Transmission Unit; for more background see our Learning Center article). The path MTU is the maximum size that is allowed on the path between client and server. The actual maximum is the smallest maximum across all elements at every hop and at every layer, from the network up to application. All it takes is for one component with a small MTU to reduce the path MTU entirely. On the Internet, 1,500 bytes is a common practical MTU. When considering tunneling using QUIC, we need to appreciate the anatomy of QUIC packets and frames in order to understand how they add bytes of overheard. This consumes bytes and subtracts from our theoretical maximum.

We’ve been talking in terms of HTTP/3 which normally has its own frames (HEADERS, DATA, etc) that have a common type and length overhead. However, there is no HTTP/3 framing when it comes to DATAGRAM, instead the bytes are placed directly into the QUIC frame. This frame is composed of two fields. The first field is a variable number of bytes, called the Quarter Stream ID field, which is an encoded identifier that supports independent multiplexed DATAGRAM flows. It does so by binding each DATAGRAM to the HTTP request stream ID. In QUIC, stream IDs use two bits to encode four types of stream. Since request streams are always of one type (client-initiated bidirectional, to be exact), we can divide their ID by four to save space on the wire. Hence the name Quarter Stream ID. The second field is payload, which contains the end-to-end message payload. Here’s how it might look on the wire.

If you recall our lesson from the last post, DATAGRAM frames (like all frames) must fit completely inside a QUIC packet. Moreover, since QUIC requires that fragmentation is disabled, QUIC packets must fit completely inside a UDP datagram. This all combines to limit the maximum size of things that we can actually send: the path MTU determines the size of the UDP datagram, then we need to subtract the overheads of the UDP datagram header, QUIC packet header, and QUIC DATAGRAM frame header. For a better understanding of QUIC’s wire image and overheads, see Section 5 of RFC 8999 and Section 12.4 of RFC 9000.

If a sender has a message that is too big to fit inside the tunnel, there are only two options: discard the message or fragment it. Neither of these are good options. Clients create the UDP tunnel and are more likely to accurately calculate the real size of encapsulated UDP datagram payload, thus avoiding the problem. However, a target server is most likely unaware that a client is behind a proxy, so it cannot accommodate the tunneling overhead. It might send a UDP datagram payload that is too big for the proxy to encapsulate. This conundrum is common to all proxy protocols! There’s an art in picking the right MTU size for UDP-based traffic in the face of tunneling overheads. While approaches like path MTU discovery can help, they are not a silver bullet. Choosing conservative maximum sizes can reduce the chances of tunnel-related problems. However, this needs to be weighed against being too restrictive. Given a theoretical path MTU of 1,500, once we consider QUIC encapsulation overheads, tunneled messages with a limit between 1,200 and 1,300 bytes can be effective.This is especially important when we think about tunneling QUIC itself. RFC 9000 Section 8.1 details how clients that initiate new QUIC connections must send UDP datagrams of at least 1,200 bytes. If a proxy can’t support that, then QUIC will not work in a tunnel.

Nested tunneling for Improved Privacy Proxying

MASQUE gives us the application layer building blocks to support efficient tunneling of TCP or UDP traffic. What’s cool about this is that we can combine these blocks into different deployment architectures for different scenarios or different needs.

One example of this case is nested tunneling via multiple proxies, which can minimize the connection metadata available to each individual proxy or server (one example of this type of deployment is described in our recent post on iCloud Private Relay). In this kind of setup, a client might manage at least three logical connections. First, a QUIC connection between Client and Proxy 1. Second, a QUIC connection between Client and Proxy 2, which runs via a CONNECT tunnel in the first connection. Third, an end-to-end byte stream between Client and Server, which runs via a CONNECT tunnel in the second connection. A real TCP connection only exists between Proxy 2 and Server. If additional Client to Server logical connections are needed, they can be created inside the existing pair of QUIC connections.

Towards a full tunnel with IP tunneling

Proxy support for UDP and TCP already unblocks a huge assortment of use cases, including TLS, QUIC, HTTP, DNS, and so on. But it doesn’t help protocols that use different IP protocols, like ICMP or IPsec Encapsulating Security Payload (ESP). Fortunately, the MASQUE Working Group has also been working on IP tunneling. This is a lot more complex than UDP tunneling, so they first spent some time defining a common set of requirements. The group has recently adopted a new specification to support IP proxying over HTTP. This behaves similarly to the other CONNECT designs we’ve discussed but with a few differences. Indeed, IP proxying support using HTTP as a substrate would unlock many applications that existing protocols like IPsec and WireGuard enable.

At this point, it would be reasonable to ask: “A complete HTTP/3 stack is a bit excessive when all I need is a simple end-to-end tunnel, right?” Our answer is, it depends! CONNECT-based IP proxies use TLS and rely on well established PKIs for creating secure channels between endpoints, whereas protocols like WireGuard use a simpler cryptographic protocol for key establishment and defer authentication to the application. WireGuard does not support proxying over TCP but can be adapted to work over TCP transports, if necessary. In contrast, CONNECT-based proxies do support TCP and UDP transports, depending on what version of HTTP is used. Despite these differences, these protocols do share similarities. In particular, the actual framing used by both protocols – be it the TLS record layer or QUIC packet protection for CONNECT-based proxies, or WireGuard encapsulation – are not interoperable but only slightly differ in wire format. Thus, from a performance perspective, there’s not really much difference.

In general, comparing these protocols is like comparing apples and oranges – they’re fit for different purposes, have different implementation requirements, and assume different ecosystem participants and threat models. At the end of the day, CONNECT-based proxies are better suited to an ecosystem and environment that is already heavily invested in TLS and the existing WebPKI, so we expect CONNECT-based solutions for IP tunnels to become the norm in the future. Nevertheless, it’s early days, so be sure to watch this space if you’re interested in learning more!

Looking ahead

The IETF has chartered the MASQUE Working Group to help design an HTTP-based solution for UDP and IP that complements the existing CONNECT method for TCP tunneling. Using HTTP semantics allows us to use features like request methods, response statuses, and header fields to enhance tunnel initialization. For example, allowing for reuse of existing authentication mechanisms or the Proxy-Status field. By using HTTP/3, UDP and IP tunneling can benefit from QUIC’s secure transport native unreliable datagram support, and other features. Through a flexible design, older versions of HTTP can also be supported, which helps widen the potential deployment scenarios. Collectively, this work brings proxy protocols to the masses.

While the design details of MASQUE specifications continue to be iterated upon, so far several implementations have been developed, some of which have been interoperability tested during IETF hackathons. This running code helps inform the continued development of the specifications. Details are likely to continue changing before the end of the process, but we should expect the overarching approach to remain similar. Join us during the MASQUE WG meeting in IETF 113 to learn more!

A Primer on Proxies

2022-03-19 Lucas Pardue

Post Syndicated from Lucas Pardue original https://blog.cloudflare.com/a-primer-on-proxies/

A Primer on Proxies

Traffic proxying, the act of encapsulating one flow of data inside another, is a valuable privacy tool for establishing boundaries on the Internet. Encapsulation has an overhead, Cloudflare and our Internet peers strive to avoid turning it into a performance cost. MASQUE is the latest collaboration effort to design efficient proxy protocols based on IETF standards. We’re already running these at scale in production; see our recent blog post about Cloudflare’s role in iCloud Private Relay for an example.

In this blog post series, we’ll dive into proxy protocols.

To begin, let’s start with a simple question: what is proxying? In this case, we are focused on forward proxying — a client establishes an end-to-end tunnel to a target server via a proxy server. This contrasts with the Cloudflare CDN, which operates as a reverse proxy that terminates client connections and then takes responsibility for actions such as caching, security including WAF, load balancing, etc. With forward proxying, the details about the tunnel, such as how it is established and used, whether or not it provides confidentiality via authenticated encryption, and so on, vary by proxy protocol. Before going into specifics, let’s start with one of the most common tunnels used on the Internet: TCP.

Transport basics: TCP provides a reliable byte stream

The TCP transport protocol is a rich topic. For the purposes of this post, we will focus on one aspect: TCP provides a readable and writable, reliable, and ordered byte stream. Some protocols like HTTP and TLS require a reliable transport underneath them and TCP’s single byte stream is an ideal fit. The application layer reads or writes to this byte stream, but the details about how TCP sends this data “on the wire” are typically abstracted away.

Large application objects are written into a stream, then they are split into many small packets and they are sent in order to the network. At the receiver, packets are read from the network and combined back into an identical stream. Networks are not perfect and packets can be lost or reordered. TCP is clever at dealing with this and not worrying the application with details. It just works. A way to visualize this is to imagine a magic paper shredder that can both shred documents and convert shredded papers back to whole documents. Then imagine you and your friend bought a pair of these and decided that it would be fun to send each other shreds.

The one problem with TCP is that when a lost packet is detected at a receiver, the sender needs to retransmit it. This takes time to happen and can mean that the byte stream reconstruction gets delayed. This is known as TCP head-of-line blocking. Applications regularly use TCP via a socket API that abstracts away protocol details; they often can’t tell if there are delays because the other end is slow at sending or if the network is slowing things down via packet loss.

Proxy Protocols

Proxying TCP is immensely useful for many applications, including, though certainly not limited to HTTPS, SSH, and RDP. In fact, Oblivious DoH, which is a proxy protocol for DNS messages, could very well be implemented using a TCP proxy, though there are reasons why this may not be desirable. Today, there are a number of different options for proxying TCP end-to-end, including:

SOCKS, which runs in cleartext and requires an expensive connection establishment step.
Transparent TCP proxies, commonly referred to as performance enhancing proxies (PEPs), which must be on path and offer no additional transport security, and, definitionally, are limited to TCP protocols.
Layer 4 proxies such as Cloudflare Spectrum, which might rely on side carriage metadata via something like the PROXY protocol.
HTTP CONNECT, which transforms HTTPS connections into opaque byte streams.

While SOCKS and PEPs are viable options for some use cases, when choosing which proxy protocol to build future systems upon, it made most sense to choose a reusable and general-purpose protocol that provides well-defined and standard abstractions. As such, the IETF chose to focus on using HTTP as a substrate via the CONNECT method.

The concept of using HTTP as a substrate for proxying is not new. Indeed, HTTP/1.1 and HTTP/2 have supported proxying TCP-based protocols for a long time. In the following sections of this post, we’ll explain in detail how CONNECT works across different versions of HTTP, including HTTP/1.1, HTTP/2, and the recently standardized HTTP/3.

HTTP/1.1 and CONNECT

In HTTP/1.1, the CONNECT method can be used to establish an end-to-end TCP tunnel to a target server via a proxy server. This is commonly applied to use cases where there is a benefit of protecting the traffic between the client and the proxy, or where the proxy can provide access control at network boundaries. For example, a Web browser can be configured to issue all of its HTTP requests via an HTTP proxy.

A client sends a CONNECT request to the proxy server, which requests that it opens a TCP connection to the target server and desired port. It looks something like this:

CONNECT target.example.com:80 HTTP/1.1
Host: target.example.com

If the proxy succeeds in opening a TCP connection to the target, it responds with a 2xx range status code. If there is some kind of problem, an error status in the 5xx range can be returned. Once a tunnel is established there are two independent TCP connections; one on either side of the proxy. If a flow needs to stop, you can simply terminate them.

HTTP CONNECT proxies forward data between the client and the target server. The TCP packets themselves are not tunneled, only the data on the logical byte stream. Although the proxy is supposed to forward data and not process it, if the data is plaintext there would be nothing to stop it. In practice, CONNECT is often used to create an end-to-end TLS connection where only the client and target server have access to the protected content; the proxy sees only TLS records and can’t read their content because it doesn’t have access to the keys.

Finally, it’s worth noting that after a successful CONNECT request, the HTTP connection (and the TCP connection underpinning it) has been converted into a tunnel. There is no more possibility of issuing other HTTP messages, to the proxy itself, on the connection.

HTTP/2 and CONNECT

HTTP/2 adds logical streams above the TCP layer in order to support concurrent requests and responses on a single connection. Streams are also reliable and ordered byte streams, operating on top of TCP. Returning to our magic shredder analogy: imagine you wanted to send a book. Shredding each page one after another and rebuilding the book one page at a time is slow, but handling multiple pages at the same time might be faster. HTTP/2 streams allow us to do that. But, as we all know, trying to put too much into a shredder can sometimes cause it to jam.

In HTTP/2, each request and response is sent on a different stream. To support this, HTTP/2 defines frames that contain the stream identifier that they are associated with. Requests and responses are composed of HEADERS and DATA frames which contain HTTP header fields and HTTP content, respectively. Frames can be large. When they are sent on the wire they might span multiple TLS records or TCP segments. Side note: the HTTP WG has been working on a new revision of the document that defines HTTP semantics that are common to all HTTP versions. The terms message, header fields, and content all come from this description.

HTTP/2 concurrency allows applications to read and write multiple objects at different rates, which can improve HTTP application performance, such as web browsing. HTTP/1.1 traditionally dealt with this concurrency by opening multiple TCP connections in parallel and striping requests across these connections. In contrast, HTTP/2 multiplexes frames belonging to different streams onto the single byte stream provided by one TCP connection. Reusing a single connection has benefits, but it still leaves HTTP/2 at risk of TCP head-of-line blocking. For more details, refer to Perf Planet blog.

HTTP/2 also supports the CONNECT method. In contrast to HTTP/1.1, CONNECT requests do not take over an entire HTTP/2 connection. Instead, they convert a single stream into an end-to-end tunnel. It looks something like this:

:method = CONNECT
:authority = target.example.com:443

If the proxy succeeds in opening a TCP connection, it responds with a 2xx (Successful) status code. After this, the client sends DATA frames to the proxy, and the content of these frames are put into TCP packets sent to the target. In the return direction, the proxy reads from the TCP byte stream and populates DATA frames. If a tunnel needs to stop, you can simply terminate the stream; there is no need to terminate the HTTP/2 connection.

By using HTTP/2, a client can create multiple CONNECT tunnels in a single connection. This can help reduce resource usage (saving the global count of TCP connections) and allows related tunnels to be logically grouped together, ensuring that they “share fate” when either client or proxy need to gracefully close. On the proxy-to-server side there are still multiple independent TCP connections.

One challenge of multiplexing tunnels on concurrent streams is how to effectively prioritize them. We’ve talked in the past about prioritization for web pages, but the story is a bit different for CONNECT. We’ve been thinking about this and captured considerations in the new Extensible Priorities draft.

QUIC, HTTP/3 and CONNECT

QUIC is a new secure and multiplexed transport protocol from the IETF. QUIC version 1 was published as RFC 9000 in May 2021 and, the next day, we enabled it for all Cloudflare customers.

QUIC is composed of several foundational features. You can think of these like individual puzzle pieces that interlink to form a transport service. This service needs one more piece, an application mapping, to bring it all together.

Similar to HTTP/2, QUIC version 1 provides reliable and ordered streams. But QUIC streams live at the transport layer and they are the only type of QUIC primitive that can carry application data. QUIC has no opinion on how streams get used. Applications that wish to use QUIC must define that themselves.

QUIC streams can be long (up to 2^62 – 1 bytes). Stream data is sent on the wire in the form of STREAM frames. All QUIC frames must fit completely inside a QUIC packet. QUIC packets must fit entirely in a UDP datagram; fragmentation is prohibited. These requirements mean that a long stream is serialized to a series of QUIC packets sized roughly to the path MTU (Maximum Transmission Unit). STREAM frames provide reliability via QUIC loss detection and recovery. Frames are acknowledged by the receiver and if the sender detects a loss (via missing acknowledgments), QUIC will retransmit the lost data. In contrast, TCP retransmits packets. This difference is an important feature of QUIC, letting implementations decide how to repacketize and reschedule lost data.

When multiplexing streams, different packets can contain STREAM frames belonging to different stream identifiers. This creates independence between streams and helps avoid the head-of-line blocking caused by packet loss that we see in TCP. If a UDP packet containing data for one stream is lost, other streams can continue to make progress without being blocked by retransmission of the lost stream.

To use our magic shredder analogy one more time: we’re sending a book again, but this time we parallelise our task by using independent shredders. We need to logically associate them together so that the receiver knows the pages and shreds are all for the same book, but otherwise they can progress with less chance of jamming.

HTTP/3 is an example of an application mapping that describes how streams are used to exchange: HTTP settings, QPACK state, and request and response messages. HTTP/3 still defines its own frames like HEADERS and DATA, but it is overall simpler than HTTP/2 because QUIC deals with the hard stuff. Since HTTP/3 just sees a logical byte stream, its frames can be arbitrarily sized. The QUIC layer handles segmenting HTTP/3 frames over STREAM frames for sending in packets. HTTP/3 also supports the CONNECT method. It functions identically to CONNECT in HTTP/2, each request stream converting to an end-to-end tunnel.

HTTP packetization comparison

We’ve talked about HTTP/1.1, HTTP/2 and HTTP/3. The diagram below is a convenient way to summarize how HTTP requests and responses get serialized for transmission over a secure transport. The main difference is that with TLS, protected records are split across several TCP segments. While with QUIC there is no record layer, each packet has its own protection.

Limitations and looking ahead

HTTP CONNECT is a simple and elegant protocol that has a tremendous number of application use cases, especially for privacy-enhancing technology. In particular, applications can use it to proxy DNS-over-HTTPS similar to what’s been done for Oblivious DoH, or more generic HTTPS traffic (based on HTTP/1.1 or HTTP/2), and many more.

However, what about non-TCP traffic? Recall that HTTP/3 is an application mapping for QUIC, and therefore runs over UDP as well. What if we wanted to proxy QUIC? What if we wanted to proxy entire IP datagrams, similar to VPN technologies like IPsec or WireGuard? This is where MASQUE comes in. In the next post, we’ll discuss how the MASQUE Working Group is standardizing technologies to enable proxying for datagram-based protocols like UDP and IP.

A Last Call for QUIC, a giant leap for the Internet

2020-10-22 Lucas Pardue

Post Syndicated from Lucas Pardue original https://blog.cloudflare.com/last-call-for-quic/

A Last Call for QUIC, a giant leap for the Internet

QUIC is a new Internet transport protocol for secure, reliable and multiplexed communications. HTTP/3 builds on top of QUIC, leveraging the new features to fix performance problems such as Head-of-Line blocking. This enables web pages to load faster, especially over troublesome networks.

QUIC and HTTP/3 are open standards that have been under development in the IETF for almost exactly 4 years. On October 21, 2020, following two rounds of Working Group Last Call, draft 32 of the family of documents that describe QUIC and HTTP/3 were put into IETF Last Call. This is an important milestone for the group. We are now telling the entire IETF community that we think we’re almost done and that we’d welcome their final review.

A Last Call for QUIC, a giant leap for the Internet

Speaking personally, I’ve been involved with QUIC in some shape or form for many years now. Earlier this year I was honoured to be asked to help co-chair the Working Group. I’m pleased to help shepherd the documents through this important phase, and grateful for the efforts of everyone involved in getting us there, especially the editors. I’m also excited about future opportunities to evolve on top of QUIC v1 to help build a better Internet.

There are two aspects to protocol development. One aspect involves writing and iterating upon the documents that describe the protocols themselves. Then, there’s implementing, deploying and testing libraries, clients and/or servers. These aspects operate hand in hand, helping the Working Group move towards satisfying the goals listed in its charter. IETF Last Call marks the point that the group and their responsible Area Director (in this case Magnus Westerlund) believe the job is almost done. Now is the time to solicit feedback from the wider IETF community for review. At the end of the Last Call period, the stakeholders will take stock, address feedback as needed and, fingers crossed, go onto the next step of requesting the documents be published as RFCs on the Standards Track.

Although specification and implementation work hand in hand, they often progress at different rates, and that is totally fine. The QUIC specification has been mature and deployable for a long time now. HTTP/3 has been generally available on the Cloudflare edge since September 2019, and we’ve been delighted to see support roll out in user agents such as Chrome, Firefox, Safari, curl and so on. Although draft 32 is the latest specification, the community has for the time being settled on draft 29 as a solid basis for interoperability. This shouldn’t be surprising, as foundational aspects crystallize the scope of changes between iterations decreases. For the average person in the street, there’s not really much difference between 29 and 32.

So today, if you visit a website with HTTP/3 enabled—such as https://cloudflare-quic.com—you’ll probably see response headers that contain Alt-Svc: h3-29=”… . And in a while, once Last Call completes and the RFCs ship, you’ll start to see websites simply offer Alt-Svc: h3=”… (note, no draft version!).

Need a deep dive?

We’ve collected a bunch of resource links at https://cloudflare-quic.com. If you’re more of an interactive visual learner, you might be pleased to hear that I’ve also been hosting a series on Cloudflare TV called “Levelling up Web Performance with HTTP/3”. There are over 12 hours of content including the basics of QUIC, ways to measure and debug the protocol in action using tools like Wireshark, and several deep dives into specific topics. I’ve also been lucky to have some guest experts join me along the way. The table below gives an overview of the episodes that are available on demand.

Episode	Description
1	Introduction to QUIC.
2	Introduction to HTTP/3.
3	QUIC & HTTP/3 logging and analysis using qlog and qvis. Featuring Robin Marx.
4	QUIC & HTTP/3 packet capture and analysis using Wireshark. Featuring Peter Wu.
5	The roles of Server Push and Prioritization in HTTP/2 and HTTP/3. Featuring Yoav Weiss.
6	"After dinner chat" about curl and QUIC. Featuring Daniel Stenberg.
7	Qlog vs. Wireshark. Featuring Robin Marx and Peter Wu.
8	Understanding protocol performance using WebPageTest. Featuring Pat Meenan and Andy Davies.
9	Handshake deep dive.
10	Getting to grips with quiche, Cloudflare’s QUIC and HTTP/3 library.
11	A review of SIGCOMM’s EPIQ workshop on evolving QUIC.
12	Understanding the role of congestion control in QUIC. Featuring Junho Choi.

Whither QUIC?

So does Last Call mean QUIC is “done”? Not by a long shot. The new protocol is a giant leap for the Internet, because it enables new opportunities and innovation. QUIC v1 is basically the set of documents that have gone into Last Call. We’ll continue to see people gain experience deploying and testing this, and no doubt cool blog posts about tweaking parameters for efficiency and performance are on the radar. But QUIC and HTTP/3 are extensible, so we’ll see people interested in trying new things like multipath, different congestion control approaches, or new ways to carry data unreliably such as the DATAGRAM frame.

We’re also seeing people interested in using QUIC for other use cases. Mapping other application protocols like DNS to QUIC is a rapid way to get its improvements. We’re seeing people that want to use QUIC as a substrate for carrying other transport protocols, hence the formation of the MASQUE Working Group. There’s folks that want to use QUIC and HTTP/3 as a “supercharged WebSocket”, hence the formation of the WebTransport Working Group.

Whatever the future holds for QUIC, we’re just getting started, and I’m excited.