Tag Archives: AI Gateway

Mitigating a token-length side-channel attack in our AI products

Post Syndicated from Celso Martinho original https://blog.cloudflare.com/ai-side-channel-attack-mitigated


Since the discovery of CRIME, BREACH, TIME, LUCKY-13 etc., length-based side-channel attacks have been considered practical. Even though packets were encrypted, attackers were able to infer information about the underlying plaintext by analyzing metadata like the packet length or timing information.

Cloudflare was recently contacted by a group of researchers at Ben Gurion University who wrote a paper titled “What Was Your Prompt? A Remote Keylogging Attack on AI Assistants” that describes “a novel side-channel that can be used to read encrypted responses from AI Assistants over the web”.
The Workers AI and AI Gateway team collaborated closely with these security researchers through our Public Bug Bounty program, discovering and fully patching a vulnerability that affects LLM providers. You can read the detailed research paper here.

Since being notified about this vulnerability, we’ve implemented a mitigation to help secure all Workers AI and AI Gateway customers. As far as we could assess, there was no outstanding risk to Workers AI and AI Gateway customers.

How does the side-channel attack work?

In the paper, the authors describe a method in which they intercept the stream of a chat session with an LLM provider, use the network packet headers to infer the length of each token, extract and segment their sequence, and then use their own dedicated LLMs to infer the response.

The two main requirements for a successful attack are an AI chat client running in streaming mode and a malicious actor capable of capturing network traffic between the client and the AI chat service. In streaming mode, the LLM tokens are emitted sequentially, introducing a token-length side-channel. Malicious actors could eavesdrop on packets via public networks or within an ISP.

An example request vulnerable to the side-channel attack looks like this:

curl -X POST \
https://api.cloudflare.com/client/v4/accounts/<account-id>/ai/run/@cf/meta/llama-2-7b-chat-int8 \
  -H "Authorization: Bearer <Token>" \
  -d '{"stream":true,"prompt":"tell me something about portugal"}'

Let’s use Wireshark to inspect the network packets on the LLM chat session while streaming:

The first packet has a length of 95 and corresponds to the token “Port” which has a length of four. The second packet has a length of 93 and corresponds to the token “ug” which has a length of two, and so on. By removing the likely token envelope from the network packet length, it is easy to infer how many tokens were transmitted and their sequence and individual length just by sniffing encrypted network data.

Since the attacker needs the sequence of individual token length, this vulnerability only affects text generation models using streaming. This means that AI inference providers that use streaming — the most common way of interacting with LLMs — like Workers AI, are potentially vulnerable.

This method requires that the attacker is on the same network or in a position to observe the communication traffic and its accuracy depends on knowing the target LLM’s writing style. In ideal conditions, the researchers claim that their system “can reconstruct 29% of an AI assistant’s responses and successfully infer the topic from 55% of them”. It’s also important to note that unlike other side-channel attacks, in this case the attacker has no way of evaluating its prediction against the ground truth. That means that we are as likely to get a sentence with near perfect accuracy as we are to get one where only things that match are conjunctions.

Mitigating LLM side-channel attacks

Since this type of attack relies on the length of tokens being inferred from the packet, it can be just as easily mitigated by obscuring token size. The researchers suggested a few strategies to mitigate these side-channel attacks, one of which is the simplest: padding the token responses with random length noise to obscure the length of the token so that responses can not be inferred from the packets. While we immediately added the mitigation to our own inference product — Workers AI, we wanted to help customers secure their LLMs regardless of where they are running them by adding it to our AI Gateway.

As of today, all users of Workers AI and AI Gateway are now automatically protected from this side-channel attack.

What we did

Once we got word of this research work and how exploiting the technique could potentially impact our AI products, we did what we always do in situations like this: we assembled a team of systems engineers, security engineers, and product managers and started discussing risk mitigation strategies and next steps. We also had a call with the researchers, who kindly attended, presented their conclusions, and answered questions from our teams.

Unfortunately, at this point, this research does not include actual code that we can use to reproduce the claims or the effectiveness and accuracy of the described side-channel attack. However, we think that the paper has theoretical merit, that it provides enough detail and explanations, and that the risks are not negligible.

We decided to incorporate the first mitigation suggestion in the paper: including random padding to each message to hide the actual length of tokens in the stream, thereby complicating attempts to infer information based solely on network packet size.

Workers AI, our inference product, is now protected

With our inference-as-a-service product, anyone can use the Workers AI platform and make API calls to our supported AI models. This means that we oversee the inference requests being made to and from the models. As such, we have a responsibility to ensure that the service is secure and protected from potential vulnerabilities. We immediately rolled out a fix once we were notified of the research, and all Workers AI customers are now automatically protected from this side-channel attack. We have not seen any malicious attacks exploiting this vulnerability, other than the ethical testing from the researchers.

Our solution for Workers AI is a variation of the mitigation strategy suggested in the research document. Since we stream JSON objects rather than the raw tokens, instead of padding the tokens with whitespace characters, we added a new property, “p” (for padding) that has a string value of variable random length.

Example streaming response using the SSE syntax:

data: {"response":"portugal","p":"abcdefghijklmnopqrstuvwxyz0123456789a"}
data: {"response":" is","p":"abcdefghij"}
data: {"response":" a","p":"abcdefghijklmnopqrstuvwxyz012"}
data: {"response":" southern","p":"ab"}
data: {"response":" European","p":"abcdefgh"}
data: {"response":" country","p":"abcdefghijklmno"}
data: {"response":" located","p":"abcdefghijklmnopqrstuvwxyz012345678"}

This has the advantage that no modifications are required in the SDK or the client code, the changes are invisible to the end-users, and no action is required from our customers. By adding random variable length to the JSON objects, we introduce the same network-level variability, and the attacker essentially loses the required input signal. Customers can continue using Workers AI as usual while benefiting from this protection.

One step further: AI Gateway protects users of any inference provider

We added protection to our AI inference product, but we also have a product that proxies requests to any provider — AI Gateway. AI Gateway acts as a proxy between a user and supported inference providers, helping developers gain control, performance, and observability over their AI applications. In line with our mission to help build a better Internet, we wanted to quickly roll out a fix that can help all our customers using text generation AIs, regardless of which provider they use or if they have mitigations to prevent this attack. To do this, we implemented a similar solution that pads all streaming responses proxied through AI Gateway with random noise of variable length.

Our AI Gateway customers are now automatically protected against this side-channel attack, even if the upstream inference providers have not yet mitigated the vulnerability. If you are unsure if your inference provider has patched this vulnerability yet, use AI Gateway to proxy your requests and ensure that you are protected.

Conclusion

At Cloudflare, our mission is to help build a better Internet – that means that we care about all citizens of the Internet, regardless of what their tech stack looks like. We are proud to be able to improve the security of our AI products in a way that is transparent and requires no action from our customers.

We are grateful to the researchers who discovered this vulnerability and have been very collaborative in helping us understand the problem space. If you are a security researcher who is interested in helping us make our products more secure, check out our Bug Bounty program at hackerone.com/cloudflare.

Announcing AI Gateway: making AI applications more observable, reliable, and scalable

Post Syndicated from Michelle Chen original http://blog.cloudflare.com/announcing-ai-gateway/

Announcing AI Gateway: making AI applications more observable, reliable, and scalable

Announcing AI Gateway: making AI applications more observable, reliable, and scalable

Today, we’re excited to announce our beta of AI Gateway – the portal to making your AI applications more observable, reliable, and scalable.

AI Gateway sits between your application and the AI APIs that your application makes requests to (like OpenAI) – so that we can cache responses, limit and retry requests, and provide analytics to help you monitor and track usage. AI Gateway handles the things that nearly all AI applications need, saving you engineering time, so you can focus on what you're building.

Connecting your app to AI Gateway

It only takes one line of code for developers to get started with Cloudflare’s AI Gateway. All you need to do is replace the URL in your API calls with your unique AI Gateway endpoint. For example, with OpenAI you would define your baseURL as "https://gateway.ai.cloudflare.com/v1/ACCOUNT_TAG/GATEWAY/openai" instead of "https://api.openai.com/v1" – and that’s it. You can keep your tokens in your code environment, and we’ll log the request through AI Gateway before letting it pass through to the final API with your token.

// configuring AI gateway with the dedicated OpenAI endpoint

const openai = new OpenAI({
  apiKey: env.OPENAI_API_KEY,
  baseURL: "https://gateway.ai.cloudflare.com/v1/ACCOUNT_TAG/GATEWAY/openai",
});

We currently support model providers such as OpenAI, Hugging Face, and Replicate with plans to add more in the future. We support all the various endpoints within providers and also response streaming, so everything should work out-of-the-box once you have the gateway configured. The dedicated endpoint for these providers allows you to connect your apps to AI Gateway by changing one line of code, without touching your original payload structure.

We also have a universal endpoint that you can use if you’d like more flexibility with your requests. With the universal endpoint, you have the ability to define fallback models and handle request retries. For example, let’s say a request was made to OpenAI GPT-3, but the API was down – with the universal endpoint, you could define Hugging Face GPT-2 as your fallback model and the gateway can automatically resend that request to Hugging Face. This is really helpful in improving resiliency for your app in cases where you are noticing unusual errors, getting rate limited, or if one bill is getting costly, and you want to diversify to other models. With the universal endpoint, you’ll just need to tweak your payload to specify the provider and endpoint, so we can properly route requests for you. Check out the example request below and the docs for more details on the universal endpoint schema.

# Using the Universal Endpoint to first try OpenAI, then Hugging Face

curl https://gateway.ai.cloudflare.com/v1/ACCOUNT_TAG/GATEWAY  -X POST \
  --header 'Content-Type: application/json' \
  --data '[
  {
    "provider": "openai",
    "endpoint": "chat/completions",
    "headers": { 
      "Authorization": "Bearer $OPENAI_TOKEN",
      "Content-Type": "application/json"
    },
    "query": {
      "model": "gpt-3.5-turbo",
      "stream": true,
      "messages": [
        {
          "role": "user",
          "content": "What is Cloudflare?"
        }
      ]
    }
  },
  {
    "provider": "huggingface",
    "endpoint": "gpt2",
    "headers": { 
      "Authorization": "Bearer $HF_TOKEN",
      "Content-Type": "application/json"
    },
    "query": {
      "inputs": "What is Cloudflare?"
    }
  },
]'

Gaining visibility into your app’s usage

Now that your app is connected to Cloudflare, we can help you gather analytics and give insight and control on the traffic that is passing through your apps. Regardless of what model or infrastructure you use in the backend, we can help you log requests and analyze data like the number of requests, number of users, cost of running the app, duration of requests, etc. Although these seem like basic analytics that model providers should expose, it’s surprisingly difficult to get visibility into these metrics with the typical model providers. AI Gateway takes it one step further and lets you aggregate analytics across multiple providers too.

Announcing AI Gateway: making AI applications more observable, reliable, and scalable

Controlling how your app scales

One of the pain points we often hear is how expensive it costs to build and run AI apps. Each API call can be unpredictably expensive and costs can rack up quickly, preventing developers from scaling their apps to their full potential. At the speed that the industry is moving, you don’t want to be limited by your scale and left behind – and that’s where caching and rate limiting can help. We allow developers to cache their API calls so that new requests can be served from our cache rather than the original API – making it cheaper and faster. Rate limiting can also help control costs by throttling the number of requests and preventing excessive or suspicious activity. Developers have full flexibility to define caching and rate limiting rules, so that apps can scale at a sustainable pace of your choosing.

Announcing AI Gateway: making AI applications more observable, reliable, and scalable

The Workers AI Platform

AI Gateway pairs perfectly with our new Workers AI and Vectorize products, so you can build full-stack AI applications all within the Workers ecosystem. From deploying applications with Workers, running model inference on the edge with Workers AI, storing vector embeddings on Vectorize, to gaining visibility into your applications with AI Gateway – the Workers platform is your one-stop shop to bring your AI applications to life. To learn how to use AI Gateway with Workers AI or the different providers, check out the docs.

Next up: the enterprise use case

We are shipping v1 of AI Gateway with a few core features, but we have plans to expand the product to cover more advanced use cases as well – usage alerts, jailbreak protection, dynamic model routing with A/B testing, and advanced cache rules. But what we’re really excited about are the other ways you can apply AI Gateway…

In the future, we want to develop AI Gateway into a product that helps organizations monitor and observe how their users or employees are using AI. This way, you can flip a switch and have all requests within your network to providers (like OpenAI) pass through Cloudflare first – so that you can log user requests, apply access policies, enable rate limiting and data loss prevention (DLP) strategies. A powerful example: if an employee accidentally pastes an API key to ChatGPT, AI Gateway can be configured to see the outgoing request and redact the API key or block the request entirely, preventing it from ever reaching OpenAI or any end providers. We can also log and alert on suspicious requests, so that organizations can proactively investigate and control certain types of activity. AI Gateway then becomes a really powerful tool for organizations that might be excited about the efficiency that AI unlocks, but hesitant about trusting AI when data privacy and user error are really critical threats. We hope that AI Gateway can alleviate these concerns and make adopting AI tools a lot easier for organizations.

Whether you’re a developer building applications or a company who’s interested in how employees are using AI, our hope is that AI Gateway can help you demystify what’s going on inside your apps – because once you understand how your users are using AI, you can make decisions on how you actually want them to use it. Some of these features are still in development, but we hope this illustrates the power of AI Gateway and our vision for the future.

At Cloudflare, we live and breathe innovation (as you can tell by our Birthday Week announcements!) and the pace of innovation in AI is incredible to witness. We’re thrilled that we can not only help people build and use apps, but actually help accelerate the adoption and development of AI with greater control and visibility. We can’t wait to hear what you build – head to the Cloudflare dashboard to try out AI Gateway and let us know what you think!

Announcing AI Gateway: making AI applications more observable, reliable, and scalable

A complete suite of Zero Trust security tools to get the most from AI

Post Syndicated from Sam Rhea original http://blog.cloudflare.com/zero-trust-ai-security/

A complete suite of Zero Trust security tools to get the most from AI

This post is also available in French, Spanish, German.

A collection of tools from Cloudflare One to help your teams use AI services safely

A complete suite of Zero Trust security tools to get the most from AI

Cloudflare One gives teams of any size the ability to safely use the best tools on the Internet without management headaches or performance challenges. We’re excited to announce Cloudflare One for AI, a new collection of features that help your team build with the latest AI services while still maintaining a Zero Trust security posture.

Large Language Models, Larger Security Challenges

A Large Language Model (LLM), like OpenAI’s GPT or Google’s Bard, consists of a neural network trained against a set of data to predict and generate text based on a prompt. Users can ask questions, solicit feedback, and lean on the service to create output from poetry to Cloudflare Workers applications.

The tools also bear an uncanny resemblance to a real human. As in some real-life personal conversations, oversharing can become a serious problem with these AI services. This risk multiplies due to the types of use cases where LLM models thrive. These tools can help developers solve difficult coding challenges or information workers create succinct reports from a mess of notes. While helpful, every input fed into a prompt becomes a piece of data leaving your organization’s control.

Some responses to tools like ChatGPT have been to try and ban the service outright; either at a corporate level or across an entire nation. We don’t think you should have to do that. Cloudflare One’s goal is to allow you to safely use the tools you need, wherever they live, without compromising performance. These features will feel familiar to any existing use of the Zero Trust products in Cloudflare One, but we’re excited to walk through cases where you can use the tools available right now to allow your team to take advantage of the latest LLM features.

Measure usage

SaaS applications make it easy for any user to sign up and start testing. That convenience also makes these tools a liability for IT budgets and security policies. Teams refer to this problem as “Shadow IT” – the adoption of applications and services outside the approved channels in an organization.

In terms of budget, we have heard from early adopter customers who know that their team members are beginning to experiment with LLMs, but they are not sure how to approach making a commercial licensing decision. What services and features do their users need and how many seats should they purchase?

On the security side, the AIs can be revolutionary for getting work done but terrifying for data control policies. Team members treat these AIs like sounding boards for painful problems. The services invite users to come with their questions or challenges. Sometimes the context inside those prompts can contain sensitive information that should never leave an organization. Even if teams select and approve a single vendor, members of your organization might prefer another AI and continue to use it in their workflow.

Cloudflare One customers on any plan can now review the usage of AIs. Your IT department can deploy Cloudflare Gateway and passively observe how many users are selecting which services as a way to start scoping out enterprise licensing plans.

A complete suite of Zero Trust security tools to get the most from AI

Administrators can also block the use of these services with a single click, but that is not our goal today. You might want to use this feature if you select ChatGPT as your approved model, and you want to make sure team members don’t continue to use alternatives, but we hope you don’t block all of these services outright. Cloudflare’s priority is to give you the ability to use these tools safely.

Control API access

When our teams began experimenting with OpenAI’s ChatGPT service, we were astonished by what it already knew about Cloudflare. We asked ChatGPT to create applications with Cloudflare Workers or guide us through how to configure a Cloudflare Access policy and, in most cases, the results were accurate and helpful.

In some cases the results missed the mark. The AIs were using outdated information, or we were asking questions about features that had only launched recently. Thankfully, these AIs can learn and we can help. We can train these models with scoped inputs and connect plug-ins to provide our customers with better AI-guided experiences when using Cloudflare services.

We heard from customers who want to do the same thing and, like us, they need to securely share training data and grant plug-in access for an AI service. Cloudflare One’s security suite extends beyond human users and can give teams the ability to securely share Zero Trust access to sensitive data over APIs.

First, teams can create service tokens that external services must present to reach data made available through Cloudflare One. Administrators can provide these tokens to systems making API requests and log every single request. As needed, teams can revoke these tokens with a single click.

A complete suite of Zero Trust security tools to get the most from AI

After creating and issuing service tokens, administrators can create policies to allow specific services access to their training data. These policies will verify the service token and can be extended to verify country, IP address or an mTLS certificate. Policies can also be created to require human users to authenticate with an identity provider and complete an MFA prompt before accessing sensitive training data or services.

A complete suite of Zero Trust security tools to get the most from AI

When teams are ready to allow an AI service to connect to their infrastructure, they can do so without poking holes in their firewalls by using Cloudflare Tunnel. Cloudflare Tunnel will create an encrypted, outbound-only connection to Cloudflare’s network where every request will be checked against the access rules configured for one or more services protected by Cloudflare One.

A complete suite of Zero Trust security tools to get the most from AI

Cloudflare’s Zero Trust access control gives you the ability to enforce authentication on each and every request made to the data your organization decides to provide to these tools. That still leaves a gap in the data your team members might overshare on their own.

Restrict data uploads

Administrators can select an AI service, block Shadow IT alternatives, and carefully gate access to their training material, but humans are still involved in these AI experiments. Any one of us can accidentally cause a security incident by oversharing information in the process of using an AI service – even an approved service.

We expect AI playgrounds to continue to evolve to feature more data management capabilities, but we don’t think you should have to wait for that to begin adopting these services as part of your workflow. Cloudflare’s Data Loss Prevention (DLP) service can provide a safeguard to stop oversharing before it becomes an incident for your security team.

First, tell us what data you care about. We provide simple, preconfigured options that give you the ability to check for things that look like social security numbers or credit card numbers. Cloudflare DLP can also scan for patterns based on regular expressions configured by your team.

A complete suite of Zero Trust security tools to get the most from AI

Once you have defined the data that should never leave your organization, you can build granular rules about how it can and cannot be shared with AI services. Maybe some users are approved to experiment with projects that contain sensitive data, in which case you can build a rule that only allows an Active Directory or Okta group to upload that kind of information while everyone else is blocked.

A complete suite of Zero Trust security tools to get the most from AI

Control use without a proxy

The tools in today’s blog post focus on features that apply to data-in-motion. We also want to make sure that misconfigurations in the applications don’t lead to security violations. For example, the new plug-in feature in ChatGPT brings the knowledge and workflows of external services into the AI interaction flow. However, that can also lead to the services behind plug-ins having more access than you want to.

Cloudflare’s Cloud Access Security Broker (CASB) scans your SaaS applications for potential issues that can occur when users make changes. Whether alerting you to files that someone accidentally just made public on the Internet to checking that your GitHub repositories have the right membership controls, Cloudflare’s CASB removes the manual effort required to check each and every setting for potential issues in your SaaS applications.

Available soon, we are working on new integrations with popular AI services to check for misconfigurations. Like most users of these services, we’re still learning more about where potential accidents can occur, and we are excited to provide administrators who use our CASB with our first wave of controls for AI services.

What’s next?

The usefulness of these tools will only accelerate. The ability of AI services to coach and generate output will continue to make it easier for builders from any background to create the next big thing.

We share a similar goal. The Cloudflare products focused on helping users build applications and services, our Workers platform, remove hassles like worrying about where to deploy your application or how to scale your services. Cloudflare solves those headaches so that users can focus on creating. Combined with the AI services, we expect to see thousands of new builders launch the next wave of products built on Cloudflare and inspired by AI coaching and generation.

We have already seen dozens of projects flourish that were built on Cloudflare Workers using guidance from tools like ChatGPT. We plan to launch new integrations with these models to make this even more seamless, bringing better Cloudflare-specific guidance to the chat experience.

We also know that the security risk of these tools will grow. We will continue to bring functionality into Cloudflare One that aims to stay one step ahead of the risks as they evolve with these services. Ready to get started? Sign up here to begin using Cloudflare One at no cost for teams of up to 50 users.