Tag Archives: Birthday Week

Workers AI: serverless GPU-powered inference on Cloudflare’s global network

Post Syndicated from Phil Wittig original http://blog.cloudflare.com/workers-ai/

Workers AI: serverless GPU-powered inference on Cloudflare’s global network

Workers AI: serverless GPU-powered inference on Cloudflare’s global network

If you're anywhere near the developer community, it's almost impossible to avoid the impact that AI’s recent advancements have had on the ecosystem. Whether you're using AI in your workflow to improve productivity, or you’re shipping AI based features to your users, it’s everywhere. The focus on AI improvements are extraordinary, and we’re super excited about the opportunities that lay ahead, but it's not enough.

Not too long ago, if you wanted to leverage the power of AI, you needed to know the ins and outs of machine learning, and be able to manage the infrastructure to power it.

As a developer platform with over one million active developers, we believe there is so much potential yet to be unlocked, so we’re changing the way AI is delivered to developers. Many of the current solutions, while powerful, are based on closed, proprietary models and don't address privacy needs that developers and users demand. Alternatively, the open source scene is exploding with powerful models, but they’re simply not accessible enough to every developer. Imagine being able to run a model, from your code, wherever it’s hosted, and never needing to find GPUs or deal with setting up the infrastructure to support it.

That's why we are excited to launch Workers AI – an AI inference as a service platform, empowering developers to run AI models with just a few lines of code, all powered by our global network of GPUs. It's open and accessible, serverless, privacy-focused, runs near your users, pay-as-you-go, and it's built from the ground up for a best in class developer experience.

Workers AI – making inference just work

We’re launching Workers AI to put AI inference in the hands of every developer, and to actually deliver on that goal, it should just work out of the box. How do we achieve that?

  • At the core of everything, it runs on the right infrastructure – our world-class network of GPUs
  • We provide off-the-shelf models that run seamlessly on our infrastructure
  • Finally, deliver it to the end developer, in a way that’s delightful. A developer should be able to build their first Workers AI app in minutes, and say “Wow, that’s kinda magical!”.

So what exactly is Workers AI? It’s another building block that we’re adding to our developer platform – one that helps developers run well-known AI models on serverless GPUs, all on Cloudflare’s trusted global network. As one of the latest additions to our developer platform, it works seamlessly with Workers + Pages, but to make it truly accessible, we’ve made it platform-agnostic, so it also works everywhere else, made available via a REST API.

Models you know and love

We’re launching with a curated set of popular, open source models, that cover a wide range of inference tasks:

  • Text generation (large language model): meta/llama-2-7b-chat-int8
  • Automatic speech recognition (ASR): openai/whisper
  • Translation: meta/m2m100-1.2
  • Text classification: huggingface/distilbert-sst-2-int8
  • Image classification: microsoft/resnet-50
  • Embeddings: baai/bge-base-en-v1.5

You can browse all available models in your Cloudflare dashboard, and soon you’ll be able to dive into logs and analytics on a per model basis!

Workers AI: serverless GPU-powered inference on Cloudflare’s global network

This is just the start, and we’ve got big plans. After launch, we’ll continue to expand based on community feedback. Even more exciting – in an effort to take our catalog from zero to sixty, we’re announcing a partnership with Hugging Face, a leading AI community + hub. The partnership is multifaceted, and you can read more about it here, but soon you’ll be able to browse and run a subset of the Hugging Face catalog directly in Workers AI.

Accessible to everyone

Part of the mission of our developer platform is to provide all the building blocks that developers need to build the applications of their dreams. Having access to the right blocks is just one part of it — as a developer your job is to put them together into an application. Our goal is to make that as easy as possible.

To make sure you could use Workers AI easily regardless of entry point, we wanted to provide access via: Workers or Pages to make it easy to use within the Cloudflare ecosystem, and via REST API if you want to use Workers AI with your current stack.

Here’s a quick CURL example that translates some text from English to French:

curl https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/run/@cf/meta/@cf/meta/m2m100-1.2b \
-H "Authorization: Bearer {API_TOKEN}" \
	-d '{ "text": "I'll have an order of the moule frites", "target_lang": "french" }'

And here are what the response looks like:

{
  "result": {
    "answer": "Je vais commander des moules frites"
  },
  "success": true,
  "errors":[],
  "messages":[]
}

Use it with any stack, anywhere – your favorite Jamstack framework, Python + Django/Flask, Node.js, Ruby on Rails, the possibilities are endless. And deploy

Designed for developers

Developer experience is really important to us. In fact, most of this post has been about just that. Making sure it works out of the box. Providing popular models that just work. Being accessible to all developers whether you build and deploy with Cloudflare or elsewhere. But it’s more than that – the experience should be frictionless, zero to production should be fast, and it should feel good along the way.

Let’s walk through another example to show just how easy it is to use! We’ll run Llama 2, a popular large language model open sourced by Meta, in a worker.

We’ll assume you have some of the basics already complete (Cloudflare account, Node, NPM, etc.), but if you don’t this guide will get you properly set up!

1. Create a Workers project

Create a new project named workers-ai by running:

$ npm create cloudflare@latest

When setting up your workers-ai worker, answer the setup questions as follows:

  • Enter workers-ai for the app name
  • Choose Hello World script for the type of application
  • Select yes to using TypeScript
  • Select yes to using Git
  • Select no to deploying

Lastly navigate to your new app directory:

cd workers-ai

2. Connect Workers AI to your worker

Create a Workers AI binding, which allows your worker to access the Workers AI service without having to manage an API key yourself.

To bind Workers AI to your worker, add the following to the end of your wrangler.toml file:

[ai]
binding = "AI" #available in your worker via env.AI

You can also bind Workers AI to a Pages Function. For more information, refer to Functions Bindings.

3. Install the Workers AI client library

npm install @cloudflare/ai --save-dev

4. Run an inference task in your worker

Update the source/index.ts with the following code:

import { Ai } from '@cloudflare/ai'
export default {
  async fetch(request, env) {
    const ai = new Ai(env.AI);
    const input = { prompt: "What's the origin of the phrase 'Hello, World'" };
    const output = await ai.run('@cf/meta/llama-2-7b-chat-int8', input );
    return new Response(JSON.stringify(output));
  },
};

5. Develop locally with Wrangler

While in your project directory, test Workers AI locally by running:

$ npx wranlger dev --remote

Note – These models currently only run on Cloudflare’s network of GPUs (and not locally), so setting `–remote` above is a must, and you’ll be prompted to log in at this point.

Wrangler will give you a URL (most likely localhost:8787). Visit that URL, and you’ll see a response like this

{
  "response": "Hello, World is a common phrase used to test the output of a computer program, particularly in the early stages of programming. The phrase "Hello, World!" is often the first program that a beginner learns to write, and it is included in many programming language tutorials and textbooks as a way to introduce basic programming concepts. The origin of the phrase "Hello, World!" as a programming test is unclear, but it is believed to have originated in the 1970s. One of the earliest known references to the phrase is in a 1976 book called "The C Programming Language" by Brian Kernighan and Dennis Ritchie, which is considered one of the most influential books on the development of the C programming language.
}

6. Deploy your worker

Finally, deploy your worker to make your project accessible on the Internet:

$ npx wranlger dev --remote
# Outputs: https://workers-ai.<YOUR_SUBDOMAIN>.workers.dev

And that’s it. You can literally go from zero to deployed AI in minutes. This is obviously a simple example, but shows how easy it is to run Workers AI from any project.

Privacy by default

When Cloudflare was founded, our value proposition had three pillars: more secure, more reliable, and more performant. Over time, we’ve realized that a better Internet is also a more private Internet, and we want to play a role in building it.

That’s why Workers AI is private by default – we don’t train our models, LLM or otherwise, on your data or conversations, and our models don’t learn from your usage. You can feel confident using Workers AI in both personal and business settings, without having to worry about leaking your data. Other providers only offer this fundamental feature with their enterprise version. With us, it’s built in for everyone.

We’re also excited to support data localization in the future. To make this happen, we have an ambitious GPU rollout plan – we’re launching with seven sites today, roughly 100 by the end of 2023, and nearly everywhere by the end of 2024. Ultimately, this will empower developers to keep delivering killer AI features to their users, while staying compliant with their end users’ data localization requirements.

The power of the platform

Vector database – Vectorize

Workers AI is all about running Inference, and making it really easy to do so, but sometimes inference is only part of the equation. Large language models are trained on a fixed set of data, based on a snapshot at a specific point in the past, and have no context on your business or use case. When you submit a prompt, information specific to you can increase the quality of results, making it more useful and relevant. That’s why we’re also launching Vectorize, our vector database that’s designed to work seamlessly with Workers AI. Here’s a quick overview of how you might use Workers AI + Vectorize together.

Example: Use your data (knowledge base) to provide additional context to an LLM when a user is chatting with it.

  1. Generate initial embeddings: run your data through Workers AI using an embedding model. The output will be embeddings, which are numerical representations of those words.
  2. Insert those embeddings into Vectorize: this essentially seeds the vector database with your data, so we can later use it to retrieve embeddings that are similar to your users’ query
  3. Generate embedding from user question: when a user submits a question to your AI app, first, take that question, and run it through Workers AI using an embedding model.
  4. Get context from Vectorize: use that embedding to query Vectorize. This should output embeddings that are similar to your user’s question.
  5. Create context aware prompt: Now take the original text associated with those embeddings, and create a new prompt combining the text from the vector search, along with the original question
  6. Run prompt: run this prompt through Workers AI using an LLM model to get your final result

AI Gateway

That covers a more advanced use case. On the flip side, if you are running models elsewhere, but want to get more out of the experience, you can run those APIs through our AI gateway to get features like caching, rate-limiting, analytics and logging. These features can be used to protect your end point, monitor and optimize costs, and also help with data loss prevention. Learn more about AI gateway here.

Start building today

Try it out for yourself, and let us know what you think. Today we’re launching Workers AI as an open Beta for all Workers plans – free or paid. That said, it’s super early, so…

Warning – It’s an early beta

Usage is not currently recommended for production apps, and limits + access are subject to change.

Limits

We’re initially launching with limits on a per-model basis

  • @cf/meta/llama-2-7b-chat-int8: 5 reqs/min
  • All other modes are between 120-180 reqs/min

Checkout our docs for a full overview of our limits.

Pricing

What we released today is just a small preview to give you a taste of what’s coming (we simply couldn’t hold back), but we’re looking forward to putting the full-throttle version of Workers AI in your hands.

We realize that as you approach building something, you want to understand: how much is this going to cost me? Especially with AI costs being so easy to get out of hand. So we wanted to share the upcoming pricing of Workers AI with you.

While we won’t be billing on day one, we are announcing what we expect our pricing will look like.

Users will be able to choose from two ways to run Workers AI:

  • Regular Twitch Neurons (RTN) – running wherever there's capacity at $0.01 / 1k neurons
  • Fast Twitch Neurons (FTN) – running at nearest user location at $1.25 / 1k neurons

You may be wondering — what’s a neuron?

Neurons are a way to measure AI output that always scales down to zero (if you get no usage, you will be charged for 0 neurons). To give you a sense of what you can accomplish with a thousand neurons, you can: generate 130 LLM responses, 830 image classifications, or 1,250 embeddings.

Our goal is to help our customers pay only for what they use, and choose the pricing that best matches their use case, whether it’s price or latency that is top of mind.

What’s on the roadmap?

Workers AI is just getting started, and we want your feedback to help us make it great. That said, there are some exciting things on the roadmap.

More models, please

We're launching with a solid set of models that just work, but will continue to roll out new models based on your feedback. If there’s a particular model you'd love to see on Workers AI, pop into our Discord and let us know!

In addition to that, we're also announcing a partnership with Hugging Face, and soon you'll be able to access and run a subset of the Hugging Face catalog directly from Workers AI.

Analytics + observability

Up to this point, we’ve been hyper focussed on one thing – making it really easy for any developer to run powerful AI models in just a few lines of code. But that’s only one part of the story. Up next, we’ll be working on some analytics and observability capabilities to give you insights into your usage + performance + spend on a per-model basis, plus the ability to fig into your logs if you want to do some exploring.

A road to global GPU coverage

Our goal is to be the best place to run inference on Region: Earth, so we're adding GPUs to our data centers as fast as we can.

We plan to be in 100 data centers by the end this year

Workers AI: serverless GPU-powered inference on Cloudflare’s global network

And nearly everywhere by the end of 2024

Workers AI: serverless GPU-powered inference on Cloudflare’s global network

We’re really excited to see you build – head over to our docs to get started.

If you need inspiration, want to share something you’re building, or have a question – pop into our Developer Discord.

Writing poems using LLama 2 on Workers AI

Post Syndicated from John Graham-Cumming original http://blog.cloudflare.com/writing-poems-using-llama-2-on-workers-ai/

Writing poems using LLama 2 on Workers AI

Writing poems using LLama 2 on Workers AI

Matthew and Michelle, co-founders of Cloudflare, published their annual founders’ letter today. The letter ends with a poem written by an AI running using Workers AI on Cloudflare’s global network.

Here’s the code that wrote the poem. It uses Workers AI and the Meta Llama 2 model with 7B parameters and 8-bit integers. Just 14 lines of code running on the Cloudflare global network, and you’ve got your very own AI to chat with.

import { Ai } from "@cloudflare/ai";

export default {
    async fetch(request: Request, env: Env): Promise<Response> {
        const body = await request.json();
        const ai = new Ai(env.AI);
        const response = await ai.run("@cf/meta/llama-2-7b-chat-int8", body);
        return new Response(JSON.stringify(response));
    },
};

export interface Env {
    AI: any;
}

That was deployed on Workers AI and all I had to do was ask for poems. Here’s my terminal output (with just the domain name changed).

% curl -X POST https://example.com/ -d '{"prompt":"Write a poem \
that talks about the connectivity cloud"}' | jq -r .response

Cloud computing provides a connectivity that's unmatched,
A bridge that spans the globe with ease and grace.
It brings us closer, no matter where we are,
And makes the world a smaller, more intimate place.

With just a click, we can connect and share,
Our thoughts and ideas, our hopes and dreams.
The cloud unites us, in a way that's fair,
And makes the world a more harmonious theme.

So let us embrace this connectivity,
And use it to build a better world for all.
For in the cloud, we find a unity,
That can help us reach new heights, and stand tall.

% curl -X POST https://example.com/ -d '{"prompt":"Write a poem \
that talks about the connectivity cloud and contrast it with the  \
captivity cloud"}' | jq -r .response

The connectivity cloud, a wondrous sight to see,
Where data flows and knowledge is set free,
A place where minds meet and ideas take flight,
A world of endless possibility in sight.

But then there's the captivity cloud, a darker place,
Where thoughts are trapped and minds are in disgrace,
A prison of the mind, where freedom's lost,
A world of endless sorrow and cost.

So let us choose the connectivity cloud,

And let our minds be free and allowed,

To soar and glide and reach for the sky,

And never be held back, but always fly.

% curl -X POST https://example.com/ -d '{"prompt":"Write a poem \
that talks about the connectivity cloud and contrast it with the  \
captivity cloud using Elizabethan English"}' | jq -r .response

In the connectivity cloud, I find my peace,

Where thoughts and dreams and hopes do cease to flee.

The captivity cloud doth but darken my day,

With chains of sorrow that will not away.

In the connectivity cloud, I find my home,

Where love and joy and laughter do make their moan.

The captivity cloud doth but lock me in,

With sorrow's shackles that will not give in.

In the connectivity cloud, I find my fate,

Where fortune smiles and misery doth wait.

The captivity cloud doth but bring me woe,

With darkness that will not let me go.

In the connectivity cloud, I find my heart,

Where love and hope and joy do play their part.

The captivity cloud doth but tear it apart,

With sorrow's hand that will not let it start.

% curl -X POST https://example.com/ -d '{"prompt":"Write a poem \
that talks about the connectivity cloud and contrast it with the  \
captivity cloud in the style of Lord Byron"}' | jq -r .response

In the realm of the cloud, where data reigns supreme,

A new divide has emerged, a digital dream,

The connectivity cloud, a place of delight,

Where knowledge and wisdom take flight.

But then there's the captivity cloud, a darker hue,

Where privacy and security are hard to pursue,

A prison of sorts, where freedom's light is lost,

And data's secrets are forever crossed.

Yet still, we cling to the captivity cloud,

For in its depths, we find a measure of proof,

That though the connectivity cloud may be bright,

The captivity cloud holds a certain allure in sight.

So let us embrace both, the light and the dark,

For in their union, we find a digital spark,

That illuminates the path to a brighter day,

Where knowledge and privacy reign in a digital way.

What are you going to build on Workers AI? It’s ready and waiting. We’ll help you go from idea to deployed in minutes.

If you want to know exactly how to deploy something like this read the Workers AI announcement blog.

Announcing AI Gateway: making AI applications more observable, reliable, and scalable

Post Syndicated from Michelle Chen original http://blog.cloudflare.com/announcing-ai-gateway/

Announcing AI Gateway: making AI applications more observable, reliable, and scalable

Announcing AI Gateway: making AI applications more observable, reliable, and scalable

Today, we’re excited to announce our beta of AI Gateway – the portal to making your AI applications more observable, reliable, and scalable.

AI Gateway sits between your application and the AI APIs that your application makes requests to (like OpenAI) – so that we can cache responses, limit and retry requests, and provide analytics to help you monitor and track usage. AI Gateway handles the things that nearly all AI applications need, saving you engineering time, so you can focus on what you're building.

Connecting your app to AI Gateway

It only takes one line of code for developers to get started with Cloudflare’s AI Gateway. All you need to do is replace the URL in your API calls with your unique AI Gateway endpoint. For example, with OpenAI you would define your baseURL as "https://gateway.ai.cloudflare.com/v1/ACCOUNT_TAG/GATEWAY/openai" instead of "https://api.openai.com/v1" – and that’s it. You can keep your tokens in your code environment, and we’ll log the request through AI Gateway before letting it pass through to the final API with your token.

// configuring AI gateway with the dedicated OpenAI endpoint

const openai = new OpenAI({
  apiKey: env.OPENAI_API_KEY,
  baseURL: "https://gateway.ai.cloudflare.com/v1/ACCOUNT_TAG/GATEWAY/openai",
});

We currently support model providers such as OpenAI, Hugging Face, and Replicate with plans to add more in the future. We support all the various endpoints within providers and also response streaming, so everything should work out-of-the-box once you have the gateway configured. The dedicated endpoint for these providers allows you to connect your apps to AI Gateway by changing one line of code, without touching your original payload structure.

We also have a universal endpoint that you can use if you’d like more flexibility with your requests. With the universal endpoint, you have the ability to define fallback models and handle request retries. For example, let’s say a request was made to OpenAI GPT-3, but the API was down – with the universal endpoint, you could define Hugging Face GPT-2 as your fallback model and the gateway can automatically resend that request to Hugging Face. This is really helpful in improving resiliency for your app in cases where you are noticing unusual errors, getting rate limited, or if one bill is getting costly, and you want to diversify to other models. With the universal endpoint, you’ll just need to tweak your payload to specify the provider and endpoint, so we can properly route requests for you. Check out the example request below and the docs for more details on the universal endpoint schema.

# Using the Universal Endpoint to first try OpenAI, then Hugging Face

curl https://gateway.ai.cloudflare.com/v1/ACCOUNT_TAG/GATEWAY  -X POST \
  --header 'Content-Type: application/json' \
  --data '[
  {
    "provider": "openai",
    "endpoint": "chat/completions",
    "headers": { 
      "Authorization": "Bearer $OPENAI_TOKEN",
      "Content-Type": "application/json"
    },
    "query": {
      "model": "gpt-3.5-turbo",
      "stream": true,
      "messages": [
        {
          "role": "user",
          "content": "What is Cloudflare?"
        }
      ]
    }
  },
  {
    "provider": "huggingface",
    "endpoint": "gpt2",
    "headers": { 
      "Authorization": "Bearer $HF_TOKEN",
      "Content-Type": "application/json"
    },
    "query": {
      "inputs": "What is Cloudflare?"
    }
  },
]'

Gaining visibility into your app’s usage

Now that your app is connected to Cloudflare, we can help you gather analytics and give insight and control on the traffic that is passing through your apps. Regardless of what model or infrastructure you use in the backend, we can help you log requests and analyze data like the number of requests, number of users, cost of running the app, duration of requests, etc. Although these seem like basic analytics that model providers should expose, it’s surprisingly difficult to get visibility into these metrics with the typical model providers. AI Gateway takes it one step further and lets you aggregate analytics across multiple providers too.

Announcing AI Gateway: making AI applications more observable, reliable, and scalable

Controlling how your app scales

One of the pain points we often hear is how expensive it costs to build and run AI apps. Each API call can be unpredictably expensive and costs can rack up quickly, preventing developers from scaling their apps to their full potential. At the speed that the industry is moving, you don’t want to be limited by your scale and left behind – and that’s where caching and rate limiting can help. We allow developers to cache their API calls so that new requests can be served from our cache rather than the original API – making it cheaper and faster. Rate limiting can also help control costs by throttling the number of requests and preventing excessive or suspicious activity. Developers have full flexibility to define caching and rate limiting rules, so that apps can scale at a sustainable pace of your choosing.

Announcing AI Gateway: making AI applications more observable, reliable, and scalable

The Workers AI Platform

AI Gateway pairs perfectly with our new Workers AI and Vectorize products, so you can build full-stack AI applications all within the Workers ecosystem. From deploying applications with Workers, running model inference on the edge with Workers AI, storing vector embeddings on Vectorize, to gaining visibility into your applications with AI Gateway – the Workers platform is your one-stop shop to bring your AI applications to life. To learn how to use AI Gateway with Workers AI or the different providers, check out the docs.

Next up: the enterprise use case

We are shipping v1 of AI Gateway with a few core features, but we have plans to expand the product to cover more advanced use cases as well – usage alerts, jailbreak protection, dynamic model routing with A/B testing, and advanced cache rules. But what we’re really excited about are the other ways you can apply AI Gateway…

In the future, we want to develop AI Gateway into a product that helps organizations monitor and observe how their users or employees are using AI. This way, you can flip a switch and have all requests within your network to providers (like OpenAI) pass through Cloudflare first – so that you can log user requests, apply access policies, enable rate limiting and data loss prevention (DLP) strategies. A powerful example: if an employee accidentally pastes an API key to ChatGPT, AI Gateway can be configured to see the outgoing request and redact the API key or block the request entirely, preventing it from ever reaching OpenAI or any end providers. We can also log and alert on suspicious requests, so that organizations can proactively investigate and control certain types of activity. AI Gateway then becomes a really powerful tool for organizations that might be excited about the efficiency that AI unlocks, but hesitant about trusting AI when data privacy and user error are really critical threats. We hope that AI Gateway can alleviate these concerns and make adopting AI tools a lot easier for organizations.

Whether you’re a developer building applications or a company who’s interested in how employees are using AI, our hope is that AI Gateway can help you demystify what’s going on inside your apps – because once you understand how your users are using AI, you can make decisions on how you actually want them to use it. Some of these features are still in development, but we hope this illustrates the power of AI Gateway and our vision for the future.

At Cloudflare, we live and breathe innovation (as you can tell by our Birthday Week announcements!) and the pace of innovation in AI is incredible to witness. We’re thrilled that we can not only help people build and use apps, but actually help accelerate the adoption and development of AI with greater control and visibility. We can’t wait to hear what you build – head to the Cloudflare dashboard to try out AI Gateway and let us know what you think!

Announcing AI Gateway: making AI applications more observable, reliable, and scalable

Partnering with Hugging Face to make deploying AI easier and more affordable than ever 🤗

Post Syndicated from Rita Kozlov original http://blog.cloudflare.com/partnering-with-hugging-face-deploying-ai-easier-affordable/

Partnering with Hugging Face to make deploying AI easier and more affordable than ever 🤗

Partnering with Hugging Face to make deploying AI easier and more affordable than ever 🤗

Today, we’re excited to announce that we are partnering with Hugging Face to make AI models more accessible and affordable than ever before to developers.

There are three things we look forward to making available to developers over the coming months:

  1. We’re excited to bring serverless GPU models to Hugging Face — no more wrangling infrastructure or paying for unused capacity. Just pick your model, and go;
  2. Bringing popular Hugging Face optimized models to Cloudflare’s model catalog;
  3. Introduce Cloudflare integrations as a part of Hugging Face’s Inference solutions.

Hosting over 500,000 models and serving over one million model downloads a day, Hugging Face is the go-to place for developers to add AI to their applications.

Meanwhile, over the past six years at Cloudflare, our goal has been to make it as easy as possible for developers to bring their ideas and applications to life on our developer platform.

As AI has become a critical part of every application, this partnership has felt like a natural match to put tools in the hands of developers to make deploying AI easy and affordable.

“Hugging Face and Cloudflare both share a deep focus on making the latest AI innovations as accessible and affordable as possible for developers. We’re excited to offer serverless GPU services in partnership with Cloudflare to help developers scale their AI apps from zero to global, with no need to wrangle infrastructure or predict the future needs of your application — just pick your model and deploy.”
Clem Delangue, CEO of Hugging Face.

We’re excited to share what’s to come, so we wanted to give you a sneak peek into what’s ahead.

Hugging Face models at your fingertips

As a developer, when you have an idea, you want to be able to act on it as quickly as possible. Through our partnership, we’re excited to provide you with familiar models, regardless of where you’re getting started.

If you’re using Cloudflare’s developer platform to build applications, we’re excited to bring Hugging Face models into the flow as a native part of the experience. You will soon be able to deploy Hugging Face models, optimized for performance and speed, right from Cloudflare’s dashboard.

Partnering with Hugging Face to make deploying AI easier and more affordable than ever 🤗

Alternatively, if you’re used to perusing and finding your models on Hugging Face, you will soon be able to deploy them directly from the Hugging Face UI directly to Workers AI.

Partnering with Hugging Face to make deploying AI easier and more affordable than ever 🤗

Both of our teams are committed to building the best developer experiences possible, so we look forward to continuing to file away any friction that gets in developers’ ways of building the next big AI idea.

Bringing serverless GPU inference to Hugging Face users

Hugging Face offers multiple inference solutions to serve predictions from the 500,000 models hosted on the platform without managing infrastructure, from the free and rate-limited Inference API, to dedicated infrastructure deployments with Inference Endpoints, and even in-browser edge inference with Transformers.js.

We look forward to working closely with the teams at Hugging Face to enable new experiences powered by Cloudflare: from new serverless GPU inference solutions, to new edge use cases – stay tuned!

See you soon!

We couldn’t wait to share the news with our developers about our partnership, and can’t wait to put these experiences in the hands of developers over the coming months.

Vectorize: a vector database for shipping AI-powered applications to production, fast

Post Syndicated from Matt Silverlock original http://blog.cloudflare.com/vectorize-vector-database-open-beta/

Vectorize: a vector database for shipping AI-powered applications to production, fast

Vectorize: a vector database for shipping AI-powered applications to production, fast

Vectorize is our brand-new vector database offering, designed to let you build full-stack, AI-powered applications entirely on Cloudflare’s global network: and you can start building with it right away. Vectorize is in open beta, and is available to any developer using Cloudflare Workers.

You can use Vectorize with Workers AI to power semantic search, classification, recommendation and anomaly detection use-cases directly with Workers, improve the accuracy and context of answers from LLMs (Large Language Models), and/or bring-your-own embeddings from popular platforms, including OpenAI and Cohere.

Visit Vectorize’s developer documentation to get started, or read on if you want to better understand what vector databases do and how Vectorize is different.

Why do I need a vector database?

Machine learning models can’t remember anything: only what they were trained on.

Vector databases are designed to solve this, by capturing how an ML model represents data — including structured and unstructured text, images and audio — and storing it in a way that allows you to compare against future inputs. This allows us to leverage the power of existing machine-learning models and LLMs (Large Language Models) for content they haven’t been trained on: which, given the tremendous cost of training models, turns out to be extremely powerful.

To better illustrate why a vector database like Vectorize is useful, let’s pretend they don’t exist, and see how painful it is to give context to an ML model or LLM for a semantic search or recommendation task. Our goal is to understand what content is similar to our query and return it: based on our own dataset.

  1. Our user query comes in: they’re searching for “how to write to R2 from Cloudflare Workers”
  2. We load up our entire documentation dataset — a thankfully “small” dataset at about 65,000 sentences, or 2.1 GB — and provide it alongside the query from our user. This allows the model to have the context it needs, based on our data.
  3. We wait.
  4. (A long time)
  5. We get our similarity scores back, with the sentences most similar to the user’s query, and then work to map those back to URLs before we return our search results.

… and then another query comes in, and we have to start this all over again.

In practice, this isn’t really possible: we can’t pass that much context in an API call (prompt) to most machine learning models, and even if we could, it’d take tremendous amounts of memory and time to process our dataset over-and-over again.

With a vector database, we don’t have to repeat step 2: we perform it once, or as our dataset updates, and use our vector database to provide a form of long-term memory for our machine learning model. Our workflow looks a little more like this:

  1. We load up our entire documentation dataset, run it through our model, and store the resulting vector embeddings in our vector database (just once).
  2. For each user query (and only the query) we ask the same model and retrieve a vector representation.
  3. We query our vector database with that query vector, which returns the vectors closest to our query vector.

If we looked at these two flows side by side, we can quickly see how inefficient and impractical it is to use our own dataset with an existing model without a vector database:

Vectorize: a vector database for shipping AI-powered applications to production, fast
Using a vector database to help machine learning models remember.

From this simple example, it’s probably starting to make some sense: but you might also be wondering why you need a vector database instead of just a regular database.

Vectors are the model’s representation of an input: how it maps that input to its internal structure, or “features”. Broadly, the more similar vectors are, the more similar the model believes those inputs to be based on how it extracts features from an input.

This is seemingly easy when we look at example vectors of only a handful of dimensions. But with real-world outputs, searching across 10,000 to 250,000 vectors, each potentially 1,536 dimensions wide, is non-trivial. This is where vector databases come in: to make search work at scale, vector databases use a specific class of algorithm, such as k-nearest neighbors (kNN) or other approximate nearest neighbor (ANN) algorithms to determine vector similarity.

And although vector databases are extremely useful when building AI and machine learning powered applications, they’re not only useful in those use-cases: they can be used for a multitude of classification and anomaly detection tasks. Knowing whether a query input is similar — or potentially dissimilar — from other inputs can power content moderation (does this match known-bad content?) and security alerting (have I seen this before?) tasks as well.

We built Vectorize to be a powerful partner to Workers AI: enabling you to run vector search tasks as close to users as possible, and without having to think about how to scale it for production.

We’re going to take a real world example — building a (product) recommendation engine for an e-commerce store — and simplify a few things.

Our goal is to show a list of “relevant products” on each product listing page: a perfect use-case for vector search. Our input vectors in the example are placeholders, but in a real world application we would generate them based on product descriptions and/or cart data by passing them through a sentence similarity model (such as Worker’s AI’s text embedding model)

Each vector represents a product across our store, and we associate the URL of the product with it. We could also set the ID of each vector to the product ID: both approaches are valid. Our query — vector search — represents the product description and content for the product user is currently viewing.

Let’s step through what this looks like in code: this example is pulled straight from our developer documentation:

export interface Env {
	// This makes our vector index methods available on env.MY_VECTOR_INDEX.*
	// e.g. env.MY_VECTOR_INDEX.insert() or .query()
	TUTORIAL_INDEX: VectorizeIndex;
}

// Sample vectors: 3 dimensions wide.
//
// Vectors from a machine-learning model are typically ~100 to 1536 dimensions
// wide (or wider still).
const sampleVectors: Array<VectorizeVector> = [
	{ id: '1', values: [32.4, 74.1, 3.2], metadata: { url: '/products/sku/13913913' } },
	{ id: '2', values: [15.1, 19.2, 15.8], metadata: { url: '/products/sku/10148191' } },
	{ id: '3', values: [0.16, 1.2, 3.8], metadata: { url: '/products/sku/97913813' } },
	{ id: '4', values: [75.1, 67.1, 29.9], metadata: { url: '/products/sku/418313' } },
	{ id: '5', values: [58.8, 6.7, 3.4], metadata: { url: '/products/sku/55519183' } },
];

export default {
	async fetch(request: Request, env: Env, ctx: ExecutionContext): Promise<Response> {
		if (new URL(request.url).pathname !== '/') {
			return new Response('', { status: 404 });
		}
		// Insert some sample vectors into our index
		// In a real application, these vectors would be the output of a machine learning (ML) model,
		// such as Workers AI, OpenAI, or Cohere.
		let inserted = await env.TUTORIAL_INDEX.insert(sampleVectors);

		// Log the number of IDs we successfully inserted
		console.info(`inserted ${inserted.count} vectors into the index`);

		// In a real application, we would take a user query - e.g. "durable
		// objects" - and transform it into a vector emebedding first.
		//
		// In our example, we're going to construct a simple vector that should
		// match vector id #5
		let queryVector: Array<number> = [54.8, 5.5, 3.1];

		// Query our index and return the three (topK = 3) most similar vector
		// IDs with their similarity score.
		//
		// By default, vector values are not returned, as in many cases the
		// vectorId and scores are sufficient to map the vector back to the
		// original content it represents.
		let matches = await env.TUTORIAL_INDEX.query(queryVector, { topK: 3, returnVectors: true });

		// We map over our results to find the most similar vector result.
		//
		// Since our index uses the 'cosine' distance metric, scores will range
		// from 1 to -1.  A value of '1' means the vector is the same; the
		// closer to 1, the more similar. Values of -1 (least similar) and 0 (no
		// match).
		// let closestScore = 0;
		// let mostSimilarId = '';
		// matches.matches.map((match) => {
		// 	if (match.score > closestScore) {
		// 		closestScore = match.score;
		// 		mostSimilarId = match.vectorId;
		// 	}
		// });

		return Response.json({
			// This will return the closest vectors: we'll see that the vector
			// with id = 5 has the highest score (closest to 1.0) as the
			// distance between it and our query vector is the smallest.
			// Return the full set of matches so we can see the possible scores.
			matches: matches,
		});
	},
};

The code above is intentionally simple, but illustrates vector search at its core: we insert vectors into our database, and query it for vectors with the smallest distance to our query vector.

Here are the results, with the values included, so we visually observe that our query vector [54.8, 5.5, 3.1] is similar to our highest scoring match: [58.799, 6.699, 3.400] returned from our search. This index uses cosine similarity to calculate the distance between vectors, which means that the closer the score to 1, the more similar a match is to our query vector.

{
  "matches": {
    "count": 3,
    "matches": [
      {
        "score": 0.999909,
        "vectorId": "5",
        "vector": {
          "id": "5",
          "values": [
            58.79999923706055,
            6.699999809265137,
            3.4000000953674316
          ],
          "metadata": {
            "url": "/products/sku/55519183"
          }
        }
      },
      {
        "score": 0.789848,
        "vectorId": "4",
        "vector": {
          "id": "4",
          "values": [
            75.0999984741211,
            67.0999984741211,
            29.899999618530273
          ],
          "metadata": {
            "url": "/products/sku/418313"
          }
        }
      },
      {
        "score": 0.611976,
        "vectorId": "2",
        "vector": {
          "id": "2",
          "values": [
            15.100000381469727,
            19.200000762939453,
            15.800000190734863
          ],
          "metadata": {
            "url": "/products/sku/10148191"
          }
        }
      }
    ]
  }
}

In a real application, we could now quickly return product recommendation URLs based on the most similar products, sorting them by their score (highest to lowest), and increasing the topK value if we want to show more. The metadata stored alongside each vector could also embed a path to an R2 object, a UUID for a row in a D1 database, or a key-value pair from Workers KV.

Workers AI + Vectorize: full stack vector search on Cloudflare

In a real application, we need a machine learning model that can both generate vector embeddings from our original dataset (to seed our database) and quickly turn user queries into vector embeddings too. These need to be from the same model, as each model represents features differently.

Here’s a compact example building an entire end-to-end vector search pipeline on Cloudflare:

import { Ai } from '@cloudflare/ai';
export interface Env {
	TEXT_EMBEDDINGS: VectorizeIndex;
	AI: any;
}
interface EmbeddingResponse {
	shape: number[];
	data: number[][];
}

export default {
	async fetch(request: Request, env: Env, ctx: ExecutionContext): Promise<Response> {
		const ai = new Ai(env.AI);
		let path = new URL(request.url).pathname;
		if (path.startsWith('/favicon')) {
			return new Response('', { status: 404 });
		}

		// We only need to generate vector embeddings just the once (or as our
		// data changes), not on every request
		if (path === '/insert') {
			// In a real-world application, we could read in content from R2 or
			// a SQL database (like D1) and pass it to Workers AI
			const stories = ['This is a story about an orange cloud', 'This is a story about a llama', 'This is a story about a hugging emoji'];
			const modelResp: EmbeddingResponse = await ai.run('@cf/baai/bge-base-en-v1.5', {
				text: stories,
			});

			// We need to convert the vector embeddings into a format Vectorize can accept.
			// Each vector needs an id, a value (the vector) and optional metadata.
			// In a real app, our ID would typicaly be bound to the ID of the source
			// document.
			let vectors: VectorizeVector[] = [];
			let id = 1;
			modelResp.data.forEach((vector) => {
				vectors.push({ id: `${id}`, values: vector });
				id++;
			});

			await env.TEXT_EMBEDDINGS.upsert(vectors);
		}

		// Our query: we expect this to match vector id: 1 in this simple example
		let userQuery = 'orange cloud';
		const queryVector: EmbeddingResponse = await ai.run('@cf/baai/bge-base-en-v1.5', {
			text: [userQuery],
		});

		let matches = await env.TEXT_EMBEDDINGS.query(queryVector.data[0], { topK: 1 });
		return Response.json({
			// We expect vector id: 1 to be our top match with a score of
			// ~0.896888444
			// We are using a cosine distance metric, where the closer to one,
			// the more similar.
			matches: matches,
		});
	},
};

The code above does four things:

  1. It passes the three sentences to Workers AI’s text embedding model (@cf/baai/bge-base-en-v1.5) and retrieves their vector embeddings.
  2. It inserts those vectors into our Vectorize index.
  3. Takes the user query and transforms it into a vector embedding via the same Workers AI model.
  4. Queries our Vectorize index for matches.

This example might look “too” simple, but in a production application, we’d only have to change two things: just insert our vectors once (or periodically via Cron Triggers), and replace our three example sentences with real data stored in R2, a D1 database, or another storage provider.

In fact, this is incredibly similar to how we run Cursor, the AI assistant that can answer questions about Cloudflare Worker: we migrated Cursor to run on Workers AI and Vectorize. We generate text embeddings from our developer documentation using its built-in text embedding model, insert them into a Vectorize index, and transform user queries on the fly via that same model.

BYO embeddings from your favorite AI API

Vectorize isn’t just limited to Workers AI, though: it’s a fully-fledged, standalone vector database.

If you’re already using OpenAI’s Embedding API, Cohere’s multilingual model, or any other embedding API, then you can easily bring-your-own (BYO) vectors to Vectorize.

It works just the same: generate your embeddings, insert them into Vectorize, and pass your queries through the model before you query your index. Vectorize includes a few shortcuts for some of the most popular embedding models.

# Vectorize has ready-to-go presets that set the dimensions and distance metric for popular embeddings models
$ wrangler vectorize create openai-index-example --preset=openai-text-embedding-ada-002

This can be particularly useful if you already have an existing workflow around an existing embeddings API, and/or have validated a specific multimodal or multilingual embeddings model for your use-case.

Making the cost of AI predictable

There’s a tremendous amount of excitement around AI and ML, but there’s also one big concern: that it’s too expensive to experiment with, and hard to predict at scale.

With Vectorize, we wanted to bring a simpler pricing model to vector databases. Have an idea for a proof-of-concept at work? That should fit into our free-tier limits. Scaling up and optimizing your embedding dimensions for performance vs. accuracy? It shouldn’t break the bank.

Importantly, Vectorize aims to be predictable: you don’t need to estimate CPU and memory consumption, which can be hard when you’re just starting out, and made even harder when trying to plan for your peak vs. off-peak hours in production for a brand new use-case. Instead, you’re charged based on the total number of vector dimensions you store, and the number of queries against them each month. It’s our job to take care of scaling up to meet your query patterns.

Here’s the pricing for Vectorize — and if you have a Workers paid plan now, Vectorize is entirely free to use until 2024:

Workers Free (coming soon) Workers Paid ($5/month)
Queried vector dimensions included 30M total queried dimensions / month 50M total queried dimensions / month
Stored vector dimensions included 5M stored dimensions / month 10M stored dimensions / month
Additional cost $0.04 / 1M vector dimensions queried or stored $0.04 / 1M vector dimensions queried or stored

Pricing is based entirely on what you store and query: (total vector dimensions queried + stored) * dimensions_per_vector * price. Query more? Easy to predict. Optimizing for smaller dimensions per vector to improve speed and reduce overall latency? Cost goes down. Have a few indexes for prototyping or experimenting with new use-cases? We don’t charge per-index.

Vectorize: a vector database for shipping AI-powered applications to production, fast
Create as many as you need indexes to prototype new ideas and/or separate production from dev.

As an example: if you load 10,000 Workers AI vectors (384 dimensions each) and make 5,000 queries against your index each day, it’d result in 49 million total vector dimensions queried and still fit into what we include in the Workers Paid plan ($5/month). Better still: we don’t delete your indexes due to inactivity.

Note that while this pricing isn’t final, we expect few changes going forward. We want to avoid the element of surprise: there’s nothing worse than starting to build on a platform and realizing the pricing is untenable after you’ve invested the time writing code, tests and learning the nuances of a technology.

Vectorize!

Every Workers developer on a paid plan can start using Vectorize immediately: the open beta is available right now, and you can visit our developer documentation to get started.

This is also just the beginning of the vector database story for us at Cloudflare. Over the next few weeks and months, we intend to land a new query engine that should further improve query performance, support even larger indexes, introduce sub-index filtering capabilities, increased metadata limits, and per-index analytics.

If you’re looking for inspiration on what to build, see the semantic search tutorial that combines Workers AI and Vectorize for document search, running entirely on Cloudflare. Or an example of how to combine OpenAI and Vectorize to give an LLM more context and dramatically improve the accuracy of its answers.

And if you have questions about how to use Vectorize for our product & engineering teams, or just want to bounce an idea off of other developers building on Workers AI, join the #vectorize and #workers-ai channels on our Developer Discord.

Vectorize: a vector database for shipping AI-powered applications to production, fast

What AI companies are building with Cloudflare

Post Syndicated from Veronica Marin original http://blog.cloudflare.com/ai-companies-building-cloudflare/

What AI companies are building with Cloudflare

What AI companies are building with Cloudflare

What AI applications can you build with Cloudflare? Instead of us telling you we reached out to a small handful of the numerous AI companies using Cloudflare to learn a bit about what they’re building and how Cloudflare is helping them on their journey.

We heard common themes from these companies about the challenges they face in bringing new products to market in the ever-changing world of AI ranging from training and deploying models, the ethical and moral judgements of AI, gaining the trust of users, and the regulatory landscape.  One area that is not a challenge is trusting their AI application infrastructure to Cloudflare.

Azule.ai

What AI companies are building with Cloudflare

Azule, based in Calgary, Canada, was founded to apply the power of AI to streamline and improve ecommerce customer service. It’s an exciting moment that, for the first time ever, we can now dynamically generate, deploy, and test code to meet specific user needs or integrations. This kind of flexibility is crucial to create a tool like Azule that is designed to meet this demand, offering a platform that can handle complex requirements and provide flexible integration options with other tools.

The AI space is evolving quickly and that applies to the rapid evolution of AI agent design patterns. These are essentially frameworks built upon LLM APIs, and they're showing immense potential. Azule effectively allows users to create AI agents which interact with their customers on behalf of their business. It's not just about addressing customer service queries anymore – AI agents can perform significant, ongoing tasks across various industries.

Azule is built entirely on Cloudflare, except for API calls to OpenAI.

The application relies on multiple Developer Platform and Cloudflare products and services.  Durable Objects and websockets are used for live chat.

“Durable Objects enabled us to build our MVP faster than we could have on any other platform, thanks to Cloudflare's thoughtful product design.” – Logan Grasby

Other products used by Azule:

  • Queues for data processing.
  • R2 for all data storage, including vector storage. Instead of using a vector database service, Azule relies entirely on Cloudflare's R2 and cache API for on-disk vector search.
  • Workers KV for storing frequently accessed configuration data.
  • D1 was implemented for their user database.
  • Constellation (now Workers AI) for various labeling and summarization tasks.
  • Workers for Platforms allows Azule AI to write and deploy custom features for the users.
  • Pages for hosting our landing page and marketing content.

Other valuable features used include API shield, email workers, the mail channels integration for email, log push, outbound workers, among others!

“I firmly believe that AI agents are at home on the web. Everything Cloudflare builds has web optimization in mind and so it only makes sense to invest in the platform. By building on Cloudflare, we've made significant cost reductions, particularly by moving all our search solutions to R2. For example, many of our users want to store large datasets on Azule and make them searchable through their agents. Our previous search solutions, based on Pinecone and Milisearch, would have cost thousands of dollars per month to store and search through just one customer's data. With Cloudflare's R2 and cache API, we can now enable our customer's AI agent to comb through large datasets in less than 900ms, at a fraction of the cost.” – Logan Grasby

42able.ai

42able, headquartered in Wales, UK, is at the forefront of AI-driven solutions, dedicated to revolutionizing engagement with business documents. Through cutting-edge technology and innovative strategies, the company seeks to streamline, enhance, and redefine the way businesses interact with their documents.

The modern business landscape is inundated with vast volumes of documents, from contracts and reports to invoices and internal communications. Navigating, understanding, and extracting value from these documents can be time-consuming, error-prone, and often requires significant manual effort.

42able envisions a future where business documents are not just static pieces of information but dynamic assets that businesses can engage with interactively, efficiently, and intelligently.

“Launching an AI product has come with many unique challenges and uncertainties. Users expect AI to be perfect or near-perfect, and are much less forgiving of an AI making an error compared to a human making the same mistake. Decisions about how AI systems should act often involve moral or ethical judgments, which might not be straightforward and can be subject to societal debates. Training and deploying AI models is challenging. Cloudflare's solutions are making it much easier, than managing all the individual parts ourselves.” – James Finney

42able chose Cloudflare for fantastic performance in comparison to other cloud providers, in part due to the no cold boot times, competitive pricing, ease of use, fantastic local development features, and brilliant support. Their development times have decreased through the use of:

  • Workers for all the APIs and re-occurring cron scripts.
  • Pages for all application/platform front-end hosting
  • KV for Angular apps.
  • R2 to store cached personal user data R2.
  • General DNS zone management
  • DDOS protection
  • DNS management
  • Turnstile
  • Zero Trust to secure login pages

They are starting to test with Constellation (now Workers AI) to host some of their models and D1 to support their database needs.

UseChat

What AI companies are building with Cloudflare

UseChat.ai, based in London, UK, supercharges customer support with a ChatGPT powered chatbot that knows your website and everything on it. With a custom ChatGPT chatbot, customers can get instant answers to the most common questions. When a customer needs more support, UseChat.ai will seamlessly hand over from AI to human live chat.

The fully real-time platform was built to take advantage of Workers and Durable Objects from day one. Workers & Durable Objects power the real-time chatbot, integrated with OpenAI ChatGPT API, Queues manages website content crawling, and KV stores crawled website content.

“It wouldn’t have been possible to build and scale our real-time platform as quickly as we did without Workers & Durable Objects. Knowing that a customer can embed our chatbot on their website with millions of visitors, and it will just work lets me sleep sound at night.” – Damien Tanner

Eclipse AI

What AI companies are building with Cloudflare

Eclipse’s mission is to revolutionise the way businesses approach customer feedback. Based in Melbourne, Australia, Eclipse empowers users to make data-driven decisions by leveraging AI for comprehensive customer understanding. If your goals are to; reduce churn, drive growth or improve your customer experience, Eclipse puts the data at your fingertips and provides you actionable insights to drive your business.

Eclipse allows you to unify your Voice of Customer channels (i.e. phone, video calls, emails, support tickets, public reviews and surveys), the platform analyses it at scale and utilises Generative AI to provide key actions specific to your business. Focused on democratising data driven decision-making, Eclipse AI has launched a Freemium model, leveling the playing field for businesses of all sizes to utilise this tech.

“We believe the future of the internet is on the edge and Cloudflare is at the forefront of this revolution with a growing network that covers most major cities around the world. As a startup with limited resources, the Cloudflare developer platform has enabled our dev team to focus on building our product and not be burdened with managing infrastructure. Best of all, it scales automagically with a pay-as-you-go pricing model.” – Saad Irfani

Eclipse AI uses:

  • Cloudflare Workers for the backend API.
  • Cloudflare Pages for the frontend to deliver content across hundreds of cities worldwide.
  • Cloudflare Images to serve cascaded versions of each asset
  • Cloudflare R2 as the object store.

“As a platform that transcribes video/audio call recordings for VoC analytics, choosing a reliable object-store was an important decision. After the launch of R2 we switched from S3 and noticed a staggering 70% reduction in cost. Overall, we are believers in Cloudflare’s vision and are eagerly awaiting the release of D1 so that our entire stack can be powered by the edge.” – Saad Irfani

Embley

What AI companies are building with Cloudflare

Embley, based in Sierre, Switzerland, is a Marketplace Automation Platform that powers the future of marketplace commerce by enabling businesses to scale better and faster.

The platform combines the most advanced technologies such as Artificial Intelligence and Process Mining to strengthen a fast end-to-end business process automation with products tailored to marketplaces businesses.

Cloudflare powers Embley’s frontend through Cloudflare Pages that serves what they call the “control center” to the users at the edge. The control center is the core of the back-office tools that users use to manage their marketplace operations.  The backend is powered by Workers, providing a serverless execution environment, connected to the frontend through the Cloudflare API Gateway.

“The primary reasons for choosing Cloudflare are the powerful serverless products that enable us to run an entire tech stack without having to care about infrastructure. Also, the scalability of Cloudflare’s global network is appealing. Finally, security is embedded into Cloudflare through the Zero Trust platform that enable us to secure both production but also the lower environments including the secured access to internal systems and apps.” – Laurent Christen

Chainfuse

What AI companies are building with Cloudflare

ChainFuse, based in San Francisco, CA, is a multichannel AI platform that assists organizations in collecting and analyzing user feedback on a large scale. Their AI-powered community tool aids support, community, and product teams in garnering valuable insights, facilitating more informed product decisions.

“We have used Google Cloud and AWS, but our experience with Cloudflare has particularly stood out. Since 2016, we have consistently chosen Cloudflare for our projects due to their excellent product range and reliable performance. Saying "it just works" is an understatement.” – Victor Sanchez

ChainFuse relies on Workers for the core of their backend infrastructure and a range of our security solutions to secure their applications and employees. WAF and its vast adaptability is a major defense, blocking an average of 48% of all incoming traffic, effectively weeding out known malicious traffic. Additionally, it employs rate limiting to prevent abuse. API Shield, used in conjunction with WAF, intercepts an average of 1.32% of the incoming traffic that manages to bypass WAF. The Zero Trust Gateway not only secures their employees but also is integrated into their product to prevent end users from exploiting the platform for malicious purposes.

ai.moda

ai.moda, headquartered in Grand Cayman, Cayman Islands, is building multiple AI tools with a focus on helping bridge humans, developers, and machines together. They’re currently building several ChatGPT plugins (such as CVEs and S3 storage), YourCrowd (MTurk compatible API for humans and bots), and Valkyrie (an automated zero-trust hardening for Linux applications and cloud workloads).

Plugins like CVEs by ai.moda bring real-time vulnerability information into ChatGPT.

What AI companies are building with Cloudflare

“By using Workers, we’re able to create SaaS services at a scale and cost that just wouldn’t be possible without. If you want a new ChatGPT plugin, let us know on Friday, and by Monday we can have it developed and shipped in production! The rapid development allowed by Workers is a huge advantage for us.”- David Manouchehri

They chose Cloudflare mainly because of the Workers platform. Being able to deploy new code rapidly globally with a single command has greatly simplified their DevOps needs, and they no longer need to worry about whether they have enough resources to scale up.

ai.moda is a heavy user of Cloudflare Workers, Email Workers, Pages, R2, Durable Objects, Constellation (now Workers AI), Cache API, DMARC management, Access, WAF, logpush, DNS, Health Checks, Zaraz, and D1.

We share the opinion of many of these companies that witnessing the incredible breadth and versatility of AI technology and the impact it has on organizations and people is astonishing, and we can’t wait to see where this technology takes people. If you’re inspired by reading these stories and want to start building, check out the Startup program and our Cloudflare for AI solutions.

If you want to share your story about what you’ve built, reach out to us or join the Developers Discord.

What AI companies are building with Cloudflare

***
Since launching the Launchpad program in 2022, we have showcased a number of exciting startups looking to build the next big application. Whether innovative website designs, content delivery or AI-based features, the internet is waiting for the next big thing.

With that said, we are proud to announce our revamped Built With Workers site, an opportunity to showcase your projects with the developer community. Built With Workers will serve as a public facing repository of full-stack applications running on the Developer Platform to demonstrate how Cloudflare is helping developers build amazing applications.

Whether you're using R2 object storage to store web data, utilizing Workers to manage your application functionality or designing the next big web application UI with Pages, we love seeing what our customers are building!

To showcase your latest and greatest projects featured on Built with Workers, complete and submit our quick form to share your projects or business with us. Share how you're using Cloudflare products to build the application of your dreams or help expand developer knowledge with our developer community.

Cloudflare’s 2023 Annual Founders’ Letter

Post Syndicated from Matthew Prince original http://blog.cloudflare.com/cloudflares-annual-founders-letter-2023/

Cloudflare’s 2023 Annual Founders’ Letter

Cloudflare’s 2023 Annual Founders’ Letter

Cloudflare is officially a teenager. We launched on September 27, 2010. Today we celebrate our thirteenth birthday. As is our tradition, we use the week of our birthday to launch products that we think of as our gift back to the Internet. More on some of the incredible announcements in a second, but we wanted to start by talking about something more fundamental: our identity.

Cloudflare’s 2023 Annual Founders’ Letter

Like many kids, it took us a while to fully understand who we are. We chafed at being put in boxes. People would describe Cloudflare as a security company, and we'd say, "That's not all we do." They'd say we were a network, and we'd object that we were so much more. Worst of all, they'd sometimes call us a "CDN," and we'd remind them that caching is a part of any sensibly designed system, but it shouldn't be a feature unto itself. Thank you very much.

And so, yesterday, the day before our thirteenth birthday, we announced to the world finally what we realized we are: a connectivity cloud.

The connectivity cloud

What does that mean? "Connectivity" means we measure ourselves by connecting people and things together. Our job isn't to be the final destination for your data, but to help it move and flow. Any application, any data, anyone, anywhere, anytime — that's the essence of connectivity, and that’s always been the promise of the Internet.

"Cloud" means the batteries are included. It scales with you. It’s programmable. Has consistent security built in. It’s intelligent and learns from your usage and others' and optimizes for outcomes better than you ever could on your own.

Cloudflare’s 2023 Annual Founders’ Letter

Our connectivity cloud is worth contrasting against some other clouds. The so-called hyperscale public clouds are, in many ways, the opposite. They optimize for hoarding your data. Locking it in. Making it difficult to move. They are captivity clouds. And, while they may be great for some things, their full potential will only truly be unlocked for customers when combined with a connectivity cloud that lets you mix and match the best of each of their features.

Enabling the future

That's what we're seeing from the hottest startups these days. Many of the leading AI companies are using Cloudflare's connectivity cloud to move their training data to wherever there's excess GPU capacity. We estimate that across the AI startup ecosystem, Cloudflare is the most commonly used cloud provider. Because, if you're building the future, you know connectivity and the agility of the cloud are key.

We've spent the last year listening to our AI customers and trying to understand what the future of AI will look like and how we can better help them build it. Today, we're releasing a series of products and features borne of those conversations and opening incredible new opportunities.

The biggest opportunity in AI is inference. Inference is what happens when you type a prompt to write a poem about your love of connectivity clouds into ChatGPT and, seconds later, get a coherent response. Or when you run a search for a picture of your passport on your phone, and it immediately pulls it up.

Cloudflare’s 2023 Annual Founders’ Letter

The models that power those modern miracles take significant time to generate — a process called training. Once trained though, they can have new data fed through them over and over to generate valuable new output.

Where inference happens

Before today, those models could run in two places. The first was the end user's device — like in the case of the search for “passport” in the photos on your phone. When that's possible it's great. It's fast. Your private data stays local. And it works even when there's no network access. But it's also challenging. Models are big and the storage on your phone or other local device is limited. Moreover, putting the fastest GPU resources to process these models in your phone makes the phone expensive and burns precious battery resources.

The alternative has been the centralized public cloud. This is what’s used for a big model like OpenAI’s GPT-4, which runs services like ChatGPT. But that has its own challenges. Today, nearly all the GPU resources for AI are deployed in the US — a fact that rightfully troubles the rest of the world. As AI queries get more personal, sending them all to some centralized cloud is a potential security and data locality disaster waiting to happen. Moreover, it's inherently slow and less efficient and therefore more costly than running the inference locally.

A third place for inference

Running on the device is too small. Running on the centralized public cloud is too far. It’s like the story of “Goldilocks and the Three Bears”: the right answer is somewhere in between. That's why today we're excited to be rolling out modern GPU resources across Cloudflare's global connectivity cloud. The third place for AI inference. Not too small. Not too far. The perfect step in between. By the end of the year, you'll be able to run AI models in more than 100 cities in 40+ countries where Cloudflare operates. By the end of 2024, we plan to have inference-tuned GPUs deployed in nearly every city that makes up Cloudflare's global network and within milliseconds of nearly every device connected to the Internet worldwide.

Cloudflare’s 2023 Annual Founders’ Letter

(A brief shout out for the Cloudflare team members who are, as of this moment, literally dragging suitcases full of NVIDIA GPU cards around the world and installing them in the servers that make up our network worldwide. It takes a lot of atoms to move all the bits that we do, and it takes intrepid people spanning the globe to update our network to facilitate these new capabilities.)

Running AI in a connectivity cloud like Cloudflare gives you the best of both worlds: nearly boundless resources running locally near any device connected to the Internet. And we've made it flexible to run whatever models a developer creates, easy to use without needing a dev ops team, and inexpensive to run where you only pay for when we're doing inference work for you.

To make this tangible, think about a Cloudflare customer like Garmin. They make devices that need to be smart but also affordable and have the longest possible battery life. As explorers rely on them literally to navigate out of harrowing conditions, tradeoffs aren't an option. That's why, when they heard about Cloudflare Workers AI, they immediately knew it was something they needed to try. Here's what Aaron Dearinger, Edge Architect at Garmin International said to us:

"Garmin is alongside our users for all their runs, workouts, and outdoor adventures, and we want to ensure that our watches and devices' power can last weeks and months instead of hours or days. We're excited for Cloudflare's innovations to ensure AI is performant, fast, close to users, and importantly privacy-first. This holds a lot of potential to us as we aim to bring more advanced fitness, wellness, and adventure insights to our users as they explore."

They are one of the limited set of customers we gave an early sneak peek to, all of whom immediately started running off ideas of what they could do next and clamoring to get more access. We feel like we’ve seen it and are here to report: the not-so-distant future is super cool.

The spirit of helping build a better Internet

Over the years we've announced several things on our birthday that have gone on to change the future of the Internet. On our first birthday, we announced an IPv6 gateway that has helped the Internet scale past its early protocol decisions. On our fourth birthday, we announced that we were making encryption free and doubled the size of the encrypted web in a day. On our seventh birthday, we launched the Workers platform that has revolutionized how modern applications are developed. On our eleventh birthday we announced R2 and further freed data from the captivity clouds.

Cloudflare’s 2023 Annual Founders’ Letter

This year, as we celebrate Cloudflare's thirteenth birthday, we set out to have as dramatic an impact on AI development as we have in other areas we've focused on in the past: providing tools that bring affordable, powerful AI inference to anyone, anywhere on Earth.

Closing with a poem

And so, to commemorate that, let's close with a poem about the connectivity cloud. This one was generated using a tiny bit of code and Meta's Llama 2 LLM model running on Cloudflare's Workers AI platform in a data center not far from where we're writing this…

Prompt: “Write a poem that talks about the connectivity cloud and contrast it with the captivity cloud”

The connectivity cloud, a wondrous sight to see,
Where data flows and knowledge is set free,
A place where minds meet and ideas take flight,
A world of endless possibility in sight.

But then there's the captivity cloud, a darker place,
Where thoughts are trapped and minds are in disgrace,
A prison of the mind, where freedom's lost,
A world of endless sorrow and cost.

So let us choose the connectivity cloud,
And let our minds be free and allowed,
To soar and glide and reach for the sky,
And never be held back, but always fly.

Cloudflare’s 2023 Annual Founders’ Letter
Cloudflare’s 2023 Annual Founders’ Letter

The best place on Region: Earth for inference

Post Syndicated from Rita Kozlov original http://blog.cloudflare.com/best-place-region-earth-inference/

The best place on Region: Earth for inference

The best place on Region: Earth for inference

Today, Cloudflare’s Workers platform is the place over a million developers come to build sophisticated full-stack applications that previously wouldn’t have been possible.

Of course, Workers didn’t start out that way. It started, on a day like today, as a Birthday Week announcement. It may not have had all the bells and whistles that exist today, but if you got to try Workers when it launched, it conjured this feeling: “this is different, and it’s going to change things”. All of a sudden, going from nothing to a fully scalable, global application took seconds, not hours, days, weeks or even months. It was the beginning of a different way to build applications.

If you’ve played with generative AI over the past few months, you may have had a similar feeling. Surveying a few friends and colleagues, our “aha” moments were all a bit different, but the overarching sentiment across the industry at this moment is unanimous — this is different, and it’s going to change things.

Today, we’re excited to make a series of announcements that we believe will make a similar impact as Workers did in the future of computing. Without burying the lede any further, here they are:

  • Workers AI (formerly known as Constellation), running on NVIDIA GPUs on Cloudflare’s global network, bringing the serverless model to AI — pay only for what you use, spend less time on infrastructure, and more on your application.
  • Vectorize, our vector Database, making it easy, fast and affordable to index and store vectors to support use cases that require access not just to running models, but customized data too.
  • AI Gateway, giving organizations the tools to cache, rate limit and observe their AI deployments regardless of where they’re running.

But that’s not all.

Doing big things is a team sport, and we don’t want to do it alone. Like in so much of what we do, we stand on the shoulders of giants. We’re thrilled to partner with some of the biggest players in the space: NVIDIA, Microsoft, Hugging Face, Databricks, and Meta.

Our announcements today mark just the beginning of Cloudflare’s journey into the AI space, like Workers did six years ago. While we encourage you to dive into each of our announcements (you won’t be disappointed!), we also wanted to take the chance to step back and provide you with a bit of our broader vision for AI, and how these announcements fit into it.

Inference: The future of AI workloads

There are two main processes involved in AI: training and inference.

Training a generative AI model is a long-running (sometimes months-long) compute intensive process, which results in a model. Training workloads are therefore best suited for running in traditional centralized cloud locations. Given the recent challenges in being able to obtain long-running access to GPUs, resulting in companies going multi-cloud, we’ve talked about the ways in which R2 can provide an essential service that eliminates egress fees for the training data to be accessed from any compute cloud. But that’s not what we’re here to talk about today.

While training requires many resources upfront, the much more ubiquitous AI-related compute task is inference. If you’ve recently asked ChatGPT a question, generated an image, translated some text, then you’ve performed an inference task. Since inference is required upon every single invocation (rather than just once), we expect that inference will become the dominant AI-related workload.

If training is best suited for a centralized cloud, then what is the best place for inference?

The network — “just right” for inference

The defining characteristic of inference is that there’s usually a user waiting on the other end of it. That is, it’s a latency sensitive task.

The best place, you might think, for a latency sensitive task is on the device. And it might be in some cases, but there are a few problems. First, hardware on devices is not nearly as powerful. Battery life.

On the other hand, you have centralized cloud compute. Unlike devices, the hardware running in centralized cloud locations has nothing if not horsepower. The problem, of course, is that it’s hundreds of milliseconds away from the user. And sometimes, they’re even across borders, which presents its own set of challenges.

So devices are not yet powerful enough, and centralized cloud is too far away. This makes the network the goldilocks of inference. Not too far, with sufficient compute power — just right.

The first inference cloud, running on Region Earth

One lesson we learned building our developer platform is that running applications at network scale not only helps optimize performance and scale (though obviously that’s a nice benefit!), but even more importantly, creates the right level of abstraction for developers to move fast.

Workers AI for serverless inference

Kicking things off with our announcement of Workers AI, we’re bringing the first truly serverless GPU cloud, to its perfect match — Region Earth. No machine learning expertise, no rummaging for GPUs. Just pick one of our provided models, and go.

We’ve put a lot of thought into designing Workers AI to make the experience of deploying a model as smooth as possible.

And if you’re deploying any models in the year 2023, chances are, one of them is an LLM.

Vectorize for… storing vectors!

To build an end-to-end AI-operated chat bot, you also need a way to present the user with a UI, parse the corpus of information you want to pass it (for example your product catalog), use the model to convert it into embeddings — and store them somewhere. Up until today, we offered the products you needed for the first two, but the latter — storing embeddings — requires a unique solution: a vector database.

Just as when we announced Workers, we soon after announced Workers KV — there’s little you can do with compute, without access to state. The same is true of AI — to build meaningful AI use cases, you need to give AI access to state. This is where a vector database comes into play, and why today we’re also excited to announce Vectorize, our own vector database.

AI Gateway for caching, rate limiting and visibility into your AI deployments

At Cloudflare, when we set out to improve something, the first step is always to measure it — if you can’t measure it, how can you improve it? When we heard about customers struggling to reign in AI deployment costs, we thought about how we would approach it — measure it, then improve it.

Our AI Gateway helps you do both!

Real-time observation capabilities empower proactive management, making it easier to monitor, debug, and fine-tune AI deployments. Leveraging it to cache, rate limit, and monitor AI deployments is essential for optimizing performance and managing costs effectively. By caching frequently used AI responses, it reduces latency and bolsters system reliability, while rate limiting ensures efficient resource allocation, mitigating the challenges of spiraling AI costs.

Collaborating with Meta to bring Llama 2 to our global network

Until recently, the only way to have access to an LLM was through calls to proprietary models. Training LLMs is a serious investment — in time, computing, and financial resources, and thus not something that’s accessible to most developers. Meta’s release of Llama 2, an open-source LLM, has presented an exciting shift, allowing developers to run and deploy their own LLMs. Except of course, one small detail — you still have to have access to a GPU to do so.

By making Llama 2 available as a part of the Workers AI catalog, we look forward to giving every developer access to an LLM — no configuration required.

Having a running model is, of course, just one component of an AI application.

Leveraging the ONNX runtime to make moving between cloud to edge to device seamless for developers

While the edge may be the optimal location for solving many of these problems, we do expect that applications will continue to be deployed at other locations along the spectrum of device, edge and centralized cloud.

The best place on Region: Earth for inference

Take for example, self-driving cars — when you’re making decisions where every millisecond matters, you need to make these decisions on the device. Inversely, if you’re looking to run hundred-billion parameter versions of models, the centralized cloud is going to be better suited for your workload.

The question then becomes: how do you navigate between these locations smoothly?

Since our initial release of Constellation (now called Workers AI), one technology we were particularly excited by was the ONNX runtime. The ONNX runtime creates a standardized environment for running models, which makes it possible to run various models across different locations.

We already talked about the edge as a great place for running inference itself, but it’s also great as a routing layer to help guide workloads smoothly across all three locations, based on the use case, and what you’re looking to optimize for — be it latency, accuracy, cost, compliance, or privacy.

Partnering with Hugging Face to provide optimized models at your fingertips

There’s nothing of course that can help developers go faster than meeting them where they are, so we are partnering with Hugging Face to bring serverless inference to available models, right where developers explore them.

Partnering with Databricks to make AI models

Together with Databricks, we will be bringing the power of MLflow to data scientists and engineers. MLflow is an open-source platform for managing the end-to-end machine learning lifecycle, and this partnership will make it easier for users to deploy and manage ML models at scale. With this partnership, developers building on Cloudflare Workers AI will be able to leverage MLFlow compatible models for easy deployment into Cloudflare’s global network. Developers can use MLflow to efficiently package, implement, deploy and track a model directly into Cloudflare’s serverless developer platform.

AI that doesn’t keep your CIO or CFO or General Counsel up at night

Things are moving quickly in AI, and it’s important to give developers the tools they need to get moving, but it’s hard to move fast when there are important considerations to worry about. What about compliance, costs, privacy?

Compliance-friendly AI

Much as most of us would prefer not to think about it, AI and data residency are becoming increasingly regulated by governments. With governments requiring that data be processed locally or that their residents’ data be stored in-country, businesses have to think about that in the context of where inference workloads run as well. While with regard to latency, the network edge provides the ability to go as wide as possible. When it comes to compliance, the power of a network that spans 300 cities, and an offering like our Data Localization Suite, we enable the granularity required to keep AI deployments local.

Budget-friendly AI

Talking to many of our friends and colleagues experimenting with AI, one sentiment seems to resonate — AI is expensive. It’s easy to let costs get away before even getting anything into production or realizing value from it. Our intent with our AI platform is to make costs affordable, but perhaps more importantly, only charge you for what you use. Whether you’re using Workers AI directly, or our AI gateway, we want to provide the visibility and tools necessary to prevent AI spend from running away from you.

Privacy-friendly AI

If you’re putting AI front and center of your customer experiences and business operations, you want to be reassured that any data that runs through it is in safe hands. As has always been the case with Cloudflare, we’re taking a privacy-first approach. We can assure our customers that   we will not use any customer data passing through Cloudflare for inference to train large language models.

No, but really — we’re just getting started

We're just getting started with AI, folks, and boy, are we in for a wild ride! As we continue to unlock the benefits of this technology, we can't help but feel a sense of awe and wonder at the endless possibilities that lie ahead. From revolutionizing healthcare to transforming the way we work, AI is poised to change the game in ways we never thought possible. So buckle up, folks, because the future of AI is looking brighter than ever – and we can't wait to see what's next!

This wrap up message may have been generated by AI, but the sentiment is genuine — this is just the beginning, and we can’t wait to see what you build.

Welcome to connectivity cloud: the modern way to connect and protect your clouds, networks, applications and users

Post Syndicated from Jen Taylor original http://blog.cloudflare.com/welcome-to-connectivity-cloud/

Welcome to connectivity cloud: the modern way to connect and protect your clouds, networks, applications and users

Welcome to connectivity cloud: the modern way to connect and protect your clouds, networks, applications and users

The best part of our job is the time we spend talking to Cloudflare customers. We always learn something new and interesting about their IT and security challenges.

In recent years, something about those conversations has changed. More and more, the biggest challenge customers tell us about isn’t something that’s easy to define. And it’s definitely not something you can address with an individual product or feature.

Rather, what we’re hearing from IT and security teams is that they are losing control of their digital environment.

This loss of control comes in a few flavors. They might express hesitance about adopting a new capability they know they need, because of compatibility concerns. Or maybe they’ll talk about how much time and effort it takes to make relatively simple changes, and how those changes take time away from more impactful work. If we had to sum the feeling up, it would be something like, “No matter how large my team or budget, it’s never enough to fully connect and protect the business.”

Does any of this feel familiar? If so, let us tell you that you are far from alone.

Welcome to connectivity cloud: the modern way to connect and protect your clouds, networks, applications and users

Reasons for loss of control

The rate of change in IT and security is accelerating, bringing with it dreaded complexity. IT and security teams are responsible for a wider variety of technological domains than they were in years past. Recent research from Forrester confirms these shifts: of teams responsible for securing in-office, remote, and hybrid workers, 52% only took that on in the past five years. Meanwhile, 46% gained responsibility for managing and securing public cloud applications in that time, and 53% were handed the thorny issue of regulatory compliance.

IT and security teams have been handed a monumental challenge: connect remote teams, on-premises teams and infrastructure, multiple cloud environments, SaaS apps, and more, so they function like a single, secure environment. But doing so is difficult for multiple reasons:

  • In most businesses, proprietary infrastructure, unique compliance needs, and semi-compatible processes and configurations make it hard to connect clouds, SaaS apps, web apps, and on-prem infrastructure. Those domains simply weren’t built to work together easily and securely.
  • Conway’s Law tells us that systems tend to match the communication structure of their organization. And, through no fault of their own, many IT and security teams are quite siloed.

The circumstances are often ripe for IT and security to get bogged down with workarounds and tangled interdependencies.

Luckily, we’ve found a way forward.

Welcome to connectivity cloud: the modern way to connect and protect your clouds, networks, applications and users

Welcome to the connectivity cloud

Frequently, an important part of customer conversations is being able to read between the lines. When customers speak about loss of control, they seem to be quietly wishing for something they think doesn’t exist. What they want is a connective tissue for everything IT and security are responsible for — something that reduces complexity by working with everything in the environment, being available everywhere, and performing whatever security, networking, and development functions are needed.

Further research confirmed our suspicions. Surveys of IT and security leaders indicate that 72% would highly value a secure “any-to-any” cloud platform. And they said they would invest an average of 16% of their entire IT and security budget in such a platform.

That got us to thinking — what exactly would that sort of cloud platform look like? How would it accomplish everything it had to?

We’ve got answers to share: a connectivity cloud.

A connectivity cloud is a new approach for delivering the many services companies need to secure and connect their digital environment. It’s a unified, intelligent, platform of programmable cloud-native services that enable any-to-any connectivity between all networks (enterprise and internet), cloud environments, applications and users. It includes a huge array of security, performance, and developer services — not with an eye to replace everything everywhere, but with the ability to fit in wherever needed and consolidate many critical services onto a single platform.

A connectivity cloud is built around four fundamental principles.

  1. Deep integration — Organizations rely on the Internet to connect various elements of their digital environment with their workers, partners and customers. A connectivity cloud is integrated natively with the Internet and with enterprise networks, offering secure, low-latency, infinitely scalable connectivity between every user, application, and infrastructure. It’s as fast and straightforward as the Internet at its best, without the risk or uncertainty.
  2. Programmability — Every enterprise digital environment has proprietary infrastructure, multiple clouds, unique compliance needs, and other highly specific tooling, processes, and configurations. A connectivity cloud’s architecture provides limitless interoperability and customizable networking, letting it adapt to those unique needs while still providing consistent user experiences and efficient management.
  3. Platform intelligence — Organizations need a wide variety of services to connect and secure everything in their digital environment. But integrating everything is onerous, and trying to manage it all causes inefficiency and security gaps. A well-architected connectivity cloud has a wide range of services built in at a foundational level, and analyzes extremely high volumes and varieties of traffic in order to automatically update intelligence models.
  4. Simplicity — Too many IT and security services means too many dashboards, leading to inefficiency, poor visibility, and alert fatigue. While 100% consolidation onto one platform isn’t the answer, a connectivity cloud greatly reduces tool sprawl and dashboard overload by managing much more of the IT environment from a single pane of glass.

With these qualities, the connectivity cloud lets you add new services to your digital environment without losing even more control — and also helps restore control to what you’ve already got.

Welcome to connectivity cloud: the modern way to connect and protect your clouds, networks, applications and users

Cloudflare’s connectivity cloud

We’ll admit we’re predisposed to find those four qualities particularly important. From our earliest days, we’ve built our services based on the principles of integration, programmability, platform intelligence, and simplicity. And now, our overall portfolio is comprehensive enough to help customers achieve these benefits when tackling a huge array of security and IT needs.

Because of this approach, we’re proud to say that Cloudflare is the world’s first connectivity cloud.

But don’t take our word for it. Here are a few examples of customers that have used Cloudflare to help resolve their own crises of control:

Conrad Electric: Secure access for a global distributed workforce

A connectivity cloud’s programmability, deep network integration, and built-in intelligence make it ideal for delivering secure access to corporate resources.

The electronics retailer Conrad Electronic told us, “Just keeping people online created a series of administrative bottlenecks.” Nearly half of their 2,300 employees need to access corporate applications remotely. Enabling that access was burdensome: they had to deploy and configure VPN clients for each user.

Conrad Electronic now uses Cloudflare’s connectivity cloud to provide secure remote access to hundreds of corporate applications. Their management burden is significantly lower, with their team telling us they now have much more time per month to devote to improving their web operations. What’s more, their security posture is stronger: “We can restrict specific individuals or secure sensitive areas with a mouse click. Not having to maintain 1,000 VPN profiles improves our security and saves us time and money.”

Carrefour: Deliver and manage trusted customer-facing applications

A connectivity cloud’s threat intelligence, network integration, and unified interface also make it excellent at closing securing gaps and enabling secure application delivery on a global scale.

The multinational retail and wholesaling company Carrefour has a thriving and rapidly growing ecommerce presence. However, when cyber attacks ramped up, simply growing their security stack didn’t help. As their security team told us, “The interlacing of multiple tools complicated coordination and control of the architecture…additionally, the lack of integration across tools made investigating and resolving security and performance issues a complex and time-consuming effort.”

As part of their broader security transformation, they adopted Cloudflare’s connectivity cloud to prevent web exploits, zero-day threats, and malicious bot attacks. Doing so allowed them to replace five security tools from different vendors. And since then, they’ve reduced their incident resolution time by 75%.

Canva: Build innovative applications

Finally, a connectivity cloud’s programmability and network integration help it power innovative development in almost any context.

The global design juggernaut Canva is one example. Previously, they used a variety of developer platforms to run custom code at the network edge. But they found those services too time-consuming to use, and ran into limitations that held their innovation back.

Cloudflare’s connectivity cloud has become “a critical part of our software” They use the connectivity cloud’s developer services to build and run custom code, optimize page delivery for SEO, and time-limit content access. Recently, they told us “Thanks to Cloudflare, we can focus on growing our product and expanding into new markets with confidence, knowing that our platform is fast, reliable, and secure.”

What’s more, their experience has led to them also adopting Cloudflare’s for secure access and application delivery — a hugely gratifying example of a connectivity cloud operating at full power.

Learn more about the connectivity cloud

Customers are the inspiration for our innovation, and our connectivity cloud vision is no exception. We live to make things easier, faster, more secure and more connected – and it’s amazing to see how the connectivity cloud helps reduce complexity and increase security.

You can learn more about the connectivity cloud here — but we hope that’s just the beginning. Reach out to all of us at Cloudflare to ask questions and make suggestions — I look forward to continuing the conversation discovering ways we can continue to help customers on their secure, connected journey.

Welcome to connectivity cloud: the modern way to connect and protect your clouds, networks, applications and users

Sippy helps you avoid egress fees while incrementally migrating data from S3 to R2

Post Syndicated from Phillip Jones original http://blog.cloudflare.com/sippy-incremental-migration-s3-r2/

Sippy helps you avoid egress fees while incrementally migrating data from S3 to R2

Sippy helps you avoid egress fees while incrementally migrating data from S3 to R2

Earlier in 2023, we announced Super Slurper, a data migration tool that makes it easy to copy large amounts of data to R2 from other cloud object storage providers. Since the announcement, developers have used Super Slurper to run thousands of successful migrations to R2!

While Super Slurper is perfect for cases where you want to move all of your data to R2 at once, there are scenarios where you may want to migrate your data incrementally over time. Maybe you want to avoid the one time upfront AWS data transfer bill? Or perhaps you have legacy data that may never be accessed, and you only want to migrate what’s required?

Today, we’re announcing the open beta of Sippy, an incremental migration service that copies data from S3 (other cloud providers coming soon!) to R2 as it’s requested, without paying unnecessary cloud egress fees typically associated with moving large amounts of data. On top of addressing vendor lock-in, Sippy makes stressful, time-consuming migrations a thing of the past. All you need to do is replace the S3 endpoint in your application or attach your domain to your new R2 bucket and data will start getting copied over.

How does it work?

Sippy is an incremental migration service built directly into your R2 bucket. Migration-specific egress fees are reduced by leveraging requests within the flow of your application where you’d already be paying egress fees to simultaneously copy objects to R2. Here is how it works:

When an object is requested from Workers, S3 API, or public bucket, it is served from your R2 bucket if it is found.

Sippy helps you avoid egress fees while incrementally migrating data from S3 to R2

If the object is not found in R2, it will simultaneously be returned from your S3 bucket and copied to R2.

Note: Some large objects may take multiple requests to copy.

Sippy helps you avoid egress fees while incrementally migrating data from S3 to R2

That means after objects are copied, subsequent requests will be served from R2, and you’ll begin saving on egress fees immediately.

Start incrementally migrating data from S3 to R2

Create an R2 bucket

To get started with incremental migration, you’ll first need to create an R2 bucket if you don’t already have one. To create a new R2 bucket from the Cloudflare dashboard:

  1. Log in to the Cloudflare dashboard and select R2.
  2. Select Create bucket.
  3. Give your bucket a name and select Create bucket.

​​To learn more about other ways to create R2 buckets refer to the documentation on creating buckets.

Enable Sippy on your R2 bucket

Next, you’ll enable Sippy for the R2 bucket you created. During the beta, you can do this by using the API. Here’s an example of how to enable Sippy for an R2 bucket with cURL:

curl -X PUT https://api.cloudflare.com/client/v4/accounts/{account_id}/r2/buckets/{bucket_name}/sippy \
--header "Authorization: Bearer <API_TOKEN>" \
--data '{"provider": "AWS", "bucket": "<AWS_BUCKET_NAME>", "zone": "<AWS_REGION>","key_id": "<AWS_ACCESS_KEY_ID>", "access_key":"<AWS_SECRET_ACCESS_KEY>", "r2_key_id": "<R2_ACCESS_KEY_ID>", "r2_access_key": "<R2_SECRET_ACCESS_KEY>"}'

For more information on getting started, please refer to the documentation. Once enabled, requests to your bucket will now start copying data over from S3 if it’s not already present in your R2 bucket.

Finish your migration with Super Slurper

You can run your incremental migration for as long as you want, but eventually you may want to complete the migration to R2. To do this, you can pair Sippy with Super Slurper to easily migrate your remaining data that hasn’t been accessed to R2.

What’s next?

We’re excited about open beta, but it’s only the starting point. Next, we plan on making incremental migration configurable from the Cloudflare dashboard, complete with analytics that show you the progress of your migration and how much you are saving by not paying egress fees for objects that have been copied over so far.

If you are looking to start incrementally migrating your data to R2 and have any questions or feedback on what we should build next, we encourage you to join our Discord community to share!

Sippy helps you avoid egress fees while incrementally migrating data from S3 to R2

Traffic anomalies and notifications with Cloudflare Radar

Post Syndicated from David Belson original http://blog.cloudflare.com/traffic-anomalies-notifications-radar/

Traffic anomalies and notifications with Cloudflare Radar

Traffic anomalies and notifications with Cloudflare Radar

We launched the Cloudflare Radar Outage Center (CROC) during Birthday Week 2022 as a way of keeping the community up to date on Internet disruptions, including outages and shutdowns, visible in Cloudflare’s traffic data. While some of the entries have their genesis in information from social media posts made by local telecommunications providers or civil society organizations, others are based on an internal traffic anomaly detection and alerting tool. Today, we’re adding this alerting feed to Cloudflare Radar, showing country and network-level traffic anomalies on the CROC as they are detected, as well as making the feed available via API.

Building on this new functionality, as well as the route leaks and route hijacks insights that we recently launched on Cloudflare Radar, we are also launching new Radar notification functionality, enabling you to subscribe to notifications about traffic anomalies, confirmed Internet outages, route leaks, or route hijacks. Using the Cloudflare dashboard’s existing notification functionality, users can set up notifications for one or more countries or autonomous systems, and receive notifications when a relevant event occurs. Notifications may be sent via e-mail or webhooks — the available delivery methods vary according to plan level.

Traffic anomalies

Internet traffic generally follows a fairly regular pattern, with daily peaks and troughs at roughly the same volumes of traffic. However, while weekend traffic patterns may look similar to weekday ones, their traffic volumes are generally different. Similarly, holidays or national events can also cause traffic patterns and volumes to differ significantly from “normal”, as people shift their activities and spend more time offline, or as people turn to online sources for information about, or coverage of, the event. These traffic shifts can be newsworthy, and we have covered some of them in past Cloudflare blog posts (King Charles III coronation, Easter/Passover/Ramadan, Brazilian presidential elections).

However, as you also know from reading our blog posts and following Cloudflare Radar on social media, it is the more drastic drops in traffic that are a cause for concern. Some are the result of infrastructure damage from severe weather or a natural disaster like an earthquake and are effectively unavoidable, but getting timely insights into the impact of these events on Internet connectivity is valuable from a communications perspective. Other traffic drops have occurred when an authoritarian government orders mobile Internet connectivity to be shut down, or shuts down all Internet connectivity nationwide. Timely insights into these types of anomalous traffic drops are often critical from a human rights perspective, as Internet shutdowns are often used as a means of controlling communication with the outside world.

Over the last several months, the Cloudflare Radar team has been using an internal tool to identify traffic anomalies and post alerts for followup to a dedicated chat space. The companion blog post Gone Offline: Detecting Internet Outages goes into deeper technical detail about our traffic analysis and anomaly detection methodologies that power this internal tool.

Many of these internal traffic anomaly alerts ultimately result in Outage Center entries and Cloudflare Radar social media posts. Today, we’re extending the Cloudflare Radar Outage Center and publishing information about these anomalies as we identify them. As shown in the figure below, the new Traffic anomalies table includes the type of anomaly (location or ASN), the entity where the anomaly was detected (country/region name or autonomous system), the start time, duration, verification status, and an “Actions” link, where the user can view the anomaly on the relevant entity traffic page or subscribe to a notification. (If manual review of a detected anomaly finds that it is present in multiple Cloudflare traffic datasets and/or is visible in third-party datasets, such as Georgia Tech’s IODA platform, we will mark it as verified. Unverified anomalies may be false positives, or related to Netflows collection issues, though we endeavor to minimize both.)

Traffic anomalies and notifications with Cloudflare Radar

In addition to this new table, we have updated the Cloudflare Radar Outage Center map to highlight where we have detected anomalies, as well as placing them into a broader temporal context in a new timeline immediately below the map. Anomalies are represented as orange circles on the map, and can be hidden with the toggle in the upper right corner. Double-bordered circles represent an aggregation across multiple countries, and zooming in to that area will ultimately show the number of anomalies associated with each country that was included in the aggregation. Hovering over a specific dot in the timeline displays information about the outage or anomaly with which it is associated.

Traffic anomalies and notifications with Cloudflare Radar

Internet outage information has been available via the Radar API since we launched the Outage Center and API in September 2022, and traffic anomalies are now available through a Radar API endpoint as well. An example traffic anomaly API request and response are shown below.

Example request:

curl --request GET \ --url https://api.cloudflare.com/client/v4/radar/traffic_anomalies \ --header 'Content-Type: application/json' \ --header 'X-Auth-Email: '

Example response:

{
  "result": {
    "trafficAnomalies": [
      {
        "asnDetails": {
          "asn": "189",
          "locations": {
            "code": "US",
            "name": "United States"
          },
          "name": "LUMEN-LEGACY-L3-PARTITION"
        },
        "endDate": "2023-08-03T23:15:00Z",
        "locationDetails": {
          "code": "US",
          "name": "United States"
        },
        "startDate": "2023-08-02T23:15:00Z",
        "status": "UNVERIFIED",
        "type": "LOCATION",
        "uuid": "55a57f33-8bc0-4984-b4df-fdaff72df39d",
        "visibleInDataSources": [
          "string"
        ]
      }
    ]
  },
  "success": true
}

Notifications overview

Timely knowledge about Internet “events”, such as drops in traffic or routing issues, are potentially of interest to multiple audiences. Customer service or help desk agents can use the information to help diagnose customer/user complaints about application performance or availability. Similarly, network administrators can use the information to better understand the state of the Internet outside their network. And civil society organizations can use the information to inform action plans aimed at maintaining communications and protecting human rights in areas of conflict or instability. With the new notifications functionality also being launched today, you can subscribe to be notified about observed traffic anomalies, confirmed Internet outages, route leaks, or route hijacks, at a country or autonomous system level. In the following sections, we discuss how to subscribe to and configure notifications, as well as the information contained within the various types of notifications.

Subscribing to notifications

Note that you need to log in to the Cloudflare dashboard to subscribe to and configure notifications. No purchase of Cloudflare services is necessary — just a verified email address is required to set up an account. While we would have preferred to not require a login, it enables us to take advantage of Cloudflare’s existing notifications engine, allowing us to avoid having to dedicate time and resources to building a separate one just for Radar. If you don’t already have a Cloudflare account, visit https://dash.cloudflare.com/sign-up to create one. Enter your username and a unique strong password, click “Sign Up”, and follow the instructions in the verification email to activate your account. (Once you’ve activated your account, we also suggest activating two-factor authentication (2FA) as an additional security measure.)

Once you have set up and activated your account, you are ready to start creating and configuring notifications. The first step is to look for the Notifications (bullhorn) icon – the presence of this icon means that notifications are available for that metric — in the Traffic, Routing, and Outage Center sections on Cloudflare Radar. If you are on a country or ASN-scoped traffic or routing page, the notification subscription will be scoped to that entity.

Traffic anomalies and notifications with Cloudflare Radar
Look for this icon in the Traffic, Routing, and Outage Center sections of Cloudflare Radar to start setting up notifications.
Traffic anomalies and notifications with Cloudflare Radar
In the Outage Center, click the icon in the “Actions” column of an Internet outages table entry to subscribe to notifications for the related location and/or ASN(s). Click the icon alongside the table description to subscribe to notifications for all confirmed Internet outages.
Traffic anomalies and notifications with Cloudflare Radar
In the Outage Center, click the icon in the “Actions” column of a Traffic anomalies table entry to subscribe to notifications for the related entity. Click the icon alongside the table description to subscribe to notifications for all traffic anomalies.
Traffic anomalies and notifications with Cloudflare Radar
On country or ASN traffic pages, click the icon alongside the description of the traffic trends graph to subscribe to notifications for traffic anomalies or Internet outages impacting the selected country or ASN.
Traffic anomalies and notifications with Cloudflare Radar
Traffic anomalies and notifications with Cloudflare Radar
On country or ASN routing pages, click the icon alongside the description to subscribe to notifications for route leaks or origin hijacks related to the selected country or ASN.
Traffic anomalies and notifications with Cloudflare Radar
Traffic anomalies and notifications with Cloudflare Radar
Within the Route Leaks or Origin Hijacks tables on the routing pages, click the icon in a table entry to subscribe to notifications for route leaks or origin hijacks for referenced countries and/or ASNs. 

After clicking a notification icon, you’ll be taken to the Cloudflare login screen. Enter your username and password (and 2FA code if required), and once logged in, you’ll see the Add Notification page, pre-filled with the key information passed through from the referring page on Radar, including relevant locations and/or ASNs. (If you are already logged in to Cloudflare, then you’ll be taken directly to the Add Notification page after clicking a notification icon on Radar.) On this page, you can name the notification, add an optional description, and adjust the location and ASN filters as necessary. Enter an email address for notifications to be sent to, or select an established webhook destination (if you have webhooks enabled on your account).

Traffic anomalies and notifications with Cloudflare Radar

Click “Save”, and the notification is added to the Notifications Overview page for the account.

Traffic anomalies and notifications with Cloudflare Radar

You can also create and configure notifications directly within Cloudflare, without starting from a link on Radar a Radar page. To do so, log in to Cloudflare, and choose “Notifications” from the left side navigation bar. That will take you to the Notifications page shown below. Click the “Add” button to add a new notification.

Traffic anomalies and notifications with Cloudflare Radar

On the next page, search for and select “Radar” from the list of Cloudflare products for which notifications are available.

Traffic anomalies and notifications with Cloudflare Radar

On the subsequent “Add Notification” page, you can create and configure a notification from scratch. Event types can be selected in the “Notify me for:” field, and both locations and ASNs can be searched for and selected within the respective “Filtered by (optional)” fields. Note that if no filters are selected, then notifications will be sent for all events of the selected type(s). Add one or more emails to send notifications to, or select a webhook target if available, and click “Save” to add it to the list of notifications configured for your account.

Traffic anomalies and notifications with Cloudflare Radar

It is worth mentioning that advanced users can also create and configure notifications through the Cloudflare API Notification policies endpoint, but we will not review that process within this blog post.

Notification messages

Example notification email messages are shown below for the various types of events. Each contains key information like the type of event, affected entities, and start time — additional relevant information is included depending on the event type. Each email includes both plaintext and HTML versions to accommodate multiple types of email clients. (Final production emails may vary slightly from those shown below.)

Traffic anomalies and notifications with Cloudflare Radar
Internet outage notification emails include information about the affected entities, a description of the cause of the outage, start time, scope (if available), and the type of outage (Nationwide, Network, Regional, or Platform), as well as a link to view the outage in a Radar traffic graph.
Traffic anomalies and notifications with Cloudflare Radar
Traffic anomaly notification emails simply include information about the affected entity and a start time, as well as a link to view the anomaly in a Radar traffic graph.
Traffic anomalies and notifications with Cloudflare Radar
BGP hijack notification emails include information about the hijacking and victim ASNs, affected IP address prefixes, the number of BGP messages (announcements) containing leaked routes, the number of peers announcing the hijack, detection timing, a confidence level on the event being a true hijack, and relevant tags, as well as a link to view details of the hijack event on Radar.
Traffic anomalies and notifications with Cloudflare Radar
BGP route leak notification emails include information about the AS that the leaked routes were learned from, the AS that leaked the routes, the AS that received and propagated the leaked routes, the number of affected prefixes, the number of affected origin ASes, the number of BGP route collector peers that saw the route leak, and detection timing, as well as a link to view details of the route leak event on Radar.

If you are sending notifications to webhooks, you can integrate those notifications into tools like Slack. For example, by following the directions in Slack’s API documentation, creating a simple integration took just a few minutes and results in messages like the one shown below.

Traffic anomalies and notifications with Cloudflare Radar

Conclusion

Cloudflare’s unique perspective on the Internet provides us with near-real-time insight into unexpected drops in traffic, as well as potentially problematic routing events. While we’ve been sharing these insights with you over the past year, you had to visit Cloudflare Radar to figure out if there were any new “events”. With the launch of notifications, we’ll now automatically send you information about the latest events that you are interested in.

We encourage you to visit Cloudflare Radar to familiarize yourself with the information we publish about traffic anomalies, confirmed Internet outages, BGP route leaks, and BGP origin hijacks. Look for the notification icon on the relevant graphs and tables on Radar, and go through the workflow to set up and subscribe to notifications. (And don’t forget to sign up for a Cloudflare account if you don’t have one already.) Please send us feedback about the notifications, as we are constantly working to improve them, and let us know how and where you’ve integrated Radar notifications into your own tools/workflows/organization.

Follow Cloudflare Radar on social media at @CloudflareRadar (Twitter), cloudflare.social/@radar (Mastodon), and radar.cloudflare.com (Bluesky).

Traffic anomalies and notifications with Cloudflare Radar

Amazon’s $2bn IPv4 tax — and how you can avoid paying it

Post Syndicated from Anie Jacob original http://blog.cloudflare.com/amazon-2bn-ipv4-tax-how-avoid-paying/

Amazon’s $2bn IPv4 tax — and how you can avoid paying it

Amazon’s $2bn IPv4 tax — and how you can avoid paying it

One of the wonderful things about the Internet is that, whether as a consumer or producer, the cost has continued to come down. Back in the day, it used to be that you needed a server room, a whole host of hardware, and an army of folks to help keep everything up and running. The cloud changed that, but even with that shift, services like SSL or unmetered DDoS protection were out of reach for many. We think that the march towards a more accessible Internet — both through ease of use, and reduced cost — is a wonderful thing, and we’re proud to have played a part in making it happen.

Every now and then, however, the march of progress gets interrupted.

On July 28, 2023, Amazon Web Services (AWS) announced that they would begin to charge “per IP per hour for all public IPv4 addresses, whether attached to a service or not”, starting February 1, 2024. This change will add at least \$43 extra per year for every IPv4 address Amazon customers use; this may not sound like much, but we’ve seen back of the napkin analysis that suggests this will result in an approximately \$2bn tax on the Internet.

In this blog, we’ll explain a little bit more about the technology involved, but most importantly, give you a step-by-step walkthrough of how Cloudflare can help you not only eliminate the need to pay Amazon for something that they shouldn’t be charging you for in the first place, but also if you’re a Pro or Business subscriber, we want to put \$43 in your pocket instead of taking it out. Don’t give Amazon \$43 for IPv4, let us give you \$43 and throw in IPv4 as well.

How can Cloudflare help?

The only way to avoid Amazon’s IPv4 tax is to transition to IPv6 with AWS. But we recognize that not everyone is ready to make that shift — it can be an expensive and challenging process, and may present problems with hardware compatibility and network performance. We cover the finer details of these challenges below, so keep reading! Cloudflare can help ease this transition: let us deal with communicating to AWS using IPv6. Not only that, you’ll get all the rest of the benefits of using Cloudflare and our global network — including all the performance and security that Cloudflare is known for — and a \$43 dollar credit for using us!

IPv6 services like these are something we’ve been offering at Cloudflare for years – in fact this was first announced during Cloudflare's first birthday week in 2011! We’ve made this process simple to enable as well, so you can set this up as soon as today.

Amazon’s $2bn IPv4 tax — and how you can avoid paying it

To set this feature up you will need to both enable IPv6 Compatibility and set up your origin for AWS to be an IPv6 origin.

To configure this feature simply follow these steps:

1. Login to your Cloudflare account.

2. Select the appropriate domain

3. Click the Network app.

Amazon’s $2bn IPv4 tax — and how you can avoid paying it

4. Make sure IPv6 Compatibility is toggled on.

Amazon’s $2bn IPv4 tax — and how you can avoid paying it

To get an IPv6 origin from Amazon you will likely have to follow these steps:

  1. Associate an IPv6 CIDR block with your VPC and subnets
  2. Update your route tables
  3. Update your security group rules
  4. Change your instance type
  5. Assign IPv6 addresses to your instances
  6. (Optional) Configure IPv6 on your instances

(For more information about this migration, check out this link.)

Once you have your IPv6 origins, you’ll want to update your origins on Cloudflare to use the IPv6 addresses. In the simple example of a single origin at root, this is done by creating a proxied (orange-cloud) AAAA record in your Cloudflare DNS editor:

Amazon’s $2bn IPv4 tax — and how you can avoid paying it

If you are using Load Balancers, you will want to update the origin(s) there.

Once that’s done, you can remove the A/IPv4 record(s) and traffic will move over to the v6 address. While this process is easy now, we’re working on how we can make moving to IPv6 on Cloudflare even easier.

Once you have these features configured and have traffic running through Cloudflare to your origin for at least 6 months, you will be eligible to have a $43 credit deposited right into your Cloudflare account! You can use this credit for your Pro or Biz subscription or even for Workers and R2 usage. See here for more information on how to opt in to this offer.

Through this feature Cloudflare provides the flexibility to manage your IPv6 settings as per your requirements. By leveraging Cloudflare's robust IPv6 support, you can ensure seamless connectivity for your users, while avoiding additional costs associated with public IPv4 addresses.

What’s wrong with IPv4?

So if Cloudflare has this solution, why should you even move to IPv6? Well to clearly explain this let's start with the problem with IPv4.

IP addresses are used to identify and reach resources on a network, which could be a private network, like your office's private network, or a complex public network like the Internet. An example of an IPv4 address would be 198.51.100.1 or 198.51.100.50. And there are approximately 4.3 billion unique IPv4 addresses like these for websites, servers, and other destinations on the Internet to use for routing.

4.3 billion IPv4 addresses may sound like a lot, but it’s not as IPv4 space is running out. In September 2015 ARIN, one of the regional Internet registries that allows people to acquire IP addresses, announced that they had no available space: if you want to buy an IPv4 address you have to go and talk to private companies who are selling them. These companies charge a pretty penny for their IPv4 addresses. It costs about $40 per IPv4 address today. To buy a grouping of IPv4 addresses, also known as a prefix of which the minimum required size is 256 IP addresses, costs about \$10,000.

IP addresses are necessary for having a domain or device on the Internet, but today IPv4 addresses are an increasingly more complicated resource to acquire. Therefore, to facilitate the growth of the Internet there needed to be more unique addresses made available without breaking the bank. That’s where IPv6 comes in.

IPv4 vs. IPv6

In 1995 the IETF (Internet Engineering Task Force) published the RFC for IPv6, which proposed to solve this problem of the limited IPv4 space. Instead of 32 bits of addressable space, IPv6 expanded to 128 bits of addressable space. This means that instead of 4.3 billion addresses available, there are approximately 340 undecillion IPv6 addresses available. This is roughly equivalent to the number of grains of sand on Earth.

So this problem is solved, why should you care?  The answer is because many networks on the Internet still prefer IPv4, and companies like AWS are starting to charge money for IPv4 usage.

Let's speak on AWS first: AWS today owns one of the largest chunks of the IPv4 space. During a period of time when IPv4 addresses were on the private market to purchase for dollars per IP address, AWS chose to use its large capital to its advantage and buy up a large amount of the space. Today AWS owns 1.7% of the IPv4 address space which equates to ~100 million IPv4 addresses.

Amazon’s $2bn IPv4 tax — and how you can avoid paying it

So you would think that moving to IPv6 is the right move, however, for the Internet community it’s proven to be quite a challenge.

When IPv6 was published in the 90s very few networks had devices that supported IPv6. However, today in 2023, that is not the case: global networks supporting IPv6 has increased to 46 percent, so the hardware limitations around supporting it are decreasing. Additionally, anti-abuse and security tools initially had no idea how to deal with attacks or traffic that used IPv6 address space, and this still remains an issue for some of these tools. In 2014, we made it even easier for origin tools to convert by creating pseudo IPv4 to help bridge the gap to those tools.

Despite all of this, many networks don’t have good support infrastructure for IPv6 networking since most networks were built on IPv4. At Cloudflare, we have built our network to support both protocols, known as “dual-stack”.

For a while there were also many networks which had markedly worse performance for IPv6 than IPv4. This is not true anymore, as of today we see only a slight degradation in IPv6 performance across the whole Internet compared to IPv4. The reasons for this include things like legacy hardware, sub-optimal IPv6 connectivity outside our network and high cost for deploying IPv6. You can see in the chart below the additional latency of IPv6 traffic on Cloudflare’s network as compared to IPv4 traffic:

Amazon’s $2bn IPv4 tax — and how you can avoid paying it

There were many challenges to adopting IPv6, and for some these issues with hardware compatibility and network performance are still worries. This is why still using IPv4 can be useful to folks while transitioning to IPv6, which is what makes AWS’ decision to charge for IPv4 impactful on many websites.

So, don’t pay the AWS tax

At the end of the day the choice is clear: you could pay Amazon more to rent their IPs than to buy them, or move to Cloudflare and use our free service to help with the transition to IPv6 with little overhead.

Image optimization made simpler and more predictable: we’re merging Cloudflare Images and Image Resizing

Post Syndicated from Deanna Lam original http://blog.cloudflare.com/merging-images-and-image-resizing/

Image optimization made simpler and more predictable: we’re merging Cloudflare Images and Image Resizing

Image optimization made simpler and more predictable: we’re merging Cloudflare Images and Image Resizing

Starting November 15, 2023, we’re merging Cloudflare Images and Image Resizing.

All Image Resizing features will be available as part of the Cloudflare Images product. To let you calculate your monthly costs more accurately and reliably, we’re changing how we bill to resize images that aren’t stored at Cloudflare. Our new pricing model will cost $0.50 per 1,000 unique transformations.

For existing Image Resizing customers, you can continue to use the legacy version of Image Resizing. When the merge is live, then you can opt into the new pricing model for more predictable pricing.

In this post, we'll cover why we came to this decision, what's changing, and how these changes might impact you.

Simplifying our products

When you build an application with images, you need to think about three separate operations: storage, optimization, and delivery.

In 2019, we launched Image Resizing, which can optimize and transform any publicly available image on the Internet based on a set of parameters. This enables our customers to deliver variants of a single image for each use case without creating and storing additional copies.

For example, an e-commerce platform for furniture retailers might use the same image of a lamp on both the individual product page and the gallery page for all lamps. They can use Image Resizing to optimize the image in its original aspect for a slider view, or manipulate and crop the image for a thumbnail view.

Image optimization made simpler and more predictable: we’re merging Cloudflare Images and Image Resizing

Two years later, we released Images to let developers build an end-to-end image management pipeline. Developers no longer need to use different vendors to handle storage, optimization, and delivery. With Images, customers can store and deliver their images from a single bucket at Cloudflare to streamline their workflow and eliminate egress fees.

Both products have overlapping features to optimize and manipulate images, which can be confusing for customers. Over the years, we've received numerous questions about which product is optimal for which use cases.

To simplify our products, we're merging Cloudflare Images and Image Resizing to let customers store, optimize, and deliver their images all from one product. Customers can continue to optimize their images without using Cloudflare for storage or purchase storage to manage their entire image pipeline through Cloudflare.

Transparent and predictable pricing

Pricing can cause headaches for Image Resizing customers.

We often hear from customers seeking guidance for calculating how much Image Resizing will cost each month. Today, you are billed for Image Resizing by the number of uncached requests to transform an image. However, caching behavior is often unpredictable, and you can't guarantee how long a given image stays cached. This means that you can't reliably predict their costs.

If you make 1M total requests to Image Resizing each month, then you won't know whether you'll be billed for 10K or 100K of these requests because our pricing model relies on cache. Since assets can be ejected from cache for a variety of reasons, bills for Image Resizing are unpredictable month over month. In some cases, the monthly bills are inconsistent even when traffic remains constant. In other cases, the monthly bill is much higher than our customers expected.

With the new Cloudflare Images, you will be billed only once per 30 days for each unique request to transform an image stored outside of Cloudflare, whether the transformation is cached. Customers will be billed $0.50 per 1,000 unique transformations per month.

In other words, if you resize one image to 100×100, then our new pricing model guarantees that you will be billed only once per month, whether there are 10K or 100K uncached requests to deliver the image at this size. If you resize 200 images to 100×100, then you will be billed for only 200 unique transformations — one for each image at this size — each month.

This change aligns more closely with how our customers think about their usage, as well as ensures that our customers can accurately estimate their costs with confidence. You won't need to consider how your cache hit ratio will affect your bill. To estimate your costs, you'll need to know only the number of unique images and the number of different ways that you need to transform those images each month.

Resize without storage with Cloudflare Images

For developers who only want to resize and optimize images, Cloudflare Images now offers a zero-storage plan. This new plan enables you to transform images while keeping your existing storage and delivery solution unchanged (just like the current Image Resizing product does).

Image optimization made simpler and more predictable: we’re merging Cloudflare Images and Image Resizing

If you want to store your images with Cloudflare Images, then you can always upgrade your plan to purchase storage at any time.

Image Resizing is currently available only for accounts with a Pro or higher plan. The merged Cloudflare Images product will be available for all customers, with pricing plans that are tailored to meet specific use cases.

Existing customers can opt into new pricing

The new version of Cloudflare Images is available on November 15, 2023.

If you currently use Image Resizing, you will have the option to migrate to the new Cloudflare Images at no cost, or continue using Image Resizing.

The functionality and usability of the product will remain the same. You will still manage stored images under the Cloudflare Images tab and can enable transformations from the Speed tab.

As we execute, we'll continue to make improvements in the Dashboard to bring a more centralized and unified experience for Cloudflare Images.

You can learn more about our current image optimization capabilities in the Developer Docs. If you have feedback or thoughts, we'd love to hear from you on the Cloudflare Developers Discord.

Gone offline: how Cloudflare Radar detects Internet outages

Post Syndicated from Carlos Azevedo original http://blog.cloudflare.com/detecting-internet-outages/

Gone offline: how Cloudflare Radar detects Internet outages

Gone offline: how Cloudflare Radar detects Internet outages

Currently, Cloudflare Radar curates a list of observed Internet disruptions (which may include partial or complete outages) in the Outage Center. These disruptions are recorded whenever we have sufficient context to correlate with an observed drop in traffic, found by checking status updates and related communications from ISPs, or finding news reports related to cable cuts, government orders, power outages, or natural disasters.

However, we observe more disruptions than we currently report in the outage center because there are cases where we can’t find any source of information that provides a likely cause for what we are observing, although we are still able to validate with external data sources such as Georgia Tech’s IODA. This curation process involves manual work, and is supported by internal tooling that allows us to analyze traffic volumes and detect anomalies automatically, triggering the workflow to find an associated root cause. While the Cloudflare Radar Outage Center is a valuable resource, one of key shortcomings include that we are not reporting all disruptions, and that the current curation process is not as timely as we’d like, because we still need to find the context.

As we announced today in a related blog post, Cloudflare Radar will be publishing anomalous traffic events for countries and Autonomous Systems (ASes). These events are the same ones referenced above that have been triggering our internal workflow to validate and confirm disruptions. (Note that at this time “anomalous traffic events” are associated with drops in traffic, not unexpected traffic spikes.) In addition to adding traffic anomaly information to the Outage Center, we are also launching the ability for users to subscribe to notifications at a location (country) or network (autonomous system) level whenever a new anomaly event is detected, or a new entry is added to the outage table. Please refer to the related blog post for more details on how to subscribe.

Gone offline: how Cloudflare Radar detects Internet outages

The current status of each detected anomaly will be shown in the new “Traffic anomalies” table on the Outage Center page:

  • When the anomaly is automatically detected its status will initially be Unverified
  • After attempting to validate ‘Unverified’ entries:
    • We will change the status to ‘Verified’ if we can confirm that the anomaly appears across multiple internal data sources, and possibly external ones as well. If we find associated context for it, we will also create an outage entry.
    • We will change status to ‘False Positive’ if we cannot confirm it across multiple data sources. This will remove it from the “Traffic anomalies” table. (If a notification has been sent, but the anomaly isn’t shown in Radar anymore, it means we flagged it as ‘False Positive’.)
  • We might also manually add an entry with a “Verified” status. This might occur if we observe, and validate, a drop in traffic that is noticeable, but was not large enough for the algorithm to catch it.

A glimpse at what Internet traffic volume looks like

At Cloudflare, we have several internal data sources that can give us insights into what the traffic for a specific entity looks like. We identify the entity based on IP address geolocation in the case of locations, and IP address allocation in the case of ASes, and can analyze traffic from different sources, such as DNS, HTTP, NetFlows, and Network Error Logs (NEL). All the signals used in the figures below come from one of these data sources and in this blog post we will treat this as a univariate time-series problem — in the current algorithm, we use more than one signal just to add redundancy and identify anomalies with a higher level of confidence. In the discussion below, we intentionally select various examples to encompass a broad spectrum of potential Internet traffic volume scenarios.

1. Ideally, the signals would resemble the pattern depicted below for Australia (AU): a stable weekly pattern with a slightly positive trend meaning that the trend average is moving up over time (we see more traffic over time from users in Australia).

Gone offline: how Cloudflare Radar detects Internet outages

These statements can be clearly seen when we perform time-series decomposition which allows us to break down a time-series into its constituent parts to better understand and analyze its underlying patterns. Decomposing the traffic volume for Australia above assuming a weekly pattern with Seasonal-Trend decomposition using LOESS (STL) we get the following:

Gone offline: how Cloudflare Radar detects Internet outages

The weekly pattern we are referring to is represented by the seasonal part of the signal that is expected to be observed due to the fact that we are interested in eyeball / human internet traffic. As observed in the image above, the trend component is expected to move slowly when compared with the signal level and the residual part ideally would resemble white noise meaning that all existing patterns in the signal are represented by the seasonal and trend components.

2. Below we have the traffic volume for AS15964 (CAMNET-AS) that appears to have more of a daily pattern, as opposed to weekly.

We also observe that there’s a value offset of the signal right after the first four days (blue dashed-line) and the red background shows us an outage for which we didn’t find any reporting besides seeing it in our data and other Internet data providers — our intention here is to develop an algorithm that will trigger an event when it comes across this or similar patterns.

Gone offline: how Cloudflare Radar detects Internet outages

3. Here we have a similar example for French Guiana (GF). We observe some data offsets (August 9 and 23), a change in the amplitude (between August 15 and 23) and another outage for which we do have context that is observable in Cloudflare Radar.

Gone offline: how Cloudflare Radar detects Internet outages

4. Another scenario is several scheduled outages for AS203214 (HulumTele), for which we also have context. These anomalies are the easiest to detect since the traffic goes to values that are unique to outages (cannot be mistaken as regular traffic), but it poses another challenge: if our plan was to just check the weekly patterns, since these government-directed outages happen with the same frequency, at some point the algorithm would see this as expected traffic.

Gone offline: how Cloudflare Radar detects Internet outages

5. This outage in Kenya could be seen as similar to the above: the traffic volume went down to unseen values although not as significantly. We also observe some upward spikes in the data that are not following any specific pattern — possibly outliers — that we should clean depending on the approach we use to model the time-series.

Gone offline: how Cloudflare Radar detects Internet outages

6. Lastly, here's the data that will be used throughout this post as an example of how we are approaching this problem. For Madagascar (MG), we observe a clear pattern with pronounced weekends (blue background). There’s also a holiday (Assumption of Mary), highlighted with a green background, and an outage, with a red background. In this example, weekends, holidays, and outages all seem to have roughly the same traffic volume. Fortunately, the outage gives itself away by showing that it intended to go up as in a normal working day, but then there was a sudden drop — we will see it more closely later in this post.

Gone offline: how Cloudflare Radar detects Internet outages

In summary, here we looked over six examples out of ~700 (the number of entities we are automatically detecting anomalies for currently) and we see a wide range of variability. This means that in order to effectively model the time-series we would have to run a lot of preprocessing steps before the modeling itself. These steps include removing outliers, detecting short and long-term data offsets and readjusting, and detecting changes in variance, mean, or magnitude. Time is also a factor in preprocessing, as we would also need to know in advance when to expect events / holidays that will push the traffic down, apply daylight saving time adjustments that will cause a time shift in the data, and be able to apply local time zones for each entity, including dealing with locations that have multiple time zones and AS traffic that is shared across different time zones.

To add to the challenge, some of these steps cannot even be performed in a close-to-real-time fashion (example: we can only say there’s a change in seasonality after some time of observing the new pattern). Considering the challenges mentioned earlier, we have chosen an algorithm that combines basic preprocessing and statistics. This approach aligns with our expectations for the data's characteristics, offers ease of interpretation, allows us to control the false positive rate, and ensures fast execution while reducing the need for many of the preprocessing steps discussed previously.

Above, we noted that we are detecting anomalies for around 700 entities (locations and autonomous systems) at launch. This obviously does not represent the entire universe of countries and networks, and for good reason. As we discuss in this post, we need to see enough traffic from a given entity (have a strong enough signal) to be able to build relevant models and subsequently detect anomalies. For some smaller or sparsely populated countries, the traffic signal simply isn’t strong enough, and for many autonomous systems, we see little-to-no traffic from them, again resulting in a signal too weak to be useful. We are initially focusing on locations where we have a sufficiently strong traffic signal and/or are likely to experience traffic anomalies, as well as major or notable autonomous systems — those that represent a meaningful percentage of a location’s population and/or those that are known to have been impacted by traffic anomalies in the past.

Detecting anomalies

The approach we took to solve this problem involves creating a forecast that is a set of data points that correspond to our expectation according to what we’ve seen in historical data. This will be explained in the section Creating a forecast. We take this forecast and compare it to what we are actually observing — if what we are observing is significantly different from what we expect, then we call it an anomaly. Here, since we are interested in traffic drops, an anomaly will always correspond to lower traffic than the forecast / expected traffic. This comparison is elaborated in the section Comparing forecast with actual traffic.

In order to compute the forecast we need to fulfill the following business requirements:

  • We are mainly interested in traffic related to human activity.
  • The more timely we detect the anomaly, the more useful it is. This needs to take into account constraints such as data ingestion and data processing times, but once the data is available, we should be able to use the latest data point and detect if it is an anomaly.
  • A low False Positive (FP) rate is more important than a high True Positive (TP) rate. As an internal tool, this is not necessarily true, but as a publicly visible notification service, we want to limit spurious entries at the cost of not reporting some anomalies.

Selecting which entities to observe

Aside from the examples given above, the quality of the data highly depends on the volume of the data, and this means that we have different levels of data quality depending on which entity (location / AS) we are considering. As an extreme example, we don’t have enough data from Antarctica to reliably detect outages. Follows the process we used to select which entities are eligible to be observed.

For ASes, since we are mainly interested in Internet traffic that represents human activity, we use the number of users estimation provided by APNIC. We then compute the total number of users per location by summing up the number of users of each AS in that location, and then we calculate what percentage of users an AS has for that location (this number is also provided by the APNIC table in column ‘% of country’). We filter out ASes that have less than 1% of the users in that location. Here’s what the list looks like for Portugal — AS15525 (MEO-EMPRESAS) is excluded because it has less than 1% of users of the total number of Internet users in Portugal (estimated).

Gone offline: how Cloudflare Radar detects Internet outages

At this point we have a subset of ASes and a set of locations (we don’t exclude any location a priori because we want to cover as much as possible) but we will have to narrow it down based on the quality of the data to be able to reliably detect anomalies automatically. After testing several metrics and visually analyzing the results, we came to the conclusion that the best predictor of a stable signal is related to the volume of data, so we removed the entities that don’t satisfy the criteria of a minimum number of unique IPs daily in a two weeks period — the threshold is based on visual inspection.

Creating a forecast

In order to detect the anomalies in a timely manner, we decided to go with traffic aggregated every fifteen minutes, and we are forecasting one hour of data (four data points / blocks of fifteen minutes) that are compared with the actual data.

After selecting the entities for which we will detect anomalies, the approach is quite simple:

1. We look at the last 24 hours immediately before the forecast window and use that interval as the reference. The assumption is that the last 24 hours will contain information about the shape of what follows. In the figure below, the last 24 hours (in blue) corresponds to data transitioning from Friday to Saturday. By using the Euclidean distance, we get the six most similar matches to that reference (orange) — four of those six matches correspond to other transitions from Friday to Saturday. It also captures the holiday on Monday (August 14, 2023) to Tuesday, and we also see a match that is the most dissimilar to the reference, a regular working day from Wednesday to Thursday. Capturing one that doesn't represent the reference properly should not be a problem because the forecast is the median of the most similar 24 hours to the reference, and thus the data of that day ends up being discarded.

Gone offline: how Cloudflare Radar detects Internet outages

  1. There are two important parameters that we are using for this approach to work:
    • We take into consideration the last 28 days (plus the reference day equals 29). This way we ensure that the weekly seasonality can be seen at least 4 times, we control the risk associated with the trend changing over time, and we set an upper bound to the amount of data we need to process. Looking at the example above, the first day was one with the highest similarity to the reference because it corresponds to the transition from Friday to Saturday.
    • The other parameter is the number of most similar days. We are using six days as a result of empirical knowledge: given the weekly seasonality, when using six days, we expect at most to match four days for the same weekday and then two more that might be completely different. Since we use the median to create the forecast, the majority is still four and thus those extra days end up not being used as reference. Another scenario is in the case of holidays such as the example below:

Gone offline: how Cloudflare Radar detects Internet outages

A holiday in the middle of the week in this case looks like a transition from Friday to Saturday. Since we are using the last 28 days and the holiday starts on a Tuesday we only see three such transitions that are matching (orange) and then another three regular working days because that pattern is not found anywhere else in the time-series and those are the closest matches. This is why we use the lower quartile when computing the median for an even number of values (meaning we round the data down to the lower values) and use the result as the forecast. This also allows us to be more conservative and plays a role in the true positive/false positive tradeoff.

Lastly let's look at the outage example:

Gone offline: how Cloudflare Radar detects Internet outages

In this case, the matches are always connected to low traffic because the last 24h (reference) corresponds to a transition from Sunday to Monday and due to the low traffic the lowest Euclidean distance (most similar 24h) are either Saturdays (two times) or Sundays (four times). So the forecast is what we would expect to see on a regular Monday and that’s why the forecast (red) has an upward trend but since we had an outage, the actual volume of traffic (black) is considerably lower than the forecast.

This approach works for regular seasonal patterns, as would several other modeling approaches, and it has also been shown to work in case of holidays and other moving events (such as festivities that don’t happen at the same day every year) without having to actively add that information in. Nevertheless, there are still use cases where it will fail specifically when there’s an offset in the data. This is one of the reasons why we use multiple data sources to reduce the chances of the algorithm being affected by data artifacts.

Below we have an example of how the algorithm behaves over time.

Comparing forecast with actual traffic

Once we have the forecast and the actual traffic volume, we do the following steps.

We calculate relative change, which measures how much one value has changed relative to another. Since we are detecting anomalies based on traffic drops, the actual traffic will always be lower than the forecast.

Gone offline: how Cloudflare Radar detects Internet outages

After calculating this metric, we apply the following rules:

  • The difference between the actual and the forecast must be at least 10% of the magnitude of the signal. This magnitude is computed using the difference between 95th and 5th percentiles of the selected data. The idea is to avoid scenarios where the traffic is low, particularly during the off-peaks of the day and scenarios where small changes in actual traffic correspond to big changes in relative change because the forecast is also low. As an example:
    • a forecast of 100 Gbps compared with an actual value of 80 Gbps gives us a relative change of -0.20 (-20%).
    • a forecast of 20 Mbps compared with an actual value of 10 Mbps gives us a much smaller decrease in total volume than the previous example but a relative change of -0.50 (50%).
  • Then we have two rules for detecting considerably low traffic:
    • Sustained anomaly: The relative change is below a given threshold α throughout the forecast window (for all four data points). This allows us to detect weaker anomalies (with smaller relative changes) that are extended over time.

Gone offline: how Cloudflare Radar detects Internet outages
  • Point anomaly: The relative change of the last data point of the forecast window is below a given threshold β (where β < α — these thresholds are negative; as an example, β and α might be -0.6 and -0.4, respectively). In this case we need β < α to avoid triggering anomalies due to the stochastic nature of the data but still be able to detect sudden and short-lived traffic drops.
Gone offline: how Cloudflare Radar detects Internet outages
  • The values of α and β were chosen empirically to maximize detection rate, while keeping the false positive rate at an acceptable level.

Closing an anomaly event

Although the most important message that we want to convey is when an anomaly starts, it is also crucial to detect when the Internet traffic volume goes back to normal for two main reasons:

  • We need to have the notion of active anomaly, which means that we detected an anomaly and that same anomaly is still ongoing. This allows us to stop considering new data for the reference while the anomaly is still active. Considering that data would impact the reference and the selection of most similar sets of 24 hours.
  • Once the traffic goes back to normal, knowing the duration of the anomaly allows us to flag those data points as outliers and replace them, so we don’t end up using it as reference or as best matches to the reference. Although we are using the median to compute the forecast, and in most cases that would be enough to overcome the presence of anomalous data, there are scenarios such as the one for AS203214 (HulumTele), used as example four, where the outages are frequently occurring at the same time of the day that would make the anomalous data become the expectation after few days.

Whenever we detect an anomaly we keep the same reference until the data comes back to normal, otherwise our reference would start including anomalous data. To determine when the traffic is back to normal, we use lower thresholds than α and we give it a time period (currently four hours) where there should be no anomalies in order for it to close. This is to avoid situations where we observe drops in traffic that bounce back to normal and drop again. In such cases we want to detect a single anomaly and aggregate it to avoid sending multiple notifications, and in terms of semantics there’s a high chance that it’s related to the same anomaly.

Conclusion

Internet traffic data is generally predictable, which in theory would allow us to build a very straightforward anomaly detection algorithm to detect Internet disruptions. However, due to the heterogeneity of the time series depending on the entity we are observing (Location or AS) and the presence of artifacts in the data, it also needs a lot of context that poses some challenges if we want to track it in real-time. Here we’ve shown particular examples of what makes this problem challenging, and we have explained how we approached this problem in order to overcome most of the hurdles. This approach has been shown to be very effective at detecting traffic anomalies while keeping a low false positive rate, which is one of our priorities. Since it is a static threshold approach, one of the downsides is that we are not detecting anomalies that are not as steep as the ones we’ve shown.

We will keep working on adding more entities and refining the algorithm to be able to cover a broader range of anomalies.

Visit Cloudflare Radar for additional insights around (Internet disruptions, routing issues, Internet traffic trends, attacks, Internet quality, etc.). Follow us on social media at @CloudflareRadar (Twitter), cloudflare.social/@radar (Mastodon), and radar.cloudflare.com (Bluesky), or contact us via e-mail.

Announcing Cloudflare Incident Alerts

Post Syndicated from Mia Malden original http://blog.cloudflare.com/incident-alerts/

Announcing Cloudflare Incident Alerts

Announcing Cloudflare Incident Alerts

A lot of people rely on Cloudflare. We serve over 46 million HTTP requests per second on average; millions of customers use our services, including 31% of the Fortune 1000. And these numbers are only growing.

Given the privileged position we sit in to help the Internet to operate, we’ve always placed a very large emphasis on transparency during incidents. But we’re constantly striving to do better.

That’s why today we are excited to announce Incident Alerts — available via email, webhook, or PagerDuty. These notifications are accessible easily in the Cloudflare dashboard, and they’re customizable to prevent notification overload. And best of all, they’re available to everyone; you simply need a free account to get started.

Lifecycle of an incident

Announcing Cloudflare Incident Alerts

Without proper transparency, incidents cause confusion and waste resources for anyone that relies on the Internet. With so many different entities working together to make the Internet operate, diagnosing and troubleshooting can be complicated and time-consuming. By far the best solution is for providers to have transparent and proactive alerting, so any time something goes wrong, it’s clear exactly where the problem is.

Cloudflare incident response

We understand the importance of proactive and transparent alerting around incidents. We have worked to improve communications by directly alerting enterprise level customers and allowing everyone to subscribe to an RSS feed or leverage the Cloudflare Status API. Additionally, we update the Cloudflare status page — which catalogs incident reports, updates, and resolutions — throughout an incident’s lifecycle, as well as tracking scheduled maintenance.

However, not everyone wants to use the Status API or subscribe to an RSS feed. Both of these options require some infrastructure and programmatic efforts from the customer’s end, and neither offers simple configuration to filter out noise like scheduled maintenance. For those who don’t want to build anything themselves, visiting the status page is still a pull, rather than a push, model. Customers themselves need to take it upon themselves to monitor Cloudflare’s status — and timeliness in these situations can make a world of difference.

Without a proactive channel of communication, there can be a disconnect between Cloudflare and our customers during incidents. Although we update the status page as soon as possible, the lack of a push notification represents a gap in meeting our customers’ expectations. The new Cloudflare Incident Alerts aim to remedy that.

Simple, free, and fast notifications

We want to proactively notify you as soon as a Cloudflare incident may be affecting your service —- without any programmatic steps on your end. Unlike the Status API and an RSS feed, Cloudflare Incident Alerts are configurable through just a few clicks in the dashboard, and you can choose to receive email, PagerDuty, or web hook alerts for incidents involving specific products at different levels of impact. The Status API will continue to be available.

With this multidimensional granularity, you can filter notifications by specific service and severity. If you are, for example, a Cloudflare for SaaS customer, you may want alerts for delays in custom hostname activation but not for increased latency on Stream. Likewise, you may only care about critical incidents instead of getting notified for minor incidents. Incident Alerts give you the ability to choose.

Announcing Cloudflare Incident Alerts
Lifecycle of an Incident

How to filter incidents to fit your needs

You can filter incident notifications with the following categories:

  • Cloudflare Sites and Services: get notified when an incident is affecting certain products or product areas.
  • Impact level: get notified for critical, major, and/or minor incidents.

These categories are not mutually exclusive. Here are a few possible configurations:

  • Notify me via email for all critical incidents.
  • Notify me via webhook for critical & major incidents affecting Pages.
  • Notify me via PagerDuty for all incidents affecting Stream.

With over fifty different alerts available via the dashboard, you can tailor your notifications to what you need. You can customize not only which alerts you are receiving but also how you would like to be notified. With PagerDuty, webhooks, and email integrated into the system, you have the flexibility of choosing what will work best with your working environment. Plus, with multiple configurations within many of the available notifications, we make it easy to only get alerts about what you want, when you want them.

Try it out

You can start to configure incident alerts on your Cloudflare account today. Here’s how:

  1. Navigate to the Cloudflare dashboard → Notifications.
  2. Select “Add”.
  3. Select “Incident Alerts”.
  4. Enter your notification name and description.
  5. Select the impact level(s) and component(s) for which you would like to be notified. If either field is left blank, it will default to all impact levels or all components, respectively.
  6. Select how you want to receive the notifications:
  7. Check PagerDuty
  8. Add Webhook
  9. Add email recipient
  10. Select “Save”.
  11. Test the notification by selecting “Test” on the right side of its row.
Announcing Cloudflare Incident Alerts

For more information on Cloudflare’s Alert Notification System, visit our documentation here.

Cloudflare Account Permissions, how to use them, and best practices

Post Syndicated from Joseph So original http://blog.cloudflare.com/permissions-best-practices/

Cloudflare Account Permissions, how to use them, and best practices

Cloudflare Account Permissions, how to use them, and best practices

In the dynamic landscape of modern web applications and organizations, access control is critical. Defining who can do what within your Cloudflare account ensures security and efficient workflow management. In order to help meet your organizational needs, whether you are a single developer, a small team, or a larger enterprise, we’re going to cover two changes that we have developed to make it easier to do user management, and best practices on how to use these features, alongside existing features in order to scope everything appropriately into your account, in order to ensure security while you are working with others.

What are roles?

In the preceding year, Cloudflare has expanded our list of roles available to everyone from 1 to over 60, and we are continuing to build out more, better roles. We have also made domain scoping a capability for all users. This prompts the question, what are roles, and why do they exist?

Roles are a set of permissions that exist in a bundle with a name. Every API call that is made to Cloudflare has a required set of permissions, otherwise an API call will return with a 403. We generally group permissions into a role to allow access to a set of capabilities that allow the use of a Cloudflare Product, or that represent everything needed to fulfill a job function.

As of today, we have two entities that we can assign roles: the first entity is a user, representing a profile, or an actor, which we generally require an email address to represent.

The second entity is a token, which represents delegation of a subset of a user’s permissions to be used for programmatic purposes.

What is scope?

Permissions are useless without an appropriate actor, and a scope. For every action a user can take, they must be directed to the appropriate resource, which is what we refer to as a scope.

When a user first signs up to Cloudflare, they are provided a Cloudflare user, as well as an account. Accounts can be shared with others. Accounts act as our traditional resource boundary, and granting permissions at the account level means that those permissions apply to all zones, and all other resources within the account.

Within accounts however, there are zones, R2 Buckets, Workers, and other resources. We are working on expanding the types of scopes that can be set, and as of now, you can scope access to a selective number of zones, or create tokens that only allow access into specified R2 buckets.

While our list of available roles is going to continue to grow, I want to go into some detail about how to use the roles right now, and how to use them to their full potential

What are the different types of roles and scopes we have today?

For most of the Cloudflare users and use cases out there, our traditional role model at the account level is the easiest to use. These roles can be viewed by selecting the scope of All domains.

Cloudflare Account Permissions, how to use them, and best practices

As of today, there are 40+ Roles available at the account level. These provide access to a capability across the whole account, with no further scoping. Once these roles are provided to a user, they are able to complete a limited set of actions across any zones in your account. We intend to cover off more capabilities in this list, and will continue to add more roles.

When you want to grant access to a specific zone, or list of zones, the best way to go about that is to use a domain scoped role.

Cloudflare Account Permissions, how to use them, and best practices

A single domain can be added similar to the above, and granting explicit scope to a domain implicitly denies access to other domains.

If you are looking to grant access to multiple domains simultaneously, in order to represent all staging zones for example, you can place them into a domain group. These can be revisited and edited within the Configurations → Lists page under Manage Account.

Cloudflare Account Permissions, how to use them, and best practices

Best practices for creating domain groups is to group a set of similar domains together, such that you can reuse it for every user on your account.

Other best practices when assigning memberships

It is always best practice to explicitly define what you are granting access to, as Cloudflare’s permissioning system defers to a fail closed design. If you do not grant access to it, they will not have access to it.

We model all the different types of roles in an additive capacity, and we’re going to move forward with creating more capability specific roles. Multiple roles can be assigned given a scope. We recommend against explicitly “excluding” objects, because it can lead to some complex permission processing.

An example of this may be your organization’s billing administrator. You may want to grant them both Billing and Analytics, but exclude them from web administration activities by explicitly granting them those two roles.

Exciting changes you will see from us soon include the capability to “stack” multiple sets of policies on top of one another. We are currently rolling this out, and some users will already have the ability to define a set of permissions for one set of domains, and an increased set of permissions for another.

This will come in handy if you are managing multiple environments within one account, and want to grant differing levels of access to say a development and staging domain.

We also recognize that Cloudflare has many resources beyond Accounts and zones, and we are currently experimenting with adding scoping to other objects. As of today, you can specify R2 Tokens to only access certain buckets, and I look forward to adding this capability to more resources.

Best practices when delegating access to tokens

Memberships and users tend to use Cloudflare in an interactive capacity, but many organizations use Cloudflare programmatically.

A new capability we are rolling out to all users soon is the capability to limit API access: on your account as a whole, or on a per-user basis.

All programmatic access to Cloudflare at this time is managed at a per-user basis, representing a delegation of that user’s access to their set of accounts. Programmatic access is always bounded by a user’s access, and many of our user’s service accounts have a wide set of access that is split into context specific tokens.

As a Super administrator, if you want to restrict programmatic access to your account, this toggle will become available on the members page.

We recommend keeping this functionality turned off, unless you explicitly want to grant the ability to use the API to specific users, which can also be controlled via a dropdown per user. We have seen some organizations use this capability to centralize the creation of API Tokens into a single service user.

Cloudflare Account Permissions, how to use them, and best practices
Cloudflare Account Permissions, how to use them, and best practices

Cloudflare recommends the use of API Tokens wherever possible. API tokens have the ability to be scoped down to a smaller subset of a user’s access, instead of granting access to all of it.

When building out a set of permissions for an API Token, we have the same scoping capability that was visible in membership roles.

Account Scoping:

Cloudflare Account Permissions, how to use them, and best practices

Domain/Zone Scoping:

Cloudflare Account Permissions, how to use them, and best practices

Cloudflare’s roles are meant to provide the flexibility to provide the least amount of privilege possible, in order to keep your Cloudflare resources safe. Recent improvements have included a number of capability specific roles, as well as the ability to lock down API Access. Future improvements will include the ability to grant multiple policies to individual users, as well as more scopes.

What’s next

All users are able to use our new roles, and there will be several rolling improvements, including the capability to lock down API access, as well as assign multiple policies to users.

Switching to Cloudflare can cut your network carbon emissions up to 96% (and we’re joining the SBTi)

Post Syndicated from Patrick Day original http://blog.cloudflare.com/switching-cloudflare-cut-your-network-carbon-emissions-sbti/

Switching to Cloudflare can cut your network carbon emissions up to 96% (and we're joining the SBTi)

This post is also available in 简体中文, 日本語, 한국어, Deutsch, Español and Français.

Switching to Cloudflare can cut your network carbon emissions up to 96% (and we're joining the SBTi)

Since our founding, Cloudflare has helped customers save on costs, increase security, and boost performance and reliability by migrating legacy hardware functions to the cloud. More recently, our customers have been asking about whether this transition can also improve the environmental impact of their operations.

We are excited to share an independent report published this week that found that switching enterprise network services from on premises devices to Cloudflare services can cut related carbon emissions up to 96%, depending on your current network footprint. The majority of these gains come from consolidating services, which improves carbon efficiency by increasing the utilization of servers that are providing multiple network functions.

And we are not stopping there. Cloudflare is also proud to announce that we have applied to set carbon reduction targets through the Science Based Targets initiative (SBTi) in order to help continue to cut emissions across our operations, facilities, and supply chain.

As we wrap up the hottest summer on record, it's clear that we all have a part to play in understanding and reducing our carbon footprint. Partnering with Cloudflare on your network transformation journey is an easy way to get started. Come join us today!

Traditional vs. cloud-based networking and security

Historically, corporate networks relied on dedicated circuits and specialized hardware to connect and secure their infrastructure. Companies built or rented space in data centers that were physically located within or close to major office locations, and hosted business applications on servers in these data centers. Employees in offices connected to these applications through the local area network (LAN) or over private wide area network (WAN) links from branch locations. A stack of security hardware in each data center, including firewalls, intrusion detection systems, DDoS mitigation appliances, VPN concentrators, and more enforced security for all traffic flowing in and out.

This architecture model broke down when applications shifted to the cloud and users left the office, requiring a new approach to connecting and securing corporate networks. Cloudflare’s model, which aligns with the SASE framework, shifts network and security functions from on premises hardware to our distributed global network.

Switching to Cloudflare can cut your network carbon emissions up to 96% (and we're joining the SBTi)
Traditional vs. cloud-based networking and security architecture

This approach improves performance by enforcing policy close to where users are, increases security with Zero Trust principles, and saves costs by delivering functions more efficiently. We are now excited to report that it materially reduces the total power consumption of the services required to connect and secure your organization, which reduces carbon emissions.

Reduced carbon emissions through cloud migration and consolidation

An independent study published this week by Analysys Mason outlines how shifting networking and security functions to the cloud, and particularly consolidating services in a unified platform, directly improves the sustainability of organizations’ network, security, and IT operations. You can read the full study here, but here are a few key points.

The study compared a typical hardware stack deployed in an enterprise data center or IT closet, and its associated energy consumption, to the energy consumption of comparable functions delivered by Cloudflare’s global network. The stack used for comparison included network firewall and WAF, DDoS mitigation, load balancing, WAN optimization, and SD-WAN. Researchers analyzed the average power consumption for devices with differing capacity and found that higher-capacity devices only consume incrementally more energy:

Switching to Cloudflare can cut your network carbon emissions up to 96% (and we're joining the SBTi)
Power consumption across representative networking and security hardware devices with varying traffic capacity

The study noted that specialized hardware is more efficient per watt of electricity consumed at performing specific functions — in other words, a device optimized for intrusion detection will perform intrusion detection functions using less power per request processed than a generic server designed to host multiple different workloads. This can be seen in the bar labeled “impact of cloud processing efficiency” in the graph below.

However, these gains are only relevant when a specialized hardware device is consistently utilized close to its capacity, which most appliances in corporate environments are not. Network, security, and IT teams intentionally provision devices with higher capacity than they will need the majority of the time in order to be able to gracefully handle spikes or peaks.

For example, a security engineer might have traditionally specced a DDoS protection appliance that can handle up to 10 Gbps of traffic in case an attack of that size came in, but the vast majority of the time, the appliance is processing far less traffic (maybe only tens or hundreds of Mbps). This means that it is actually much more efficient for those functions to run on a generic device that is also running other kinds of processes and therefore can operate at a higher baseline utilization, using the same power to get more work done. These benefits are shown in the “utilization gains from cloud” bar in the following graph.

There are also some marginal efficiency gains from other aspects of cloud architecture, such as improved power usage effectiveness (PUE) and carbon intensity of data centers optimized for cloud workloads vs. traditional enterprise infrastructure. These are represented on the right of the graph below.

Switching to Cloudflare can cut your network carbon emissions up to 96% (and we're joining the SBTi)
The analysis shows that processing efficiency in the cloud is lower than specialized on-premises equipment; however, utilization gains through shared cloud services combined with expected PUE and carbon intensity yield potentially 86% emissions savings for large enterprises.  

Researchers compared multiple examples of enterprise IT environments, from small to large traffic volume and complexity, and found that these factors contribute to overall carbon emissions reduction of 78-96% depending on the network analyzed.

One of the most encouraging parts of this study was that it did not include Cloudflare's renewable energy or offset purchases in its findings. A number of studies have concluded that migrating various applications and compute functions from on premises hardware to the cloud can significantly cut carbon emissions. But, those studies also relied in part on carbon accounting benefits like renewable energy or carbon offsets to demonstrate those savings.

Cloudflare also powers its operations with 100% renewable energy and purchases high-quality offsets to account for its annual emissions footprint. Meaning, the emissions savings of potentially switching to Cloudflare are likely even higher than those reported.

Overall, consolidating and migrating to Cloudflare’s services and retiring legacy hardware can substantially reduce energy consumption and emissions. And while you are at it, make sure to consider sustainable end-of-life practices for those retired devices — we will even help you recycle them!

Cloudflare is joining the Science Based Targets initiative (SBTi)

We're incredibly proud that Cloudflare is helping move the Internet toward a zero emissions future. But, we know that we can do more.

Cloudflare is thrilled to announce that we have submitted our application to join SBTi and set science-based carbon reduction targets across our facilities, operations, and supply chain.

SBTi is one of the world's most ambitious corporate climate action commitments. It requires companies to achieve verifiable emissions reductions across their operations and supply chain without the use of carbon offsets. Companies' short- and long-term reduction goals must be consistent with the Paris Climate Agreement goal of limiting global warming to 1.5 degrees above pre-industrial levels.

Once approved, Cloudflare will work over the next 24 months with SBTi to develop and validate our short and long term reduction targets. Stay tuned to our blog and our Impact page for updates as we go.

Cloudflare's commitment to SBTi reduction targets builds on our ongoing commitments to 100% renewable energy, to offset or remove historic carbon emissions associated with powering our network by 2025, and reforestation efforts.

As we have said before, Cloudflare's original goal was not to reduce the Internet's environmental impact. But, that has changed.

Come join Cloudflare today and help us work towards a zero emissions Internet.

Switching to Cloudflare can cut your network carbon emissions up to 96% (and we're joining the SBTi)

Cloudflare Stream Low-Latency HLS support now in Open Beta

Post Syndicated from Taylor Smith original http://blog.cloudflare.com/cloudflare-stream-low-latency-hls-open-beta/

Cloudflare Stream Low-Latency HLS support now in Open Beta

Cloudflare Stream Low-Latency HLS support now in Open Beta

Stream Live lets users easily scale their live-streaming apps and websites to millions of creators and concurrent viewers while focusing on the content rather than the infrastructure — Stream manages codecs, protocols, and bit rate automatically.

For Speed Week this year, we introduced a closed beta of Low-Latency HTTP Live Streaming (LL-HLS), which builds upon the high-quality, feature-rich HTTP Live Streaming (HLS) protocol. Lower latency brings creators even closer to their viewers, empowering customers to build more interactive features like chat and enabling the use of live-streaming in more time-sensitive applications like live e-learning, sports, gaming, and events.

Today, in celebration of Birthday Week, we’re opening this beta to all customers with even lower latency. With LL-HLS, you can deliver video to your audience faster, reducing the latency a viewer may experience on their player to as little as three seconds. Low Latency streaming is priced the same way, too: $1 per 1,000 minutes delivered, with zero extra charges for encoding or bandwidth.

Broadcast with latency as low as three seconds.

LL-HLS is an extension of the HLS standard that allows us to reduce glass-to-glass latency — the time between something happening on the broadcast end and a user seeing it on their screen. That includes factors like network conditions and transcoding for HLS and adaptive bitrates. We also include client-side buffering in our understanding of latency because we know the experience is driven by what a user sees, not when a byte is delivered into a buffer. Depending on encoder and player settings, broadcasters' content can be playing on viewers' screens in less than three seconds.

On the left, OBS Studio broadcasting from my personal computer to Cloudflare Stream. On the right, watching this livestream using our own built-in player playing LL-HLS with three second latency!

Same pricing, lower latency. Encoding is always free.

Our addition of LL-HLS support builds on all the best parts of Stream including simple, predictable pricing. You never have to pay for ingress (broadcasting to us), compute (encoding), or egress. This allows you to stream with peace of mind, knowing there are no surprise fees and no need to trade quality for cost. Regardless of bitrate or resolution, Stream costs \$1 per 1,000 minutes of video delivered and \$5 per 1,000 minutes of video stored, billed monthly.

Stream also provides both a built-in web player or HLS/DASH manifests to use in a compatible player of your choosing. This enables you or your users to go live using the same protocols and tools that broadcasters big and small use to go live to YouTube or Twitch, but gives you full control over access and presentation of live streams. We also provide access control with signed URLs and hotlinking prevention measures to protect your content.

Powered by the strength of the network

And of course, Stream is powered by Cloudflare's global network for fast delivery worldwide, with points of presence within 50ms of 95% of the Internet connected population, a key factor in our quest to slash latency. We ingest live video close to broadcasters and move it rapidly through Cloudflare’s network. We run encoders on-demand and generate player manifests as close to viewers as possible.

Getting started with LL-HLS

Getting started with Stream Live only takes a few minutes, and by using Live Outputs for restreaming, you can even test it without changing your existing infrastructure. First, create or update a Live Input in the Cloudflare dashboard. While in beta, Live Inputs will have an option to enable LL-HLS called “Low-Latency HLS Support.” Activate this toggle to enable the new pipeline.

Cloudflare Stream Low-Latency HLS support now in Open Beta

Stream will automatically provide the RTMPS and SRT endpoints to broadcast your feed to us, just as before. For the best results, we recommend the following broadcast settings:

  • Codec: h264
  • GOP size / keyframe interval: 1 second

Optionally, configure a Live Output to point to your existing video ingest endpoint via RTMPS or SRT to test Stream while rebroadcasting to an existing workflow or infrastructure.

Stream will automatically provide RTMPS and SRT endpoints to broadcast your feed to us as well as an HTML embed for our built-in player.

Cloudflare Stream Low-Latency HLS support now in Open Beta

This connection information can be added easily to a broadcast application like OBS to start streaming immediately:

Cloudflare Stream Low-Latency HLS support now in Open Beta

During the beta, our built-in player will automatically attempt to use low-latency for any enabled Live Input, falling back to regular HLS otherwise. If LL-HLS is being used, you’ll see “Low Latency” noted in the player.

During this phase of the beta, we are most closely focused on using OBS to broadcast and Stream’s built-in player to watch — which uses HLS.js under the hood for LL-HLS support. However, you may test the LL-HLS manifest in a player of your own by appending ?protocol=llhls to the end of the HLS manifest URL. This flag may change in the future and is not yet ready for production usage; watch for changes in DevDocs.

Sign up today

Low-Latency HLS is Stream Live’s latest tool to bring your creators and audiences together. All new and existing Stream subscriptions are eligible for the LL-HLS open beta today, with no pricing changes or contract requirements — all part of building the fastest, simplest serverless live-streaming platform. Join our beta to start test-driving Low-Latency HLS!

Cloudflare Fonts: enhancing website font privacy and speed

Post Syndicated from Matt Bullock original http://blog.cloudflare.com/cloudflare-fonts-enhancing-website-privacy-speed/

Cloudflare Fonts: enhancing website font privacy and speed

Cloudflare Fonts: enhancing website font privacy and speed

We are thrilled to introduce Cloudflare Fonts! In the coming weeks sites that use Google Fonts will be able to effortlessly load their fonts from the site’s own domain rather than from Google. All at a click of a button. This enhances both privacy and performance. It enhances users' privacy by eliminating the need to load fonts from Google’s third-party servers. It boosts a site's performance by bringing fonts closer to end users, reducing the time spent on DNS lookups and TLS connections.

Sites that currently use Google Fonts will not need to self-host fonts or make complex code changes to benefit – Cloudflare Fonts streamlines the entire process, making it a breeze.

Fonts and privacy

When you load fonts from Google, your website initiates a data exchange with Google's servers. This means that your visitors' browsers send requests directly to Google. Consequently, Google has the potential to accumulate a range of data, including IP addresses, user agents (formatted descriptions of the browser and operating system), the referer (the page on which the Google font is to be displayed) and how often each IP makes requests to Google. While Google states that they do not use this data for targeted advertising or set cookies, any time you can prevent sharing your end user’s personal data unnecessarily is a win for privacy.

With Cloudflare Fonts, when you serve fonts directly from your own domain. This means no font requests are sent to third-party domains like Google, which some privacy regulators have found to be a problem in the past. Our pro-privacy approach means your end user’s IP address and other data are not sent to another domain. All that information stays within your control, within your domain. In addition, because Cloudflare Fonts eliminates data transmission to third-party servers like Google's, this can enhance your ability to comply with any potential data localization requirements.

Faster Google Font delivery through Cloudflare

Now that we have established that Cloudflare Fonts can improve your privacy, let's flip to the other side of the coin – how Cloudflare Fonts will improve your performance.

To do this, we first need to delve into how Google Fonts affects your website's performance. Subsequently, we'll explore how Cloudflare Fonts addresses and rectifies these performance challenges.

Google Fonts is a fantastic resource that offers website owners a range of royalty-free fonts for website usage. When you decide on the fonts you would like to incorporate, it’s super easy to integrate. You just add a snippet of HTML to your site. You then add styles to apply these fonts to various parts of your page:

<link href="https://fonts.googleapis.com/css?family=Open+Sans|Roboto+Slab" rel="stylesheet">
<style>
  body {
    font-family: 'Open Sans', sans-serif;
  }
  h1 {
    font-family: 'Roboto Slab', serif;
  }
</style>

But this ease of use comes with a performance penalty.

Upon loading your webpage, your visitors' browser fetches the CSS file as soon as the HTML starts to be parsed. Then, when the browser starts rendering the page and identifies the need for fonts in different text sections, it requests the required font files.

This is where the performance problem arises. Google Fonts employs a two-domain system: the CSS resides on one domain – fonts.googleapis.com – while the font files reside on another domain – fonts.gstatic.com.

This separation results in a minimum of four round trips to the third-party servers for each resource request. These round trips are DNS lookup, socket connection establishment, TLS negotiation (for HTTPS), and the final round trip for the actual resource request. Ultimately, getting a font from Google servers to a browser requires eight round trips.

Users can see this. If they are using Google Fonts they can open their network tab and filter for these Google domains.

Cloudflare Fonts: enhancing website font privacy and speed

You can visually see the impact of the extra DNS request and TLS connection that these requests add to your website experience. For example on my WordPress site that natively uses Google Fonts as part of the theme adds an extra ~150ms.

Cloudflare Fonts: enhancing website font privacy and speed

Fast fonts

Cloudflare Fonts streamlines this process, by reducing the number of round trips from eight to one. Two sets of DNS lookups, socket connections and TLS negotiations to third-parties are no longer required because there is no longer a third-party server involved in serving the CSS or the fonts. The only round trip involves serving the font files directly from the same domain where the HTML is hosted. This approach offers an additional advantage: it allows fonts to be transmitted over the same HTTP/2 or HTTP/3 connection as other page resources, benefiting from proper prioritization and preventing bandwidth contention.

The eagle-eyed amongst you might be thinking “Surely it is still two round trips – what about the CSS request?”. Well, with Cloudflare Fonts, we have also removed the need for a separate CSS request. This means there really is only one round-trip – fetching the font itself.

To achieve both the home-routing of font requests and the removal of the CSS request, we rewrite the HTML as it passes through Cloudflare’s global network. The CSS response is embedded, and font URL transformations are performed within the embedded CSS.

These transformations adjust the font URLs to align with the same domain as the HTML content. These modified responses seamlessly pass through Cloudflare's caching infrastructure, where they are automatically cached for a substantial performance boost. In the event of any cache misses, we use Fontsource and NPM to load these fonts and cache them within the Cloudflare infrastructure. This approach ensures that there's no inadvertent data exposure to Google's infrastructure, maintaining both performance and data privacy.

With Cloudflare Fonts enabled, you are able to see within your Network Tab that font files are now loaded from your own hostname from the /cf-fonts path and served from Cloudflare’s closest cache to the user, as indicated by the cf-cache-status: HIT.

Cloudflare Fonts: enhancing website font privacy and speed

Additionally, you can notice that the timings section in the browser no longer needs an extra DNS lookup for the hostname or the setup of a TLS connection. This happens because the content is served from your hostname, and the browser has already cached the DNS response and has an open TLS connection.

Cloudflare Fonts: enhancing website font privacy and speed

Finally, you can see the real-world performance benefits of Cloudflare Fonts. We conducted synthetic Google Lighthouse tests before enabling Cloudflare Fonts on a straightforward page that displays text. First Contentful Paint (FCP), which represents the time it takes for the first content element to appear on the page, was measured at 0.9 seconds in the Google fonts tests. After enabling Cloudflare Fonts, the First Contentful Paint (FCP) was reduced to 0.3 seconds, and our overall Lighthouse performance score improved from 98 to a perfect 100 out of 100.

Cloudflare Fonts: enhancing website font privacy and speed

Making Cloudflare Fonts fast with ROFL

In order to make Cloudflare Fonts this performant, we needed to make blazing-fast HTML alterations as responses stream through Cloudflare’s network. This has been made possible by leveraging one of Cloudflare’s more recent technologies.

Earlier this year, we finished rewriting one of Cloudflare's oldest components, which played a crucial role in dynamically altering HTML content. But as described in this blog post, a new solution was required to replace the old – A memory-safe solution, able to scale to Cloudflare’s ever-increasing load.

This new module is known as ROFL (Response Overseer for FL). It now powers various Cloudflare products that need to alter HTML as it streams, such as Email Obfuscation, Rocket Loader, and HTML Minification.

ROFL was developed entirely in Rust. This decision was driven by Rust's memory safety, performance, and security. The memory-safety features of Rust are indispensable to ensure airtight protection against memory leaks while we process a staggering volume of requests, measuring in the millions per second. Rust's compiled nature allows us to finely optimize our code for specific hardware configurations, delivering impressive performance gains compared to interpreted languages.

ROFL paved the way for the development of Cloudflare Fonts. The performance of ROFL allows us to rewrite HTML on-the-fly and modify the Google Fonts links quickly, safely and efficiently. This speed helps us reduce any additional latency added by processing the HTML file and improve the performance of your website.

Unlock the power of Cloudflare Fonts today! 🚀

Cloudflare Fonts will be available to all Cloudflare customers in October. If you're using Google Fonts, you will be able to supercharge your site's privacy and speed. By enabling this feature, you can seamlessly enhance your website's performance while safeguarding your user’s privacy.

How Cloudflare’s systems dynamically route traffic across the globe

Post Syndicated from David Tuber original http://blog.cloudflare.com/meet-traffic-manager/

How Cloudflare’s systems dynamically route traffic across the globe

How Cloudflare’s systems dynamically route traffic across the globe

Picture this: you’re at an airport, and you’re going through an airport security checkpoint. There are a bunch of agents who are scanning your boarding pass and your passport and sending you through to your gate. All of a sudden, some of the agents go on break. Maybe there’s a leak in the ceiling above the checkpoint. Or perhaps a bunch of flights are leaving at 6pm, and a number of passengers turn up at once. Either way, this imbalance between localized supply and demand can cause huge lines and unhappy travelers — who just want to get through the line to get on their flight. How do airports handle this?

Some airports may not do anything and just let you suffer in a longer line. Some airports may offer fast-lanes through the checkpoints for a fee. But most airports will tell you to go to another security checkpoint a little farther away to ensure that you can get through to your gate as fast as possible. They may even have signs up telling you how long each line is, so you can make an easier decision when trying to get through.

At Cloudflare, we have the same problem. We are located in 300 cities around the world that are built to receive end-user traffic for all of our product suites. And in an ideal world, we always have enough computers and bandwidth to handle everyone at their closest possible location. But the world is not always ideal; sometimes we take a data center offline for maintenance, or a connection to a data center goes down, or some equipment fails, and so on. When that happens, we may not have enough attendants to serve every person going through security in every location. It’s not because we haven’t built enough kiosks, but something has happened in our data center that prevents us from serving everyone.

So, we built Traffic Manager: a tool that balances supply and demand across our entire global network. This blog is about Traffic Manager: how it came to be, how we built it, and what it does now.

The world before Traffic Manager

The job now done by Traffic Manager used to be a manual process carried out by network engineers: our network would operate as normal until something happened that caused user traffic to be impacted at a particular data center.

When such events happened, user requests would start to fail with 499 or 500 errors because there weren’t enough machines to handle the request load of our users. This would trigger a page to our network engineers, who would then remove some Anycast routes for that data center. The end result: by no longer advertising those prefixes in the impacted data center, user traffic would divert to a different data center. This is how Anycast fundamentally works: user traffic is drawn to the closest data center advertising the prefix the user is trying to connect to, as determined by Border Gateway Protocol. For a primer on what Anycast is, check out this reference article.

Depending on how bad the problem was, engineers would remove some or even all the routes in a data center. When the data center was again able to absorb all the traffic, the engineers would put the routes back and the traffic would return naturally to the data center.

As you might guess, this was a challenging task for our network engineers to do every single time any piece of hardware on our network had an issue. It didn’t scale.

Never send a human to do a machine’s job

But doing it manually wasn’t just a burden on our Network Operations team. It also resulted in a sub-par experience for our customers; our engineers would need to take time to diagnose and re-route traffic. To solve both these problems, we wanted to build a service that would immediately and automatically detect if users were unable to reach a Cloudflare data center, and withdraw routes from the data center until users were no longer seeing issues. Once the service received notifications that the impacted data center could absorb the traffic, it could put the routes back and reconnect that data center. This service is called Traffic Manager, because its job (as you might guess) is to manage traffic coming into the Cloudflare network.

How Cloudflare’s systems dynamically route traffic across the globe

Accounting for second order consequences

When a network engineer removes a route from a router, they can make the best guess at where the user requests will move to, and try to ensure that the failover data center has enough resources to handle the requests — if it doesn’t, they can adjust the routes there accordingly prior to removing the route in the initial data center. To be able to automate this process, we needed to move from a world of intuition to a world of data — accurately predicting where traffic would go when a route was removed, and feeding this information to Traffic Manager, so it could ensure it doesn’t make the situation worse.

Meet Traffic Predictor

Although we can adjust which data centers advertise a route, we are unable to influence what proportion of traffic each data center receives. Each time we add a new data center, or a new peering session, the distribution of traffic changes, and as we are in over 300 cities and 12,500 peering sessions, it has become quite difficult for a human to keep track of, or predict the way traffic will move around our network. Traffic manager needed a buddy: Traffic Predictor.

In order to do its job, Traffic Predictor carries out an ongoing series of real world tests to see where traffic actually moves. Traffic Predictor relies on a testing system that simulates removing a data center from service and measuring where traffic would go if that data center wasn’t serving traffic. To help understand how this system works, let’s simulate the removal of a subset of a data center in Christchurch, New Zealand:

  • First, Traffic Predictor gets a list of all the IP addresses that normally connect to Christchurch. Traffic Predictor will send a ping request to hundreds of thousands of IPs that have recently made a request there.
  • Traffic Predictor records if the IP responds, and whether the response returns to Christchurch using a special Anycast IP range specifically configured for Traffic Predictor.
  • Once Traffic Predictor has a list of IPs that respond to Christchurch, it removes that route containing that special range from Christchurch, waits a few minutes for the Internet routing table to be updated, and runs the test again.
  • Instead of being routed to Christchurch, the responses are instead going to data centers around Christchurch. Traffic Predictor then uses the knowledge of responses for each data center, and records the results as the failover for Christchurch.

This allows us to simulate Christchurch going offline without actually taking Christchurch offline!

But Traffic Predictor doesn’t just do this for any one data center. To add additional layers of resiliency, Traffic Predictor even calculates a second layer of indirection: for each data center failure scenario, Traffic Predictor also calculates failure scenarios and creates policies for when surrounding data centers fail.

Using our example from before, when Traffic Predictor tests Christchurch, it will run a series of tests that remove several surrounding data centers from service including Christchurch to calculate different failure scenarios. This ensures that even if something catastrophic happens which impacts multiple data centers in a region, we still have the ability to serve user traffic. If you think this data model is complicated, you’re right: it takes several days to calculate all of these failure paths and policies.

Here’s what those failure paths and failover scenarios look like for all of our data centers around the world when they’re visualized:

How Cloudflare’s systems dynamically route traffic across the globe

This can be a bit complicated for humans to parse, so let’s dig into that above scenario for Christchurch, New Zealand to make this a bit more clear. When we take a look at failover paths specifically for Christchurch, we see they look like this:

How Cloudflare’s systems dynamically route traffic across the globe

In this scenario we predict that 99.8% of Christchurch’s traffic would shift to Auckland, which is able to absorb all Christchurch traffic in the event of a catastrophic outage.

Traffic Predictor allows us to not only see where traffic will move to if something should happen, but it allows us to preconfigure Traffic Manager policies to move requests out of failover data centers to prevent a thundering herd scenario: where sudden influx of requests can cause failures in a second data center if the first one has issues. With Traffic Predictor, Traffic Manager doesn’t just move traffic out of one data center when that one fails, but it also proactively moves traffic out of other data centers to ensure a seamless continuation of service.

From a sledgehammer to a scalpel

With Traffic Predictor, Traffic Manager can dynamically advertise and withdraw prefixes while ensuring that every datacenter can handle all the traffic. But withdrawing prefixes as a means of traffic management can be a bit heavy-handed at times. One of the reasons for this is that the only way we had to add or remove traffic to a data center was through advertising routes from our Internet-facing routers. Each one of our routes has thousands of IP addresses, so removing only one still represents a large portion of traffic.

Specifically, Internet applications will advertise prefixes to the Internet from a /24 subnet at an absolute minimum, but many will advertise prefixes larger than that. This is generally done to prevent things like route leaks or route hijacks: many providers will actually filter out routes that are more specific than a /24 (for more information on that, check out this blog here). If we assume that Cloudflare maps protected properties to IP addresses at a 1:1 ratio, then each /24 subnet would be able to service 256 customers, which is the number of IP addresses in a /24 subnet. If every IP address sent one request per second, we’d have to move 4 /24 subnets out of a data center if we needed to move 1,000 requests per second (RPS).

But in reality, Cloudflare maps a single IP address to hundreds of thousands of protected properties. So for Cloudflare, a /24 might take 3,000 requests per second, but if we needed to move 1,000 RPS out, we would have no choice but to move a single /24 out. And that’s just assuming we advertise at a /24 level. If we used /20s to advertise, the amount we can withdraw gets less granular: at a 1:1 website to IP address mapping, that’s 4,096 requests per second for each prefix, and even more if the website to IP address mapping is many to one.

While withdrawing prefix advertisements improved the customer experience for those users who would have seen a 499 or 500 error — there may have been a significant portion of users who wouldn’t have been impacted by an issue who still were moved away from the data center they should have gone to, probably slowing them down, even if only a little bit. This concept of moving more traffic out than is necessary is called “stranding capacity”: the data center is theoretically able to service more users in a region but cannot because of how Traffic Manager was built.

We wanted to improve Traffic Manager so that it only moved the absolute minimum of users out of a data center that was seeing a problem and not strand any more capacity. To do so, we needed to shift percentages of prefixes, so we could be extra fine-grained and only move the things that absolutely need to be moved. To solve this, we built an extension of our Layer 4 load balancer Unimog, which we call Plurimog.

A quick refresher on Unimog and layer 4 load balancing: every single one of our machines contains a service that determines whether that machine can take a user request. If the machine can take a user request then it sends the request to our HTTP stack which processes the request before returning it to the user. If the machine can’t take the request, the machine sends the request to another machine in the data center that can. The machines can do this because they are constantly talking to each other to understand whether they can serve requests for users.

Plurimog does the same thing, but instead of talking between machines, Plurimog talks in between data centers and points of presence. If a request goes into Philadelphia and Philadelphia is unable to take the request, Plurimog will forward to another data center that can take the request, like Ashburn, where the request is decrypted and processed. Because Plurimog operates at layer 4, it can send individual TCP or UDP requests to other places which allows it to be very fine-grained: it can send percentages of traffic to other data centers very easily, meaning that we only need to send away enough traffic to ensure that everyone can be served as fast as possible. Check out how that works in our Frankfurt data center, as we are able to shift progressively more and more traffic away to handle issues in our data centers. This chart shows the number of actions taken on free traffic that cause it to be sent out of Frankfurt over time:

How Cloudflare’s systems dynamically route traffic across the globe

But even within a data center, we can route traffic around to prevent traffic from leaving the datacenter at all. Our large data centers, called Multi-Colo Points of Presence (MCPs) contain logical sections of compute within a data center that are distinct from one another. These MCP data centers are enabled with another version of Unimog called Duomog, which allows for traffic to be shifted between logical sections of compute automatically. This makes MCP data centers fault-tolerant without sacrificing performance for our customers, and allows Traffic Manager to work within a data center as well as between data centers.

When evaluating portions of requests to move, Traffic Manager does the following:

  • Traffic Manager identifies the proportion of requests that need to be removed from a data center or subsection of a data center so that all requests can be served.
  • Traffic Manager then calculates the aggregated space metrics for each target to see how many requests each failover data center can take.
  • Traffic Manager then identifies how much traffic in each plan we need to move, and moves either a proportion of the plan, or all of the plan through Plurimog/Duomog, until we've moved enough traffic. We move Free customers first, and if there are no more Free customers in a data center, we'll move Pro, and then Business customers if needed.

For example, let’s look at Ashburn, Virginia: one of our MCPs. Ashburn has nine different subsections of capacity that can each take traffic. On 8/28, one of those subsections, IAD02, had an issue that reduced the amount of traffic it could handle.

During this time period, Duomog sent more traffic from IAD02 to other subsections within Ashburn, ensuring that Ashburn was always online, and that performance was not impacted during this issue. Then, once IAD02 was able to take traffic again, Duomog shifted traffic back automatically. You can see these actions visualized in the time series graph below, which tracks the percentage of traffic moved over time between subsections of capacity within IAD02 (shown in green):

How Cloudflare’s systems dynamically route traffic across the globe

How does Traffic Manager know how much to move?

Although we used requests per second in the example above, using requests per second as a metric isn’t accurate enough when actually moving traffic. The reason for this is that different customers have different resource costs to our service; a website served mainly from cache with the WAF deactivated is much cheaper CPU wise than a site with all WAF rules enabled and caching disabled. So we record the time that each request takes in the CPU. We can then aggregate the CPU time across each plan to find the CPU time usage per plan. We record the CPU time in ms, and take a per second value, resulting in a unit of milliseconds per second.

CPU time is an important metric because of the impact it can have on latency and customer performance. As an example, consider the time it takes for an eyeball request to make it entirely through the Cloudflare front line servers: we call this the cfcheck latency. If this number goes too high, then our customers will start to notice, and they will have a bad experience. When cfcheck latency gets high, it’s usually because CPU utilization is high. The graph below shows 95th percentile cfcheck latency plotted against CPU utilization across all the machines in the same data center, and you can see the strong correlation:

How Cloudflare’s systems dynamically route traffic across the globe

So having Traffic Manager look at CPU time in a data center is a very good way to ensure that we’re giving customers the best experience and not causing problems.

After getting the CPU time per plan, we need to figure out how much of that CPU time to move to other data centers. To do this, we aggregate the CPU utilization across all servers to give a single CPU utilization across the data center. If a proportion of servers in the data center fail, due to network device failure, power failure, etc., then the requests that were hitting those servers are automatically routed elsewhere within the data center by Duomog. As the number of servers decrease, the overall CPU utilization of the data center increases. Traffic Manager has three thresholds for each data center; the maximum threshold, the target threshold, and the acceptable threshold:

  • Maximum: the CPU level at which performance starts to degrade, where Traffic Manager will take action
  • Target: the level to which Traffic Manager will try to reduce the CPU utilization to restore optimal service to users
  • Acceptable: the level below which a data center can receive requests forwarded from another data center, or revert active moves

When a data center goes above the maximum threshold, Traffic Manager takes the ratio of total CPU time across all plans to current CPU utilization, then applies that to the target CPU utilization to find the target CPU time. Doing it this way means we can compare a data center with 100 servers to a data center with 10 servers, without having to worry about the number of servers in each data center. This assumes that load increases linearly, which is close enough to true for the assumption to be valid for our purposes.

Target ratio equals current ratio:

How Cloudflare’s systems dynamically route traffic across the globe

Therefore:

How Cloudflare’s systems dynamically route traffic across the globe

Subtracting the target CPU time from the current CPU time gives us the CPU time to move:

How Cloudflare’s systems dynamically route traffic across the globe

For example, if the current CPU utilization was at 90% across the data center, the target was 85%, and the CPU time across all plans was 18,000, we would have:

How Cloudflare’s systems dynamically route traffic across the globe

This would mean Traffic Manager would need to move 1,000 CPU time:

How Cloudflare’s systems dynamically route traffic across the globe

Now we know the total CPU time needed to move, we can go through the plans, until the required time to move has been met.

What is the maximum threshold?

A frequent problem that we faced was determining at which point Traffic Manager should start taking action in a data center – what metric should it watch, and what is an acceptable level?

As said before, different services have different requirements in terms of CPU utilization, and there are many cases of data centers that have very different utilization patterns.

To solve this problem, we turned to machine learning. We created a service that will automatically adjust the maximum thresholds for each data center according to customer-facing indicators. For our main service-level indicator (SLI), we decided to use the cfcheck latency metric we described earlier.

But we also need to define a service-level objective (SLO) in order for our machine learning application to be able to adjust the threshold. We set the SLO for 20ms. Comparing our SLO to our SLI, our 95th percentile cfcheck latency should never go above 20ms and if it does, we need to do something. The below graph shows 95th percentile cfcheck latency over time, and customers start to get unhappy when cfcheck latency goes into the red zone:

How Cloudflare’s systems dynamically route traffic across the globe

If customers have a bad experience when CPU gets too high, then the goal of Traffic Manager’s maximum thresholds are to ensure that customer performance isn’t impacted and to start redirecting traffic away before performance starts to degrade. At a scheduled interval the Traffic Manager service will fetch a number of metrics for each data center and apply a series of machine learning algorithms. After cleaning the data for outliers we apply a simple quadratic curve fit, and we are currently testing a linear regression algorithm.

After fitting the models we can use them to predict the CPU usage when the SLI is equal to our SLO, and then use it as our maximum threshold. If we plot the cpu values against the SLI we can see clearly why these methods work so well for our data centers, as you can see for Barcelona in the graphs below, which are plotted against curve fit and linear regression fit respectively.

How Cloudflare’s systems dynamically route traffic across the globe
How Cloudflare’s systems dynamically route traffic across the globe

In these charts the vertical line is the SLO, and the intersection of this line with the fitted model represents the value that will be used as the maximum threshold. This model has proved to be very accurate, and we are able to significantly reduce the SLO breaches. Let’s take a look at when we started deploying this service in Lisbon:

How Cloudflare’s systems dynamically route traffic across the globe

Before the change, cfcheck latency was constantly spiking, but Traffic Manager wasn’t taking actions because the maximum threshold was static. But after July 29, we see that cfcheck latency has never hit the SLO because we are constantly measuring to make sure that customers are never impacted by CPU increases.

Where to send the traffic?

So now that we have a maximum threshold, we need to find the third CPU utilization threshold which isn’t used when calculating how much traffic to move – the acceptable threshold. When a data center is below this threshold, it has unused capacity which, as long as it isn’t forwarding traffic itself, is made available for other data centers to use when required. To work out how much each data center is able to receive, we use the same methodology as above, substituting target for acceptable:

How Cloudflare’s systems dynamically route traffic across the globe

Therefore:

How Cloudflare’s systems dynamically route traffic across the globe

Subtracting the current CPU time from the acceptable CPU time gives us the amount of CPU time that a data center could accept:

How Cloudflare’s systems dynamically route traffic across the globe

To find where to send traffic, Traffic Manager will find the available CPU time in all data centers, then it will order them by latency from the data center needing to move traffic. It moves through each of the data centers, using all available capacity based on the maximum thresholds before moving onto the next. When finding which plans to move, we move from the lowest priority plan to highest, but when finding where to send them, we move in the opposite direction.

To make this clearer let's use an example:

We need to move 1,000 CPU time from data center A, and we have the following usage per plan: Free: 500ms/s, Pro: 400ms/s, Business: 200ms/s, Enterprise: 1000ms/s.

We would move 100% of Free (500ms/s), 100% of Pro (400ms/s), 50% of Business (100ms/s), and 0% of Enterprise.

Nearby data centers have the following available CPU time: B: 300ms/s, C: 300ms/s, D: 1,000ms/s.

With latencies: A-B: 100ms, A-C: 110ms, A-D: 120ms.

Starting with the lowest latency and highest priority plan that requires action, we would be able to move all the Business CPU time to data center B and half of Pro. Next we would move onto data center C, and be able to move the rest of Pro, and 20% of Free. The rest of Free could then be forwarded to data center D. Resulting in the following action: Business: 50% → B, Pro: 50% → B, 50% → C, Free: 20% → C, 80% → D.

Reverting actions

In the same way that Traffic Manager is constantly looking to keep data centers from going above the threshold, it is also looking to bring any forwarded traffic back into a data center that is actively forwarding traffic.

Above we saw how Traffic Manager works out how much traffic a data center is able to receive from another data center — it calls this the available CPU time. When there is an active move we use this available CPU time to bring back traffic to the data center — we always prioritize reverting an active move over accepting traffic from another data center.

When you put this all together, you get a system that is constantly measuring system and customer health metrics for every data center and spreading traffic around to make sure that each request can be served given the current state of our network. When we put all of these moves between data centers on a map, it looks something like this, a map of all Traffic Manager moves for a period of one hour. This map doesn’t show our full data center deployment, but it does show the data centers that are sending or receiving moved traffic during this period:

Data centers in red or yellow are under load and shifting traffic to other data centers until they become green, which means that all metrics are showing as healthy. The size of the circles represent how many requests are shifted from or to those data centers. Where the traffic is going is denoted by where the lines are moving. This is difficult to see at a world scale, so let’s zoom into the United States to see this in action for the same time period:

Here you can see Toronto, Detroit, New York, and Kansas City are unable to serve some requests due to hardware issues, so they will send those requests to Dallas, Chicago, and Ashburn until equilibrium is restored for users and data centers. Once data centers like Detroit are able to service all the requests they are receiving without needing to send traffic away, Detroit will gradually stop forwarding requests to Chicago until any issues in the data center are completely resolved, at which point it will no longer be forwarding anything. Throughout all of this, end users are online and are not impacted by any physical issues that may be happening in Detroit or any of the other locations sending traffic.

Happy network, happy products

Because Traffic Manager is plugged into the user experience, it is a fundamental component of the Cloudflare network: it keeps our products online and ensures that they’re as fast and reliable as they can be. It’s our real time load balancer, helping to keep our products fast by only shifting necessary traffic away from data centers that are having issues. Because less traffic gets moved, our products and services stay fast.

But Traffic Manager can also help keep our products online and reliable because they allow our products to predict where reliability issues may occur and preemptively move the products elsewhere. For example, Browser Isolation directly works with Traffic Manager to help ensure the uptime of the product. When you connect to a Cloudflare data center to create a hosted browser instance, Browser Isolation first asks Traffic Manager if the data center has enough capacity to run the instance locally, and if so, the instance is created right then and there. If there isn’t sufficient capacity available, Traffic Manager tells Browser Isolation which the closest data center with sufficient available capacity is, thereby helping Browser Isolation to provide the best possible experience for the user.

Happy network, happy users

At Cloudflare, we operate this huge network to service all of our different products and customer scenarios. We’ve built this network for resiliency: in addition to our MCP locations designed to reduce impact from a single failure, we are constantly shifting traffic around on our network in response to internal and external issues.

But that is our problem — not yours.

Similarly, when human beings had to fix those issues, it was customers and end users who would be impacted. To ensure that you’re always online, we’ve built a smart system that detects our hardware failures and preemptively balances traffic across our network to ensure it’s online and as fast as possible. This system works faster than any person — not only allowing our network engineers to sleep at night — but also providing a better, faster experience for all of our customers.

And finally: if these kinds of engineering challenges sound exciting to you, then please consider checking out the Traffic Engineering team's open position on Cloudflare’s Careers page!