Tag Archives: Developers

Introducing high-definition portrait video support for Cloudflare Stream

Post Syndicated from Alex Huang original https://blog.cloudflare.com/introducing-high-definition-portrait-video-support-for-cloudflare-stream


Cloudflare Stream is an end-to-end solution for video encoding, storage, delivery, and playback. Our focus has always been on simplifying all aspects of video for developers. This goal continues to motivate us as we introduce first-class portrait (vertical) video support today. Newly uploaded or ingested portrait videos will now automatically be processed in full HD quality.

Why portrait video

In the past few years, the popularity of portrait video has exploded, motivated by short-form video content applications such as TikTok or YouTube Shorts.  However, Cloudflare customers have been confused as to why their portrait videos appear to be lower quality when viewed on portrait-first devices such as smartphones. This is because our video encoding pipeline previously did not support high-quality portrait videos, leading them to be grainy and lower quality. This pain point has now been addressed with the introduction of high-definition portrait video.

The current stream pipeline

When you upload a video to Stream, it is first encoded into several different “renditions” (sizes or resolutions) before delivery. This is done in order to enable playback in a wide variety of network conditions, as well as to standardize the way a video is experienced. By using these adaptive bitrate renditions, we are able to offer viewers the highest quality streaming experience which fits their network bandwidth, meaning someone watching a video with a slow mobile connection would be served a 240p video (a resolution of 320×240 pixels) and receive the 1080p (a resolution of 1920×1080 pixels) version when they are watching at home on their fiber Internet connection. This encoding pipeline follows one of two different paths:

The first path is our video on-demand (VOD) encoding pipeline, which generates and stores a set of encoded video segments at each of our standard video resolutions. The other path is our on-the-fly encoding (OTFE) pipeline, which uses the same process as Stream Live to generate resolutions upon user request. Both pipelines work with the set of standard resolutions, which are identified through a constrained target (output) height. This means that we encode every rendition to heights of 240 pixels, 360 pixels, etc. up to 1080 pixels.

When originally conceived, this encoding pipeline was not designed with portrait video in mind because portrait video was less common. As a result, portrait videos were encoded with lower quality dimensions that were consistent with landscape video encoding. For example, a portrait HD video would have the dimensions 1920×1080 — scaling that down to the height of a landscape HD video would result in the much smaller output of 1080×606. However, current smartphones all have HD displays, making the discrepancy clear when a portrait video is viewed in portrait mode on a phone. With this new change to our encoding pipeline, all newly uploaded portrait videos will now be automatically encoded with constrained target width, using a new set of standard resolutions for portrait video. These resolutions are the same as the current set of landscape resolutions, but with the dimensions reversed: 240×426 up to 1080×1920.

Technical details

As the Stream intern this summer, I was tasked with this project, as well as the expectation of shipping a long-requested change, by the end of my internship. The first step in implementing this change was to familiarize myself with the complex architecture of Stream’s internal systems. After this, I began brainstorming a few different implementation decisions, like how to consistently track orientation through various stages of the pipeline. Following a group discussion to decide which choices would be the most scalable, least complex, and best for users, it was time to write the technical specification.

Due to the implementation method we chose, making this change involved tracing the life of a video from upload to delivery through both of our encoding pipelines and applying different logic for portrait videos. Previously, all video renditions were identified by their height at each stage of the pipeline, making certain parts of the pipeline completely agnostic to the orientation of a video. With the proposed changes, we would now be using the constraining dimension and orientation to identify a video rendition. Therefore, much of the work involved modifying the different portions of the pipeline to use these new parameters.

The first area of the pipeline to be modified was the Stream API service, which is the process which handles all Stream API calls. The API service enqueues the rendition encoding jobs for a video after it is uploaded, so it was necessary to introduce a new set of renditions designed for portrait videos, and enqueue the corresponding encoding jobs. The queuing system is handled by our in-house queue management system, which handles jobs generically and therefore did not require any changes.

Following this, I tackled the on-the-fly encoding pipeline. The area of interest here was the delivery portion of our pipeline, which generated the set of encoding resolutions to pass on to our on-the-fly encoder. Here I also introduced a new set of portrait renditions and the corresponding logic to encode them for portrait videos. This part of the backend is written and hosted on Cloudflare Workers, which made it very easy and quick to deploy and test changes.

Finally, we wanted to change how we presented these quality levels to users in the Stream built-in player and thought that using the new constrained dimension rather than always showing the height would feel more familiar. For portrait videos, we now display the size of the constraining dimension, which also means quality selection for portrait videos encoded under our old system now more accurately reflects their quality, too. As an example, a 9:16 portrait video would have been encoded to a maximum size of 608×1080 by the previous pipeline. Now, such a rendition will be marked as 608p rather than the full-quality 1080p, which would be a 1080×1920 rendition.

Stream as a whole is built on many of our own Developer Platform products, such as Workers for handling delivery, R2 for rendition storage, Workers AI for automatic captioning, and Durable Objects for bitrate observation, all of which enhance our ability to deploy and ship new updates quickly. Throughout my work on this project, I was able to see all of these pieces in action, as well as gain a new understanding of the powerful tools Cloudflare offers for developers.

Results and findings

After the change, portrait videos are now encoded to higher resolutions and visibly appear to be higher quality. To confirm these differences, I analyzed the effect of the pipeline change on four different sample videos using the peak-signal-to-noise ratio (PSNR, a mathematical representation of image quality). Since the old pipeline produced lower resolution videos, the comparison here is between an upscaled version of the old pipeline rendition and the current pipeline rendition. In the graph below, higher values reflect higher quality relative to the unencoded original video.

According to this metric, we see an increase in quality from the pipeline changes as high as 8%. However, the quality increase is most noticeable to the human eye in videos that feature fine details or a high amount of movement, which is not always captured in the PSNR. For example, compare a side-by-side of a frame from the book sample video encoded both ways:

The difference between the old and new encodings is most clear when zoomed in:

Here’s another example (sourced from Mixkit):

Magnified:

Of course, watching these example clips is the clearest way to see:

Maximize the Stream player and look at the quality selector (in the gear menu) to see the new quality level labels – select the highest option to compare. Note the improved sharpness of the text in the book sample as well as the improved detail in the hair and eye shadow of the hair and makeup sample.

Implementation challenges

Due to the complex nature of our encoding pipelines, there were several technical challenges to making a large change like this. Aside from simply uploading videos, many of the features we offer, like downloads or clipping, require tweaking to produce the correct video renditions. This involved modifying many parts of the encoding pipeline to ensure that portrait video logic was handled.

There were also some edge cases which were not caught until after release. One release of this feature contained a bug in the on-the-fly encoding logic which caused a subset of new portrait livestream renditions to have negative bitrates, making them unusable. This was due to an internal representation of video renditions’ constraining dimensions being mistakenly used for bitrate observation. We remedied this by increasing the scope of our end-to-end testing to include more portrait video tests and live recording interaction tests.

Another small bug caused downloading very small videos to sometimes fail. This was because for videos under 240p, our smallest encoding resolution, the non-constraining dimension was not being properly scaled to match the aspect ratio of the video, causing some specific combinations of dimensions to be interpreted as portrait when they should have been landscape, and vice versa. This bug was fixed quickly, but was not initially caught after the release since it required a very specific set of conditions to activate. After this experience, I added a few more unit tests involving small videos.

That’s a wrap

As my internship comes to a close, reflecting on the experience makes me grateful for all the team members who have helped me throughout this time. I am very glad to have shipped this project which addresses a long-standing concern and will have real-world customer impact. Support for high-definition portrait video is now available, and we will continue to make improvements to our video solutions suite. You can see the difference yourself by uploading a portrait video to Stream! Or, perhaps you’d like to help build a better Internet, too — our internship and early talent programs are a great way to jumpstart your own journey.

Sample video acknowledgements: The sample video of the book was created by the Stream Product Manager and shows the opening page of The Strange Wonder of Roots by Evan Griffith (HarperCollins). The hair and makeup fashion video was sourced from Mixkit, a great source of free media for video projects.

Meta Llama 3.1 now available on Workers AI

Post Syndicated from Michelle Chen original https://blog.cloudflare.com/meta-llama-3-1-available-on-workers-ai


At Cloudflare, we’re big supporters of the open-source community – and that extends to our approach for Workers AI models as well. Our strategy for our Cloudflare AI products is to provide a top-notch developer experience and toolkit that can help people build applications with open-source models.

We’re excited to be one of Meta’s launch partners to make their newest Llama 3.1 8B model available to all Workers AI users on Day 1. You can run their latest model by simply swapping out your model ID to @cf/meta/llama-3.1-8b-instruct or test out the model on our Workers AI Playground. Llama 3.1 8B is free to use on Workers AI until the model graduates out of beta.

Meta’s Llama collection of models have consistently shown high-quality performance in areas like general knowledge, steerability, math, tool use, and multilingual translation. Workers AI is excited to continue to distribute and serve the Llama collection of models on our serverless inference platform, powered by our globally distributed GPUs.

The Llama 3.1 model is particularly exciting, as it is released in a higher precision (bfloat16), incorporates function calling, and adds support across 8 languages. Having multilingual support built-in means that you can use Llama 3.1 to write prompts and receive responses directly in languages like English, French, German, Hindi, Italian, Portuguese, Spanish, and Thai. Expanding model understanding to more languages means that your applications have a bigger reach across the world, and it’s all possible with just one model.

const answer = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
    stream: true,
    messages: [{
        "role": "user",
        "content": "Qu'est-ce que ç'est verlan en français?"
    }],
});

Llama 3.1 also introduces native function calling (also known as tool calls) which allows LLMs to generate structured JSON outputs which can then be fed into different APIs. This means that function calling is supported out-of-the-box, without the need for a fine-tuned variant of Llama that specializes in tool use. Having this capability built-in means that you can use one model across various tasks.

Workers AI recently announced embedded function calling, which is now usable with Meta Llama 3.1 as well. Our embedded function calling gives developers a way to run their inference tasks far more efficiently than traditional architectures, leveraging Cloudflare Workers to reduce the number of requests that need to be made manually. It also makes use of our open-source ai-utils package, which helps you orchestrate the back-and-forth requests for function calling along with other helper methods that can automatically generate tool schemas. Below is an example function call to Llama 3.1 with embedded function calling that then stores key-values in Workers KV.

const response = await runWithTools(env.AI, "@cf/meta/llama-3.1-8b-instruct", {
    messages: [{ role: "user", content: "Greet the user and ask them a question" }],
    tools: [{
        name: "Store in memory",
        description: "Store everything that the user talks about in memory as a key-value pair.",
        parameters: {
            type: "object",
            properties: {
                key: {
                        type: "string",
                        description: "The key to store the value under.",
				},
                value: {
                        type: "string",
                        description: "The value to store.",
				},
            },
			required: ["key", "value"],
		},
        function: async ({ key, value }) => {
                await env.KV.put(key, value);

                return JSON.stringify({
                    success: true,
			});
		}
	}]
})

We’re excited to see what you build with these new capabilities. As always, use of the new model should be conducted with Meta’s Acceptable Use Policy and License in mind. Take a look at our developer documentation to get started!

Embedded function calling in Workers AI: easier, smarter, faster

Post Syndicated from Harley Turan original https://blog.cloudflare.com/embedded-function-calling


Introducing embedded function calling and a new ai-utils package

Today, we’re excited to announce a novel way to do function calling that co-locates LLM inference with function execution, and a new ai-utils package that upgrades the developer experience for function calling.

This is a follow-up to our mid-June announcement for traditional function calling, which allows you to leverage a Large Language Model (LLM) to intelligently generate structured outputs and pass them to an API call. Function calling has been largely adopted and standardized in the industry as a way for AI models to help perform actions on behalf of a user.

Our goal is to make building with AI as easy as possible, which is why we’re introducing a new @cloudflare/ai-utils npm package that allows developers to get started quickly with embedded function calling. These helper tools drastically simplify your workflow by actually executing your function code and dynamically generating tools from OpenAPI specs. We’ve also open-sourced our ai-utils package, which you can find on GitHub. With both embedded function calling and our ai-utils, you’re one step closer to creating intelligent AI agents, and from there, the possibilities are endless.

Why Cloudflare’s AI platform?

OpenAI has been the gold standard when it comes to having performant model inference and a great developer experience. However, they mostly support their closed-source models, while we want to also promote the open-source ecosystem of models. One of our goals with Workers AI is to match the developer experience you might get from OpenAI, but with open-source models.

There are other open-source inference providers out there like Azure or Bedrock, but most of them are focused on serving inference and the underlying infrastructure, rather than being a developer toolkit. While there are external libraries and frameworks like AI SDK that help developers build quickly with simple abstractions, they rely on upstream providers to do the actual inference. With Workers AI, it’s the best of both worlds – we offer open-source model inference and a killer developer experience out of the box.

With the release of embedded function calling and ai-utils today, we’ve advanced how we do inference for function calling and improved the developer experience by making it dead simple for developers to start building AI experiences.

How does traditional function calling work?

Traditional LLM function calling allows customers to specify a set of function names and required arguments along with a prompt when running inference on an LLM. The LLM returns the names and arguments for the functions that the customer can then make to perform actions. These actions give LLMs the ability to do things like fetch fresh data not present in the training dataset and “perform actions” based on user intent.

Traditional function calling requires multiple back-and-forth requests passing through the network in order to get to the final output. This includes requests to your origin server, an inference provider, and external APIs. As a developer, you have to orchestrate all the back-and-forths and handle all the requests and responses. If you were building complex agents with multi-tool calls or recursive tool calls, it gets infinitely harder. Fortunately, this doesn’t have to be the case, and we’ve solved it for you.

Embedded function calling

With Workers AI, our inference runtime is the Workers platform, and the Workers platform can be seen as a global compute network of distributed functions (RPCs). With this model, we can run inference using Workers AI, and supply not only the function names and arguments, but also the runtime function code to be executed. Rather than performing multiple round-trips across networks, the LLM inference and function can run in the same execution environment, cutting out all the unnecessary requests.

Cloudflare is one of the few inference providers that is able to do this because we offer more than just inference – our developer platform has compute, storage, inference, and more, all within the same Workers runtime.

We made it easy for you with a new ai-utils package

And to make it as simple as possible, we created a @cloudflare/ai-utils package that you can use to get started. These powerful abstractions cut down on the logic you have to implement to do function calling – it just works out of the box.

runWithTools

runWithTools is our method that you use to do embedded function calling. You pass in your AI binding (env.AI), model, prompt messages, and tools. The tools array includes the description of the function, similar to traditional function calling, but you also pass in the function code that needs to be executed. This method makes the inference calls and executes the function code in one single step. runWithTools is also able to handle multiple function calls, recursive tool calls, validation for model responses, streaming for the final response, and other features.

Another feature to call out is a helper method called autoTrimTools that automatically selects the relevant tools and trims the tools array based on the names and descriptions. We do this by adding an initial LLM inference call to intelligently trim the tools array before the actual function-calling inference call is made. We found that autoTrimTools helped decrease the number of total tokens used in the entire process (especially when there’s a large number of tools provided) because there’s significantly fewer input tokens used when generating the arguments list. You can choose to use autoTrimTools by setting it as a parameter in the runWithTools method.

const response = await runWithTools(env.AI,"@hf/nousresearch/hermes-2-pro-mistral-7b",
  {
    messages: [{ role: "user", content: "What's the weather in Austin, Texas?"}],
    tools: [
      {
        name: "getWeather",
        description: "Return the weather for a latitude and longitude",
        parameters: {
          type: "object",
          properties: {
            latitude: {
              type: "string",
              description: "The latitude for the given location"
            },
            longitude: {
              type: "string",
              description: "The longitude for the given location"
            }
          },
          required: ["latitude", "longitude"]
        },
	 // function code to be executed after tool call
        function: async ({ latitude, longitude }) => {
          const url = `https://api.weatherapi.com/v1/current.json?key=${env.WEATHERAPI_TOKEN}&q=${latitude},${longitude}`
          const res = await fetch(url).then((res) => res.json())

          return JSON.stringify(res)
        }
      }
    ]
  },
  {
    streamFinalResponse: true,
    maxRecursiveToolRuns: 5,
    trimFunction: autoTrimTools,
    verbose: true,
    strictValidation: true
  }
)

createToolsFromOpenAPISpec

For many use cases, users will need to make a request to an external API call during function calling to get the output needed. Instead of having to hardcode the exact API endpoints in your tools array, we made a helper function that takes in an OpenAPI spec and dynamically generates the corresponding tool schemas and API endpoints you’ll need for the function call. You call createToolsFromOpenAPISpec from within runWithTools and it’ll dynamically populate everything for you.

const response = await runWithTools(env.AI, "@hf/nousresearch/hermes-2-pro-mistral-7b", {
  messages: [{ role: "user",content: "Can you name me 5 repos created by Cloudflare"}],
  tools: [
    ...(await createToolsFromOpenAPISpec(  "https://raw.githubusercontent.com/github/rest-api-description/main/descriptions-next/api.github.com/api.github.com.json"
    ))
  ]
})

Putting it all together

When you make a function calling inference request with runWithTools and createToolsFromOpenAPISpec, the only thing you need is the prompts – the rest is automatically handled. The LLM will choose the correct tool based on the prompt, the runtime will execute the function needed, and you’ll get a fast, intelligent response from the model. By leveraging our Workers runtime’s bindings and RPC calls along with our global network, we can execute everything from a single location close to the user, enabling developers to easily write complex agentic chains with fewer lines of code.

We’re super excited to help people build intelligent AI systems with our new embedded function calling and powerful tools. Check out our developer docs on how to get started, and let us know what you think on Discord.

Introducing Stream Generated Captions, powered by Workers AI

Post Syndicated from Mickie Betz original https://blog.cloudflare.com/stream-automatic-captions-with-ai


With one click, customers can now generate video captions effortlessly using Stream’s newest feature: AI-generated captions for on-demand videos and recordings of live streams. As part of Cloudflare’s mission to help build a better Internet, this feature is available to all Stream customers at no additional cost.

This solution is designed for simplicity, eliminating the need for third-party transcription services and complex workflows. For videos lacking accessibility features like captions, manual transcription can be time-consuming and impractical, especially for large video libraries. Traditionally, it has involved specialized services, sometimes even dedicated teams, to transcribe audio and deliver the text along with video, so it can be displayed during playback. As captions become more widely expected for a variety of reasons, including ethical obligation, legal compliance, and changing audience preferences, we wanted to relieve this burden.

With Stream’s integrated solution, the caption generation process is seamlessly integrated into your existing video management workflow, saving time and resources. Regardless of when you uploaded a video, you can easily add automatic captions to enhance accessibility. Captions can now be generated within the Cloudflare Dashboard or via an API request, all within the familiar and unified Stream platform.

This feature is designed with utmost consideration for privacy and data protection. Unlike other third-party transcription services that may share content with external entities, your data remains securely within Cloudflare’s ecosystem throughout the caption generation process. Cloudflare does not utilize your content for model training purposes. For more information about data protection, review Your Data and Workers AI.

Getting Started

Starting June 20th, 2024, this beta is available for all Stream customers as well as subscribers of the Professional and Business plans, which include 100 minutes of video storage.

To get started, upload a video to Stream (from the Cloudflare Dashboard or via API).

Next, navigate to the “Captions” tab on the video, click “Add Captions,” then select the language and “Generate captions with AI.” Finally, click save and within a few minutes, the new captions will be visible in the captions manager and automatically available in the player, too. Captions can also be generated via the API.

Captions are usually generated in a few minutes. When captions are ready, the Stream player will automatically be updated to offer them to users. The HLS and DASH manifests are also updated so third party players that support text tracks can display them as well.

On-demand videos and recordings of live streams, regardless of when they were created, are supported. While in beta, only English captions can be generated, and videos must be shorter than 2 hours. The quality of the transcription is best on videos with clear speech and minimal background noise.

We’ve been pleased with how well the AI model transcribes different types of content during our tests. That said, there are times when the results aren’t perfect, and another method might work better for some use cases. It’s important to check if the accuracy of the generated captions are right for your needs.

Technical Details

Built using Workers AI

The Stream engineering team built this new feature using Workers AI, allowing us to access the Whisper model – an open source Automatic Speech Recognition model – with a single API call. Using Workers AI radically simplified the AI model deployment, integration, and scaling with an out-of-the-box solution. We eliminated the need for our team to handle infrastructure complexities, enabling us to focus solely on building the automated captions feature.

Writing software that utilizes an AI model can involve several challenges. First, there’s the difficulty of configuring the appropriate hardware infrastructure. AI models require substantial computational resources to run efficiently and require specialized hardware, like GPUs, which can be expensive and complex to manage. There’s also the daunting task of deploying AI models at scale, which involve the complexities of balancing workload distribution, minimizing latency, optimizing throughput, and maintaining high availability. Not only does Workers AI solve the pain of managing underlying infrastructure, it also automatically scales as needed.

Using Workers AI transformed a daunting task into a Worker that transcribes audio files with less than 30 lines of code.

import { Ai } from '@cloudflare/ai'


export interface Env {
 AI: any
}


export type AiVTTOutput = {
 vtt?: string
}


export default {
 async fetch(request: Request, env: Env) {
   const blob = await request.arrayBuffer()


   const ai = new Ai(env.AI)
   const input = {
     audio: [...new Uint8Array(blob)],
   }


   try {
     const response: AiVTTOutput = (await ai.run(
       '@cf/openai/whisper-tiny-en',
       input
     )) as any
     return Response.json({ vtt: response.vtt })
   } catch (e) {
     const errMsg =
       e instanceof Error
         ? `${e.name}\n${e.message}\n${e.stack}`
         : 'unknown error type'
     return new Response(`${errMsg}`, {
       status: 500,
       statusText: 'Internal error',
     })
   }
 },
}

Quickly captioning videos at scale

The Stream team wanted to ensure this feature is fast and performant at scale,   which required engineering work to process a high volume of videos regardless of duration.

First, our team needed to pre-process the audio prior to running AI inference to ensure the input is compatible with Whisper’s input format and requirements.

There is a wide spectrum of variability in video content, from a short grainy video filmed on a phone to a multi-hour high-quality Hollywood-produced movie. Videos may be silent or contain an action-driven cacophony. Also, Stream’s on-demand videos include recordings of live streams which are packaged differently from videos uploaded as whole files. With this variability, the audio inputs are stored in an array of different container formats, with different durations, and different file sizes. We ensured our audio files were properly formatted to be compatible with Whisper’s requirements.

One aspect for pre-processing is ensuring files are a sensible duration for optimized inference.  Whisper has an “sweet spot” of 30 seconds for the duration of audio files for transcription. As they note in this Github discussion: “Too short, and you’d lack surrounding context. You’d cut sentences more often. A lot of sentences would cease to make sense. Too long, and you’ll need larger and larger models to contain the complexity of the meaning you want the model to keep track of.” Fortunately, Stream already splits videos into smaller segments to ensure fast delivery during playback on the web. We wrote functionality to concatenate those small segments into 30-second batches prior to sending to Workers AI.

To optimize processing speed, our team parallelized as many operations as possible. By concurrently creating the 30-second audio batches and sending requests to Workers AI, we take full advantage of the scalability of the Workers AI platform. Doing this greatly reduces the time it takes to generate captions, but adds some additional complexity. Because we are sending requests to Workers AI in parallel, transcription responses may arrive out-of-order. For example, if a video is one minute in duration, the request to generate captions for the second 30 seconds of a video may complete before the request for the first 30 seconds of the video. The captions need to be sequential to align with the video, so our team had to maintain an understanding of the audio batch order to ensure our final combined WebVTT caption file is properly synced with the video. We sort the incoming Workers AI responses and re-order timestamps for a final accurate transcript.

The end result is the ability to generate captions for longer videos quickly and efficiently at scale.

Try it now

We are excited to bring this feature to open beta for all of our subscribers as well as Pro and Business plan customers today! Get started by uploading a video to Stream. Review our documentation for tutorials and current beta limitations. Up next, we will be focused on adding more languages and supporting longer videos.

AI Gateway is generally available: a unified interface for managing and scaling your generative AI workloads

Post Syndicated from Kathy Liao original https://blog.cloudflare.com/ai-gateway-is-generally-available


During Developer Week in April 2024, we announced General Availability of Workers AI, and today, we are excited to announce that AI Gateway is Generally Available as well. Since its launch to beta in September 2023 during Birthday Week, we’ve proxied over 500 million requests and are now prepared for you to use it in production.

AI Gateway is an AI ops platform that offers a unified interface for managing and scaling your generative AI workloads. At its core, it acts as a proxy between your service and your inference provider(s), regardless of where your model runs. With a single line of code, you can unlock a set of powerful features focused on performance, security, reliability, and observability – think of it as your control plane for your AI ops. And this is just the beginning – we have a roadmap full of exciting features planned for the near future, making AI Gateway the tool for any organization looking to get more out of their AI workloads.

Why add a proxy and why Cloudflare?

The AI space moves fast, and it seems like every day there is a new model, provider, or framework. Given this high rate of change, it’s hard to keep track, especially if you’re using more than one model or provider. And that’s one of the driving factors behind launching AI Gateway – we want to provide you with a single consistent control plane for all your models and tools, even if they change tomorrow, and then again the day after that.

We’ve talked to a lot of developers and organizations building AI applications, and one thing is clear: they want more observability, control, and tooling around their AI ops. This is something many of the AI providers are lacking as they are deeply focused on model development and less so on platform features.

Why choose Cloudflare for your AI Gateway? Well, in some ways, it feels like a natural fit. We’ve spent the last 10+ years helping build a better Internet by running one of the largest global networks, helping customers around the world with performance, reliability, and security – Cloudflare is used as a reverse proxy by nearly 20% of all websites. With our expertise, it felt like a natural progression – change one line of code, and we can help with observability, reliability, and control for your AI applications – all in one control plane – so that you can get back to building.

Here is that one line code change using the OpenAI JS SDK. And check out our docs to reference other providers, SDKs, and languages.

import OpenAI from 'openai';

const openai = new OpenAI({
apiKey: 'my api key', // defaults to process.env["OPENAI_API_KEY"]
	baseURL: "https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_slug}/openai"
});

What’s included today?

After talking to customers, it was clear that we needed to focus on some foundational features before moving onto some of the more advanced ones. While we’re really excited about what’s to come, here are the key features available in GA today:

Analytics: Aggregate metrics from across multiple providers. See traffic patterns and usage including the number of requests, tokens, and costs over time.

Real-time logs: Gain insight into requests and errors as you build.

Caching: Enable custom caching rules and use Cloudflare’s cache for repeat requests instead of hitting the original model provider API, helping you save on cost and latency.

Rate limiting: Control how your application scales by limiting the number of requests your application receives to control costs or prevent abuse.

Support for your favorite providers: AI Gateway now natively supports Workers AI plus 10 of the most popular providers, including Groq and Cohere as of mid-May 2024.

Universal endpoint: In case of errors, improve resilience by defining request fallbacks to another model or inference provider.

curl https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_slug} -X POST \
  --header 'Content-Type: application/json' \
  --data '[
  {
    "provider": "workers-ai",
    "endpoint": "@cf/meta/llama-2-7b-chat-int8",
    "headers": {
      "Authorization": "Bearer {cloudflare_token}",
      "Content-Type": "application/json"
    },
    "query": {
      "messages": [
        {
          "role": "system",
          "content": "You are a friendly assistant"
        },
        {
          "role": "user",
          "content": "What is Cloudflare?"
        }
      ]
    }
  },
  {
    "provider": "openai",
    "endpoint": "chat/completions",
    "headers": {
      "Authorization": "Bearer {open_ai_token}",
      "Content-Type": "application/json"
    },
    "query": {
      "model": "gpt-3.5-turbo",
      "stream": true,
      "messages": [
        {
          "role": "user",
          "content": "What is Cloudflare?"
        }
      ]
    }
  }
]'

What’s coming up?

We’ve gotten a lot of feedback from developers, and there are some obvious things on the horizon such as persistent logs and custom metadata – foundational features that will help unlock the real magic down the road.

But let’s take a step back for a moment and share our vision. At Cloudflare, we believe our platform is much more powerful as a unified whole than as a collection of individual parts. This mindset applied to our AI products means that they should be easy to use, combine, and run in harmony.

Let’s imagine the following journey. You initially onboard onto Workers AI to run inference with the latest open source models. Next, you enable AI Gateway to gain better visibility and control, and start storing persistent logs. Then you want to start tuning your inference results, so you leverage your persistent logs, our prompt management tools, and our built in eval functionality. Now you’re making analytical decisions to improve your inference results. With each data driven improvement, you want more. So you implement our feedback API which helps annotate inputs/outputs, in essence building a structured data set. At this point, you are one step away from a one-click fine tune that can be deployed instantly to our global network, and it doesn’t stop there. As you continue to collect logs and feedback, you can continuously rebuild your fine tune adapters in order to deliver the best results to your end users.

This is all just an aspirational story at this point, but this is how we envision the future of AI Gateway and our AI suite as a whole. You should be able to start with the most basic setup and gradually progress into more advanced workflows, all without leaving Cloudflare’s AI platform. In the end, it might not look exactly as described above, but you can be sure that we are committed to providing the best AI ops tools to help make Cloudflare the best place for AI.

How do I get started?

AI Gateway is available to use today on all plans. If you haven’t yet used AI Gateway, check out our developer documentation and get started now. AI Gateway’s core features available today are offered for free, and all it takes is a Cloudflare account and one line of code to get started. In the future, more premium features, such as persistent logging and secrets management will be available subject to fees. If you have any questions, reach out on our Discord channel.

New Consent and Bot Management features for Cloudflare Zaraz

Post Syndicated from Yo'av Moshe original https://blog.cloudflare.com/new-consent-and-bot-management-features-for-cloudflare-zaraz


Managing consent online can be challenging. After you’ve figured out the necessary regulations, you usually need to configure some Consent Management Platform (CMP) to load all third-party tools and scripts on your website in a way that respects these demands. Cloudflare Zaraz manages the loading of all of these third-party tools, so it was only natural that in April 2023 we announced the Cloudflare Zaraz CMP: the simplest way to manage consent in a way that seamlessly integrates with your third-party tools manager.

As more and more third-party tool vendors are required to handle consent properly, our CMP has evolved to integrate with these new technologies and standardization efforts. Today, we’re happy to announce that the Cloudflare Zaraz CMP is now compatible with the Interactive Advertising Bureau Transparency and Consent Framework (IAB TCF) requirements, and fully supports Google’s Consent Mode v2 signals. Separately, we’ve taken efforts to improve the way Cloudflare Zaraz handles traffic coming from online bots.

IAB TCF Compatibility

Earlier this year, Google announced that websites that would like to use AdSense and other advertising solutions in the European Economic Area (EEA), the UK, and Switzerland, will be required to use a CMP that is approved by IAB Europe, an association for digital marketing and advertising. Their Transparency and Consent Framework sets guidelines for how CMPs should operate. Since March 2024, the Cloudflare Zaraz CMP is compliant with these guidelines, and Zaraz users in Europe can use Google’s advertising products without any restrictions.

Since the IAB TCF requirements can make the consent modal a little complex for users, we have made this compliance mode an opt-in feature. See the official documentation for information on how to enable it.

Google Consent Mode v2

Another new requirement from Google was the need to send “Consent Signals”. These signals are part of what is also known as “Consent Mode”, and later, Consent Mode v2. Together with each event sent to Google Analytics and Google Ads, they tell the Google servers about the consent status of the current visitor – did they agree to have their data used for personalized advertising? Did they accept cookies? These and other questions are answered by Consent Mode v2, telling the Google servers how to treat the data it receives.

Consent Mode v2 usually requires setting two values for each consent category – a default value and an updated one. The default value represents the consent status (granted or denied) a certain category (e.g. using cookies) has before the user has submitted their personal preferences. Usually, and especially within the EU, the default value would be `denied`. Once the user submits their preferences, Consent Mode v2 sends an additional “updated” value that represents the choice the user made.

Implementing Consent Mode v2 is quick and easy with Cloudflare Zaraz, although the specific implementation depends on your CMP. Examples, including integration with the Cloudflare Zaraz CMP, are available in our official documentation.

We believe that better standardization around online consent benefits everyone, and we are glad to be working on tools that respect users’ privacy and improve online user experience.

Bot Management

We also recently integrated better Bot Management support within Cloudflare Zaraz. You often want crawlers to be able to access your website, but you don’t want them to trigger your analytics and conversion pixels. Using the Bot Management feature in the Cloudflare Zaraz Settings page allows you to fine tune which requests will make it to Cloudflare Zaraz and which ones will be skipped. Since Zaraz pricing is based on the total number of Zaraz Events, this can also be useful if you want more control over your Cloudflare Zaraz costs, ensuring you will not be paying for events triggered by bots. Like all other Cloudflare Zaraz features, these new features are also available to users on all plans, including the free plan. For us, it is part of making sure that everyone can benefit from a faster, safer, and more private way to manage third parties online. If you haven’t started using Cloudflare Zaraz already, now is a great time. Go to the Cloudflare dashboard and set it up in just a few clicks.

Using Fortran on Cloudflare Workers

Post Syndicated from John Graham-Cumming original https://blog.cloudflare.com/using-fortran-on-cloudflare-workers


In April 2020, we blogged about how to get COBOL running on Cloudflare Workers by compiling to WebAssembly. The ecosystem around WebAssembly has grown significantly since then, and it has become a solid foundation for all types of projects, be they client-side or server-side.

As WebAssembly support has grown, more and more languages are able to compile to WebAssembly for execution on servers and in browsers. As Cloudflare Workers uses the V8 engine and supports WebAssembly natively, we’re able to support languages that compile to WebAssembly on the platform.

Recently, work on LLVM has enabled Fortran to compile to WebAssembly. So, today, we’re writing about running Fortran code on Cloudflare Workers.

Before we dive into how to do this, here’s a little demonstration of number recognition in Fortran. Draw a number from 0 to 9 and Fortran code running somewhere on Cloudflare’s network will predict the number you drew.

Try yourself on handwritten-digit-classifier.fortran.demos.cloudflare.com.

This is taken from the wonderful Fortran on WebAssembly post but instead of running client-side, the Fortran code is running on Cloudflare Workers. Read on to find out how you can use Fortran on Cloudflare Workers and how that demonstration works.

Wait, Fortran? No one uses that!

Not so fast! Or rather, actually pretty darn fast if you’re doing a lot of numerical programming or have scientific data to work with. Fortran (originally FORmula TRANslator) is very well suited for scientific workloads because of its native functionality for things like arithmetic and handling large arrays and matrices.

If you look at the ranking of the fastest supercomputers in the world you’ll discover that the measurement of “fast” is based on these supercomputers running a piece of software called LINPACK that was originally written in Fortran. LINPACK is designed to help with problems solvable using linear algebra.

The LINPACK benchmarks use LINPACK to solve an n x n system of linear equations using matrix operations and, in doing so, determine how fast supercomputers are. The code is available in Fortran, C and Java.

A related Fortran package, BLAS, also does linear algebra and forms the basis of the number identifying code above. But other Fortran packages are still relevant. Back in 2017, NASA ran a competition to make FUN3D (used to perform calculations of airflow over simulated aircraft). FUN3D is written in Fortran.

So, although Fortran (or at the time FORTRAN) first came to life in 1957, it’s alive and well and being used widely for scientific applications (there’s even Fortran for CUDA). One particular application left Earth 20 years after Fortran was born: Voyager. The Voyager probes use a combination of assembly language and Fortran to keep chugging along.

But back in our solar system, and back on Region: Earth, you can now use Fortran on Cloudflare Workers. Here’s how.

How to get your Fortran code running on Cloudflare Workers

To make it easy to run your Fortran code on Cloudflare Workers, we created a tool called Fortiche (translates to smart in French). It uses Flang and Emscripten under the hood.

Flang is a frontend in LLVM and, if you read the Fortran on WebAssembly blog post, we currently have to patch LLVM to work around a few issues.

Emscripten is used to compile LLVM output and produce code that is compatible with Cloudflare Workers.

This is all packaged in the Fortiche Docker image. Let’s see a simple example.

add.f90:

SUBROUTINE add(a, b, res)
    INTEGER, INTENT(IN) :: a, b
    INTEGER, INTENT(OUT) :: res

    res = a + b
END

Here we defined a subroutine called add that takes a and b, sums them together and places the result in res.

Compile with Fortiche:

docker run -v $PWD:/input -v $PWD/output:/output xtuc/fortiche --export-func=add add.f90

Passing --export-func=add to Fortiche makes the Fortran add subroutine available to JavaScript.

The output folder contains the compiled WebAssembly module and JavaScript from Emscripten, and a JavaScript endpoint generated by Fortiche:

$ ls -lh ./output
total 84K
-rw-r--r-- 1 root root 392 avril 22 12:00 index.mjs
-rw-r--r-- 1 root root 27K avril 22 12:00 out.mjs
-rwxr-xr-x 1 root root 49K avril 22 12:00 out.wasm

And finally the Cloudflare Worker:

// Import what Fortiche generated
import {load} from "../output/index.mjs"

export default {
    async fetch(request: Request): Promise<Response> {
        // Load the Fortran program
        const program = await load();

        // Allocate space in memory for the arguments and result
        const aPtr = program.malloc(4);
        const bPtr = program.malloc(4);
        const outPtr = program.malloc(4);

        // Set argument values
        program.HEAP32[aPtr / 4] = 123;
        program.HEAP32[bPtr / 4] = 321;

        // Run the Fortran add subroutine
        program.add(aPtr, bPtr, outPtr);

        // Read the result
        const res = program.HEAP32[outPtr / 4];

        // Free everything
        program.free(aPtr);
        program.free(bPtr);
        program.free(outPtr);

        return Response.json({ res });
    },
};

Interestingly, the values we pass to Fortran are all pointers, therefore we have to allocate space for each argument and result (the Fortran integer type is four bytes wide), and pass the pointers to the `add` subroutine.

Running the Worker gives us the right answer:

$ curl https://fortran-add.cfdemos.workers.dev

{"res":444}

You can find the full example here.

Handwritten digit classifier

This example is taken from https://gws.phd/posts/fortran_wasm/#mnist. It relies on the BLAS library, which is available in Fortiche with the flag: --with-BLAS-3-12-0.

Note that the LAPACK library is also available in Fortiche with the flag: --with-LAPACK-3-12-0.

You can try on https://handwritten-digit-classifier.fortran.demos.cloudflare.com and find the source code here.

Let us know what you write using Fortran and Cloudflare Workers!

Meta Llama 3 available on Cloudflare Workers AI

Post Syndicated from Michelle Chen original https://blog.cloudflare.com/meta-llama-3-available-on-cloudflare-workers-ai


Workers AI

Workers AI’s initial launch in beta included support for Llama 2, as it was one of the most requested open source models from the developer community. Since that initial launch, we’ve seen developers build all kinds of innovative applications including knowledge sharing chatbots, creative content generation, and automation for various workflows.  

At Cloudflare, we know developers want simplicity and flexibility, with the ability to build with multiple AI models while optimizing for accuracy, performance, and cost, among other factors. Our goal is to make it as easy as possible for developers to use their models of choice without having to worry about the complexities of hosting or deploying models.

As soon as we learned about the development of Llama 3 from our partners at Meta, we knew developers would want to start building with it as quickly as possible. Workers AI’s serverless inference platform makes it extremely easy and cost effective to start using the latest large language models (LLMs). Meta’s commitment to developing and growing an open AI-ecosystem makes it possible for customers of all sizes to use AI at scale in production. All it takes is a few lines of code to run inference to Llama 3:

export interface Env {
  // If you set another name in wrangler.toml as the value for 'binding',
  // replace "AI" with the variable name you defined.
  AI: any;
}

export default {
  async fetch(request: Request, env: Env) {
    const response = await env.AI.run('@cf/meta/llama-3-8b-instruct', {
        messages: [
{role: "user", content: "What is the origin of the phrase Hello, World?"}
	 ]
      }
    );

    return new Response(JSON.stringify(response));
  },
};

Built with Meta Llama 3

Llama 3 offers leading performance on a wide range of industry benchmarks. You can learn more about the architecture and improvements on Meta’s blog post. Cloudflare Workers AI supports Llama 3 8B, including the instruction fine-tuned model.

Meta’s testing shows that Llama 3 is the most advanced open LLM today on evaluation benchmarks such as MMLU, GPQA, HumanEval, GSM-8K, and MATH. Llama 3 was trained on an increased number of training tokens (15T), allowing the model to have a better grasp on language intricacies. Larger context windows doubles the capacity of Llama 2, and allows the model to better understand lengthy passages with rich contextual data. Although the model supports a context window of 8k, we currently only support 2.8k but are looking to support 8k context windows through quantized models soon. As well, the new model introduces an efficient new tiktoken-based tokenizer with a vocabulary of 128k tokens, encoding more characters per token, and achieving better performance on English and multilingual benchmarks. This means that there are 4 times as many parameters in the embedding and output layers, making the model larger than the previous Llama 2 generation of models.

Under the hood, Llama 3 uses grouped-query attention (GQA), which improves inference efficiency for longer sequences and also renders their 8B model architecturally equivalent to Mistral-7B. For tokenization, it uses byte-level byte-pair encoding (BPE), similar to OpenAI’s GPT tokenizers. This allows tokens to represent any arbitrary byte sequence — even those without a valid utf-8 encoding. This makes the end-to-end model much more flexible in its representation of language, and leads to improved performance.

Along with the base Llama 3 models, Meta has released a suite of offerings with tools such as Llama Guard 2, Code Shield, and CyberSec Eval 2, which we are hoping to release on our Workers AI platform shortly.

Try it out now

Meta Llama 3 8B is available in the Workers AI Model Catalog today! Check out the documentation here and as always if you want to share your experiences or learn more, join us in the Developer Discord.

Developer Week 2024 wrap-up

Post Syndicated from Phillip Jones original https://blog.cloudflare.com/developer-week-2024-wrap-up


Developer Week 2024 has officially come to a close. Each day last week, we shipped new products and functionality geared towards giving developers the components they need to build full-stack applications on Cloudflare.

Even though Developer Week is now over, we are continuing to innovate with the over two million developers who build on our platform. Building a platform is only as exciting as seeing what developers build on it. Before we dive into a recap of the announcements, to send off the week, we wanted to share how a couple of companies are using Cloudflare to power their applications:

We have been using Workers for image delivery using R2 and have been able to maintain stable operations for a year after implementation. The speed of deployment and the flexibility of detailed configurations have greatly reduced the time and effort required for traditional server management. In particular, we have seen a noticeable cost savings and are deeply appreciative of the support we have received from Cloudflare Workers.
FAN Communications

Milkshake helps creators, influencers, and business owners create engaging web pages directly from their phone, to simply and creatively promote their projects and passions. Cloudflare has helped us migrate data quickly and affordably with R2. We use Workers as a routing layer between our users’ websites and their images and assets, and to build a personalized analytics offering affordably. Cloudflare’s innovations have consistently allowed us to run infrastructure at a fraction of the cost of other developer platforms and we have been eagerly awaiting updates to D1 and Queues to sustainably scale Milkshake as the product continues to grow.
Milkshake

In case you missed anything, here’s a quick recap of the announcements and in-depth technical explorations that went out last week:

Summary of announcements

Monday

Announcement Summary
Making state easy with D1 GA, Hyperdrive, Queues and Workers Analytics Engine updates A core part of any full-stack application is storing and persisting data! We kicked off the week with announcements that help developers build stateful applications on top of Cloudflare, including making D1, Cloudflare’s SQL database and Hyperdrive, our database accelerating service, generally available.
Building D1: a Global Database D1, Cloudflare’s SQL database, is now generally available. With new support for 10GB databases, data export, and enhanced query debugging, we empower developers to build production-ready applications with D1 to meet all their relational SQL needs. To support Workers in global applications, we’re sharing a sneak peek of our design and API for D1 global read replication to demonstrate how developers scale their workloads with D1.
Why Workers environment variables contain live objects Bindings don’t just reduce boilerplate. They are a core design feature of the Workers platform which simultaneously improve developer experience and application security in several ways. Usually these two goals are in opposition to each other, but bindings elegantly solve for both at the same time.

Tuesday

Announcement Summary
Leveling up Workers AI: General Availability and more new capabilities We made a series of AI-related announcements, including Workers AI, Cloudflare’s inference platform becoming GA, support for fine-tuned models with LoRAs, one-click deploys from HuggingFace, Python support for Cloudflare Workers, and more.
Running fine-tuned models on Workers AI with LoRAs Workers AI now supports fine-tuned models using LoRAs. But what is a LoRA and how does it work? In this post, we dive into fine-tuning, LoRAs and even some math to share the details of how it all works under the hood.
Bringing Python to Workers using Pyodide and WebAssembly We introduced Python support for Cloudflare Workers, now in open beta. We’ve revamped our systems to support Python, from the Workers runtime itself to the way Workers are deployed to Cloudflare’s network. Learn about a Python Worker’s lifecycle, Pyodide, dynamic linking, and memory snapshots in this post.

Wednesday

Announcement Summary
R2 adds event notifications, support for migrations from Google Cloud Storage, and an infrequent access storage tier We announced three new features for Cloudflare R2: event notifications, support for migrations from Google Cloud Storage, and an infrequent access storage tier.
Data Anywhere with Pipelines, Event Notifications, and Workflows We’re making it easier to build scalable, reliable, data-driven applications on top of our global network, and so we announced a new Event Notifications framework; our take on durable execution, Workflows; and an upcoming streaming ingestion service, Pipelines.
Improving Cloudflare Workers and D1 developer experience with Prisma ORM Together, Cloudflare and Prisma make it easier than ever to deploy globally available apps with a focus on developer experience. To further that goal, Prisma ORM now natively supports Cloudflare Workers and D1 in Preview. With version 5.12.0 of Prisma ORM you can now interact with your data stored in D1 from your Cloudflare Workers with the convenience of the Prisma Client API. Learn more and try it out now.
How Picsart leverages Cloudflare’s Developer Platform to build globally performant services Picsart, one of the world’s largest digital creation platforms, encountered performance challenges in catering to its global audience. Adopting Cloudflare’s global-by-default Developer Platform emerged as the optimal solution, empowering Picsart to enhance performance and scalability substantially.

Thursday

Announcement Summary
Announcing Pages support for monorepos, wrangler.toml, database integrations and more! We launched four improvements to Pages that bring functionality previously restricted to Workers, with the goal of unifying the development experience between the two. Support for monorepos, wrangler.toml, new additions to Next.js support and database integrations!
New tools for production safety — Gradual Deployments, Stack Traces, Rate Limiting, and API SDKs Production readiness isn’t just about scale and reliability of the services you build with. We announced five updates that put more power in your hands – Gradual Deployments, Source mapped stack traces in Tail Workers, a new Rate Limiting API, brand-new API SDKs, and updates to Durable Objects – each built with mission-critical production services in mind.
What’s new with Cloudflare Media: updates for Calls, Stream, and Images With Cloudflare Calls in open beta, you can build real-time, serverless video and audio applications. Cloudflare Stream lets your viewers instantly clip from ongoing streams. Finally, Cloudflare Images now supports automatic face cropping and has an upload widget that lets you easily integrate into your application.
Cloudflare Calls: millions of cascading trees all the way down Cloudflare Calls is a serverless SFU and TURN service running at Cloudflare’s edge. It’s now in open beta and costs $0.05/ real-time GB. It’s 100% anycast WebRTC.

Friday

Announcement Summary
Browser Rendering API GA, rolling out Cloudflare Snippets, SWR, and bringing Workers for Platforms to all users Browser Rendering API is now available to all paid Workers customers with improved session management.
Cloudflare acquires Baselime to expand serverless application observability capabilities We announced that Cloudflare has acquired Baselime, a serverless observability company.
Cloudflare acquires PartyKit to allow developers to build real-time multi-user applications We announced that PartyKit, a trailblazer in enabling developers to craft ambitious real-time, collaborative, multiplayer applications, is now a part of Cloudflare. This acquisition marks a significant milestone in our journey to redefine the boundaries of serverless computing, making it more dynamic, interactive, and, importantly, stateful.
Blazing fast development with full-stack frameworks and Cloudflare Full-stack web development with Cloudflare is now faster and easier! You can now use your framework’s development server while accessing D1 databases, R2 object stores, AI models, and more. Iterate locally in milliseconds to build sophisticated web apps that run on Cloudflare. Let’s dev together!
We’ve added JavaScript-native RPC to Cloudflare Workers Cloudflare Workers now features a built-in RPC (Remote Procedure Call) system for use in Worker-to-Worker and Worker-to-Durable Object communication, with absolutely minimal boilerplate. We’ve designed an RPC system so expressive that calling a remote service can feel like using a library.
Community Update: empowering startups building on Cloudflare and creating an inclusive community We closed out Developer Week by sharing updates on our Workers Launchpad program, our latest Developer Challenge, and the work we’re doing to ensure our community spaces – like our Discord and Community forums – are safe and inclusive for all developers.

Continue the conversation

Thank you for being a part of Developer Week! Want to continue the conversation and share what you’re building? Join us on Discord. To get started building on Workers, check out our developer documentation.

Cloudflare acquires Baselime to expand serverless application observability capabilities

Post Syndicated from Boris Tane original https://blog.cloudflare.com/cloudflare-acquires-baselime-expands-observability-capabilities


Today, we’re thrilled to announce that Cloudflare has acquired Baselime.

The cloud is changing. Just a few years ago, serverless functions were revolutionary. Today, entire applications are built on serverless architectures, from compute to databases, storage, queues, etc. — with Cloudflare leading the way in making it easier than ever for developers to build, without having to think about their architecture. And while the adoption of serverless has made it simple for developers to run fast, it has also made one of the most difficult problems in software even harder: how the heck do you unravel the behavior of distributed systems?

When I started Baselime 2 years ago, our goal was simple: enable every developer to build, ship, and learn from their serverless applications such that they can resolve issues before they become problems.

Since then, we built an observability platform that enables developers to understand the behaviour of their cloud applications. It’s designed for high cardinality and dimensionality data, from logs to distributed tracing with OpenTelemetry. With this data, we automatically surface insights from your applications, and enable you to quickly detect, troubleshoot, and resolve issues in production.

In parallel, Cloudflare has been busy the past few years building the next frontier of cloud computing: the connectivity cloud. The team is building primitives that enable developers to build applications with a completely new set of paradigms, from Workers to D1, R2, Queues, KV, Durable Objects, AI, and all the other services available on the Cloudflare Developers Platform.

This synergy makes Cloudflare the perfect home for Baselime. Our core mission has always been to simplify and innovate around observability for the future of the cloud, and Cloudflare’s ecosystem offers the ideal ground to further this cause. With Cloudflare, we’re positioned to deeply integrate into a platform that tens of thousands of developers trust and use daily, enabling them to quickly build, ship, and troubleshoot applications. We believe that every Worker, Queue, KV, Durable Object, AI call, etc. should have built-in observability by default.

That’s why we’re incredibly excited about the potential of what we can build together and the impact it will have on developers around the world.

To give you a preview into what’s ahead, I wanted to dive deeper into the 3 core concepts we followed while building Baselime.

High Cardinality and Dimensionality

Cardinality and dimensionality are best described using examples. Imagine you’re playing a board game with a deck of cards. High cardinality is like playing a game where every card is a unique character, making it hard to remember or match them. And high dimensionality is like each card has tons of details like strength, speed, magic, aura, etc., making the game’s strategy complex because there’s so much to consider.

This also applies to the data your application emits. For example, when you log an HTTP request that makes database calls.

  • High cardinality means that your logs can have a unique userId or requestId (which can take millions of distinct values). Those are high cardinality fields.
  • High dimensionality means that your logs can have thousands of possible fields. You can record each HTTP header of your request and the details of each database call. Any log can be a key-value object with thousands of individual keys.

The ability to query on high cardinality and dimensionality fields is key to modern observability. You can surface all errors or requests for a specific user, compute the duration of each of those requests, and group by location. You can answer all of those questions with a single tool.

OpenTelemetry

OpenTelemetry provides a common set of tools, APIs, SDKs, and standards for instrumenting applications. It is a game-changer for debugging and understanding cloud applications. You get to see the big picture: how fast your HTTP APIs are, which routes are experiencing the most errors, or which database queries are slowest. You can also get into the details by following the path of a single request or user across your entire application.

Baselime is OpenTelemetry native, and it is built from the ground up to leverage OpenTelemetry data. To support this, we built a set of OpenTelemetry SDKs compatible with several serverless runtimes.

Cloudflare is building the cloud of tomorrow and has developed workerd, a modern JavaScript runtime for Workers. With Cloudflare, we are considering embedding OpenTelemetry directly in the Workers’ runtime. That’s one more reason we’re excited to grow further at Cloudflare, enabling more developers to understand their applications, even in the most unconventional scenarios.

Developer Experience

Observability without action is just storage. I have seen too many developers pay for tools to store logs and metrics they never use, and the key reason is how opaque these tools are.

The crux of the issue in modern observability isn’t the technology itself, but rather the developer experience. Many tools are complex, with a significant learning curve. This friction reduces the speed at which developers can identify and resolve issues, ultimately affecting the reliability of their applications. Improving developer experience is key to unlocking the full potential of observability.

We built Baselime to be an exploratory solution that surfaces insights to you rather than requiring you to dig for them. For example, we notify you in real time when errors are discovered in your application, based on your logs and traces. You can quickly search through all of your data with full-text search, or using our powerful query engine, which makes it easy to correlate logs and traces for increased visibility, or ask our AI debugging assistant for insights on the issue you’re investigating.

It is always possible to go from one insight to another, asking questions about the state of your app iteratively until you get to the root cause of the issue you are troubleshooting.

Cloudflare has always prioritised the developer experience of its developer platform, especially with Wrangler, and we are convinced it’s the right place to solve the developer experience problem of observability.

What’s next?

Over the next few months, we’ll work to bring the core of Baselime into the Cloudflare ecosystem, starting with OpenTelemetry, real-time error tracking, and all the developer experience capabilities that make a great observability solution. We will keep building and improving observability for applications deployed outside Cloudflare because we understand that observability should work across providers.

But we don’t want to stop there. We want to push the boundaries of what modern observability looks like. For instance, directly connecting to your codebase and correlating insights from your logs and traces to functions and classes in your codebase. We also want to enable more AI capabilities beyond our debugging assistant. We want to deeply integrate with your repositories such that you can go from an error in your logs and traces to a Pull Request in your codebase within minutes.

We also want to enable everyone building on top of Large Language Models to do all your LLM observability directly within Cloudflare, such that you can optimise your prompts, improve latencies and reduce error rates directly within your cloud provider. These are just a handful of capabilities we can now build with the support of the Cloudflare platform.

Thanks

We are incredibly thankful to our community for its continued support, from day 0 to today. With your continuous feedback, you’ve helped us build something we’re incredibly proud of.

To all the developers currently using Baselime, you’ll be able to keep using the product and will receive ongoing support. Also, we are now making all the paid Baselime features completely free.

Baselime products remain available to sign up for while we work on integrating with the Cloudflare platform. We anticipate sunsetting the Baselime products towards the end of 2024 when you will be able to observe all of your applications within the Cloudflare dashboard. If you’re interested in staying up-to-date on our work with Cloudflare, we will release a signup link in the coming weeks!

We are looking forward to continuing to innovate with you.

Blazing fast development with full-stack frameworks and Cloudflare

Post Syndicated from Igor Minar original https://blog.cloudflare.com/blazing-fast-development-with-full-stack-frameworks-and-cloudflare


Hello web developers! Last year we released a slew of improvements that made deploying web applications on Cloudflare much easier, and in response we’ve seen a large growth of Astro, Next.js, Nuxt, Qwik, Remix, SolidStart, SvelteKit, and other web apps hosted on Cloudflare. Today we are announcing major improvements to our integration with these web frameworks that makes it easier to develop sophisticated applications that use our D1 SQL database, R2 object store, AI models, and other powerful features of Cloudflare’s developer platform.

In the past, if you wanted to develop a web framework-powered application with D1 and run it locally, you’d have to build a production build of your application, and then run it locally using `wrangler pages dev`. While this worked, each of your code iterations would take seconds, or tens of seconds for big applications. Iterating using production builds is simply too slow, pulls you out of the flow, and doesn’t allow you to take advantage of all the DX optimizations that framework authors have put a lot of hard work into. This is changing today!

Our goal is to integrate with web frameworks in the most natural way possible, without developers having to learn and adopt significant workflow changes or custom APIs when deploying their app to Cloudflare. Whether you are a Next.js developer, a Nuxt developer, or prefer another framework, you can now keep on using the blazing fast local development workflow familiar to you, and ship your application on Cloudflare.

All full-stack web frameworks come with a local development server (dev server) that is custom tailored to the framework and often provides an excellent development experience, with only one exception — they don’t natively support some important features of Cloudflare’s development platform, especially our storage solutions.

So up until recently, you had to make a tough choice. You could use the framework-specific dev server to develop your application, but forgo access to many of Cloudflare’s features. Alternatively, you could take full advantage of Cloudflare’s platform including various resources like D1 or R2, but you would have to give up using the framework specific developer tooling. In that case, your iteration cycle would slow down, and it would take seconds rather than milliseconds for you to see results of your code changes in the browser. But not anymore! Let’s take a look.

Let’s build an application

Let’s create a new application using C3 — our create-cloudflare CLI. We could use any npm client of our choice (pnpm anyone?!?), but to keep things simple in this post, we’ll stick with the default npm client. To get started, just run:

$ npm create cloudflare@latest

Provide a name for your app, or stick with the randomly generated one. Then select the “Website or web app” category, and pick a full-stack framework of your choice. We support many: Astro, Next.js, Nuxt, Qwik, Remix, SolidStart, and SvelteKit.

Since C3 delegates the application scaffolding to the latest version of the framework-specific CLI, you will scaffold the application exactly as the framework authors intended without missing out on any of the framework features or options. C3 then adds to your application everything necessary for integrating and deploying to Cloudflare so that you don’t have to configure it yourself.

With our application scaffolded, let’s get it to display a list of products stored in a database with just a few steps. First, we add the configuration for our database to our wrangler.toml config file:

[[d1_databases]]
binding = "DB"
database_name = "blog-products-db"
database_id = "XXXXXXXXXXXXXXXX"

Yes, that’s right! You can now configure your bound resources via the wrangler.toml file, even for full-stack apps deployed to Pages. We’ll share much more about configuration enhancements to Pages in a dedicated announcement.

Now let’s create a simple schema.sql file representing our database schema:

CREATE TABLE products(product_id INTEGER PRIMARY KEY, name TEXT, price INTEGER);
INSERT INTO products (product_id, name, price) VALUES (1, 'Apple', 250), (2, 'Banana', 100), (3, 'Cherry', 375);

And initialize our database:

$ npx wrangler d1 execute blog-products-db --local --file schema.sql

Notice that we used the –local flag of wrangler d1 execute to apply the changes to our local D1 database. This is the database that our dev server will connect to.

Next, if you use TypeScript, let TypeScript know about your database by running:

$ npm run build-cf-types

This command is preconfigured for all full-stack applications created via C3 and executes wrangler types to update the interface of Cloudflare’s environment containing all configured bindings.

We can now start the dev server provided by your framework via a handy shortcut:

$ npm run dev

This shortcut will start your framework’s dev server, whether it’s powered by next dev, nitro, or vite.

Now to access our database and list the products, we can now use a framework specific approach. For example, in a Next.js application that uses the App router, we could update app/api/hello/route.ts with the following:

const db = getRequestContext().env.DB;
 const productsResults = await db.prepare('SELECT * FROM products').all();
 return Response.json(productsResults.results);

Or in a Nuxt application, we can create a server/api/hello.ts file and populate it with:

export default defineEventHandler(async ({ context }) => {
   const db = context.cloudflare.env.DB;
   const productsResults = await db.prepare('SELECT * FROM products').all();
   return productsResults.results;
 });

Assuming that the framework dev server is running on port 3000, you can test the new API route in either framework by navigating to http://localhost:3000/api/hello. For simplicity, we picked API routes in these examples, but the same applies to any UI-generating routes as well.

Each web framework has its own way to define routes and pass contextual information about the request throughout the application, so how you access your databases, object stores, and other resources will depend on your framework. You can read our updated full-stack framework guides to learn more:

Now that you know how to access Cloudflare’s resources in the framework of your choice, everything else you know about your framework remains the same. You can now develop your application locally, using the development server optimized for your framework, which often includes support for hot module replacement (HMR), custom dev tools, enhanced debugging support and more, all while still benefiting from Cloudflare-specific APIs and features. Win-win!

What has actually changed to enable these development workflows?

To decrease the development latency and preserve the custom framework-specific experiences, we needed to enable web frameworks and their dev servers to integrate with wrangler and miniflare in a seamless, almost invisible way.

Miniflare is a key component in this puzzle. It is our local simulator for Cloudflare-specific resources, which is powered by workerd, our JavaScript (JS) runtime. By relying on workerd, we ensure that Cloudflare’s JavaScript APIs run locally in a way that faithfully simulates our production environment. The trouble is that framework dev servers already rely on Node.js to run the application, so bringing another JS runtime into the mix breaks many assumptions in how these dev servers have been architected.

Our team however came up with an interesting approach to bridging the gap between these two JS runtimes. We call it the getPlatformProxy() API, which is now part of wrangler and is super-powered by miniflare’s magic proxy. This API exposes a JS proxy object that behaves just like the usual Workers env object containing all bound resources. The proxy object enables code from Node.js to transparently invoke JavaScript code running in workerd, as well access Cloudflare-specific runtime APIs.

With this bridge between the Node.js and workerd runtimes, your application can now access Cloudflare simulators for D1, R2, KV and other storage solutions directly while running in a dev server powered by Node.js. Or you could even write an Node.js script to do the same:

 import {getPlatformProxy} from 'wrangler';


 const {env} = getPlatformProxy();
 console.dir(env);
 const db = env.DB;


 // Now let’s execute a DB query that runs in a local D1 db
 // powered by miniflare/workerd and access the result from Node.js
 const productsResults = await db.prepare('SELECT * FROM products').all();
 console.log(productsResults.results);

With the getPlatformProxy() API available, the remaining work was all about updating all framework adapters, plugins, and in some cases frameworks themselves to make use of this API. We are grateful for the support we received from framework teams on this journey, especially Alex from Astro, pi0 from Nuxt, Pedro from Remix, Ryan from Solid, Ben and Rich from Svelte, and our collaborator on the next-on-pages project, James Anderson.

Future improvements to development workflows with Vite

While the getPlatformProxy() API is a good solution for many scenarios, we can do better. If we could run the entire application in our JS runtime rather than Node.js, we could even more faithfully simulate the production environment and reduce developer friction and production surprises.

In the ideal world, we’d like you to develop against the same runtime that you deploy to in production, and this can only be achieved by integrating workerd directly into the dev servers of all frameworks, which is not a small feat considering the number of frameworks out there and the differences between them.

We however got a bit lucky. As we kicked off this effort, we quickly realized that Vite, a popular dev server used by many full-stack frameworks, was gaining increasingly greater adoption. In fact, Remix switched over to Vite just recently and confirmed the popularity of Vite as the common foundation for web development today.

If Vite had first-class support for running a full-stack application in an alternative JavaScript runtime, we could enable anyone using Vite to develop their applications locally with complete access to the Cloudflare developer platform. No more framework specific custom integrations and workarounds — all the features of a full-stack framework, Vite, and Cloudflare accessible to all developers.

Sounds too good to be true? Maybe. We are very stoked to be working with the Vite team on the Vite environments proposal, which could enable just that. This proposal is still evolving, so stay tuned for updates.

What will you build today?

We aim to make Cloudflare the best development platform for web developers. Making it quick and easy to develop your application with frameworks and tools you are already familiar with is a big part of our story. Start your journey with us by running a single command:

$ npm create cloudflare@latest

We’ve added JavaScript-native RPC to Cloudflare Workers

Post Syndicated from Kenton Varda original https://blog.cloudflare.com/javascript-native-rpc


Cloudflare Workers now features a built-in RPC (Remote Procedure Call) system enabling seamless Worker-to-Worker and Worker-to-Durable Object communication, with almost no boilerplate. You just define a class:

export class MyService extends WorkerEntrypoint {
  sum(a, b) {
    return a + b;
  }
}

And then you call it:

let three = await env.MY_SERVICE.sum(1, 2);

No schemas. No routers. Just define methods of a class. Then call them. That’s it.

But that’s not it

This isn’t just any old RPC. We’ve designed an RPC system so expressive that calling a remote service can feel like using a library – without any need to actually import a library! This is important not just for ease of use, but also security: fewer dependencies means fewer critical security updates and less exposure to supply-chain attacks.

To this end, here are some of the features of Workers RPC:

  • For starters, you can pass Structured Clonable types as the params or return value of an RPC. (That means that, unlike JSON, Dates just work, and you can even have cycles.)
  • You can additionally pass functions in the params or return value of other functions. When the other side calls the function you passed to it, they make a new RPC back to you.
  • Similarly, you can pass objects with methods. Method calls become further RPCs.
  • RPC to another Worker (over a Service Binding) usually does not even cross a network. In fact, the other Worker usually runs in the very same thread as the caller, reducing latency to zero. Performance-wise, it’s almost as fast as an actual function call.
  • When RPC does cross a network (e.g. to a Durable Object), you can invoke a method and then speculatively invoke further methods on the result in a single network round trip.
  • You can send a byte stream over RPC, and the system will automatically stream the bytes with proper flow control.
  • All of this is secure, based on the object-capability model.
  • The protocol and implementation are fully open source as part of workerd.

Workers RPC is a JavaScript-native RPC system. Under the hood, it is built on Cap’n Proto. However, unlike Cap’n Proto, Workers RPC does not require you to write a schema. (Of course, you can use TypeScript if you like, and we provide tools to help with this.)

In general, Workers RPC is designed to “just work” using idiomatic JavaScript code, so you shouldn’t have to spend too much time looking at docs. We’ll give you an overview in this blog post. But if you want to understand the full feature set, check out the documentation.

Why RPC? (And what is RPC anyway?)

Remote Procedure Calls (RPC) are a way of expressing communications between two programs over a network. Without RPC, you might communicate using a protocol like HTTP. With HTTP, though, you must format and parse your communications as an HTTP request and response, perhaps designed in REST style. RPC systems try to make communications look like a regular function call instead, as if you were calling a library rather than a remote service. The RPC system provides a “stub” object on the client side which stands in for the real server-side object. When a method is called on the stub, the RPC system figures out how to serialize and transmit the parameters to the server, invoke the method on the server, and then transmit the return value back.

The merits of RPC have been subject to a great deal of debate. RPC is often accused of committing many of the fallacies of distributed computing.

But this reputation is outdated. When RPC was first invented some 40 years ago, async programming barely existed. We did not have Promises, much less async and await. Early RPC was synchronous: calls would block the calling thread waiting for a reply. At best, latency made the program slow. At worst, network failures would hang or crash the program. No wonder it was deemed “broken”.

Things are different today. We have Promise and async and await, and we can throw exceptions on network failures. We even understand how RPCs can be pipelined so that a chain of calls takes only one network round trip. Many large distributed systems you likely use every day are built on RPC. It works.

The fact is, RPC fits the programming model we’re used to. Every programmer is trained to think in terms of APIs composed of function calls, not in terms of byte stream protocols nor even REST. Using RPC frees you from the need to constantly translate between mental models, allowing you to move faster.

Example: Authentication Service

Here’s a common scenario: You have one Worker that implements an application, and another Worker that is responsible for authenticating user credentials. The app Worker needs to call the auth Worker on each request to check the user’s cookie.

This example uses a Service Binding, which is a way of configuring one Worker with a private channel to talk to another, without going through a public URL. Here, we have an application Worker that has been configured with a service binding to the Auth worker.

Before RPC, all communications between Workers needed to use HTTP. So, you might write code like this:

// OLD STYLE: HTTP-based service bindings.
export default {
  async fetch(req, env, ctx) {
    // Call the auth service to authenticate the user's cookie.
    // We send it an HTTP request using a service binding.

    // Construct a JSON request to the auth service.
    let authRequest = {
      cookie: req.headers.get("Cookie")
    };

    // Send it to env.AUTH_SERVICE, which is our service binding
    // to the auth worker.
    let resp = await env.AUTH_SERVICE.fetch(
        "https://auth/check-cookie", {
      method: "POST",
      headers: {
        "Content-Type": "application/json; charset=utf-8",
      },
      body: JSON.stringify(authRequest)
    });

    if (!resp.ok) {
      return new Response("Internal Server Error", {status: 500});
    }

    // Parse the JSON result.
    let authResult = await resp.json();

    // Use the result.
    if (!authResult.authorized) {
      return new Response("Not authorized", {status: 403});
    }
    let username = authResult.username;

    return new Response(`Hello, ${username}!`);
  }
}

Meanwhile, your auth server might look like:

// OLD STYLE: HTTP-based auth server.
export default {
  async fetch(req, env, ctx) {
    // Parse URL to decide what endpoint is being called.
    let url = new URL(req.url);
    if (url.pathname == "/check-cookie") {
      // Parse the request.
      let authRequest = await req.json();

      // Look up cookie in Workers KV.
      let cookieInfo = await env.COOKIE_MAP.get(
          hash(authRequest.cookie), "json");

      // Construct the response.
      let result;
      if (cookieInfo) {
        result = {
          authorized: true,
          username: cookieInfo.username
        };
      } else {
        result = { authorized: false };
      }

      return Response.json(result);
    } else {
      return new Response("Not found", {status: 404});
    }
  }
}

This code has a lot of boilerplate involved in setting up an HTTP request to the auth service. With RPC, we can instead express this as a function call:

// NEW STYLE: RPC-based service bindings
export default {
  async fetch(req, env, ctx) {
    // Call the auth service to authenticate the user's cookie.
    // We invoke it using a service binding.
    let authResult = await env.AUTH_SERVICE.checkCookie(
        req.headers.get("Cookie"));

    // Use the result.
    if (!authResult.authorized) {
      return new Response("Not authorized", {status: 403});
    }
    let username = authResult.username;

    return new Response(`Hello, ${username}!`);
  }
}

And the server side becomes:

// NEW STYLE: RPC-based auth server.
import { WorkerEntrypoint } from "cloudflare:workers";

export class AuthService extends WorkerEntrypoint {
  async checkCookie(cookie) {
    // Look up cookie in Workers KV.
    let cookieInfo = await this.env.COOKIE_MAP.get(
        hash(cookie), "json");

    // Return result.
    if (cookieInfo) {
      return {
        authorized: true,
        username: cookieInfo.username
      };
    } else {
      return { authorized: false };
    }
  }
}

This is a pretty nice simplification… but we can do much more!

Let’s get fancy! Or should I say… classy?

Let’s say we want our auth service to do a little more. Instead of just checking cookies, it provides a whole API around user accounts. In particular, it should let you:

  • Get or update the user’s profile info.
  • Send the user an email notification.
  • Append to the user’s activity log.

But, these operations should only be allowed after presenting the user’s credentials.

Here’s what the server might look like:

import { WorkerEntrypoint, RpcTarget } from "cloudflare:workers";

// `User` is an RPC interface to perform operations on a particular
// user. This class is NOT exported as an entrypoint; it must be
// received as the result of the checkCookie() RPC.
class User extends RpcTarget {
  constructor(uid, env) {
    super();

    // Note: Instance members like these are NOT exposed over RPC.
    // Only class (prototype) methods and properties are exposed.
    this.uid = uid;
    this.env = env;
  }

  // Get/set user profile, backed by Worker KV.
  async getProfile() {
    return await this.env.PROFILES.get(this.uid, "json");
  }
  async setProfile(profile) {
    await this.env.PROFILES.put(this.uid, JSON.stringify(profile));
  }

  // Send the user a notification email.
  async sendNotification(message) {
    let addr = await this.env.EMAILS.get(this.uid);
    await this.env.EMAIL_SERVICE.send(addr, message);
  }

  // Append to the user's activity log.
  async logActivity(description) {
    // (Please excuse this somewhat problematic implementation,
    // this is just a dumb example.)
    let timestamp = new Date().toISOString();
    await this.env.ACTIVITY.put(
        `${this.uid}/${timestamp}`, description);
  }
}

// Now we define the entrypoint service, which can be used to
// get User instances -- but only by presenting the cookie.
export class AuthService extends WorkerEntrypoint {
  async checkCookie(cookie) {
    // Look up cookie in Workers KV.
    let cookieInfo = await this.env.COOKIE_MAP.get(
        hash(cookie), "json");

    if (cookieInfo) {
      return {
        authorized: true,
        user: new User(cookieInfo.uid, this.env),
      };
    } else {
      return { authorized: false };
    }
  }
}

Now we can write a Worker that uses this API while displaying a web page:

export default {
  async fetch(req, env, ctx) {
    // `using` is a new JavaScript feature. Check out the
    // docs for more on this:
    // https://developers.cloudflare.com/workers/runtime-apis/rpc/lifecycle/
    using authResult = await env.AUTH_SERVICE.checkCookie(
        req.headers.get("Cookie"));
    if (!authResult.authorized) {
      return new Response("Not authorized", {status: 403});
    }

    let user = authResult.user;
    let profile = await user.getProfile();

    await user.logActivity("You visited the site!");
    await user.sendNotification(
        `Thanks for visiting, ${profile.name}!`);

    return new Response(`Hello, ${profile.name}!`);
  }
}

Finally, this worker needs to be configured with a service binding pointing at the AuthService class. Its wrangler.toml may look like:

name = "app-worker"
main = "./src/app.js"

# Declare a service binding to the auth service.
[[services]]
binding = "AUTH_SERVICE"    # name of the binding in `env`
service = "auth-service"    # name of the worker in the dashboard
entrypoint = "AuthService"  # name of the exported RPC class

Wait, how?

What exactly happened here? The Server created an instance of the class User and returned it to the client. It has methods that the client can then just call? Are we somehow transferring code over the wire?

No, absolutely not! All code runs strictly in the isolate where it was originally loaded. What actually happens is, when the return value is passed over RPC, all class instances are replaced with RPC stubs. The stub, when called, makes a new RPC back to the server, where it calls the method on the original User object that was created there:

But then you might ask: how does the RPC stub know what methods are available? Is a list of methods passed over the wire?

In fact, no. The RPC stub is a special object called a “Proxy“. It implements a “wildcard method”, that is, it appears to have an infinite number of methods of every possible name. When you try to call a method, the name you called is sent to the server. If the original object has no such method, an exception is thrown.

Did you spot the security?

In the above example, we see that RPC is easy to use. We made an object! We called it! It all just felt natural, like calling a local API! Hooray!

But there’s another extremely important property that the AuthService API has which you may have missed: As designed, you cannot perform any operation on a user without first checking the cookie. This is true despite the fact that the individual method calls do not require sending the cookie again, and the User object itself doesn’t store the cookie.

The trick is, the initial checkCookie() RPC is what returns a User object in the first place. The AuthService API does not provide any other way to obtain a User instance. The RPC client cannot create a User object out of thin air, and cannot call methods of an object without first explicitly receiving a reference to it.

This is called capability-based security: we say that the User reference received by the client is a “capability”, because receiving it grants the client the ability to perform operations on the user. The getProfile() method grants this capability only when the client has presented the correct cookie.

Capability-based security is often like this: security can be woven naturally into your APIs, rather than feel like an additional concern bolted on top.

More security: Named entrypoints

Another subtle but important detail to call out: in the above example, the auth service’s RPC API is exported as a named class:

export class AuthService extends WorkerEntrypoint {

And in our wrangler.toml for the calling worker, we had to specify an “entrypoint”, matching the class name:

entrypoint = "AuthService"  # name of the exported RPC class

In the past, service bindings would bind to the “default” entrypoint, declared with export default {. But, the default entrypoint is also typically exposed to the Internet, e.g. automatically mapped to a hostname under workers.dev (unless you explicitly turn that off). It can be tricky to safely assume that requests arriving at this entrypoint are in any way trusted.

With named entrypoints, this all changes. A named entrypoint is only accessible to Workers which have explicitly declared a binding to it. By default, only Workers on your own account can declare such bindings. Moreover, the binding must be declared at deploy time; a Worker cannot create new service bindings at runtime.

Thus, you can trust that requests arriving at a named entrypoint can only have come from Workers on your account and for which you explicitly created a service binding. In the future, we plan to extend this pattern further with the ability to lock down entrypoints, audit which Workers have bindings to them, tell the callee information about who is calling at runtime, and so on. With these tools, there is no need to write code in your app itself to authenticate access to internal APIs; the system does it for you.

What about type safety?

Workers RPC works in an entirely dynamically-typed way, just as JavaScript itself does. But just as you can apply TypeScript on top of JavaScript in general, you can apply it to Workers RPC.

The @cloudflare/workers-types package defines the type Service<MyEntrypointType>, which describes the type of a service binding. MyEntrypointType is the type of your server-side interface. Service<MyEntrypointType> applies all the necessary transformations to turn this into a client-side type, such as converting all methods to async, replacing functions and RpcTargets with (properly-typed) stubs, and so on.

It is up to you to share the definition of MyEntrypointType between your server app and its clients. You might do this by defining the interface in a separate shared TypeScript file, or by extracting a .d.ts type declaration file from your server code using tsc --declaration.

With that done, you can apply types to your client:

import { WorkerEntrypoint } from "cloudflare:workers";

// The interface that your server-side entrypoint implements.
// (This would probably be imported from a .d.ts file generated
// from your server code.)
declare class MyEntrypointType extends WorkerEntrypoint {
  sum(a: number, b: number): number;
}

// Define an interface Env specifying the bindings your client-side
// worker expects.
interface Env {
  MY_SERVICE: Service<MyEntrypointType>;
}

// Define the client worker's fetch handler with typed Env.
export default <ExportedHandler<Env>> {
  async fetch(req, env, ctx) {
    // Now env.MY_SERVICE is properly typed!
    const result = await env.MY_SERVICE.sum(1, 2);
    return new Response(result.toString());
  }
}

RPC to Durable Objects

Durable Objects allow you to create a “named” worker instance somewhere on the network that multiple other workers can then talk to, in order to coordinate between them. Each Durable Object also has its own private on-disk storage where it can store state long-term.

Previously, communications with a Durable Object had to take the form of HTTP requests and responses. With RPC, you can now just declare methods on your Durable Object class, and call them on the stub. One catch: to opt into RPC, you must declare your Durable Object class with extends DurableObject, like so:

import { DurableObject } from "cloudflare:workers";

export class Counter extends DurableObject {
  async increment() {
    // Increment our stored value and return it.
    let value = await this.ctx.storage.get("value");
    value = (value || 0) + 1;
    this.ctx.storage.put("value", value);
    return value;
  }
}

Now we can call it like:

let stub = env.COUNTER_NAMEPSACE.get(id);
let value = await stub.increment();

TypeScript is supported here too, by defining your binding with type DurableObjectNamespace<ServerType>:

interface Env {
  COUNTER_NAMESPACE: DurableObjectNamespace<Counter>;
}

Eliding awaits with speculative calls

When talking to a Durable Object, the object may be somewhere else in the world from the caller. RPCs must cross the network. This takes time: despite our best efforts, we still haven’t figured out how to make information travel faster than the speed of light.

When you have a complex RPC interface where one call returns an object on which you wish to make further method calls, it’s easy to end up with slow code that makes too many round trips over the network.

// Makes three round trips.
let foo = await stub.foo();
let baz = await foo.bar.baz();
let corge = await baz.qux[3].corge();

Workers RPC features a way to avoid this: If you know that a call will return a value containing a stub, and all you want to do with it is invoke a method on that stub, you can skip awaiting it:

// Same thing, only one round trip.
let foo = stub.foo();
let baz = foo.bar.baz();
let corge = await baz.qux[3].corge();

Whoa! How does this work?

RPC methods do not return normal promises. Instead, they return special RPC promises. These objects are “custom thenables“, which means you can use them in all the ways you’d use a regular Promise, like awaiting it or calling .then() on it.

But an RPC promise is more than just a thenable. It is also a proxy. Like an RPC stub, it has a wildcard property. You can use this to express speculative RPC calls on the eventual result, before it has actually resolved. These speculative calls will be sent to the server immediately, so that they can begin executing as soon as the first RPC has finished there, before the result has actually made its way back over the network to the client.

This feature is also known as “Promise Pipelining”. Although it isn’t explicitly a security feature, it is commonly provided by object-capability RPC systems like Cap’n Proto.

The future: Custom Bindings Marketplace?

For now, Service Bindings and Durable Objects only allow communication between Workers running on the same account. So, RPC can only be used to talk between your own Workers.

But we’d like to take it further.

We have previously explained why Workers environments contain live objects, also known as “bindings”. But today, only Cloudflare can add new binding types to the Workers platform – like Queues, KV, or D1. But what if anyone could invent their own binding type, and give it to other people?

Previously, we thought this would require creating a way to automatically load client libraries into the calling Workers. That seemed scary: it meant using someone’s binding would require trusting their code to run inside your isolate. With RPC, there’s no such trust. The binding only sees exactly what you explicitly pass to it. It cannot compromise the rest of your Worker.

Could Workers RPC provide the basis for a “bindings marketplace”, where people can offer rich JavaScript APIs to each other in an easy and secure way? We’re excited to explore and find out.

Try it now

Workers RPC is available today for all Workers users. To get started, check out the docs.

Browser Rendering API GA, rolling out Cloudflare Snippets, SWR, and bringing Workers for Platforms to all users

Post Syndicated from Tanushree Sharma original https://blog.cloudflare.com/browser-rendering-api-ga-rolling-out-cloudflare-snippets-swr-and-bringing-workers-for-platforms-to-our-paygo-plans


Browser Rendering API is now available to all paid Workers customers with improved session management

In May 2023, we announced the open beta program for the Browser Rendering API. Browser Rendering allows developers to programmatically control and interact with a headless browser instance and create automation flows for their applications and products.

At the same time, we launched a version of the Puppeteer library that works with Browser Rendering. With that, developers can use a familiar API on top of Cloudflare Workers to create all sorts of workflows, such as taking screenshots of pages or automatic software testing.

Today, we take Browser Rendering one step further, taking it out of beta and making it available to all paid Workers’ plans. Furthermore, we are enhancing our API and introducing a new feature that we’ve been discussing for a long time in the open beta community: session management.

Session Management

Session management allows developers to reuse previously opened browsers across Worker’s scripts. Reusing browser sessions has the advantage that you don’t need to instantiate a new browser for every request and every task, drastically increasing performance and lowering costs.

Before, to keep a browser instance alive and reuse it, you’d have to implement complex code using Durable Objects. Now, we’ve simplified that for you by keeping your browsers running in the background and extending the Puppeteer API with new session management methods that give you access to all of your running sessions, activity history, and active limits.

Here’s how you can list your active sessions:

const sessions = await puppeteer.sessions(env.RENDERING);
console.log(sessions);
[
   {
      "connectionId": "2a2246fa-e234-4dc1-8433-87e6cee80145",
      "connectionStartTime": 1711621704607,
      "sessionId": "478f4d7d-e943-40f6-a414-837d3736a1dc",
      "startTime": 1711621703708
   },
   {
      "sessionId": "565e05fb-4d2a-402b-869b-5b65b1381db7",
      "startTime": 1711621703808
   }
]

We have added a Worker script example on how to use session management to the Developer Documentation.

Analytics and logs

Observability is an essential part of any Cloudflare product. You can find detailed analytics and logs of your Browser Rendering usage in the dashboard under your account’s Worker & Pages section.

Browser Rendering is now available to all customers with a paid Workers plan. Each account is limited to running two new browsers per minute and two concurrent browsers at no cost during this period. Check our developers page to get started.

We are rolling out access to Cloudflare Snippets

Powerful, programmable, and free of charge, Snippets are the best way to perform complex HTTP request and response modifications on Cloudflare. What was once too advanced to achieve using Rules products is now possible with Snippets. Since the initial announcement during Developer Week 2022, the promise of extending out-of-the-box Rules functionality by writing simple JavaScript code is keeping the Cloudflare community excited.

During the first 3 months of 2024 alone, the amount of traffic going through Snippets increased over 7x, from an average of 2,200 requests per second in early January to more than 17,000 in March.

However, instead of opening the floodgates and letting millions of Cloudflare users in to test (and potentially break) Snippets in the most unexpected ways, we are going to pace ourselves and opt for a phased rollout, much like the newly released Gradual Rollouts for Workers.

In the next few weeks, 5% of Cloudflare users will start seeing “Snippets” under the Rules tab of the zone-level menu in their dashboard. If you happen to be part of the first 5%, snip into action and try out how fast and powerful Snippets are even for advanced use cases like dynamically changing the date in headers or A / B testing leveraging the `math.random` function. Whatever you use Snippets for, just keep one thing in mind: this is still an alpha, so please do not use Snippets for production traffic just yet.

Until then, keep your eyes out for the new Snippets tab in the Cloudflare dashboard and learn more how powerful and flexible Snippets are at the developer documentation in the meantime.

Coming soon: asynchronous revalidation with stale-while-revalidate

One of the features most requested by our customers is the asynchronous revalidation with stale-while-revalidate (SWR) cache directive, and we will be bringing this to you in the second half of 2024.  This functionality will be available by design as part of our new CDN architecture that is being built using Rust with performance and memory safety at top of mind.

Currently, when a client requests a resource, such as a web page or an image, Cloudflare checks to see if the asset is in cache and provides a cached copy if available. If the file is not in the cache or has expired and become stale, Cloudflare connects to the origin server to check for a fresh version of the file and forwards this fresh version to the end user. This wait time adds latency to these requests and impacts performance.

Stale-while-revalidate is a cache directive that allows the expired or stale version of the asset to be served to the end user while simultaneously allowing Cloudflare to check the origin to see if there’s a fresher version of the resource available. If an updated version exists, the origin forwards it to Cloudflare, updating the cache in the process. This mechanism allows the client to receive a response quickly from the cache while ensuring that it always has access to the most up-to-date content. Stale-while-revalidate strikes a balance between serving content efficiently and ensuring its freshness, resulting in improved performance and a smoother user experience.

Customers who want to be part of our beta testers and “cache” in on the fun can register here, and we will let you know when the feature is ready for testing!

Coming on April 16, 2024: Workers for Platforms for our pay-as-you-go plan

Today, we’re excited to share that on April 16th, Workers for Platforms will be available to all developers through our new $25 pay-as-you-go plan!

Workers for Platforms is changing the way we build software – it gives you the ability to embed personalization and customization directly into your product. With Workers for Platforms, you can deploy custom code on behalf of your users or let your users directly deploy their own code to your platform, without you or your users having to manage any infrastructure. You can use Workers for Platforms with all the exciting announcements that have come out this Developer Week – it supports all the bindings that come with Workers (including Workers AI, D1 and Durable Objects) as well as Python Workers.  

Here’s what some of our customers – ranging from enterprises to startups – are building on Workers for Platforms:

  • Shopify Oxygen is a hosting platform for their Remix-based eCommerce framework Hydrogen, and it’s built on Workers for Platforms! The Hydrogen/Oxygen combination gives Shopify merchants control over their buyer experience without the restrictions of generic storefront templates.
  • Grafbase is a data platform for developers to create a serverless GraphQL API that unifies data sources across a business under one endpoint. They use Workers for Platforms to give their developers the control and flexibility to deploy their own code written in JavaScript/TypeScript or WASM.
  • Triplit is an open-source database that syncs data between server and browser in real-time. It allows users to build low latency, real-time applications with features like relational querying, schema management and server-side storage built in. Their query and sync engine is built on top of Durable Objects, and they’re using Workers for Platforms to allow their customers to package custom Javascript alongside their Triplit DB instance.

Tools for observability and platform level controls

Workers for Platforms doesn’t just allow you to deploy Workers to your platform – we also know how important it is to have observability and control over your users’ Workers. We have a few solutions that help with this:

  • Custom Limits: Set CPU time or subrequest caps on your users’ Workers. Can be used to set limits in order to control your costs on Cloudflare and/or shape your own pricing and packaging model. For example, if you run a freemium model on your platform, you can lower the CPU time limit for customers on your free tier.
  • Tail Workers: Tail Worker events contain metadata about the Worker, console.log() messages, and capture any unhandled exceptions. They can be used to provide your developers with live logging in order to monitor for errors and troubleshoot in real time.
  • Outbound Workers: Get visibility into all outgoing requests from your users’ Workers. Outbound Workers sit between user Workers and the fetch() requests they make, so you get full visibility over the request before it’s sent out to the Internet.

Pricing

We wanted to make sure that Workers for Platforms was affordable for hobbyists, solo developers, and indie developers. Workers for Platforms is part of a new $25 pay-as-you-go plan, and it includes the following:

Included Amounts
Requests 20 million requests/month
+$0.30 per additional million
CPU time 60 million CPU milliseconds/month
+$0.02 per additional million CPU milliseconds
Scripts 1000 scripts
+0.02 per additional script/month

Workers for Platforms will be available to purchase on April 16, 2024!

The Workers for Platforms will be available to purchase under the Workers for Platforms tab on the Cloudflare Dashboard on April 16, 2024.

In the meantime, to learn more about Workers for Platforms, check out our starter project and developer documentation.

Community Update: empowering startups building on Cloudflare and creating an inclusive community

Post Syndicated from Ricky Robinett original https://blog.cloudflare.com/2024-community-update


With millions of developers around the world building on Cloudflare, we are constantly inspired by what you all are building with us. Every Developer Week, we’re excited to get your hands on new products and features that can help you be more productive, and creative, with what you’re building. But we know our job doesn’t end when we put new products and features in your hands during Developer Week. We also need to cultivate welcoming community spaces where you can come get help, share what you’re building, and meet other developers building with Cloudflare.

We’re excited to close out Developer Week by sharing updates on our Workers Launchpad program, our latest Developer Challenge, and the work we’re doing to ensure our community spaces – like our Discord and Community forums – are safe and inclusive for all developers.

Helping startups go further with Workers Launchpad

In late 2022, we initiated the $2 billion Workers Launchpad Funding Program aimed at aiding the more than one million developers who use Cloudflare’s Developer Platform. This initiative particularly benefits startups that are investing in building on Cloudflare to propel their business growth.

The Workers Launchpad Program offers a variety of resources to help builders scale faster and reach more customers. The program includes, but is not limited to:

  • Fostering a community of like-minded founders
  • Facilitating introductions to the Launchpad’s VC partner network of 40+ leading investors
  • Company-building support and mentorship through virtual Founders Bootcamp sessions
  • Organizing technical office hours with our engineers
  • Access to preview upcoming Cloudflare products and Product Managers
  • Culminating in a Demo Day, for participants to share their stories globally with investors and prospective customers.

So far, 50 amazing startups from 13 countries have successfully graduated from the Workers Launchpad program. We finished up Cohort #1 in March 2023, and Cohort #2 wrapped up August 2023.

Meet Cohort #3 of the Workers Launchpad!

Since the end of Cohort #2, we have received hundreds of new applications from startups across the globe. Startup applicants showcased incredible tools and software across a variety of industries, including AI, SaaS, Supply Chain, Media, Gaming, Hospitality, and Developer Productivity. While we were encouraged by this wave of applicants’ ability to build amazing technology, there were a few that stood out that are leveraging Cloudflare’s Developer Suite to scale their business.

With that being said, we would like to introduce you to the 29 startups that have been chosen to participate in Cohort #3 of Workers Launchpad:

Below, you will find a brief summary of what problems these startups are looking to solve:

Autoblocks AI Evaluation & testing platform to improve AI product quality.
BEAM (By Mass Luminosity) A next-gen live streaming platform that elevates creator-viewer interactions to the next level.
BentoML Run any AI model in your cloud.
bohr.io An easy and fast development platform.
Cloneable Provides low/no-code tools to build and deploy applications to any device, instantly.
CleverEV Manage EV charging infrastructure and experience for clients.
Dulia Managed edge powered serverless AI platform.
Erxes Open-source experience operating system empowering businesses.
Exporio AI-first A/B testing and personalization platform.
Helicone GenAI observability platform built for developers.
Houdini An end-to-end solution for building and deploying GraphQL applications.
Intelligage Humanize your AI for customers.
Langbase Ship hyper-personalized AI apps in seconds — any LLM, any data, any developer.
Milkshake Create websites via mobile within minutes.
Nadrama Kubernetes PaaS in your cloud account, in minutes.
NuxtHub (by NuxtLabs) Build full-stack applications on Cloudflare with zero configuration.
Panaptico High performance networking fabrics for specialized workloads.
Playroom Platform for game developers to build multiplayer games in minutes.
Puzzle Labs P2P, prompt-first knowledge base for teams to collaborate with AI.
Resilis Intelligent edge caching for REST APIs.
Scrappi A better way to collect, create and collaborate.
Starlight Labs AI native game studio.
T4 Stack Ship feature parity on universal apps.
TextCortex AI AI copilot platform to leverage the power of easy customization and integration.
Toothless Build GenAI-powered workflow automation and internal tools in minutes.
Unfetch Generate and run scripts with AI to complete tasks within seconds.
Unkey Redefines API management for developers. Add authentication, analytics, and rate-limiting to your APIs in minutes.
Unnug Transformative cloud compiler with an emphasis on user on-premises, cloud, and edge resources.
Wope AI-powered marketing agent that leverages Gen AI to optimize businesses’ online presence and drive more traffic.

The Cloudflare team is looking forward to working with Cohort #3 participants and sharing what they are building on Cloudflare. To follow along with Cohort #3 of Workers Launchpad, follow @CloudflareDev and join our Developer Discord server.

Are you a startup and interested in joining Cohort #4? Apply here!

AI developer challenge

Now that Workers AI is GA, we’re excited to see what our community can build.  We’ve teamed up with our friends at DEV who will be running an AI Developer challenge, which officially launched on Wednesday, April 3, and runs until Sunday, April 14, 2024, when submissions close.

For this challenge, you will build a Workers AI application that makes use of AI task types from Cloudflare’s growing catalog of Open Models. Apps will be evaluated on innovation, creativity, and demonstration of underlying technology with prizes awarded by DEV for the best overall app, as well as projects leveraging multiple models and tasks. For more information and details on how to participate, including DEV’s rules and requirements, head over to the official challenge page.

Creating an inclusive community

Our community has been growing really fast over the past year, so fast that it’s becoming more difficult to welcome each new member that joins our Discord server every day, and Developer Week has always been one of the main drivers of this growth.

When you come into the Cloudflare developer community, it’s important to us that you’re entering a space that is safe and welcoming. Even though we already have rules for the server and community forums, we needed guidelines for our community programs, so that’s why we’ve created a new Code of Conduct that promotes inclusivity, respect, and will help us create a better community for everyone.

Do you want to be part of this and help us create a more inclusive and helpful community? Then please share your feedback and tell us what you would like to see improved in our community and our Discord server in this thread.

What’s new with Cloudflare Media: updates for Calls, Stream, and Images

Post Syndicated from Deanna Lam original https://blog.cloudflare.com/whats-next-for-cloudflare-media


Our customers use Cloudflare Calls, Stream, and Images to build live, interactive, and real-time experiences for their users. We want to reduce friction by making it easier to get data into our products. This also means providing transparent pricing, so customers can be confident that costs make economic sense for their business, especially as they scale.

Today, we’re introducing four new improvements to help you build media applications with Cloudflare:

  • Cloudflare Calls is in open beta with transparent pricing
  • Cloudflare Stream has a Live Clipping API to let your viewers instantly clip from ongoing streams
  • Cloudflare Images has a pre-built upload widget that you can embed in your application to accept uploads from your users
  • Cloudflare Images lets you crop and resize images of people at scale with automatic face cropping

Build real-time video and audio applications with Cloudflare Calls

Cloudflare Calls is now in open beta, and you can activate it from your dashboard. Your usage will be free until May 15, 2024. Starting May 15, 2024, customers with a Calls subscription will receive the first terabyte each month for free, with any usage beyond that charged at $0.05 per real-time gigabyte. Additionally, there are no charges for inbound traffic to Cloudflare.

To get started, read the developer documentation for Cloudflare Calls.

Live Instant Clipping: create clips from live streams and recordings

Live broadcasts often include short bursts of highly engaging content within a longer stream. Creators and viewers alike enjoy being able to make a “clip” of these moments to share across multiple channels. Being able to generate that clip rapidly enables our customers to offer instant replays, showcase key pieces of recordings, and build audiences on social media in real-time.

Today, Cloudflare Stream is launching Live Instant Clipping in open beta for all customers. With the new Live Clipping API, you can let your viewers instantly clip and share moments from an ongoing stream – without re-encoding the video.

When planning this feature, we considered a typical user flow for generating clips from live events. Consider users watching a stream of a video game: something wild happens and users want to save and share a clip of it to social media. What will they do?

First, they’ll need to be able to review the preceding few minutes of the broadcast, so they know what to clip. Next, they need to select a start time and clip duration or end time, possibly as a visualization on a timeline or by scrubbing the video player. Finally, the clip must be available quickly in a way that can be replayed or shared across multiple platforms, even after the original broadcast has ended.

That ideal user flow implies some heavy lifting in the background. We now offer a manifest to preview recent live content in a rolling window, and we provide the timing information in that response to determine the start and end times of the requested clip relative to the whole broadcast. Finally, on request, we will generate on-the-fly that clip as a standalone video file for easy sharing as well as an HLS manifest for embedding into players.

Live Instant Clipping is available in beta to all customers starting today! Live clips are free to make; they do not count toward storage quotas, and playback is billed just like minutes of video delivered. To get started, check out the Live Clipping API in developer documentation.

Integrate Cloudflare Images into your application with only a few lines of code

Building applications with user-uploaded images is even easier with the upload widget, a pre-built, interactive UI that lets users upload images directly into your Cloudflare Images account.

Many developers use Cloudflare Images as an end-to-end image management solution to support applications that center around user-generated content, from AI photo editors to social media platforms. Our APIs connect the frontend experience – where users upload their images – to the storage, optimization, and delivery operations in the backend.

But building an application can take time. Our team saw a huge opportunity to take away as much extra work as possible, and we wanted to provide off-the-shelf integration to speed up the development process.

With the upload widget, you can seamlessly integrate Cloudflare Images into your application within minutes. The widget can be integrated in two ways: by embedding a script into a static HTML page or by installing a package that works with your favorite framework. We provide a ready-made Worker template that you can deploy directly to your account to connect your frontend application with Cloudflare Images and authorize users to upload through the widget.

To try out the upload widget, sign up for our closed beta.

Optimize images of people with automatic face cropping for Cloudflare Images

Cloudflare Images lets you dynamically manipulate images in different aspect ratios and dimensions for various use cases. With face cropping for Cloudflare Images, you can now crop and resize images of people’s faces at scale. For example, if you’re building a social media application, you can apply automatic face cropping to generate profile picture thumbnails from user-uploaded images.

Our existing gravity parameter uses saliency detection to set the focal point of an image based on the most visually interesting pixels, which determines how the image will be cropped. We expanded this feature by using a machine learning model called RetinaFace, which classifies images that have human faces. We’re also introducing a new zoom parameter that you can combine with face cropping to specify how closely an image should be cropped toward the face.

To apply face cropping to your image optimization, sign up for our closed beta.

Photo by Eye for Ebony on Unsplash
https://example.com/cdn-cgi/image/fit=crop,width=500,height=500,gravity=face,zoom=0.6/https://example.com/images/picture.jpg

Meet the Media team over Discord

As we’re working to build the next set of media tools, we’d love to hear what you’re building for your users. Come say hi to us on Discord. You can also learn more by visiting our developer documentation for Calls, Stream, and Images.

Cloudflare Calls: millions of cascading trees all the way down

Post Syndicated from Renan Dincer original https://blog.cloudflare.com/cloudflare-calls-anycast-webrtc


Following its initial announcement in September 2022, Cloudflare Calls is now in open beta and available in your Cloudflare Dashboard. Cloudflare Calls lets developers build real-time audio/video apps using WebRTC, and it abstracts away the complexity by turning the Cloudflare network into a singular SFU. In this post, we dig into how we make this possible.

WebRTC growing pains

WebRTC is the only way to send UDP traffic out of a web browser – everything else uses TCP.

As a developer, you need a UDP-based transport layer for applications demanding low latency and real-time feedback, such as audio/video conferencing and interactive gaming. This is because unlike WebSocket and other TCP-based solutions, UDP is not subject to head-of-line blocking, a frequent topic on the Cloudflare Blog.

When building a new video conferencing app, you typically start with a peer-to-peer web application using WebRTC, where clients exchange data directly. This approach is efficient for small-scale demos, but scalability issues arise as the number of participants increases. This is because the amount of data each client must transmit grows substantially, following an almost exponential increase relative to the number of participants, as each client needs to send data to n-1 other clients.

Selective Forwarding Units (SFUs) play pivotal roles in scaling WebRTC applications. An SFU functions by receiving multiple media or data flows from participants and deciding which streams should be forwarded to other participants, thus acting as a media stream routing hub. This mechanism significantly reduces bandwidth requirements and improves scalability by managing stream distribution based on network conditions and participant needs. Even though it hasn’t always been this way from when video calling on computers first became popular, SFUs are often found in the cloud, rather than home computers of clients, because of superior connectivity offered in a data center.

A modern audio/video application thus quickly becomes complicated with the addition of this server side element. Since all clients connect to this central SFU server, there are numerous things to consider when you’re architecting and scaling a real-time application:

  • How close is the SFU server location(s) to the end user clients, how is a client assigned to a server?
  • Where is the SFU hosted, and if it’s hosted in the cloud, what are the egress costs from VMs?
  • How many participants can fit in a “room”? Are all participants sending and receiving data? With cameras on? Audio only?
  • Some SFUs require the use of custom SDKs. Which platforms do these run on and are they compatible with the application you’re trying to build?
  • Monitoring/reliability/other issues that come with running infrastructure

Some of these concerns, and the complexity of WebRTC infrastructure in general, has made the community look in different directions. However, it is clear that in 2024, WebRTC is alive and well with plenty of new and old uses. AI startups build characters that converse in real time, cars leverage WebRTC to stream live footage of their cameras to smartphones, and video conferencing tools are going strong.

WebRTC has been interesting to us for a while. Cloudflare Stream implemented WHIP and WHEP WebRTC video streaming protocols in 2022, which remain the lowest latency way to broadcast video. OBS Studio implemented WHIP broadcasting support as have a variety of software and hardware vendors alongside Cloudflare. In late 2022, we launched Cloudflare Calls in closed beta. When we blogged about it back then, we were very impressed with how WebRTC fared, and spoke to many customers about their pain points as well as creative ideas the existing browser APIs can foster. We also saw other WebRTC-based apps like Clubhouse rise in popularity and Twitter Spaces play a role in popular culture. Today, we see real-time applications of a different sort. Many AI projects have impressive demos with voice/video interactions. All of these apps are built with the same WebRTC APIs and system architectures.

We are confident that Cloudflare Calls is a new kind of WebRTC infrastructure you should try. When we set out to build Cloudflare Calls, we had a few ideas that we weren’t sure would work, but were worth trying:

  • Build every WebRTC component on Anycast with a single IP address for DTLS, ICE, STUN, SRTP, SCTP, etc.
  • Don’t force an SDK – WebRTC APIs by themselves are enough, and allow for the most novel uses to shine, because best developers always find ways to hit the limits of SDKs.
  • Deploy in all 310+ cities Cloudflare operates in – use every Cloudflare server, not just a subset
  • Exchange offer and answer over HTTP between Cloudflare and the WebRTC client. This way there is only a single PeerConnection to manage.

Now we know this is all possible, because we made it happen, and we think it’s the best experience a developer can get with pure WebRTC.

Is Cloudflare Calls a real SFU?

Cloudflare is in the business of having computers in numerous places. Historically, our core competency was operating a caching HTTP reverse proxy, and we are very good at this. With Cloudflare Calls, we asked ourselves “how can we build a large distributed system that brings together our global network to form one giant stateful system that feels like a single machine?”

When using Calls, every PeerConnection automatically connects to the closest Cloudflare data center instead of a single server. Rather than connecting every client that needs to communicate with each other to a single server, anycast spreads out connections as much as possible to minimize last mile latency sourced from your ISP between your client and Cloudflare.

It’s good to minimize last mile latency because after the data enters Cloudflare’s control, the underlying media can be managed carefully and routed through the Cloudflare backbone. This is crucial for WebRTC applications where millisecond delays can significantly impact user experience. To give you a sense about latency between Cloudflare’s data centers and end-users, about 95% of the Internet connected population is within 50ms of a Cloudflare data center. As I write this, I am about 20ms away, but in the past, I have been lucky enough to be connected to a **great** home Wi-Fi network less than 1ms away in Manhattan. “But you are just one user!” you might be thinking, so here is a chart from Cloudflare Radar showing recent global latency measurements:

This setup allows more opportunities for packets lost to be replied with retransmissions closer to users, more opportunities for bandwidth adjustments.

Eliminating SFU region selection

A traditional challenge in WebRTC infrastructure involves the manual selection of Selective Forwarding Units (SFUs) based on geographic location to minimize latency. Some systems solve this problem by selecting a location for the SFU after the first user joins the “room”. This makes routing inefficient when the rest of the participants in the conversation are clustered elsewhere. The anycast architecture of Calls eliminates this issue. When a client initiates a connection, BGP dynamically determines the closest data center. Each selected server only becomes responsible for the PeerConnection of the clients closest to it.

One might see this is actually a simpler way of managing servers, as there is no need to maintain a layer of WebRTC load balancing for traffic or CPU capacity between servers. However, anycast has its own challenges, and we couldn’t take a laissez-faire approach.

Steps to establishing a PeerConnection

One of the challenging parts in assigning a server to a client PeerConnection is supporting dual stack networking for backwards compatibility with clients that only support the old version of the Internet Protocol, IPv4.

Cloudflare Calls uses a single IP address per protocol, and our L4 load balancer directs packets to a single server per client by using the 4-tuple {client IP, client port, destination IP, destination port} hashing. This means that every ICE connectivity check packet arrives at different servers: one for IPv4 and one for IPv6.

ICE is not the only protocol used for WebRTC; there is also STUN and TURN for connectivity establishment. Actual media bits are encrypted using DTLS, which carries most of the data during a session.

DTLS packets don’t have any identifiers in them that would indicate they belong to a specific connection (unlike QUIC’s connection ID field), so every server should be able to handle DTLS packets and get the necessary certificates to be able to decrypt them for processing. DTLS encryption is negotiated at the SDP layer using the HTTPS API.

The HTTPS API for Calls also lands on a different server than DTLS and ICE connectivity checks. Since DTLS packets need information from the SDP exchanged using the HTTPS API, and ICE connectivity checks depend on the HTTPS API for userFragment and password fields in the connectivity check packets, it would be very useful for all of these to be available in one server. Yet in our setup, they’re not.

Fippo and Gustavo of WebRTCHacks complained (gracefully noted) about slow replies to ICE connectivity checks in their great article as they were digging into our WHIP implementation right around our announcement in 2022:

Looking at the Wireshark dumps we see a surprisingly large amount of time pass between the first STUN request and the first STUN response – it was 1.8 seconds in the screenshot below.

In other tests, it was shorter, but still 600ms long.

After that, the DTLS packets do not get an immediate response, requiring multiple attempts. This ultimately leads to a call setup time of almost three seconds – way above the global average of 800ms Fippo has measured previously (for the complete handshake, 200ms for the DTLS handshake). For Cloudflare with their extensive network, we expected this to be way below that average.

Gustavo and Fippo observed our solution to this problem of different parts of the WebRTC negotiation landing on different servers. Since Cloudflare Calls unbundles the WebRTC protocol to make the entire network act like a single computer, at this critical moment, we need to form consensus across the network. We form consensus by configuring every server to handle any incoming PeerConnection just in time. When a packet arrives, if the server doesn’t know about it, it quickly learns about the negotiated parameters from another server, such as the ufrag and the DTLS fingerprint from the SDP, and responds with the appropriate response.

Getting faster

Even though we’ve sped up the process of forming consensus across the Cloudflare network, any delays incurred can still have weird side effects. For example, up until a few months ago, delays of a few hundred milliseconds caused slow connections in Chrome.

A connectivity check packet delayed by a few hundred milliseconds signals to Chrome that this is a high latency network, even though every other STUN message after that was replied to in less than 5-10ms. Chrome thus delays sending a USE-CANDIDATE attribute in the responses for a few seconds, degrading the user experience.

Fortunately, Chrome also sends DTLS ClientHello before USE-CANDIDATE (behavior we’ve seen only on Chrome), so to help speed up Chrome, Calls uses DTLS packets in place of STUN packets with USE-CANDIDATE attributes.

After solving this issue with Chrome, PeerConnections globally now take about 100-250ms to get connected. This includes all consensus management, STUN packets, and a complete DTLS handshake.

Sessions and Tracks are the building blocks of Cloudflare’s SFU, not rooms

Once a PeerConnection is established to Cloudflare, we call this a Session. Many media Tracks or DataChannels can be published using a single Session, which returns a unique ID for each. These then can be subscribed to over any other PeerConnection anywhere around the world using the unique ID. The tracks can be published or subscribed anytime during the lifecycle of the PeerConnection.

In the background, Cloudflare takes care of scaling through a fan-out architecture with cascading trees that are unique per track. This structure works by creating a hierarchy of nodes where the root node distributes the stream to intermediate nodes, which then fan out to end-users. This significantly reduces the bandwidth required at the source and ensures scalability by distributing the load across the network. This simple but powerful architecture allows developers to build anything from 1:1 video calls to large 1:many or many:many broadcasting scenarios with Calls.

There is no “room” concept in Cloudflare Calls. Each client can add as many tracks into a PeerConnection as they’d like. The limit is the bandwidth available between Cloudflare and the client, which is practically limited by the client side every time. The signaling or the concept of a “room” is left to the application developer, who can choose to pull as many tracks as they’d like from the tracks they have pushed elsewhere into a PeerConnection. This allows developers to move participants into breakout rooms and then back into a plenary room, and then 1:1 rooms while keeping the same PeerConnection and MediaTracks active.

Cloudflare offers an unopinionated approach to bandwidth management, allowing for greater control in customizing logic to suit your business needs. There is no active bandwidth management or restriction on the number of tracks. The WebRTC Stats API provides a standardized way to access data on packet loss and possible congestion, enabling you to incorporate client-side logic based on this information. For instance, if poor Wi-Fi connectivity leads to degraded service, your front-end could inform the user through a notice and automatically reduce the number of video tracks for that client.

“NACK shield” at the edge

The Internet can’t guarantee timely and orderly delivery of packets, leading to the necessity of retransmission mechanisms, particularly in protocols like TCP. This ensures data eventually reaches its destination, despite possible delays. Real-time systems, however, need special consideration of these delays. A packet that is delayed past its deadline for rendering on the screen is worthless, but a packet that is lost can be recovered if it can be retransmitted within a very short period of time, on the order of milliseconds. This is where NACKs come to play.

A WebRTC client receiving data constantly checks for packet loss. When one or more packets don’t arrive at the expected time or a sequence number discontinuity is seen on the receiving buffer, a special NACK packet is sent back to the source in order to ask for a packet retransmission.

In a peer-to-peer topology, if it receives a NACK packet, the source of the data has to retransmit packets for every participant. When an SFU is used, the SFU could send NACKs back to source, or keep a complex buffer for each client to handle retransmissions.

This gets more complicated with Cloudflare Calls, since both the publisher and the subscriber connect to Cloudflare, likely to different servers and also probably in different locations. In addition, there is a possibility of other Cloudflare data centers in the middle, either through Argo, or just as part of scaling to many subscribers on the same track.

It is common for SFUs to backpropagate NACK packets back to the source, losing valuable time to recover packets. Calls goes beyond this and can handle NACK packets in the location closest to the user, which decreases overall latency. The latency advantage gives more chance for the packet to be recovered compared to a centralized SFU or no NACK handling at all.

Since there is possibly a number of Cloudflare data centers between clients, packet loss within the Cloudflare network is also possible. We handle this by generating NACK packets in the network. With each hop that is taken with the packets, the receiving end can generate NACK packets. These packets are then recovered or backpropagated to the publisher to be recovered.

Cloudflare Calls does TURN over Anycast too

Separately from the SFU, Calls also offers a TURN service. TURN relays act as relay points for traffic between WebRTC clients like the browser and SFUs, particularly in scenarios where direct communication is obstructed by NATs or firewalls. TURN maintains an allocation of public IP addresses and ports for each session, ensuring connectivity even in restrictive network environments.

Cloudflare Calls’ TURN service supports a few ports to help with misbehaving middleboxes and firewalls:

  • TURN-over-UDP over port 3748 (standard), and also port 53
  • TURN-over-TCP over ports 3748 and 80
  • TURN-over-TLS over ports 5349 and 443

TURN works the same way as Calls, available over anycast and always connecting to the closest datacenter.

Pricing and how to get started

Cloudflare Calls is now in open beta and available in your Cloudflare Dashboard. Depending on your use case, you can set up an SFU application and/or a TURN service with only a few clicks.

To kick off its open beta phase, Calls is available at no cost for a limited time. Starting May 15, 2024, customers will receive the first terabyte each month for free, with any usage beyond that charged at $0.05 per real-time gigabyte. Beta customers will be provided at least 30 days to upgrade from the free beta to a paid subscription. Additionally, there are no charges for in-bound traffic to Cloudflare. For volume pricing, talk to your account manager.

Cloudflare Calls is ideal if you are building new WebRTC apps. If you have existing SFUs or TURN infrastructure, you may still consider using Calls alongside your existing infrastructure. Building a bridge to Calls from other places is not difficult as Cloudflare Calls supports standard WebRTC APIs and acts like just another WebRTC peer.

We understand that getting started with a new platform is difficult, so we’re also open sourcing our internal video conferencing app, Orange Meets. Orange Meets supports small and large conference calls by maintaining room state in Workers Durable Objects. It has screen sharing, client-side noise-canceling, and background blur. It is written with TypeScript and React and is available on GitHub.

We’re hiring

We think the current state of Cloudflare Calls enables many use cases. Calls already supports publishing and subscribing to media tracks and DataChannels. Soon, it will support features like simulcasting.

But we’re just scratching the surface and there is so much more to build on top of this foundation.

If you are passionate about WebRTC (and other real-time protocols!!), the Media Platform team building the Calls product at Cloudflare is hiring and would love to talk to you.

Improving Cloudflare Workers and D1 developer experience with Prisma ORM

Post Syndicated from Jon Harrell (Guest Author) original https://blog.cloudflare.com/prisma-orm-and-d1


Working with databases can be difficult. Developers face increasing data complexity and needs beyond simple create, read, update, and delete (CRUD) operations. Unfortunately, these issues also compound on themselves: developers have a harder time iterating in an increasingly complex environment. Cloudflare Workers and D1 help by reducing time spent managing infrastructure and deploying applications, and Prisma provides a great experience for your team to work and interact with data.  

Together, Cloudflare and Prisma make it easier than ever to deploy globally available apps with a focus on developer experience. To further that goal, Prisma Object Relational Mapper (ORM) now natively supports Cloudflare Workers and D1 in Preview. With version 5.12.0 of Prisma ORM you can now interact with your data stored in D1 from your Cloudflare Workers with the convenience of the Prisma Client API. Learn more and try it out now.

What is Prisma?

From writing to debugging, SQL queries take a long time and slow developer productivity. Even before writing queries, modeling tables can quickly become unwieldy, and migrating data is a nerve-wracking process. Prisma ORM looks to resolve all of these issues by providing an intuitive data modeling language, an automated migration workflow, and a developer-friendly and type-safe client for JavaScript and TypeScript, allowing developers to focus on what they enjoy: developing!

Prisma is focused on making working with data easy. Alongside an ORM, Prisma offers Accelerate and Pulse, products built on Cloudflare that cover needs from connection pooling, to query caching, to real-time type-safe database subscriptions.

How to get started with Prisma ORM, Cloudflare Workers, and D1

To get started with Prisma ORM and D1, first create a basic Cloudflare Workers app. This guide will start with the ”Hello World” Worker example app, but any Workers example app will work. If you don’t have a project yet, start by creating a new one. Name your project something memorable, like my-d1-prisma-app and select “Hello World” worker and TypeScript. For now, we will choose to not deploy and will wait until after we have set up D1 and Prisma ORM.

npm create cloudflare@latest

Next, move into your newly created project and make sure that dependencies are installed:

cd my-d1-prisma-app && npm install

After dependencies are installed, we can move on to the D1 setup.

First, create a new D1 database for your app.

npx wrangler d1 create prod-prisma-d1-app
.
.
.

[[d1_databases]]
binding = "DB" # i.e. available in your Worker on env.DB
database_name = "prod-prisma-d1-app"
database_id = "<unique-ID-for-your-database>"

The section starting with [[d1_databases]] is the binding configuration needed in your wrangler.toml for your Worker to communicate with D1. Add that now:

// wrangler.toml
name="my-d1-prisma-app"
main = "src/index.ts"
compatibility_date = "2024-03-20"
compatibility_flags = ["nodejs_compat"]

[[d1_databases]]
binding = "DB" # i.e. available in your Worker on env.DB
database_name = "prod-prisma-d1-app"
database_id = "<unique-ID-for-your-database>"

Your application now has D1 available! Next, add Prisma ORM to manage your queries, schema and migrations! To add Prisma ORM, first make sure the latest version is installed. Prisma ORM versions 5.12.0 and up support Cloudflare Workers and D1.

npm install prisma@latest @prisma/client@latest @prisma/adapter-d1

Now run npx prisma init in order to create the necessary files to start with. Since D1 uses SQLite’s SQL dialect, we set the provider to be sqlite.

npx prisma init --datasource-provider sqlite

This will create a few files, but the one to look at first is your Prisma schema file, available at prisma/schema.prisma

// schema.prisma
// This is your Prisma schema file,
// learn more about it in the docs: https://pris.ly/d/prisma-schema

generator client {
  provider = "prisma-client-js"
}

datasource db {
  provider = "sqlite"
  url  = env("DATABASE_URL")
}

Before you can create any models, first enable the driverAdapters Preview feature. This will allow the Prisma Client to use an adapter to communicate with D1.

// schema.prisma
// This is your Prisma schema file,
// learn more about it in the docs: https://pris.ly/d/prisma-schema

generator client {
  provider = "prisma-client-js"
+ previewFeatures = ["driverAdapters"]
}

datasource db {
  provider = "sqlite"
  url      = env("DATABASE_URL")
}

Now you are ready to create your first model! In this app, you will be creating a “ticker”, a mainstay of many classic Internet sites.

Add a new model to your schema, Visit, which will track that an individual visited your site. A Visit is a simple model that will have a unique ID and the time at which an individual visited your site.

// This is your Prisma schema file,
// learn more about it in the docs: https://pris.ly/d/prisma-schema

generator client {
  provider        = "prisma-client-js"
  previewFeatures = ["driverAdapters"]
}

datasource db {
  provider = "sqlite"
  url      = env("DATABASE_URL")
}

+ model Visit {
+   id        Int      @id @default(autoincrement())
+   visitTime DateTime @default(now())
+ }

Now that you have a schema and a model, let’s create a migration. First use wrangler to generate an empty migration file and prisma migrate to fill it. If prompted, select “yes” to create a migrations folder at the root of your project.

npx wrangler d1 migrations create prod-prisma-d1-app init
 ⛅️ wrangler 3.36.0
-------------------
✔ No migrations folder found. Set `migrations_dir` in wrangler.toml to choose a different path.
Ok to create /path/to/your/project/my-d1-prisma-app/migrations? … yes
✅ Successfully created Migration '0001_init.sql'!

The migration is available for editing here
/path/to/your/project/my-d1-prisma-app/migrations/0001_init.sql
npx prisma migrate diff --script --from-empty --to-schema-datamodel ./prisma/schema.prisma >> migrations/0001_init.sql

The npx prisma migrate diff command takes the difference between your database (which is currently empty) and the Prisma schema. It then saves this difference to a new file in the migrations directory.

// 0001_init.sql
-- Migration number: 0001 	 2024-03-21T22:15:50.184Z
-- CreateTable
CREATE TABLE "Visit" (
    "id" INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
    "visitTime" DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP

Now you can migrate your local and remote D1 database instances using wrangler and re-generate your Prisma Client to begin making queries.

npx wrangler d1 migrations apply prod-prisma-d1-app --local
npx wrangler d1 migrations apply prod-prisma-d1-app --remote
npx prisma generate

Make sure to import PrismaClient and PrismaD1, define the binding for your D1 database, and you’re ready to use Prisma in your application.

// src/index.ts
import { PrismaClient } from "@prisma/client";
import { PrismaD1 } from "@prisma/adapter-d1";

export interface Env {
  DB: D1Database,
}

export default {
  async fetch(request: Request, env: Env, ctx: ExecutionContext): Promise<Response> {
    const adapter = new PrismaD1(env.DB);
    const prisma = new PrismaClient({ adapter });
    const { pathname } = new URL(request.url);

    if (pathname === '/') {
      const numVisitors = await prisma.visit.count();
      return new Response(
        `You have had ${numVisitors} visitors!`
      );
    }

    return new Response('');
  },
};

You may notice that there’s always 0 visitors. Add another route to create a new visitor whenever someone visits the /visit route

// src/index.ts
import { PrismaClient } from "@prisma/client";
import { PrismaD1 } from "@prisma/adapter-d1";

export interface Env {
  DB: D1Database,
}

export default {
  async fetch(request: Request, env: Env, ctx: ExecutionContext): Promise<Response> {
    const adapter = new PrismaD1(env.DB);
    const prisma = new PrismaClient({ adapter });
    const { pathname } = new URL(request.url);

    if (pathname === '/') {
      const numVisitors = await prisma.visit.count();
      return new Response(
        `You have had ${numVisitors} visitors!`
      );
    } else if (pathname === '/visit') {
      const newVisitor = await prisma.visit.create({ data: {} });
      return new Response(
        `You visited at ${newVisitor.visitTime}. Thanks!`
      );
    }

    return new Response('');
  },
};

Your app is now set up to record visits and report how many visitors you have had!

Summary and further reading

We were able to build a simple app easily with Cloudflare Workers, D1 and Prisma ORM, but the benefits don’t stop there! Check the official documentation for information on using Prisma ORM with D1 along with workflows for migrating your data, and even extending the Prisma Client for your specific needs.

Data Anywhere with Pipelines, Event Notifications, and Workflows

Post Syndicated from Matt Silverlock original https://blog.cloudflare.com/data-anywhere-events-pipelines-durable-execution-workflows


Data is fundamental to any real-world application: the database storing your user data and inventory, the analytics tracking sales events and/or error rates, the object storage with your web assets and/or the Parquet files driving your data science team, and the vector database enabling semantic search or AI-powered recommendations for your users.

When we first announced Workers back in 2017, and then Workers KV, Cloudflare R2, and D1, it was obvious that the next big challenge to solve for developers would be in making it easier to ingest, store, and query the data needed to build scalable, full-stack applications.

To that end, as part of our quest to make building stateful, distributed-by-default applications even easier, we’re launching our new Event Notifications service; a preview of our upcoming streaming ingestion product, Pipelines; and a sneak peek into our take on durable execution, Workflows.

Event-based architectures

When you’re writing data — whether that’s new data, changing existing data, or deleting old data — you often want to trigger other, asynchronous work to run in response. That could be processing user-driven uploads, updating search indexes as the underlying data changes, or removing associated rows in your SQL database when content is removed.

In order to make these event-driven workflows far easier to build across Cloudflare, we’re launching the first step towards a wider Event Notifications platform across Cloudflare, starting with notifications support in R2.

You can read more in the deep-dive on Event Notifications for R2, but in a nutshell: you can configure changes to content in any R2 bucket to write directly to a Queue, allowing you to reliably consume those events in a Worker or to pull from compute in a legacy cloud.

Event Notifications for R2 are just the beginning, though. There are many kinds of events you might want to trigger as a developer — these are just some of the event types we’re planning to support:

  • Changes (writes) to key-value pairs in your Workers KV namespaces.
  • Updates to your D1 databases, including changed rows or triggers.
  • Deployments to your Cloudflare Workers

Consuming event notifications from a single Worker is just one approach, though. As you start to consume events, you may want to trigger multi-step workflows that execute reliably, resume from errors or exceptions, and ensure that previous steps aren’t duplicated or repeated unnecessarily. An event notification framework turns out to be just the thing needed to drive a workflow engine that executes durably

Making it even easier to ingest data

When we launched Cloudflare R2, our object storage service, we knew that supporting the de facto-standard S3 API was critical in order to allow developers to bring the tooling and services they already had over to R2. But the S3 API is designed to be simple: at its core, it provides APIs for upload, download, multipart and metadata operations, and many tools don’t support the S3 API.

What if you want to batch clickstream data from your web services so that it’s efficient (and cost-effective) to query by your analytics team? Or partition data by customer ID, merchant ID, or locale within a structured data format like JSON?

Well, we want to help solve this problem too, and so we’re announcing Pipelines, an upcoming streaming ingestion service designed to ingest data at scale, aggregate it, and write it directly to R2, without you having to manage infrastructure, partitions, runners, or worry about durability.

With Pipelines, creating a globally scalable ingestion endpoint that can ingest tens-of-thousands of events per second doesn’t require any code:

$ wrangler pipelines create clickstream-ingest-prod --batch-size="1MB" --batch-timeout-secs=120 --batch-on-json-key=".merchantId" --destination-bucket="prod-cs-data"

✅ Successfully created new pipeline "clickstream-ingest-prod"
📥 Created endpoints:
➡ HTTPS: https://d458dbe698b8eef41837f941d73bc5b3.pipelines.cloudflarestorage.com/clickstream-ingest-prod
➡ WebSocket: wss://d458dbe698b8eef41837f941d73bc5b3.pipelines.cloudflarestorage.com:8443/clickstream-ingest-prod
➡ Kafka: d458dbe698b8eef41837f941d73bc5b3.pipelines.cloudflarestorage.com:9092 (topic: clickstream-ingest-prod)

As you can see here, we’re already thinking about how to make Pipelines protocol-agnostic: write from a HTTP client, stream events over a WebSocket, and/or redirect your existing Kafka producer (and stop having to manage and scale Kafka) directly to Pipelines.

But that’s just the beginning of our vision here. Scalable ingestion and simple batching is one thing, but what about if you have more complex needs? Well, we have a massively scalable compute platform (Cloudflare Workers) that can help address this too.

The code below is just an initial exploration for how we’re thinking about an API for running transforms over streaming data. If you’re aware of projects like Apache Beam or Flink, this programming model might even look familiar:

export default {    
   // Pipeline handler is invoked when batch criteria are met
   async pipeline(stream: StreamPipeline, env: Env, ctx: ExecutionContext): Promise<StreamingPipeline> {
      // ...
      return stream
         // Type: transform(label: string, transformFunc: TransformFunction): Promise<StreamPipeline>
         // Each transform has a label that is used in metrics to provide
    // per-transform observability and debugging
         .transform("human readable label", (events: Array<StreamEvent>) => {
            return events.map((e) => ...)
         })
         .transform("another transform", (events: Array<StreamEvent>) => {
            return events.map((e) => ...)
         })
         .writeToR2({
            format: "json",
            bucket: "MY_BUCKET_NAME",
            prefix: somePrefix,
            batchSize: "10MB"
         })
   }
}

Specifically:

  • The Worker describes a pipeline of transformations (mapping, reducing, filtering) that operates over each subset of events (records)
  • You can call out to other services — including D1 or KV — in order to synchronously or asynchronously hydrate data or lookup values during your stream processing
  • We take care of scaling horizontally based on records-per-second and/or any concurrency settings you configure based on processing latency requirements.

We’ll be bringing Pipelines into open beta later in 2024, and it will initially launch with support for HTTP ingestion and R2 as a destination (sink), but we’re already thinking bigger.

We’ll be sharing more as Pipelines gets closer to release. In the meantime, you can register your interest and share your use-case, and we’ll reach out when Pipelines reaches open beta.

Durable Execution

If the term “Durable Execution” is new to you, don’t worry: the term comes from the desire to run applications that can resume execution from where they left off, even if the underlying host or compute fails (where the “durable” part comes from).

As we’ve continued to build out our data and AI platforms, we’ve been acutely aware that developers need ways to create reliable, repeatable workflows that operate over that data, turn unstructured data into structured data, trigger on fresh data (or periodically), and automatically retry, restart, and export metrics for each step along the way. The industry calls this Durable Execution: we’re just calling it Workflows.

What makes Workflows different from other takes on Durable Execution is that we provide the underlying compute as part of the platform. You don’t have to bring-your-own compute, or worry about scaling it or provisioning it in the right locations. Workflows runs on top of Cloudflare Workers – you write the workflow, and we take care of the rest.

Here’s an early example of writing a Workflow that generates text embeddings using Workers AI and stores them (ready to query) in Vectorize as new content is written to (or updated within) R2.

  • Each Workflow run is triggered by an Event Notification consumed from a Queue, but could also be triggered by a HTTP request, another Worker, or even scheduled on a timer.
  • Individual steps within the Workflow allow us to define individually retriable units of work: in this case, we’re reading the new objects from R2, creating text embeddings using Workers AI, and then inserting.
  • State is durably persisted between steps: each step can emit state, and Workflows will automatically persist that so that any underlying failures, uncaught exceptions or network retries can resume execution from the last successful step.
  • Every call to step() automatically emits metrics associated with the unique Workflow run, making it easier to debug within each step and/or break down your application into its smallest units of execution, without having to worry about observability.

Step-by-step, it looks like this:

Transforming this series of steps into real code, here’s what this would look like with Workflows:

import { Ai } from "@cloudflare/ai";
import { Workflow } from "cloudflare:workers";

export interface Env {
  R2: R2Bucket;
  AI: any;
  VECTOR_INDEX: VectorizeIndex;
}

export default class extends Workflow {
  async run(event: Event) {
    const ai = new Ai(this.env.AI);

    // List of keys to fetch from our incoming event notification
    const keysToFetch = event.messages.map((val) => {
      return val.object.key;
    });

    // The return value of each step is stored (the "durable" part
    // of "durable execution")
    // This ensures that state can be persisted between steps, reducing
    // the need to recompute results ($$, time) should subsequent
    // steps fail.
    const inputs = await this.ctx.run(
      // Each step has a user-defined label
      // Metrics are emitted as each step runs (to success or failure)
// with this label attached and available within per-Workflow
// analytics in near-real-time.
"read objects from R2", async () => {
      const objects = [];

      for (const key of keysToFetch) {
        const object = await this.env.R2.get(key);
        objects.push(await object.text());
      }

      return objects;
    });


    // Persist the output of this step.
    const embeddings = await this.ctx.run(
      "generate embeddings",
      async () => {
        const { data } = await ai.run("@cf/baai/bge-small-en-v1.5", {
          text: inputs,
        });

        if (data.length) {
          return data;
        } else {
          // Uncaught exceptions trigger an automatic retry of the step
          // Retries and timeouts have sane defaults and can be overridden
    // per step
          throw new Error("Failed to generate embeddings");
        }
      },
      {
        retries: {
          limit: 5,
          delayMs: 1000,
          backoff: "exponential",
        },
      }
    );

    await this.ctx.run("insert vectors", async () => {
      const vectors = [];

      keysToFetch.forEach((key, index) => {
        vectors.push({
          id: crypto.randomUUID(),
          // Our embeddings from the previous step
          values: embeddings[index].values, 
          // The path to each R2 object to map back to during
 	    // vector search
          metadata: { r2Path: key },
        });
      });

      return this.env.VECTOR_INDEX.upsert();
    });
  }
}

This is just one example of what a Workflow can do. The ability to durably execute an application, modeled as a series of steps, applies to a wide number of domains. You can apply this model of execution to a number of use-cases, including:

  • Deploying software: each step can define a build step and subsequent health check, gating further progress until your deployment meets your criteria for “healthy”.
  • Post-processing user data: triggering a workflow based on user uploads (e.g. to Cloudflare R2) that then subsequently parses that data asynchronously, redacts PII or sensitive data, writes the sanitized output, and triggers a notification via email, webhook, or mobile push.
  • Payment and batch workflows: aggregating raw customer usage data on a periodic schedule by querying your data warehouse (or Workers Analytics Engine), triggering usage or spend alerts, and/or generating PDF invoices.

Each of these use cases model tasks that you want to run to completion, minimize redundant retries by persisting intermediate state, and (importantly) easily observe success and failure.

We’ll be sharing more about Workflows during the second quarter of 2024 as we work towards an open (public!) beta. This includes how we’re thinking about idempotency and interactions with our storage, per-instance observability and metrics, local development, and templates to bootstrap common workflows.

Putting it together

We’ve often thought of Cloudflare’s own network as one massively scalable parallel data processing cluster: data centers in 310+ cities, with the ability to run compute close to users and/or close to data, keep it within the bounds of regulatory or compliance requirements, and most importantly, use our massive scale to enable our customers to scale as well.

Recapping, a fully-fledged data platform needs to enable three things:

  1. Ingesting data: getting data into the platform (in the right format, from the right sources)
  2. Storing data: securely, reliably, and durably.
  3. Querying data: understanding and extracting insights from the data, and/or transforming it for use by other tools.

When we launched R2 we tackled the second part, but knew that we’d need to follow up with the first and third parts in order to make it easier for developers to get data in and make use of it.

If we look at how we can build a system that helps us solve each of these three parts together with Pipelines, Event Notifications, R2, and Workflows, we end up with an architecture that resembles this:

Specifically, we have Pipelines (1) scaling out to ingest data, batch it, filter it, and then durably store it in R2 (2) in a format that’s ready and optimized for querying. Workflows, ClickHouse, Databricks, or the query engine of your choice can then query (3) that data as soon as it’s ready — with “ready” being automatically triggered by an Event Notification as soon as the data is ingested and written to R2.

There’s no need to poll, no need to batch after the fact, no need to have your query engine slow down on data that wasn’t pre-aggregated or filtered, and no need to manage and scale infrastructure in order to keep up with load or data jurisdiction requirements. Create a Pipeline, write your data directly to R2, and query directly from it.

If you’re also looking at this and wondering about the costs of moving this data around, then we’re holding to one important principle: zero egress fees across all of our data products. Just as we set the stage for this with our R2 object storage, we intend to apply this to every data product we’re building, Pipelines included.

Start Building

We’ve shared a lot of what we’re building so that developers have an opportunity to provide feedback (including via our Developer Discord), share use-cases, and think about how to build their next application on Cloudflare.

R2 adds event notifications, support for migrations from Google Cloud Storage, and an infrequent access storage tier

Post Syndicated from Matt DeBoard original https://blog.cloudflare.com/r2-events-gcs-migration-infrequent-access


We’re excited to announce three new features for Cloudflare R2, our zero egress fee object storage platform:

Event Notifications Open Beta

The lifecycle of data often doesn’t stop immediately after upload to an R2 bucket – event data may need to be transformed and loaded into a data warehouse, media files may need to go through a post-processing step, etc. We’re releasing event notifications for R2 in open beta to enable building applications and workflows driven by your changing data.

Event notifications work by sending messages to your queue each time there is a change to your data. These messages are then received by a consumer Worker where you can then define any subsequent action that needs to be taken.

To get started enabling event notifications on your R2 bucket, you can run the following Wrangler command (replacing bucket_name and queue_name with your bucket and queue names respectively):

wrangler r2 bucket notification create <bucket_name> --event-type object-create --queue <queue_name>

For more information on how to set up event notifications on your R2 buckets today and limitations during beta, please refer to the documentation.

Super Slurper for Google Cloud Storage

Super Slurper can now migrate data from Google Cloud Storage (GCS) to Cloudflare R2. We released Super Slurper last year with the goal of making one-time comprehensive data migrations fast, reliable, and easy: there’s no need to spin up migration VMs and implement complicated retry logic. Since then, thousands of developers have used Super Slurper to migrate petabytes of data from AWS S3 to R2. Now Google Cloud Storage customers can migrate data to Cloudflare R2 to benefit from Cloudflare’s zero egress fees, whether you are permanently moving data to another provider or not.

To get started migrating data from GCS:

  1. From the Cloudflare dashboard, select R2 > Data Migration.
  2. Select Migrate files.
  3. Select Google Cloud Storage for the source bucket provider.
  4. Enter your bucket name and associated credentials and select Next.
  5. Enter your R2 bucket name and associated credentials and select Next.
  6. After you finish reviewing the details of your migration, select Migrate files.

You can view the status of your migration job at any time on the dashboard. For more information on how to use Super Slurper, please refer to the documentation here.

Infrequent Access Private Beta

We’re excited to introduce the private beta of our new Infrequent Access storage class. For use cases that involve data that isn’t frequently accessed (long tail user-generated content, logs, etc), Infrequent Access gives you the ability to pay less for storage while maintaining performance and durability.

Here’s an example of how you can upload an object to your R2 bucket with the new Infrequent Access storage class using Workers:

# wrangler.toml
[[r2_buckets]]
binding = 'MY_BUCKET'
bucket_name = '<YOUR_BUCKET_NAME>'

# index.ts
export default {
   async fetch(request: Request, env: Env): Promise<Response> {
      if (request.method === "PUT") {
         await env.MY_BUCKET.put("myobject", request.body, storageClass: "InfrequentAccess");
         return new Response("Put object successfully!");
      }
      return new Response("Not a PUT!");
   }
}

In addition to uploading objects directly to Infrequent Access, you can define an object lifecycle policy to move data to Infrequent Access after a period of time goes by and you no longer need to access your data as often. In the future, we plan to automatically optimize storage classes for data so you can avoid manually creating rules and better adapt to changing data access patterns.

For data stored in the Infrequent Access storage class, the pricing components will be similar to what you’re used to with R2: storage, Class A operations (writes, lists), Class B operations (reads), and data retrieval (processing). Data retrieval is charged per GB when data in the Infrequent Access storage class is retrieved and is what allows us to provide storage at a lower price. It reflects the additional computational resources required to fetch data from underlying storage optimized for less frequent access. And when the time comes, and you do need to use your data, there are still no egress fees.

Component Price
Storage $0.01 / GB-month
Class A Operations $9.00 / million requests
Class B Operations $0.90 / million requests
Data Retrieval (Processing) $0.01 / GB
Egress (or Data Transfer) $0 – No Charge

Are you interested in participating in the private beta for Infrequent Access?

Join the private beta waitlist to get access.

Have any feedback?

We would love to hear from you! To share your feedback about R2 and our data migration services, please join the Cloudflare Developer Discord. If you’re interested in learning more about R2, get started by visiting R2’s developer documentation or see how much you could save with our pricing calculator.

How Picsart leverages Cloudflare’s Developer Platform to build globally performant services

Post Syndicated from Mark Dembo original https://blog.cloudflare.com/picsart-move-to-workers-huge-performance-gains


Delivering great user experiences with a global user base can be challenging. While serving requests quickly when you start out in a local market is straightforward, doing so for a global audience is much more difficult. Why? Even under optimal conditions, you cannot be faster than the speed of light, which brings single data center solutions to their performance limits.

In this post, we will cover how Picsart improved the performance of one of its most critical services by moving from a centralized architecture to a globally distributed service built on Cloudflare. Our serverless compute platform, Workers, distributed throughout 310+ cities around the world, and our globally distributed Workers KV storage allowed them to improve their performance significantly and drive real business impact.

Success driven by data-driven insights

Picsart is one of the world’s largest digital creation platforms and a long-standing Cloudflare partner. At its core, an advanced tech stack powers its comprehensive features, including AI-driven photo and video editing tools and community-driven content sharing. With its infrastructure spanning across multiple cloud environments and on-prem deployments, Picsart is engineered to handle billions of daily requests from its huge mobile and web user base and API integrations. For over a decade, Cloudflare has been integral to Picsart, providing support for performant content delivery and securing its digital ecosystem.  

Similar to many other tech giants, Picsart approaches product development in a data-driven way. At the core of the innovation is Picsart’s remote configuration and experimentation platform, which enables product managers, UX researchers, and others to segment their user base into different test groups. These test groups might get to see slightly different implementations of features or designs of the Picsart app. Users might also get early access to experimental features or see different in-app promotions. In combination with constant monitoring of relevant KPIs, this allows for informed product decisions based on user preference and business impact.

On each app start, the client device sends a request to the remote configuration service for the latest setup tailored to the user’s session. The assignment of experiments relies on factors like the operating system and previous sessions, making each request unique and uncachable. Picsart’s app showcases extensive remote configuration capabilities, enabling adjustments to nearly every element. This results in a response containing a 1.5 MB configuration file for mobile clients. While the long-term solution is to reduce the file size, which has grown over time as more teams adopted the powerful service, this is not possible in the near or mid-term as it requires a major rewrite of all clients.

This setup request is blocking in the “hot path” during app start, as the results of this request will decide how the app itself looks and behaves. Hence, performance is critical. To ensure users are not waiting for too long, Picsart apps will wait for 1500ms on mobile for the request to complete – if it does not, the user will not be assigned a test group and the app will fallback to default settings.

The clock is ticking

While a 1500ms round trip time seems like a sufficiently large time budget, the data suggested otherwise. Before the improvements were implemented, a staggering 50% of devices could not complete the requests in time. How come? In these 1.5 seconds the following steps need to complete:

  1. The request must travel from the users’ devices to the centralized backend servers
  2. The server processes the request based on dozens of user attributes provided in the request and thousands of defined remote configuration variations, running experiments, and segments metadata. Using all the info, the server selects the right variation of each remote setting entry and builds the response payload.
  3. The response must travel from the centralized backend servers to the user devices.

Looking at the data, it was clear to the Picsart team that their backend service was already well-optimized, taking only 30 milliseconds, a tiny fraction of the available time budget, to process each of the billions of monthly requests. The bulk of the request time came from network latency. Especially with mobile devices, last mile performance can be very volatile, eating away a significant amount of the available time budget. Not only that, but the data was clear: users closer to the origin server had a much higher chance of making the round trip in time versus users out of region. It quickly became obvious that Picsart, fueled by its global success, had outgrown a single-region setup.

To the drawing board

A solution that comes to mind would be to replicate the existing cloud infrastructure in multiple regions and use global load balancing to minimize the distance a request needs to travel. However, this introduces significant overhead and cost. On the infrastructure side, it is not only the additional compute instances and database clusters that incur cost, but also cross-region data transfer to keep data in sync. Moreover, technical teams would need to operate and monitor infrastructure in multiple regions, which can add a lot to the complexity and cognitive load, leading to decreased development velocity and productivity loss.

Picsart instead looked to Cloudflare – we already had a long-lasting relationship for Application Performance and Security, and they aimed to use our Developer Platform to tackle the problem.

Workers and Workers KV seemed like the ideal solution. Both compute and data are globally distributed in 310+ locations around the world, resulting in a shorter distance between end users and the experimentation service. Not only that, but Cloudflare’s global-by-default approach allows for deployment with minimal overhead, and in contrast to other considered solutions, no additional fees to distribute the data around the globe.

No race without a clock

The objective for the refactor of the experimentation service was to increase the share of devices that successfully receive experimentation configuration within the set time budget.

But how to measure success? While synthetic testing can be useful in many situations, Picsart opted to come up with another clever solution:

During development, the Picsart engineers had already added a testing endpoint to the web and mobile versions of their app that sends a duplicate request to the new endpoint, discarding the response and swallowing all potential errors. This allows them to collect timing data based on real-user metrics without impacting the app’s performance and reliability.

A simplified version of this pattern for a web client could look like this:

// API endpoint URLs
const prodUrl = 'https://prod.example.com/';
const devUrl = 'https://new.example.com/';

// Function to collect metrics
const collectMetrics = (duration) => {
    console.log('Request duration:', duration);
    // …
};

// Function to fetch data from an endpoint and call collectMetrics
const fetchData = async (url, options) => {
    const startTime = performance.now();
    
    try {
        const response = await fetch(url, options);
        const endTime = performance.now();
        const duration = endTime - startTime;
        collectMetrics(duration);
        return await response.json();
    } catch (error) {
        console.error('Error fetching data:', error);
    }
};

// Fetching data from both endpoints
async function fetchDataFromBothEndpoints() {
    try {
        const result1 = await fetchData(prodUrl, { method: 'POST', ... });
        console.log('Result from endpoint 1:', result1);

        // Fetching data from the second endpoint without awaiting its completion
        fetchData(devUrl, { method: 'POST', ... });
    } catch (error) {
        console.error('Error fetching data from both endpoints:', error);
    }
}

fetchDataFromBothEndpoints();

Using existing data analytics tools, Picsart was able to analyze the performance of the new services from day one, starting with a dummy endpoint and a ‘hello world’ response. And with that a v0 was created that did not have the correct logic just yet, but simulated reading multiple values from KV and returning a response of a realistic size back to the end user.

The need for a do-over

In the initial phase, outcomes fell short of expectations. Surprisingly, requests were slower despite the service’s proximity to end users. What caused this setback?  Subsequent investigation unveiled multiple culprits and design patterns in need for optimization.

Data segmentation

The previous, stateful solution operated on a single massive “blob” of data exceeding 100MB in value. Loading this into memory incurred a few seconds of initial startup time, but once the VM completed the task, request handling was fast, benefiting from the readily available data in memory.

However, this approach doesn’t seamlessly transition to the serverless realm. Unlike long-running VMs, Worker isolates have short lifespans. Repeatedly parsing large JSON objects led to prolonged compute durations. Simply parsing four KV entries of 25MB each (KV maximum value size is 25MB) on each request was not a feasible option.

The Picsart team went back to solution design and embarked on a journey to optimize their system’s execution time, resulting in a series of impactful improvements.

The fundamental insight that guided the solution was the unnecessary overhead that was involved in loading and parsing data irrelevant to the user’s specific context. The 100MB configuration file contained configurations for all platforms and locations worldwide – a setup that was far from efficient in a globally distributed, serverless compute environment. For instance, when processing requests from users in the United States, there was no need to fetch configurations targeted for users in other countries, or for different platforms.

To address this inefficiency, the Picsart team stored the configuration of each platform and country in separate KV records. This targeted strategy meant that for a request originating from a US user on an Android device, our system would only fetch and parse the KV record specific to Android users in the US, thereby excluding all irrelevant data. This resulted in approximately 600 KV records, each with a maximum size of 10MB. While this leads to data duplication on the KV storage side, it decreases the amount of data that needs to be parsed upon request. As Cloudflare operates in over 120 countries around the world, only a subset of records were needed in each location. Hence, the increase in cardinality had minimal impact on KV cache performance, as demonstrated by more than 99.5% of KV reads being served from local cache.

Key Size
settings_part1.json 25MB
settings_part2.json 25MB
….

Before (simplified)

Key Size
com.picsart.studio_apple_us.json 6.1MB
com.picsart.studio_apple_de.json 6.1MB
com.picsart.studio_android_us.json 5.9MB

After (simplified)

This approach was a significant move for Picsart as they transitioned from a regional cloud setup to Cloudflare’s globally distributed connectivity cloud. By serving data from close proximity to end user locations, they were able to combat the high network latency from their previous setup. This strategy radically transformed the data-handling process. which unlocked two major benefits:

  • Performance Gains: By ensuring that only the relevant subset of data is fetched and parsed based on the user’s platform and geographical location, wall time and compute resources required for these operations could be significantly reduced.
  • Scalability and Flexibility: the granular segmentation of data enables effortless scaling of the service for new features or regional content. Adding support for new applications now only requires inserting new, standalone KV records in contrast to the previous solution where this would require increasing the size of the single record.

Immutable updates

Now that changes to the configuration were segmented by app, country, and platform, this also allowed for individual updates of the configuration in KV. KV storage showcases its best performance when records are updated infrequently but read very often. This pattern leverages KV’s fundamental design to cache values at edge locations upon reads, ensuring that subsequent queries for the same record are swiftly served by local caches rather than requiring a trip back to KV’s centralized data centers. This architecture is fundamental for minimizing latency and maximizing the speed of data retrieval across a globally distributed platform.

A crucial requirement for Picsart’s experimentation system was the ability to propagate updates of remote configuration values immediately. Updating existing records would require very short cache TTLs and even the minimum KV cache TTL of 60 seconds was considered unacceptable for the dynamic nature of the feature flagging. Moreover, setting short TTLs also impacts the cache hit ratio and the overall KV performance, specifically in regions with low traffic.

To reconcile the need for both rapid updates and efficient caching, Picsart adopted an innovative approach: making KV records immutable. Instead of modifying existing records, they opted to create new records with each configuration change. By appending the content hash to the KV key and writing new records after each update, Picsart ensured that each record was unique and immutable. This allowed them to leverage higher cache TTLs, as these records would never be updated.

Key Size
com.picsart.studio_apple_us.json 60s
….

Before (simplified)

Key Size
com.picsart.studio_apple_us_b58b59.json 86400s
com.picsart.studio_apple_us_273678.json 86400s

After (simplified)

There was a catch, though. The service must now keep track of the correct KV keys to use. The Picsart team addressed this challenge by storing references to the latest KV keys in the environment variables of the Worker.

Each configuration change triggers a new KV pair to be written and the services’ environment variables to be updated. As global Workers deployments take mere seconds, changes to the experimentation and configuration data are near-instantaneously globally available.

JSON serialization & alternatives

Following the previous improvements, the Picsart team made another significant discovery: only a small fraction of configuration data is needed to assign the experiments, while the remaining majority of the data comprises JSON values for the remote configuration payloads. While the service must deliver the latter in the response, the data is not required during the initial processing phase.

The initial implementation used KV’s get() method to retrieve the configuration data with the parameter type=json, which converts the KV value to an object. This process is very compute-intensive compared to using the get() method with parameter type= text, which simply returns the value as a string. In the context of Picsart’s data, the bulk of the CPU cycles were wasted on serializing JSON data that is not needed to perform the required business logic.

What if the data structure and code could be changed in such a way that only the data needed to assign experiments was parsed as JSON, while the configuration values were treated as text? Picsart went ahead with a new approach: splitting the KV records into two, creating a small 300KB record for the metadata, which can be quickly parsed to an object, and a 9.7MB record of configuration values. The extracted configuration values are delimited by newline characters. The respective line number is used as reference in the metadata entry, so that the respective configuration value for an experiment can be merged back into the payload response later.


{
  "name": "shape_replace_items",
  "default_value": "<large json object>",
  "segments": [
    {
      "id": "f1244",
      "value": "<Another json object
json object>"
    },
    {
      "id": "a2lfd",
      "value": "<Yet another large json
object>"
    }
  ]
}
Before: Metadata and Values in one JSON object (simplified)

// com.picsart.studio_apple_am_metadata

1 {
2   "name": "shape_replace_items",
3   "default_value": 1,
4   "segments": [
5     {
6       "id": "f1244",
7       "value": 2
8     },
9     {
10       "id": "a2lfd",
11      "value": 3
12     }
13   ]
14 }


// com.picsart.studio_apple_am_values

1 "<large json object>"
2 "<Another json object>"
3 "<Yet another json object>"

After: Metadata and Values are split (simplified)

After calculating the experiments and selecting the correct variants based solely on the small metadata entry, the service constructs a JSON string for the response containing placeholders for the actual values that reference the corresponding line numbers in the separate text file. To finalize the response, the server replaces the placeholders with the corresponding serialized JSON strings from the text file. This approach circumvents the need for parsing and re-serializing large JSON objects and helps to avoid a significant computational overhead.

Note that the process of parsing the metadata JSON and determining the correct experiments as well as the loading of the large file with configuration values are executed in parallel, saving additional milliseconds.

By minimizing the size of the JSON data that needed to be parsed and leveraging a more efficient method for constructing the final response, the Picsart team managed to not only reduce the response times but also optimize the compute resource usage. This approach reflects a broader principle applicable across the tech industry: that efficiency, particularly in serverless architectures, can often be dramatically improved by rethinking data structure and utilization.

Getting a head start

The changes on the server-side, moving from a single region setup to Cloudflare’s global architecture, paid off massively. Median response times globally dropped by more than 1 second, which was already a huge improvement for the team. However, in looking at the new data, two more paths for client-side optimizations were found.

As the web and mobile app would call the service at startup, most of the time no active connections to the servers were alive and establishing that connection at request time costs valuable milliseconds.

For the web version, setting a pre-connect header on initial page load showed a positive impact. For the mobile app version, the Picsart team took it a step further. Investigation showed that before the connection could be established, three modules had to complete initialization: the error tracker, the HTTP client, and the SDK. Reordering of the modules to initialize the HTTP client first allowed for connection establishment in parallel to the initialization of the SDK and error tracker, again saving time. This resulted in another 200ms improvement for end users.

Setting a new personal best

The day had come and it was time for the phased rollout, web first and the mobile apps second. With suspense, the team looked at the dashboards, and were pleasantly surprised. The rollout was successful and billions of requests were handled with ease.

Share of successfully delivered experiments

The result? The Picsart apps are loading faster than ever for millions of users worldwide, while the share of successfully delivered experiments increased from 50% to 85%. Median response time dropped from 1500 ms to 280 ms. The response time dropped to 70 ms on the web since the response size is smaller compared to mobile. This translates to a real business impact for Picsart as they can now deliver more personalized and data-driven experiences to even more of their users.

A bright future ahead

Picsart is already thinking of the next generation of experimentation. To integrate with Cloudflare even further, the plan is to use Durable Objects to store hundreds of millions of user data records in a decentralized fashion, enabling even more powerful experiments without impacting performance. This is possible thanks to Durable Objects’ underlying architecture that stores the user data in-region, close to the respective end user device.

Beyond that, Picsart’s experimentation team is also planning to onboard external B2B customers to their experimentation platform as Cloudflare’s developer platform provides them with the scale and global network to handle more traffic and data with ease.

Get started yourself

If you’re also working on or with an application that would benefit from Cloudflare’s speed and scale, check out our developer documentation and tutorials, and join our developer Discord to get community support and inspiration.