Tag Archives: Zaraz

Cloudflare Zaraz supports Managed Components and DLP to make third-party tools private

Post Syndicated from Yo'av Moshe original https://blog.cloudflare.com/zaraz-uses-managed-components-and-dlp-to-make-tools-private/

Cloudflare Zaraz supports Managed Components and DLP to make third-party tools private

Cloudflare Zaraz supports Managed Components and DLP to make third-party tools private

When it comes to privacy, much is in your control as a website owner. You decide what information to collect, how to transmit it, how to process it, and where to store it. If you care for the privacy of your users, you’re probably taking action to ensure that these steps are handled sensitively and carefully. If your website includes no third party tools at all – no analytics, no conversion pixels, no widgets, nothing at all – then it’s probably enough! But… If your website is one of the other 94% of the Internet, you have some third-party code running in it. Unfortunately, you probably can’t tell what exactly this code is doing.

Third-party tools are great. Your product team, marketing team, BI team – they’re all right when they say that these tools make a better website. Third-party tools can help you understand your users better, embed information such as maps, chat widgets, or measure and attribute conversions more accurately. The problem doesn’t lay with the tools themselves, but with the way they are implemented – third party scripts.

Third-party scripts are pieces of JavaScript that your website is loading, often from a remote web server. Those scripts are then parsed by the browser, and they can generally do everything that your website can do. They can change the page completely, they can write cookies, they can read form inputs, URLs, track visitors and more. It is mostly a restrictions-less system. They were built this way because it used to be the only way to create a third-party tool.

Over the years, companies have suffered a lot of third party scripts. Those scripts were sometimes hacked, and started hijacking information from visitors to websites that were using them. More often, third party scripts are simply collecting information that could be sensitive, exposing the website visitors in ways that the website owner never intended.

Recently we announced that we’re open sourcing Managed Components. Managed Components are a new API to load third-party tools in a secure and privacy-aware way. It changes the way third-party tools load, because by default there are no more third-party scripts in it at all. Instead, there are components, which are controlled with a Components Manager like Cloudflare Zaraz.

In this blogpost we will discuss how to use Cloudflare Zaraz for granting and revoking permissions from components, and for controlling what information flows into components. Even more exciting, we’re also announcing the upcoming DLP features of Cloudflare Zaraz, that can report, mask and remove PII from information shared with third-parties by mistake.

How are Managed Components better

Because Managed Components run isolated inside a Component Manager, they are more private by design. Unlike a script that gets unlimited access to everything on your website, a Managed Component is transparent about what kind of access it needs, and operates under a Component Manager that grants and revokes permissions.

Cloudflare Zaraz supports Managed Components and DLP to make third-party tools private
Cloudflare Zaraz supports Managed Components and DLP to make third-party tools private

When you add a Managed Component to your website, the Component Manager will list all the permissions required for this component. Such permissions could be “setting cookies”, “making client-side network requests”, “installing a widget” and more. Depending on the tool, you’ll be able to remove permissions that are optional, if your website maintains a more restrictive approach to privacy.

Aside from permissions, the Component Manager also lets you choose what information is exposed to each Managed Component. Perhaps you don’t want to send IP addresses to Facebook? Or rather not send user agent strings to Mixpanel? Managed Components put you in control by telling you exactly what information is consumed by each tool, and letting you filter, mask or hide it according to your needs.

Data Loss Prevention with Cloudflare Zaraz

Another area we’re working on is developing DLP features that let you decide what information to forward to different Managed Components not only by the field type, e.g. “user agent header” or “IP address”, but by the actual content. DLP filters can scan all information flowing into a Managed Component and detect names, email addresses, SSN and more – regardless of which field they might be hiding under.

Our DLP Filters will be highly flexible. You can decide to only enable them for users from specific geographies, for users on specific pages, for users with a certain cookie, and you can even mix-and-match different rules. After configuring your DLP filter, you can set what Managed Components you want it to apply for – letting you filter information differently according to the receiving target.

For each DLP filter you can choose your action type. For example, you might want to not send any information in which the system detected a SSN, but to only report a warning if a first name was detected. Masking will allow you to replace an email address like [email protected] with [email protected], making sure events containing email addresses are still sent, but without exposing the address itself.

While there are many DLP tools available in the market, we believe that the integration between Cloudflare Zaraz’s DLP features and Managed Components is the safest approach, because the DLP rules are effectively fencing the information not only before it is being sent, but before the component even accesses it.

Getting started with Managed Components and DLP

Cloudflare Zaraz is the most advanced Component Manager, and you can start using it today. We recently also announced an integrated Consent Management Platform. If your third-party tool of course is missing a Managed Component, you can always write a Managed Component of your own, as the technology is completely open sourced.

While we’re working on bringing advanced permissions handling, data masking and DLP Filters to all users, you can sign up for the closed beta, and we’ll contact you shortly.

Building and using Managed Components with WebCM

Post Syndicated from Yo'av Moshe original https://blog.cloudflare.com/building-using-managed-components-webcm/

Building and using Managed Components with WebCM

Building and using Managed Components with WebCM

Managed Components are here to shake up the way third-party tools integrate with websites. Two months ago we announced that we’re open sourcing parts of the most innovative technologies behind Cloudflare Zaraz, making them accessible and usable to everyone online. Since then, we’ve been working hard to make sure that the code is well documented and all pieces are fun and easy to use. In this article, I want to show you how Managed Components can be useful for you right now, if you manage a website or if you’re building third-party tools. But before we dive in, let’s talk about the past.

Third-party scripts are a threat to your website

For decades, if you wanted to add an analytics tool to your site, a chat widget, a conversion pixel or any other kind of tool – you needed to include an external script. That usually meant adding some code like this to your website:

<script src=”https://example.com/script.js”></script>

If you think about it – it’s a pretty terrible idea. Not only that you’re now asking the browser to connect to another server, fetch and execute more JavaScript code – you’re also completely giving up the control on your website. How much do you really trust this script? And how much do you trust that the script’s server wasn’t hacked, or will never get hacked in the future? In the previous blog post we showed how including one script usually results in more network calls, hogging the browser and slowing everything down. But the worst thing about these scripts is that they are completely unrestricted: JavaScript code running in the browser can hijack users, steal their credentials or credit card information, use their devices to mine cryptocurrencies, manipulate your website, leak PII, and more. Since the code in those scripts is usually minified, it’s practically impossible for you to read it and figure out what it’s doing.

Managed Components change all that. Like apps on your phone, they’re built around a permissions system. You decide if you allow a component to connect to a remote server, if you allow it to use cookies, to run code, to inject a widget to pages and more. Unlike the world of minified external scripts, it is a framework that promotes transparency. Website owners can toggle permissions on and off, and if a Managed Component wasn’t granted a permission, it will not have access to the relevant API.

But Managed Components do more than wrapping the current system with permissions – they also provide functionality that was never available before: making server-side connections, caching information, using a key-value store, manipulating responses before they are handed to the browser and more. The core of this functionality comes from the ability to execute code outside the browser. Freeing the browser from running code that was previously executed in the browser, means that your website can become approximately 40% faster. It also results in a smaller attack surface in case your tool’s vendor gets hacked.

All of this is possible thanks to the new Managed Components API. We designed it together with vendors, to make sure you can use them to write any tool, while keeping performance, security and privacy a top priority. At its core, a Managed Component is just a JavaScript module, and so every JavaScript developer should feel right at home when building one. Check out the two examples in our previous blog post to see how they actually look like, or see some Managed Components we already open sourced on GitHub.

WebCM is the open source Component Manager

When tools are loaded with a <script> tag, their code is executed by the browser. Since Managed Components don’t run in the browser, their code needs to be executed somewhere else. This is the Component Manager. We designed the APIs around Managed Components deliberately to not be tied to a specific Component Manager, and in fact, there are already two in the world: Cloudflare Zaraz, and WebCM.

WebCM, Web-based Component Manager, is our open source reference implementation of a Component Manager. If you run a website, you can use WebCM today to run Managed Components on your website, even if you’re not a Cloudflare user. If you want to create a Managed Component, you can use it like an SDK.

Over the last few months, we’ve been helping vendors to write their own Managed Components, and we will continue to do so. We open sourced WebCM to ensure that Managed Components are a technology of the Web as a whole. Everyone should be able to use WebCM to load and create Managed Components. Let’s see how to use it!

Getting started with WebCM in 5 minutes

Getting started with WebCM is easier than you think. Because WebCM works like a proxy, you can use it regardless of how your website is built. In a new folder, create a simple HTML file and call it index.html:

<!DOCTYPE html>
<html lang=”en”>
  <head>
	<meta charset="UTF-8">
	<title>My Website</title>
  </head>
  <body>
    	<h1>WebCM test website</h1>  
  </body>
</html>

Let’s serve this file by launching an HTTP serve in the same folder:

You can use Node.js:

npx http-server -p 8000

You can use Python:

python3 -m http.server

Or anything else that would serve our HTML file on http://localhost:8000/index.html.

Next, create a configuration file for WebCM. In a new directory, create a file called webcm.config.ts.

export default {
  components: [
    {
      name: 'demo',
      permissions: [
        'access_client_kv',
        'provide_server_functionality',
        'provide_widget',
        'serve_static_files',
        'client_network_requests',
      ],
    },
  ],
  target: 'http://127.0.0.1:8000',
  hostname: 'localhost',
  trackPath: '/webcm/track',
  ecommerceEventsPath: '/webcm/ecommerce',
  clientEventsPath: '/webcm/system',
  port: 8080
}

Let’s go over this configuration file:

  • components is an array that lists all the Managed Components you want to load. For now, we will load the demo component. Note that all we needed was to specify “demo”, and WebCM will go and get it from NPM for us. Other Managed Components are available on NPM too, and you can install components from other sources too. For each component, we’re defining what permissions we are giving it. You can read more about the permissions in the specifications. If we try to add the component without granting it its required permissions, WebCM will alert us.
  • target is where our origin HTTP server runs. In the previous step, we set it to run on port 8000.
  • port is the port under which WebCM will serve our website.
  • hostname is the host WebCM will bind to.
  • trackPath, clientEventsPath, ecommerceEventsPath are paths that WebCM will use to send events from the browser to its backend. We can leave these paths as they are for now, and will see how they’re used later.

! Note: Node version 17 or higher is needed for the next section

While keeping your HTTP server running, and in the same directory as webcm.config.ts, run npx webcm. Node will fetch WebCM and start it for you, and WebCM will read the configuration. First, it will fetch the required components to a components folder. When done, it will start another server that proxies your origin.

If you open http://localhost:8080/index.html in your browser now, you’d see the same page you saw at http://localhost:8000/index.html. While the pages might look similar, the one running on port 8080 has our demo Managed Component running. Moving your mouse and clicking around the page should result in messages being printed in your WebCM terminal, showing that the component was loaded and that it is receiving data. You will also notice that the page now displays a simple weather widget – this a Managed Component Widget that got appended to your page. The weather information was fetched without the browser needing to contact any additional server, and WebCM can cache that information to make sure it is served fast. Lastly, if you go to http://localhost:8080/webcm/demo/cheese, you’ll see that the component is serving a static image of a cheese. This is an example of how Managed Components can expose new endpoints on your servers, if you allow them.

The Demo Component, like its name suggests, is just a demo. We use it to showcase and test the Managed Components APIs. What if we want to add a real Managed Component to our site? Google Analytics is used by more than half of the websites on the internet, so let’s see how we edit our webcm.config.ts file to load it.

export default {
  components: [
    {
      name: 'demo',
      permissions: [
        'access_client_kv',
        'provide_server_functionality',
        'provide_widget',
        'serve_static_files',
        'client_network_requests',
      ],
    },
    {
      name: 'google-analytics',
      settings: { 'tid': 'UA-XXXXXX' },
      permissions: [
        'access_client_kv',
      ],
    },
  ],
  target: 'http://127.0.0.1:8000',
  hostname: 'localhost',
  trackPath: '/webcm/track',
  ecommerceEventsPath: '/webcm/ecommerce',
  clientEventsPath: '/webcm/system',
  port: 8080
}

In the above example, we just replaced our demo component with the Google Analytics Managed Component. Note that this component requires much fewer permissions to run – that’s because it is running 100% server-side. Remember to replace UA-XXXXXX with your real Google Universal Analytics (version 3) account identifier.

Re-running `npx webcm` will now fetch the google-analytics Managed Component and run it, with the settings you provided. If you go now to your proxied website, you won’t see anything changed. But if you go to your Google Analytics dashboard, you will start seeing page views appearing on the Real Time view. WebCM loaded the component and is sending information server-side to Google Analytics.

There are many other components you can play around with, and we’re adding more all the time. Check out the Managed Components organization on NPM or on GitHub to see the full list.

Build your own Managed Component

Managed Components isn’t a closed library of tools you can use. As mentioned before – we are gradually open sourcing more tools from our library on GitHub. If you think our components are doing something weird – please let us know with an issue, or even make a PR. Managed Components are for the benefit of everyone. Over the past few months, the Cloudflare Zaraz community on Discord and on the Cloudflare Community Forum was extremely helpful in actively reporting issues, and so we’re excited to give them the option to take one step closer to the internals behind Cloudflare Zaraz.

While improving existing Managed Components is great, the thing we’re most thrilled about is that you can now build your own Managed Components too. If you’re a third-party tool vendor – this is a way for you to create a version of your tool that is safe and performant, so customers can discover and adopt your tool easily. If you’re a website developer, you might want to tinker with Managed Components to see what kind of things you can easily move away from the browser, for performance gains.

Let’s see how easy it is to create a Managed Component that listens to every page view on our website. Run npm init managed-component in the components directory that WebCM created, and create-managed-component will take you through the process of scaffolding your component files. To start with, our component will not use any special permissions, so you can select none.

Once we’re done with the wizard, we can open our src/index.ts file. By default, our component will add a listener to all page views:

import { ComponentSettings, Manager } from '@managed-components/types'

export default async function (manager: Manager, settings: ComponentSettings) {
  manager.addEventListener('pageview', event => {
    // do the things
  })
}

Let’s edit the comment line so that we can see whenever a page view happens. Note we also prefixed settings with a _ because we’re not using it now:

import { ComponentSettings, Manager } from '@managed-components/types'

export default async function (manager: Manager, _settings: ComponentSettings) {
  manager.addEventListener('pageview', event => {
    console.log(`New pageview at ${event.client.url}`)
  })
}

With these changes, the component will print the current URL whenever a page is viewed on the website. Before trying it out, we need to build our component. In the folder of your component run npm i && npm run build. Then, use the namespace of your component to add it to your webcm.config.ts file and restart WebCM:

export default {
  components: [
    {
      name: 'demo',
      permissions: [
        'access_client_kv',
        'provide_server_functionality',
        'provide_widget',
        'serve_static_files',
        'client_network_requests',
      ],
    },
    {
      name: 'google-analytics',
      settings: { 'tid': 'UA-123456' },
      permissions: [
        'access_client_kv',
      ],
    },
    {
      name: 'your-component-namespace',
      settings: {},
      permissions: [],
    },
  ],
  target: 'http://127.0.0.1:8000',
  hostname: 'localhost',
  trackPath: '/webcm/track',
  ecommerceEventsPath: '/webcm/ecommerce',
  clientEventsPath: '/webcm/system',
  port: 8080
}

This is a very simple component, but it shows how easy it is to build functionality that was previously only available in the browser. You can easily extend your component: use fetch next to the console.log statement and send information to your own analytics warehouse whenever a pageview happens on your site. Read about all the other Managed Components APIs to create widgets, listen to clicks, store cookies, use cache, and much more. These APIs allow you to build richer tools than it was ever possible before.

Your tool is better as a Managed Component

When we started working on Managed Components, many people were asking what would be the motivation of a tool vendor to build a Managed Component. During these last few months, we’ve learned that vendors are often excited about Managed Components for the same reasons we are – it provides a safe way to use their tools, and a streamlined way to integrate their tools in websites. Customers care deeply for these things, so having a Managed Component means that customers are more likely to try out your technology. Vendors will also get huge discoverability benefits, as their tools could be featured in the Cloudflare Zaraz dashboard, exposing them to tens of thousands of Zaraz-enabled websites. We are getting a lot of interest from major vendors in building a Managed Component, and we’re doing our best in actively supporting them in the process. If your company is interested in having a Managed Component, contact us.

We strongly believe that Managed Components can change the way third-party tools are used online. This is only the beginning of making them faster, secure and private. Together with users, and vendors, we will work on constantly improving the capabilities of Managed Components as a community, for the benefit of every user of the World Wide Web. To get started with building your Managed Component, head to managedcomponents.dev and start building. Our team is available to help you at [email protected]

Cloudflare Zaraz launches new privacy features in response to French CNIL standards

Post Syndicated from Yair Dovrat original https://blog.cloudflare.com/zaraz-privacy-features-in-response-to-cnil/

Cloudflare Zaraz launches new privacy features in response to French CNIL standards

Cloudflare Zaraz launches new privacy features in response to French CNIL standards

Last week, the French national data protection authority (the Commission Nationale de l’informatique et des Libertés or “CNIL”), published guidelines for what it considers to be a GDPR-compliant way of loading Google Analytics and similar marketing technology tools. The CNIL published these guidelines following notices that the CNIL and other data protection authorities issued to several organizations using Google Analytics stating that such use resulted in impermissible data transfers to the United States. Today, we are excited to announce a set of features and a practical step-by-step guide for using Zaraz that we believe will help organizations continue to use Google Analytics and similar tools in a way that will help protect end user privacy and avoid sending EU personal data to the United States. And the best part? It takes less than a minute.

Enter Cloudflare Zaraz.

The new Zaraz privacy features

What we are releasing today is a new set of privacy features to help our customers enhance end user privacy. Starting today, on the Zaraz dashboard, you can apply the following configurations:

  • Remove URL query parameters: when toggled-on, Zaraz will remove all query parameters from a URL that is reported to a third-party server. It will turn https://example.com/?q=hello to https://example.com. This will allow users to remove  query parameters, such as UTM, gclid, and the sort that can be used for fingerprinting. This setting will apply to all of your Zaraz integrations.
  • Hide originating IP address: using Zaraz to load tools like Google Analytics entirely server-side while hiding visitor IP addresses from Google and Facebook has been doable for quite some time now. This will prevent sending the visitor IP address to a third-party tool provider’s server. This feature is configured at a tool level, currently offered for Google Analytics Universal, Google Analytics 4, and Facebook Pixel. We will add this capability to more and more tools as we go. In addition to hiding visitors’ IP addresses from specific tools, you can use Zaraz to trim visitors’ IP addresses across all tools to avoid sending originating IP addresses to third-party tool servers. This option is available on the Zaraz setting page, and is considered less strict.
  • Clear user agent strings: when toggled on, Zaraz will clear sensitive information from the User Agent String. The User-Agent is a request header that includes information about the operating system, browser, extensions and more of the site visitor. Zaraz clears this string by removing pieces of information (such as versions, extensions, and more) that could lead to user tracking or fingerprinting. This setting will apply only to server-side integrations.
  • Removal of external referrers: when toggled-on, Zaraz will hide the URL of the referring page from third-party servers. If the referring URL is on the same domain, it will not hide it, to keep analytics accurate and avoid the session from “splitting”. This setting will apply to all of your Zaraz integrations.
Cloudflare Zaraz launches new privacy features in response to French CNIL standards

How to set up Google Analytics with the new privacy features

We wrote this guide to help you implement our new features when using Google Analytics. We will use Google Analytics (Universal) as the example of this guide, because Google Analytics is widely used by Zaraz customers. You can follow the same principles to set up your Facebook Pixel, or other server-side integration that Zaraz offers.

Step 1: Install Zaraz on your website

Zaraz loads automatically for every website proxied by Cloudflare (Orange Clouded), no code changes are needed. If your website is not proxied by Cloudflare, you can load Zaraz manually with a JavaScript code snippet. If you are new to Cloudflare, or unsure if your website is proxied by Cloudflare, you can use this Chrome extension to find out if your site is Orange Clouded or not.

Step 2: Add Google Analytics via the Zaraz dashboard

Cloudflare Zaraz launches new privacy features in response to French CNIL standards

All customers have access to the Zaraz dashboard. By default, when you add Google Analytics using the Zaraz tools library, it will load server-side. You do not need to set up any cloud environment or proxy server. Zaraz handles this for you. When you add a tool, Zaraz will start loading on your website, and a request will leave from the end user’s browser to a Cloudflare Worker that sits on your own domain. Cloudflare Workers is our edge computing platform, and this Worker will communicate directly with Google Analytics’ servers. There will be no direct communication between an end user’s browser and Google’s servers. If you wish to learn more about how Zaraz works, please read our previous posts about the unique Zaraz architecture and how we use Workers. Note that “proxying” Google Analytics, by itself, is not enough, according to the CNIL’s guidance. You will have to take more actions to make sure you set up Google Analytics properly.

Step 3: Configure Google Analytics and hide IP addresses

Cloudflare Zaraz launches new privacy features in response to French CNIL standards

All you need to do to set up Google Analytics is to enter your Tracking ID. On the tools setting screen, you would also need to toggle-on the “Hide Originating IP Address” feature. This will prevent Zaraz from sending the visitor’s IP address to Google. Zaraz will remove the IP address on the Edge, before it hits Google’s servers. If you want to make sure Zaraz will run only in the EU, review Cloudflare’s Data Localization Suite.

According to your needs, you can of course set up more complex configurations of Google Analytics, including Ecommerce tracking, Custom Dimension, fields to set, Custom Metrics, etc. Follow this guide for more instructions.

Step 4: Toggle-on Zaraz’s new privacy features

Cloudflare Zaraz launches new privacy features in response to French CNIL standards

Next, you will need to toggle-on all of our new privacy features mentioned above. You can do this on the Zaraz Settings page, under the Privacy section.

Step 5: Clean your Google Analytics configuration

In this step, you would need to take actions to clean your specific Google Analytics setting. We gathered a list of suggestions for you to help preserve end user privacy:

  • Do not include any personal identifiable information. You will want to review the CNIL’s guidance on anonymization and determine how to apply it on your end. It is likely that such anonymization will make the unique identifier pretty much useless with most analytics tools. For example, according to our findings, features like Google Analytics’ User ID View, won’t work well with such anonymization. In such cases, you may want to stop using such analytics tools to avoid discrepancies and assure accuracy.
  • If you wish to hide Google Analytics’ Client ID, on the Google Analytics setting page, click “add field” and choose “Client ID”. To override the Client ID, you can insert any string as the field’s constant value. Please note that this will likely limit Google’s ability to aggregate data and will likely create discrepancies in session and user counts. Still, we’ve seen customers that are using Google Analytics to count events, and to our knowledge that should still be doable with this setting.
  • Clean your implementation from cross-site identifiers. This could include things like your CRM tool unique identifier, or URL query parameters passing identifiers to share them between different domains (avoid “cross-domain tracking” also known as “site linking”).
  • You would need to make sure not to include any personal data in your customized configuration and implementation. We recommend you go over the list of Custom Dimension, Event parameters/properties, Ecommerce Data, and User Properties to make sure they do not contain personal data. While this still demands some manual work, the good news is that soon we are about to announce a new set of Privacy features, Zaraz Data Loss Prevention, that will help you do that automatically, at scale. Stay tuned!

Step 6 – you are done! 🎉

A few more things you will want to consider is that implementing this guide will result in some limitations in your ability to use Google Analytics. For example, not collecting UTM parameters and referrers will disable your ability to track traffic sources and campaigns. Not tracking User ID, will prevent you from using the User ID View, and so on. Some companies will find these limitations extreme, but like most things in life, there is a trade-off. We’re taking a step towards a more privacy-oriented web, and this is just the beginning. In the face of new regulatory constraints, new technologies will appear which will unlock new abilities and features. Zaraz is dedicated to leading the way, offering privacy-focused tools that empower website operators and protect end users.

We recommend you learn more about Cloudflare’s Data Localization Suite, and how you can use Zaraz to keep analytics data in the EU.

To wrap up, we would really appreciate any feedback on this announcement, or new feature requests you might have. You can reach out to your Cloudflare account manager, or directly to us on our Discord channel. Privacy is at the heart of everything our team is building.

We always take a proactive approach towards privacy, and we believe privacy is not only about responding to different regulations, it is about building technology that helps customers do a better job protecting their users. It is about simplifying what it takes to respect and protect user privacy and personal information. It is about helping build a better Internet.

Open source Managed Components for Cloudflare Zaraz

Post Syndicated from Yo'av Moshe original https://blog.cloudflare.com/zaraz-open-source-managed-components-and-webcm/

Open source Managed Components for Cloudflare Zaraz

Open source Managed Components for Cloudflare Zaraz

In early 2020, we sat down and tried thinking if there’s a way to load third-party tools on the Internet without slowing down websites, without making them less secure, and without sacrificing users’ privacy. In the evening, after scanning through thousands of websites, our answer was “well, sort of”. It seemed possible: many types of third-party tools are merely collecting information in the browser and then sending it to a remote server. We could theoretically figure out what it is that they’re collecting, and then instead just collect it once efficiently, and send it server-side to their servers, mimicking their data schema. If we do this, we can get rid of loading their JavaScript code inside websites completely. This means no more risk of malicious scripts, no more performance losses, and fewer privacy concerns.

But the answer wasn’t a definite “YES!” because we realized this is going to be very complicated. We looked into the network requests of major third-party scripts, and often it seemed cryptic. We set ourselves up for a lot of work, looking at the network requests made by tools and trying to figure out what they are doing – What is this parameter? When is this network request sent? How is this value hashed? How can we achieve the same result more securely, reliably and efficiently? Our team faced these questions on a daily basis.

When we joined Cloudflare, the scale of everything changed. Suddenly we were on thousands of websites, serving more than 10,000 requests per second. Users are writing to us every single day over our Discord channel, the community forum, and sometimes even directly on Twitter. More often than not, their messages would be along the lines of “Hi! Can you please add support for X?” Cloudflare Zaraz launched with around 30 tools in its library, but this market is vast and new tools are popping up all the time.

Changing our trust model

In my previous blog post on how Zaraz uses Cloudflare Workers, I included some examples of how tool integrations are written in Zaraz today. Usually, a “tool” in Zaraz would be a function that prepares a payload and sends it. This function could return one thing – clientJS, JavaScript code that the browser would later execute. We’ve done our best so that tools wouldn’t use clientJS, if it wasn’t really necessary, and in reality most Zaraz-built tool integrations are not using clientJS at all.

This worked great, as long as we were the ones coding all tool integrations. Customers trusted us that we’d write code that is performant and safe, and they trusted the results they saw when trying Zaraz. Upon joining Cloudflare, many third-party tool vendors contacted us and asked to write a Zaraz integration. We quickly realized that our system wasn’t enforcing speed and safety – vendors could literally just dump their old browser-side JavaScript into our clientJS variable, and say “We have a Cloudflare Zaraz integration!”, and that wasn’t our vision at all.

We want third-party tool vendors to be able to write their own performant, safe server-side integrations. We want to make it possible for them to reimagine their tools in a better way. We also want website owners to have transparency into what is happening on their website, to be able to manage and control it, and to trust that if a tool is running through Zaraz, it must be a good tool — not because of who wrote it, but because of the technology it is constructed within. We realized that to achieve that we needed a new format for defining third-party tools.

Introducing Managed Components

We started rethinking how third-party code should be written. Today, it’s a black box – you usually add a script to your site, and you have zero clue what it does and when. You can’t properly read or analyze the minified code. You don’t know if the way it behaves for you is the same way it behaves for everyone else. You don’t know when it might change. If you’re a website owner, you’re completely in the dark.

Tools do many different things. The simple ones just collected information and sent it somewhere. Often, they’d set some cookies. Sometimes, they’d install some event listeners on the page. And widget-based tools can literally manipulate the page DOM, providing new functionality like a social media embed or a chatbot. Our new format needed to support all of this.

Managed Components is how we imagine the future of third-party tools online. It provides vendors with an API that allows them to do much more than a normal script can, including keeping code execution outside the browser. We designed this format together with vendors, for vendors, while having in mind that users’ best interest is everyone’s best interest long-term.

From the get-go, we built Managed Components to use a permission-based system. We want to provide even more transparency than Zaraz does today. As the new API allows tools to set cookies, change the DOM or collect IP addresses, all those abilities require being granted a permission. Installing a third-party tool on your site is similar to installing an app on your phone – you get an explanation of what the tool can and can’t do, and you can allow or disallow features to a granular level. We previously wrote about how you can use Zaraz to not send IP addresses to Google Analytics, and now we’re doubling down in this direction. It’s your website, and it’s your decision to make.

Every Managed Component is a JavaScript module at its core. Unlike today, this JavaScript code isn’t sent to the browser. Instead, it is executed by a Components Manager. This manager implements the APIs that are then used by the component. It dispatches server-side events that originate in the browser, providing the components with access to information while keeping them sandboxed and performant. It handles caching, storage and more — all so that the Managed Components can implement their logic without worrying so much about the surrounding.

An example analytics Managed Component can look something like this:

export default function (manager) {
  manager.addEventListener("pageview", ({ context, client }) => {
    fetch("https://example.com/collect", {
  	method: "POST",
  	data: {
    	  url: context.page.url.href,
    	  userAgent: client.device.userAgent,
  	},
    });
  });
}

The above component gets notified whenever a page view occurs, and it then creates some payload with the visitor user-agent and page URL and sends that as a POST request to the vendor’s server. This is very similar to how things are done today, except this doesn’t require running any code at all in the browser.

But Managed Components aren’t just doing what was previously possible but better, they also provide dramatic new functionality. See for example how we’re exposing server-side endpoints:

export default function (manager) {
  const api = manager.proxy("/api", "https://api.example.com");
  const assets = manager.serve("/assets", "assets");
  const ping = manager.route("/ping", (request) => new Response(204));
}

These three lines are a complete shift in what’s possible for third-parties. If granted the permissions, they can proxy some content, serve and expose their own endpoints – all under the same domain as the one running the website. If a tool needs to do some processing, it can now off-load that from the browser completely without forcing the browser to communicate with a third-party server.

Exciting new capabilities

Every third-party tool vendor should be able to use the Managed Components API to build a better version of their tool. The API we designed is comprehensive, and the benefits for vendors are huge:

  • Same domain: Managed Components can serve assets from the same domain as the website itself. This allows a faster and more secure execution, as the browser needs to trust and communicate with only one server instead of many. This can also reduce costs for vendors as their bandwidth will be lowered.
  • Website-wide events system: Managed Components can hook to a pre-existing events system that is used by the website for tracking events. Not only is there no need to provide a browser-side API to your tool, it’s also easier for your users to send information to your tool because they don’t need to learn your methods.
  • Server logic: Managed Components can provide server-side logic on the same domain as the website. This includes proxying a different server, or adding endpoints that generate dynamic responses. The options are endless here, and this, too, can reduce the load on the vendor servers.
  • Server-side rendered widgets and embeds: Did you ever notice how when you’re loading an article page online, the content jumps when some YouTube or Twitter embed suddenly appears between the paragraphs? Managed Components provide an API for registering widgets and embed that render server-side. This means that when the page arrives to the browser, it already includes the widget in its code. The browser doesn’t need to communicate with another server to fetch some tweet information or styling. It’s part of the page now, so expect a better CLS score.
  • Reliable cross-platform events: Managed Components can subscribe to client-side events such as clicks, scroll and more, without needing to worry about browser or device support. Not only that – those same events will work outside the browser too – but we’ll get to that later.
  • Pre-Response Actions: Managed Components can execute server-side actions before the network response even arrives in the browser. Those actions can access the response object, reading it or altering it.
  • Integrated Consent Manager support: Managed Components are predictable and scoped. The Component Manager knows what they’ll need and can predict what kind of consent is needed to run them.

The right choice: open source

As we started working with vendors on creating a Managed Component for their tool, we heard a repeating concern – “What Components Managers are there? Will this only be useful for Cloudflare Zaraz customers?”. While Cloudflare Zaraz is indeed a Components Manager, and it has a generous free tier plan, we realized we need to think much bigger. We want to make Managed Components available for everyone on the Internet, because we want the Internet as a whole to be better.

Today, we’re announcing much more than just a new format.

WebCM is a reference implementation of the Managed Components API. It is a complete Components Manager that we will soon release and maintain. You will be able to use it as an SDK when building your Managed Component, and you will also be able to use it in production to load Managed Components on your website, even if you’re not a Cloudflare customer. WebCM works as a proxy – you place it before your website, and it rewrites your pages when necessary and adds a couple of endpoints. This makes WebCM 100% framework-agnostic – it doesn’t matter if your website uses Node.js, Python or Ruby behind the scenes: as long as you’re sending out HTML, it supports that.

That’s not all though! We’re also going to open source a few Managed Components of our own. We converted some of our classic Zaraz integrations to Managed Components, and they will soon be available for you to use and improve. You will be able to take our Google Analytics Managed Component, for example, and use WebCM to run Google Analytics on your website, 100% server-side, without Cloudflare.

Tech-leading vendors are already joining

Revolutionizing third-party tools on the internet is something we could only do together with third-party vendors. We love third-party tools, and we want them to be even more popular. That’s why we worked very closely with a few leading companies on creating their own Managed Components. These new Managed Components extend Zaraz capabilities far beyond what’s possible now, and will provide a safe and secure onboarding experience for new users of these tools.

Open source Managed Components for Cloudflare Zaraz

DriftDrift helps businesses connect with customers in moments that matter most.  Drift’s integration will let customers use their fully-featured conversation solution while also keeping it completely sandboxed and without making third-party network connections, increasing privacy and security for our users.

Open source Managed Components for Cloudflare Zaraz

CrazyEggCrazy Egg helps customers make their websites better through visual heatmaps, A/B testing, detailed recordings, surveys and more. Website owners, Cloudflare, and Crazy Egg all care deeply about performance, security and privacy. Managed Components have enabled Crazy Egg to do things that simply aren’t possible with third-party JavaScript, which means our mutual customers will get one of the most performant and secure website optimization tools created.

We also already have customers that are eager to implement Managed Components:

Open source Managed Components for Cloudflare Zaraz

Hopin Quote:

“I have been really impressed with Cloudflare’s Zaraz ability to move Drift’s JS library to an Edge Worker while loading it off the DOM. My work is much more effective due to the savings in page load time. It’s a pleasure to work with two companies that actively seek better ways to increase both page speed and load times with large MarTech stacks.”
– Sean Gowing, Front End Engineer, Hopin

If you’re a third-party vendor, and you want to join these tech-leading companies, do reach out to us, and we’d be happy to support you on writing your own Managed Component.

What’s next for Managed Components

We’re working on Managed Components on many fronts now. While we develop and maintain WebCM, work with vendors and integrate Managed Components into Cloudflare Zaraz, we’re already thinking about what’s possible in the future.

We see a future where many open source runtimes exist for Managed Components. Perhaps your infrastructure doesn’t allow you to use WebCM? We want to see Managed Components runtimes created as service workers, HTTP servers, proxies and framework plugins. We’re also working on making Managed Components available on mobile applications. We’re working on allowing unofficial Managed Components installs on Cloudflare Zaraz. We’re fixing a long-standing issue of the WWW, and there’s so much to do.

We will very soon publish the full specs of Managed Components. We will also open source WebCM, the reference implementation server, as well as many components you can use yourself. If this is interesting to you, reach out to us at [email protected], or join us on Discord.

Cloudflare Zaraz supports CSP

Post Syndicated from Simona Badoiu original https://blog.cloudflare.com/cloudflare-zaraz-supports-csp/

Cloudflare Zaraz supports CSP

Cloudflare Zaraz supports CSP

Cloudflare Zaraz can be used to manage and load third-party tools on the cloud, achieving significant speed, privacy and security improvements. Content Security Policy (CSP) configuration prevents malicious content from being run on your website.

If you have Cloudflare Zaraz enabled on your website, you don’t have to ask yourself twice if you should enable CSP because there’s no harmful collision between CSP & Cloudflare Zaraz.

Why would Cloudflare Zaraz collide with CSP?

Cloudflare Zaraz, at its core, injects a <script> block on every page where it runs. If the website enforces CSP rules, the injected script can be automatically blocked if inline scripts are not allowed. To prevent this, at the moment of script injection, Cloudflare Zaraz adds a nonce to the script-src policy in order for everything to work smoothly.

Cloudflare Zaraz supports CSP enabled by using both Content-Security-Policy headers or Content-Security-Policy <meta> blocks.

What is CSP?

Content Security Policy (CSP) is a security standard meant to protect websites from Cross-site scripting (XSS) or Clickjacking by providing the means to list approved origins for scripts, styles, images or other web resources.

Although CSP is a reasonably mature technology with most modern browsers already implementing the standard, less than 10% of websites use this extra layer of security. A Google study says that more than 90% of the websites using CSP are still leaving some doors open for attackers.

Why this low adoption and poor configuration? An early study on ‘why CSP adoption is failing’ says that the initial setup for a website is tedious and can generate errors, risking doing more harm for the business than an occasional XSS attack.

We’re writing this to confirm that Cloudflare Zaraz will comply with your CSP settings without any other additional configuration from your side and to share some interesting findings from our research regarding how CSP works.

What is Cloudflare Zaraz?

Cloudflare Zaraz aims to make the web faster by moving third-party script bloat away from the browser.

There are tens of third-parties loaded by almost every website on the Internet (analytics, tracking, chatbots, banners, embeds, widgets etc.). Most of the time, the third parties have a slightly small role on a particular website compared to the high-volume and complex code they provide (for a good reason, because it’s meant to deal with all the issues that particular tool can tackle). This code, loaded a huge number of times on every website is simply inefficient.

Cloudflare Zaraz reduces the amount of code executed on the client side and the amount of time consumed by the client with anything that is not directly serving the user and their experience on the web.

At the same time, Cloudflare Zaraz does the integration between website owners and third-parties by providing a common management interface with an easy ‘one-click’ method.

Cloudflare Zaraz & CSP

When auto-inject is enabled, Cloudflare Zaraz ‘bootstraps’ on your website by running a small inline script that collects basic information from the browser (Screen Resolution, User-Agent, Referrer, page URL) and also provides the main `track` function. The `track` function allows you to track the actions your users are taking on your website, and other events that might happen in real time. Common user actions a website will probably be interested in tracking are successful sign-ups, calls-to-action clicks, and purchases.

Before adding CSP support, Cloudflare Zaraz would’ve been blocked on any website that implemented CSP and didn’t include ‘unsafe-inline’ in the script-src policy or default-src policy. Some of our users have signaled collisions between CSP and Cloudflare Zaraz so we decided to make peace with CSP once and forever.

To solve the issue, when Cloudflare Zaraz is automatically injected on a website implementing CSP, it enhances the response header by appending a nonce value in the script-src policy. This way we make sure we’re not harming any security that was already enforced by the website owners, and we are still able to perform our duties – making the website faster and more secure by asynchronously running third parties on the server side instead of clogging the browser with irrelevant computation from the user’s point of view.

CSP – Edge Cases

In the following paragraphs we’re describing some interesting CSP details we had to tackle to bring Cloudflare Zaraz to a trustworthy state. The following paragraphs assume that you already know what CSP is, and you know how a simple CSP configuration looks like.

Dealing with multiple CSP headers/<meta> elements

The CSP standard allows multiple CSP headers but, on a first look, it’s slightly unclear how the multiple headers will be handled.

You would think that the CSP rules will be somehow merged and the final CSP rule will be a combination of all of them but in reality the rule is much more simple – the most restrictive policy among all the headers will be applied. Relevant examples can be found in the w3c’s standard description and in this multiple CSP headers example.

The rule of thumb is that “A connection will be allowed only if all the CSP headers will allow it”.

In order to make sure that the Cloudflare Zaraz script is always running, we’re adding the nonce value to all the existing CSP headers and/or <meta http-equiv=”Content-Security-Policy”> blocks.

Adding a nonce to a CSP header that already allows unsafe-inline

Just like when sending multiple CSP headers, when configuring one policy with multiple values, the most restrictive value has priority.

An illustrative example for a given CSP header:

Content-Security-Policy: default-src ‘self’; script-src ‘unsafe-inline’ ‘nonce-12345678’

If ‘unsafe-inline’ is set in the script-src policy, adding nonce-123456789 will disable the effect of unsafe-inline and will allow only the inline script that mentions that nonce value.

This is a mistake we already made while trying to make Cloudflare Zaraz compliant with CSP. However, we solved it by adding the nonce only in the following situations:

  • if ‘unsafe-inline’ is not already present in the header
  • If ‘unsafe-inline’ is present in the header but next to it, a ‘nonce-…’ or ‘sha…’ value is already set (because in this situation ‘unsafe-inline’ is practically disabled)

Adding the script-src policy if it doesn’t exist already

Another interesting case we had to tackle was handling a CSP header that didn’t include the script-src policy, a policy that was relying only on the default-src values. In this case, we’re copying all the default-src values to a new script-src policy, and we’re appending the nonce associated with the Cloudflare Zaraz script to it(keeping in mind the previous point of course)

Notes

Cloudflare Zaraz is still not 100% compliant with CSP because some tools still need to use eval() – usually for setting cookies, but we’re already working on a different approach so, stay tuned!

The Content-Security-Policy-Report-Only header is not modified by Cloudflare Zaraz yet – you’ll be seeing error reports regarding the Cloudflare Zaraz <script> element if Cloudflare Zaraz is enabled on your website.

Content-Security-Policy Report-Only can not be set using a <meta> element

Conclusion

Cloudflare Zaraz supports the evolution of a more secure web by seamlessly complying with CSP.

If you encounter any issue or potential edge-case that we didn’t cover in our approach, don’t hesitate to write on Cloudflare Zaraz’s Discord Channel, we’re always there fixing issues, listening to feedback and announcing exciting product updates.

For more details on how Cloudflare Zaraz works and how to use it, check out the official documentation.

Resources:

[1] https://wiki.mozilla.org/index.php?title=Security/CSP/Spec&oldid=133465
[2] https://storage.googleapis.com/pub-tools-public-publication-data/pdf/45542.pdf
[3] https://en.wikipedia.org/wiki/Content_Security_Policy
[4] https://www.w3.org/TR/CSP2/#enforcing-multiple-policies
[5] https://chrisguitarguy.com/2019/07/05/working-with-multiple-content-security-policy-headers/
[6]https://www.bitsight.com/blog/content-security-policy-limits-dangerous-activity-so-why-isnt-everyone-doing-it
[7]https://wkr.io/publication/raid-2014-content_security_policy.pdf

Need to Keep Analytics Data in the EU? Cloudflare Zaraz Can Offer a Solution

Post Syndicated from Yair Dovrat original https://blog.cloudflare.com/keep-analytics-tracking-data-in-the-eu-cloudflare-zaraz/

Need to Keep Analytics Data in the EU? Cloudflare Zaraz Can Offer a Solution

Need to Keep Analytics Data in the EU? Cloudflare Zaraz Can Offer a Solution

A recent decision from the Austrian Data Protection Authority (the Datenschutzbehörde) has network engineers scratching their heads and EU companies that use Google Analytics scrambling. The Datenschutzbehörde found that an Austrian website’s use of Google Analytics violates the EU General Data Protection Regulation (GDPR) as interpreted by the “Schrems II” case because Google Analytics can involve sending full or truncated IP addresses to the United States.

While disabling such trackers might be one (extreme) solution, doing so would leave website operators blind to how users are engaging with their site. A better approach: find a way to use tools like Google Analytics, but do so with an approach that protects the privacy of personal information and keeps it in the EU, avoiding a data transfer altogether. Enter Cloudflare Zaraz.

But before we get into just how Cloudflare Zaraz can help, we need to explain a bit of the background for the Datenschutzbehörde’s ruling, and why it’s a big deal.

What are the privacy and data localization issues?

The GDPR is a comprehensive data privacy law that applies to EU residents’ personal data, regardless of where it is processed. The GDPR itself does not insist that personal data must be processed only in Europe. Instead, it provides a number of legal mechanisms to ensure that GDPR-level protections are available for EU personal data if it is transferred outside the EU to a third country like the United States. Data transfers from the EU to the US were, until the 2020 “Schrems II” decision, permitted under an agreement called the EU-US Privacy Shield Framework.

The Schrems II decision refers to the July 2020 decision by the Court of Justice of the European Union that invalidated the EU-US Privacy Shield. The Court found that the Privacy Shield was not an effective means to protect EU data from US government surveillance authorities once data was transferred to the US, and therefore that under the Privacy Shield, EU personal data would not receive the level of protection guaranteed by the GDPR. However, the court upheld other valid transfer mechanisms designed to allow EU personal data to be transferred to the US in a way that is consistent with the GDPR that ensure EU personal data won’t be accessed by US government authorities in a way that violates the GDPR. One of those was the use of Standard Contractual Clauses, which are legal agreements approved by the EU Commission that enable data transfers – but they can only be used if supplementary measures are also in place.

Following the Schrems II case, the “NOYB” advocacy group founded by Max Schrems (the lawyer and activist who brought the legal action against Facebook that ultimately ended with the Schrems II ruling) filed 101 complaints against European websites that used Google Analytics and Facebook Connect trackers on the grounds that use of these trackers violates the Schrems II ruling because they send EU personal data to the United States without putting in place sufficient supplementary measures.

That issue of supplementary measures figured prominently in the Austrian data regulator’s decision. In its decision, the Datenschutzbehörde said that a European company could not use Google Analytics on its Austrian website because Google Analytics was sending the IP addresses of visitors to that website to Google’s servers in the United States. The Datenschutzbehörde reiterated earlier case law out of the EU that IP addresses can be sufficiently linked to individuals and therefore constitute personal data, so the GDPR applies. The regulator also found that IP addresses are not pseudonymous, and that Google doesn’t have sufficient supplementary measures in place to prevent US government authorities from accessing the data. As a result, the regulator found the use of Google Analytics and the transmission of IP addresses to the United States in this case violated the GDPR as interpreted by the Schrems II case. Since the Datenschutzbehörde announced its decision, Norway’s data protection authority announced it is joining the Austrian decision.

Google Analytics decision sets worrisome precedent

It’s important to remember that the Austrian ruling relates to one website’s use and implementation of Google Analytics. It is not a ban on Google Analytics throughout Europe. But is it a harbinger of more sweeping actions from data regulators? Any website might use dozens of third-party tools. If any of the third-party tools are transferring personal data to the US, they could attract the attention of an EU data regulator. Even if those tools are not collecting personal data or sensitive information intentionally, there remains a concern with the use of third-party tools, which evolves from how the Internet is built and operates.

Every time a user loads a website, those tools load and establish a connection between the end user’s browser and the third-party server. This connection is used for multiple purposes, such as requesting a script, reporting analytics data, or downloading an image pixel. In every such communication, the IP address of the visitor is exposed. This is how communication between a browser and a server has worked over the Internet since the Internet’s infancy.

The implications of the decision are therefore profound. If other European regulators adopt the Austrian ruling, and its conclusion that even the transfer of truncated IP addresses to the United States could constitute transfers of personal data that violate GDPR, the industry will likely need to fundamentally rethink current Internet architecture and the way IP addresses are used. Cloudflare increasingly believes that we’ll eventually solve these challenges by completely disassociating IP addresses from identity. We’ve partnered with others in the industry to pioneer new protocols like Oblivious DNS over HTTPS that divorce IP addresses from content being queried online to help begin to make this future a reality.

While we can envision this future, our customers need immediate ways to address regulators’ concerns. The median website in 2021 used 21 third-party solutions on mobile and 23 on desktop. At the 90th percentile, these numbers climbed to 89 third-party solutions on mobile, and 91 on desktop. Taking into account the Austrian DPA ruling, according to which the EU company itself is responsible for making sure no personal data is transmitted to the United States without proper handling, we can conclude that companies may soon become responsible for every one of their third-party solutions implemented on their website. And since this is a staggering amount of tools, it demands a scalable solution. Luckily, that is exactly what we have built.

Zaraz’s solution leverages Cloudflare’s global network and Workers platform

Zaraz is a third-party manager, built for speed, privacy and security. With Zaraz, customers can load analytics tools, advertising pixels, interactive widgets, and many other types of third-party tools without making any changes to their code.

Zaraz loads third party tools on the cloud, using Cloudflare Workers. There are multiple reasons why we chose to build on Workers, and you can read more about it in this blog post. By using Workers to offload third-party tools to the cloud and away from the browser, Zaraz creates an extra layer of security and control over Personal Identifiable Information (PII), Protected Health Information (PHI), or other sensitive pieces of information that are often unintentionally passed to third-party vendors.

Need to Keep Analytics Data in the EU? Cloudflare Zaraz Can Offer a Solution

In the traditional way of loading third-party tools, either via a Tag Management Software (TMS), a Customer Data Platform (CDP) or by including JavaScript snippets directly in the HTML, the browser always sends requests to the third-party domain. This is problematic for a bunch of reasons, but mainly because even if you wanted to, you can’t hide the user’s IP address. It is revealed with every HTTP request. It is also problematic because those tools execute remote JavaScript resources, and you have almost no visibility over the actions they take in the browser or the data they transmit.

We can use the Google Analytics example to illustrate the difference. When a website is loading Google Analytics either via Google Tag Manager or directly from the HTML, the browser downloads the analytics.js file that loads Google Analytics. It then sends an HTTP POST request from the browser to Google’s endpoint: https://www.google-analytics.com/collect. Both of these requests reveal the end-user’s IP address and might append to the URL some personal data, such as the Google Client ID, as query parameters for example.

Need to Keep Analytics Data in the EU? Cloudflare Zaraz Can Offer a Solution

In comparison, when you use Zaraz to load Google Analytics, there’s simply no communication at all between the browser and Google’s endpoint. Instead, Zaraz works as an intermediary, and the entire communication is between Zaraz (which runs on Workers servers, isolated from the browser) and the third party. You can think of Zaraz as an extra protection layer between the browser and the third-party endpoint, and this extra layer allows us to include some powerful privacy features.

For example, Zaraz allows customers to decide whether to transfer an end user’s IP address to Google Analytics or not. As simple as that. When configuring a new third-party tool like Google Analytics, you can choose in the tools settings page to hide IP addresses.

Need to Keep Analytics Data in the EU? Cloudflare Zaraz Can Offer a Solution

You can use this feature currently with Google Analytics and the Facebook Pixel/Conversion API. But with more and more tools opening up their API and allowing server-to-server integrations, we expect the number of tools you can apply this on to grow rapidly.

A somewhat similar feature Zaraz offers is the Zaraz Data Loss Prevention (DLP) feature, currently used by several of our Enterprise customers. The DLP feature scans every request going to a third-party endpoint to make sure it doesn’t include sensitive information such as names, email addresses, social  security number, credit card numbers, IP addresses, and more. Using this feature, customers can either mask the data or simply be alerted when a tool is collecting such personal data. It gives full visibility and control over the information shared with third parties.

How Zaraz Can Help with Data Localization

Right now, you might be asking yourself, “wait, but how is Cloudflare different from Google, and won’t end users’ logs go to Cloudflare’s US servers as well?” This is a great question, and where the combination of Zaraz with the Cloudflare global network makes us shine. We offer Enterprise customers Zaraz in combination with two powerful features of Cloudflare’s Data Localisation Suite: Regional Services, and the Customer Metadata Boundary.

Cloudflare Regional Services allows you to choose where you want the Cloudflare services to run, including the Zaraz service. To meet your compliance obligations, you may need control over where your data is inspected. Cloudflare Regional Services helps you decide where your data should be handled, without losing the performance benefits our network provides.

Let’s say you run a website for a European bank. Let’s also assume you enabled the Data Localisation Suite for the EU. When a person in the EU visits your website, an HTTP request is sent to activate Zaraz. Since Zaraz is running in a first-party context, meaning under your own domain, all the Data Localisation settings will apply on it as well. So the network will direct the traffic to the EU, without inspecting its content, and run Zaraz there.

The EU Customer Metadata Boundary expands the Data Localisation Suite to ensure that a customer’s end-user traffic metadata stays in the EU. “Metadata” can be a scary term, but it’s a simple concept — it just means “data about data.” In other words, it’s a description of activity that happened on our network. Using the EU Customer Metadata Boundary means that this type of metadata would be saved only in the EU.

And what about the end user’s personal data handled by Zaraz? By default, Zaraz doesn’t log or save any piece of information about the end user, with one exception in the case of error logging. To make our service better, we are saving logs of errors, so we can fix any issues. For customers that are using the Data Localisation Suite, this is something we can toggle off, which means that no log data whatsoever will be saved by Zaraz.

What Does the Future Hold for Privacy Features?

Since the Zaraz acquisition, we have been talking to hundreds of Cloudflare enterprise customers, and thousands of users using the beta for the free version of Zaraz. And we have gathered a shortlist of features that we plan to develop in 2022.

  • The Zaraz Consent Manager. Zaraz is fundamentally changing the way third-party tools are implemented on the web. So, in order to provide our customers with full control over user consent management, we realized we should build our own tool to allow customers to do so easily. The Zaraz consent manager will be fully integrated with Zaraz and will allow customers to take actions according to the user choices in a few clicks.
  • Geolocation Triggers. We are planning to add the option to create trigger rules based on an end user’s current location. This means you could configure tools to only load if the user is visiting your site from a specific region. You’d be able to even send specific events or properties according to the end-user’s location. This feature should help global companies to set granular configurations that meet the requirements of their global operations.
  • DLP pattern templates. At the moment, our DLP feature can scan requests going to third-party tools according to the patterns that enterprise customers create themselves. In the near future, we will introduce templates to help customers scan for common PII with more ease.

This is just a taste of what’s coming. If you have any ideas for privacy features you’d like to see, reach out to [email protected] – we would love to hear from you!

If you would like to explore the free beta version, please click here. Provided you are an Enterprise customer and want to learn more about Zaraz’s privacy features, please click here to join the waitlist. To join our Discord channel, click here.

Why Cloudflare Bought Zaraz

Post Syndicated from Matthew Prince original https://blog.cloudflare.com/why-cloudflare-bought-zaraz/

Why Cloudflare Bought Zaraz

Why Cloudflare Bought Zaraz

Today we’re excited to announce that Cloudflare has acquired Zaraz. The Zaraz value proposition aligns with Cloudflare’s mission. They aim to make the web more secure, more reliable, and faster. And they built their solution on Cloudflare Workers. In other words, it was a no-brainer that we invite them to join our team.

Be Careful Who Takes Out the Trash

To understand Zaraz’s value proposition, you need to understand one of the biggest risks to most websites that people aren’t paying enough attention to. And, to understand that, let me use an analogy.

Imagine you run a business. Imagine that business is, I don’t know, a pharmacy. You have employees. They have a process and way they do things. They’re under contract, and you conduct background checks before you hire them. They do their jobs well and you trust them. One day, however, you realize that no one is emptying the trash. So you ask your team to find someone to empty the trash regularly.

Your team is busy and no one has the time to add this to their regular duties. But one plucky employee has an idea. He goes out on the street and hails down a relative stranger. “Hey,” your employee says to the stranger. “I’ve seen you walking by this way every day. Would you mind stopping in and taking out the trash when you do?”

“Uh”, the stranger says. “Sure?!”

“Great,” your employee says. “Here’s a badge that will let you into the building. The trash is behind the secure area of the pharmacy, but, don’t worry, just use the badge, and you can get back there. You look trustworthy. This will work out great!!”

And for a while it does. The stranger swings by every day. Takes out the trash. Behaves exactly as hoped. And no one thinks much about the trash again.

But one day you walk in, and the pharmacy has been robbed. Drugs stolen, patient records missing. Logs indicate that it was the stranger’s badge that had been used to access the pharmacy. You track down the stranger, and he says “Hey, that sucks, but it wasn’t me”. I handed off that trash responsibility to someone else long ago when I stopped walking past the pharmacy every day.”

And you never track down the person who used the privileged access to violate your trust.

The Keys to the Kingdom

Now, of course, this is crazy. No one would go pick a random stranger off the street and give them access to their physical store. And yet, in the virtual world, a version of this happens all the time.

Every day, front end developers, marketers, and even security teams embed third-party scripts directly on their web pages. These scripts perform basic tasks — the metaphorical equivalent of taking out the trash. When performing correctly, they can be valuable at bringing advanced functionality to sites, helping track marketing conversions, providing analytics, or stopping fraud. But, if they ever go bad, they can cause significant problems and even steal data.

At the most mundane, poorly configured scripts can slow down the rendering pages. While there are ways to make scripts non-blocking, the unfortunate reality is that their developers don’t always follow the best practices. Often when we see slow websites, the biggest cause of slowness is all the third-party scripts that have been embedded.

But it can be worse. Much worse. At Cloudflare, we’ve seen this first hand. Back in 2019 a hacker compromised a third-party service that Cloudflare used and modified the third-party JavaScript that was loaded into a page on cloudflare.com. Their aim was to steal login cookies, usernames and passwords. They went so far as to automatically create username and password fields that would autocomplete.

Here’s a snippet of the actual code injected:

        var cf_form = document.createElement("form");
        cf_form.style.display = "none";
        document.body.appendChild(cf_form);
        var cf_email = document.createElement("input");
        cf_email.setAttribute("type", "text");
        cf_email.setAttribute("name", "email");
        cf_email.setAttribute("autocomplete", "username");
        cf_email.setAttribute("id", "_email_");
        cf_email.style.display = "none";
        cf_form.appendChild(cf_email);
        var cf_password = document.createElement("input");
        cf_password.setAttribute("type", "password");
        cf_password.setAttribute("name", "password");
        cf_password.setAttribute("autocomplete", "current-password");
        cf_password.setAttribute("id", "_password_");
        cf_password.style.display = "none";
        cf_form.appendChild(cf_password);

Luckily, this attack caused minimal damage because it was caught very quickly by the team, but it highlights the very real danger of third-party JavaScript. Why should code designed to count clicks even be allowed to create a password field?

Put simply, third-party JavaScript is a security nightmare for the web. What looks like a simple one-line change (“just add this JavaScript to get free page view tracking!”) opens a door to malicious code that you simply don’t control.

And worse is that third-party JavaScript can and does load other JavaScript from other unknown parties. Even if you trust the company whose code you’ve chosen to embed, you probably don’t trust (or even know about!) what they choose to include.

And even worse these scripts can change any time. Security threats can come and go. The attacker who went after Cloudflare compromised the third-party and modified their service to only attack Cloudflare and included anti-debugging features to try to stop developers spotting the hack. If you’re a CIO and this doesn’t freak you out already, ask your web development team how many third-party scripts are on your websites. Do you trust them all?

The practice of adding third-party scripts to handle simple tasks is the literal equivalent of pulling a random stranger off the street, giving them physical access to your office, and asking them to stop by once a day to empty the trash. It’s completely crazy in the physical world, and yet it’s common practice in web development.

Sandboxing the Strangers

At Cloudflare, our solution was draconian. We ordered that all third-party scripts be stripped from our websites. Different teams at Cloudflare were concerned. Especially our marketing team, who used these scripts to assess whether the campaigns they were running were successful. But we made the decision that it was more important to protect the integrity of our service than to have visibility into things like marketing campaigns.

It was around this time that we met the team behind Zaraz. They argued there didn’t need to be such a drastic choice. What if, instead, you could strictly control what the scripts that you insert on your page did. Make sure if ever they were compromised they wouldn’t have access to anything they weren’t authorized to see. Ensure that if they failed or were slow they wouldn’t keep a page from rendering.

We’ve spent the last half year testing Zaraz, and it’s magical. It gives you the best of the flexible, extensible web while ensuring that CIOs and CISOs can sleep well at night knowing that even if a third-party script provider is compromised, it won’t result in a security incident.

To put a fine point on it, had Cloudflare been running Zaraz then the threat from the compromised script we saw in 2019 would have been completely and automatically eliminated. There’s no way for the attacker to create those username and password fields, no access to cookies that are stored in the user’s browser. The attack surface would have been completely removed.

We’ve published two other posts today outlining how Zaraz works as well as examples of how companies are using it to ensure their web presence is secure, reliable, and fast. We are making Zaraz available to our Enterprise customers immediately, and all other customers can access a free beta version on their dashboard starting today.

If you’re a third-party script developer, be on notice that if you’re not properly securing your scripts, then as Zaraz rolls out across more of the web your scripts will stop working. Today, Cloudflare sits in front of nearly 20% of all websites and, before long, we expect Zaraz’s technology will help protect all of them. We want to make sure all scripts running on our customers’ sites meet modern security, reliability, and performance standards. If you need help getting there, please reach out, and we’ll be standing ready to help: [email protected]

In the meantime, we encourage you to read about how the Zaraz technology works and how customers like Instacart are using it to build a better web presence.

It’s terrific to have Zaraz on board, furthering Cloudflare’s mission to help build a better Internet. Welcome to the team. And in that vein: we’d like to welcome you to Zaraz! We’re excited for you to get your hands on this piece of technology that makes the web better.

Zaraz use Workers to make third-party tools secure and fast

Post Syndicated from Yo'av Moshe original https://blog.cloudflare.com/zaraz-use-workers-to-make-third-party-tools-secure-and-fast/

Zaraz use Workers to make third-party tools secure and fast

Zaraz use Workers to make third-party tools secure and fast

We decided to create Zaraz around the end of March 2020. We were working on another product when we noticed everyone was asking us about the performance impact of having many third-parties on their website. Third-party content is an important part of the majority of websites today, powering analytics, chatbots, conversion pixels, widgets — you name it. The definition of third-party is an asset, often JavaScript, hosted outside the primary site-user relationship, that is not under the direct control of the site owner but is present with ‘approval’. Yair wrote in detail about the process of measuring the impact of these third-party tools, and how we pivoted our startup, but I wanted to write about how we built Zaraz and what it actually does behind the scenes.

Third parties are great in that they let you integrate already-made solutions with your website, and you barely need to do any coding. Analytics? Just drop this code snippet. Chat widget? Just add this one. Third-party vendors will usually instruct you on how to add their tool, and from that point on things should just be working. Right? But when you add third-party code, it usually fetches even more code from remote sources, meaning you have less and less control over whatever is happening in your visitors’ browsers. How can you guarantee that none of the multitude of third parties you have on your website wasn’t hacked, and started stealing information, mining cryptocurrencies or logging key presses on your visitors’ computers?

It doesn’t even have to be a deliberate hack. As we investigated more and more third-party tools, we noticed a pattern — sometimes it’s easier for a third-party vendor to collect everything, rather than being selective or careful about it. More often than not, user emails would find their way into a third-party tool, which could very easily put the website owner in trouble due to GDPR, CCPA, or similar.

How third-party tools work today

Usually, when you add a third party to your page, you’re asked to add a piece of JavaScript code to the <head> of your HTML. Google Analytics is by far the most popular third-party, so let’s see how it’s done there:

<!-- Google Analytics -->
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');

ga('create', 'UA-XXXXX-Y', 'auto');
ga('send', 'pageview');
</script>
<!-- End Google Analytics -->

In this case, and in most other cases, the snippet that you’re pasting actually calls more JavaScript code to be executed. The snippet above creates a new <script> element, gives it the https://www.google-analytics.com/analytics.js src attribute, and appends it to the DOM. The browser then loads the analytics.js script, which includes more JavaScript code than the snippet itself, and sometimes asks the browser to download even more scripts, some of them bigger than analytics.js itself. So far, however, no analytics data has been captured at all, although this is why you’ve added Google Analytics in the first place.

The last line in the snippet, ga('send', 'pageview');, uses a function defined in the analytics.js file to finally send the pageview. The function is needed because it is what is capturing the analytics data — it fetches the kind of browser, the screen resolution, the language, etc…  Then, it constructs a URL that includes all the data, and  sends a request to this URL. It’s only after this step that the analytics information gets captured. Every user behavior event you record using Google Analytics will result in another request.

The reality is that the vast majority of tools consist of more than one resource file, and that it’s practically impossible to know in advance what a tool is going to load without testing it on your website. You can use Request Map Generator to get a visual representation of all the resources loaded on your website, including how they call each other. Below is a Request Map of a demo e-commerce website we created:

Zaraz use Workers to make third-party tools secure and fast

That big blue circle is our website’s resources, and all other circles are third-party tools. You can see how the big green circle is actually a sub-request of the main Facebook pixel (fbevents.js), and how many tools, like LinkedIn on top right, are creating a redirect chain in order to sync some data, on the expense of forcing the browser to make more and more network requests.

A new place to run a tag manager — the edge

Since we want to make third-parties faster, more secure, and private, we had to develop a fundamental new way of thinking about them and a new system for how they run. We came up with a plan: build a platform where third-parties can run code outside the browser, while still getting access to the information they need and being able to talk with the DOM when necessary. We don’t believe third parties are evil: they never intended to slow down the Internet for everyone, they just didn’t have another option. Being able to run code on the edge and run it fast opened up new possibilities and changed all that, but the transition is hard.

By moving third-party code to run outside the browser, we get multiple wins.

  • The website will load faster and be more interactive. The browser rendering your website can now focus on the most important thing — your website. The downloading, parsing and execution of all the third-party scripts will no longer compete or even block the rendering and interactivity of your website.
  • Control over the data sent to third-parties. Third-party tools often automatically collect information from the page and from the browser to, for example, measure site behaviour/usage. In many cases, this information should stay private. For example, most tools collect the document.location, but we often see a “reset password” page including the user email in the URL, meaning emails are unknowingly being sent and saved by third-party providers, usually without consent. Moving the execution of the third parties to the edge means we have full visibility into what is being sent. This means we can provide alerts and filters in case tools are trying to collect Personally Identifiable Information or mask the private parts of the data before they reach third-party servers. This feature is currently not available on the public beta, but contact us if you want to start using it today.
  • By reducing the amount of code being executed in the browser and by scanning all code that is executed in it, we can continuously verify that the code hasn’t been tampered with and that it only does what it is intended to do. We are working to connect Zaraz with Cloudflare Page Shield to do this automatically.

When you configure a third-party tool through a normal tag manager, a lot happens in the browsers of your visitors which is out of your control. The tag manager will load and then evaluate all trigger rules to decide which tools to load. It would then usually append the script tags of those tools to the DOM of the page, making the browser fetch the scripts and execute them. These scripts come from untrusted or unknown origins, increasing the risk of malicious code execution in the browser. They can also block the browser from becoming interactive until they are completely executed. They are generally free to do whatever they want in the browser, but most commonly they would then collect some information and send it to some endpoint on the third-party server. With Zaraz, the browser essentially does none of that.

Zaraz use Workers to make third-party tools secure and fast

Choosing Cloudflare Workers

When we set about coding Zaraz, we quickly understood that our infrastructure decisions would have a massive impact on our service. In fact, choosing the wrong one could mean we have no service at all. The most common alternative to Zaraz is traditional Tag Management software. They generally have no server-side component: whenever a user “publishes” a configuration, a JavaScript file is rendered and hosted as a static asset on a CDN. With Zaraz the idea is to move most of the evaluation of code out of the browser, and respond with a dynamically generated JavaScript code each time. We needed to find a solution that would allow us to have a server-side component, but would be as fast as a CDN. Otherwise, there was a risk we might end up slowing down websites instead of making them faster.

We needed Zaraz to be served from a place close to the visiting user. Since setting up servers all around the world seemed like too big of a task for a very young startup, we looked at a few distributed serverless platforms. We approached this search with a small list of requirements:

  • Run JavaScript: Third-party tools all use JavaScript. If we were to port them to run in a cloud environment, the easiest way to do so would be to be able to use JavaScript as well.
  • Secure: We are processing sensitive data. We can’t afford the risk of someone hacking into our EC2 instance. We wanted to make sure that data doesn’t stay on some server after we sent our HTTP response.
  • Fully programmable: Some CDNs allow setting complicated rules for handling a request, but altering HTTP headers, setting redirects or HTTP response codes isn’t enough. We need to generate JavaScript code on the fly, meaning we need full control over the responses. We also need to use some external JavaScript libraries.
  • Extremely fast and globally distributed: In the very early stages of the company, we already had customers in the USA, Europe, India, and Israel. As we were preparing to show them a Proof of Concept, we needed to be sure it would be fast wherever they are. We were competing with tag managers and Customer Data Platforms that have a pretty fast response time, so we need to be able to respond as fast as if our content was statically hosted on a CDN, or faster.

Initially we thought we would need to create Docker containers that would run around the globe and would use their own HTTP server, but then a friend from our Y Combinator batch said we should check out Cloudflare Workers.

At first, we thought it wouldn’t work — Workers doesn’t work like a Node.js application, and we felt that limitation would prevent us from building what we wanted. We planned to let Workers handle the requests coming from users’ browsers, and then use an AWS Lambda for the heavy lifting of actually processing data and sending it to third-party vendors.

Our first attempt with Workers was very simple: just confirming we could use it to actually return dynamic browser-side JavaScript that is generated on-the-fly:

addEventListener('fetch', (event) => {
 event.respondWith(handleRequest(event.request))
})
 
async function handleRequest(request) {
   let code = '(function() {'
  
   if (request.headers.get('user-agent').includes('Firefox')) {
     code += `console.log('Hello Firefox!');`
   } else {
     code += `console.log('Hey other browsers...');`
   }
  
   code += '})();'
  
   return new Response(code, {
     headers: { 'content-type': 'text/javascript' }
   });
}

It was a tiny example, but I remember calling Yair afterwards and saying “this could actually work!”. It proved the flexibility of Workers. We just created an endpoint that served a JavaScript file, this JavaScript file was dynamically generated, and the response time was less than 10ms. We could now put <script src="path/to/worker.js"> in our HTML and treat this Worker like a normal JavaScript file.

As we took a deeper look, we found Workers answering demand after demand from our list, and learned we could even do the most complicated things inside Workers. The Lambda function started doing less and less, and was eventually removed. Our little Node.js proof-of-concept was easily converted to Workers.

Using the Cloudflare Workers platform: “standing on the shoulders of giants”

When we raised our seed round we heard many questions like “if this can work, how come it wasn’t built before?” We often said that while the problem has been a long standing one, accessible edge computing is a new possibility. Later, on our first investors update after creating the prototype, we told them about the unbelievably fast response time we managed to achieve and got much praise for it — talk about “standing on the shoulders of giants”. Workers simply checked all our boxes. Running JavaScript and using the same V8 engine as the browser meant that we could keep the same environment when porting tools to run on the cloud (it also helped with hiring). It also opened the possibility of later on using WebAssembly for certain tasks. The fact that Workers are serverless and stateless by default was a selling point for our own trustworthiness: we told customers we couldn’t save their personal data even by mistake, which was true. The integration between webpack and Wrangler meant that we could write a full-blown application — with modules and external dependencies — to shift 100% of our logic into our Worker. And the performance helped us ace all our demos.

As we were building Zaraz, the Workers platform got more advanced. We ended up using Workers KV for storing user configuration, and Durable Objects for communicating between Workers. Our main Worker holds server-side implementations of more than 50 popular third-party tools, replacing hundreds of thousands of JavaScript lines of code that traditionally run inside browsers. It’s an ever growing list, and we recently also published an SDK that allows third-party vendors to build support for their tools by themselves. For the first time, they can do it in a secure, private, and fast environment.

A new way to build third-parties

Most third-party tools do two fundamental things: First, they collect some information from the browser such as screen resolution, current URL, page title or cookie content. Second, they send it to their server. It is often simple, but when a website has tens of these tools, and each of them query for the information it needs and then sends its requests, it can cause a real slowdown. On Zaraz, this looks very different: Every tool provides a run function, and when Zaraz evaluates the user request and decides to load a tool, it executes this run function. This is how we built integrations for over 50 different tools, all from different categories, and this is how we’re inviting third-party vendors to write their own integrations into Zaraz.

run({system, utils}) { 
  // The `system` object includes information about the current page, browser, and more 
  const { device, page, cookies } = system
  // The `utils` are a set of functions we found useful across multiple tools
  const { getCookieString, waitUntil } = utils

  // Get the existing cookie content, or create a new UUID instead
  const cookieName = 'visitor-identifier'
  const sessionCookie = cookies[cookieName] || crypto.randomUUID()

  // Build the payload
  const payload = {
    session: sessionCookie,
    ip: device.ip,
    resolution: device.resolution,
    ua: device.userAgent,
    url: page.url.href,
    title: page.title,
  }

  // Construct the URL
  const baseURL = 'https://example.com/collect?'
  const params = new URLSearchParams(payload)
  const finalURL = baseURL + params

  // Send a request to the third-party server from the edge
  waitUntil(fetch(finalURL))
  
  // Save or update the cookie in the browser
  return getCookieString(cookieName, sessionCookie)
}

The above code runs in our Cloudflare Worker, instead of the browser. Previously, having 10x more tools meant 10x more requests browsers rendering your website needed to make, and 10x more JavaScript code they needed to evaluate. This code would often be repetitive, for example, almost every tool implements their own “get cookie” function. It’s also 10x more origins you have to trust no one is tampering with. When running tools on the edge, this doesn’t affect the browser at all: you can add as many tools as you want, but they wouldn’t be loading in the browser, so they will have no effect.

In this example, we first check for the existence of a cookie that identifies the session, called “visitor-identifier”. If it exists, we read its value; if not, we generate a new UUID for it. Note that the power of Workers is all accessible here: we use crypto.randomUUID() just like we can use any other Workers functionality. We then collect all the information our example tool needs — user agent, current URL, page title, screen resolution, client IP address — and the content of the “visitor-identifier” cookie. We construct the final URL that the Worker needs to send a request to, and we then use waitUntil to make sure the request gets there. Zaraz’s version of fetch gives our tools automatic logging, data loss prevention and retries capabilities.

Lastly, we return the value of the getCookieString function. Whatever string is returned by the run function is passed to the visitor as browser-side JavaScript. In this case, getCookieString returns something like document.cookie = 'visitor-identifier=5006e6fa-7ce6-45ef-8724-c846f1953369; Path=/; Max-age=31536000';, causing the browser to create a first-party cookie. The next time a user loads a page, the visitor-identifier cookie should exist, causing Zaraz to reuse the UUID instead of creating a new one.

This system of run functions allows us to separate and isolate each tool to run independently of the rest of the system, while still providing it with all the required context and data coming from the browser, and the capabilities of Workers. We are inviting third-party vendors to work with us to build the future of secure, private and fast third-party tools.

A new events system

Many third-party tools need to collect behavioral information during a user visit. For example, you might want to place a conversation pixel right after a user clicked “submit” on the credit card form. Since we moved tools to the cloud, you can’t access their libraries from the browser context anymore. For that we created zaraz.track() — a method that allows you to call tools programmatically, and optionally provide them with more information:

document.getElementById("credit-card-form").addEventListener("submit", () => {
  zaraz.track("card-submission", {
    value: document.getElementById("total").innerHTML,
    transaction: "X-98765",
  });
});

In this example, we’re letting Zaraz know about a trigger called “card-submission”, and we associate some data with it — the value of the transaction that we’re taking from an element with the ID total, and a transaction code that is hardcoded and gets printed directly from our backend.

In the Zaraz interface, configured tools can be subscribed to different and multiple triggers. When the code above gets triggered, Zaraz checks, on the edge, what tools are subscribed to the card-submission trigger, and it then calls them with the right additional data supplied, populating their requests with the transaction code and its value.

This is different from how traditional tag managers work: GTM’s dataLayer.push serves a similar purpose, but is evaluated client-side. The result is that GTM itself, when used intensively, will grow its script so much that it can become the heaviest tool a website loads. Each event sent using dataLayer.push will cause repeated evaluation of code in the browser, and each tool that will match the evaluation will execute code in the browser, and might call more external assets again. As these events are usually coupled with user interactions, this often makes interacting with a website feel slow, because running the tools is occupying the main thread. With Zaraz, these tools exist and are evaluated only at the edge, improving the website’s speed and security.

Zaraz use Workers to make third-party tools secure and fast

You don’t have to be coder to use triggers. The Zaraz dashboard allows you to choose from a predefined set of templates like click listeners, scroll events and more, that you can attach to any element on your website without touching your code. When you combine zaraz.track() with the ability to program your own tools, what you get is essentially a one-liner integration of Workers into your website. You can write any backend code you want and Zaraz will take care of calling it exactly at the right time with the right parameters.

Joining Cloudflare

When new customers started using Zaraz, we noticed a pattern: the best teams we worked with chose Cloudflare, and some were also moving parts of their backend infrastructure to Workers. We figured we could further improve performance and integration for companies using Cloudflare as well. We could inline parts of the code inside the page and then further reduce the amount of network requests. Integration also allowed us to remove the time it takes to DNS resolve our script, because we could use Workers to proxy Zaraz into our customers’ domains. Integrating with Cloudflare made our offering even more compelling.

Back when we were doing Y Combinator in Winter 2020 and realized how much third parties could affect a websites’ performance, we saw a grand mission ahead of us: creating a faster, private, and secure web by reducing the amount of third-party bloat. This mission remained the same to this day. As our conversations with Cloudflare got deeper, we were excited to realize that we’re talking with people who share the same vision. We are thrilled for the opportunity to scale our solutions to millions of websites on the Internet, making them faster and safer and even reducing carbon emissions.

If you would like to explore the free beta version, please click here. If you are an enterprise and have additional/custom requirements, please click here to join the waitlist. To join our Discord channel, click here.