Tag Archives: javascript

JAMstack at the Edge: How we built Built with Workers… on Workers

Post Syndicated from Kristian Freeman original https://blog.cloudflare.com/jamstack-at-the-edge-how-we-built-built-with-workers-on-workers/

JAMstack at the Edge: How we built Built with Workers… on Workers

I’m extremely stoked to announce Built with Workers today – it’s an awesome resource for exploring what you can build with Cloudflare Workers. As Adam explained in our launch post, showcasing developers building incredible projects with tools like Workers KV or our streaming HTML rewriter is a great way to celebrate users of our platform. It also helps encourage developers to try building their dream app on top of Workers. In this post, I’ll explore some of the architectural and implementation designs we made while building the site.

When we first started planning Built with Workers, we wanted to use the site as an opportunity to build a new greenfield application, showcasing the strength of the Workers platform. The Workers Developer Experience team is cross-functional: while we might spend most of our time improving our docs, or developing features for our command-line interface Wrangler, most of us have spent years developing on the web. The prospect of starting a new application is always fun, but in this instance, it was a prime chance to ask (and answer) the question, “If I could build this site on Workers with whatever tools I want, what would I choose?”

A guiding principle for the Workers platform is ease-of-use. The programming model is simple: it’s just JavaScript (or, via WASM, Rust, C, and C++), and you have complete control over the requests coming in and the requests going out from your Workers script. In the same way, while building Built with Workers, it was crucial to find a set of tools that could enable something like this throughout the process of building the entire application. To enable this, we’ve embraced JAMstack – a software stack comprised of JavaScript, APIs, and markup – with Built with Workers, deploying always up-to-date static builds of the site directly to the edge, using Workers Sites. Our framework of choice, Gatsby.js, provides a set of sane defaults to build a modern web application. To manage content and the layout of the site, we’ve chosen Sanity.io, a powerful headless CMS that allows us to model the entire website without needing to deploy any databases or spin up any additional infrastructure.

Personally, I’m excited about JAMstack as a methodology for building web applications because of this emphasis on reducing infrastructure: it’s incredibly similar to the motivations behind deploying serverless applications using Cloudflare Workers, and as we developed Built with Workers, we discovered a number of these philosophical similarities in JAMstack and Cloudflare Workers – exciting! To help encourage developers to explore building their own JAMstack applications on Workers, I’m also announcing today that we’ve made the Built with Workers codebase open-source on GitHub – you can check out how the application is developed, built and deployed from start to finish.

In this post, we’ll dig into Built with Workers, exploring how it works, the technical decisions we’ve made, and some of the most fascinating aspects of what it means to build applications on the edge.

JAMstack at the Edge: How we built Built with Workers… on Workers
A screenshot of the Built with Workers homepage

Exploring the JAMstack

My first encounter with tooling that would ultimately become part of “JAMstack” was in 2013. I noticed the huge proliferation of developers building personal “static” sites – taking blog posts written primarily in Markdown, and pushing them through frameworks like Jekyll to build full websites that could easily be deployed to a number of CDNs and file hosting platforms. These static sites were fast – they are just HTML, CSS, and JavaScript – and easy to update. The average developer spends their days maintaining large and complex software systems, so it was relaxing to just write Markdown, plug in some re-usable HTML and CSS, and deploy your website. The advent of static sites, of course, isn’t new – but after years of increasingly complex full-stack technology, the return to simplicity was a promising development for many kinds of websites that didn’t need databases, or any dynamic content.

In the last couple years, JAMstack has built upon that resurgence, and represents an approach to building complete, complex applications using the same tooling that has become the first choice for developers building their simple personal sites. JAMstack is comprised of three conceptual pieces – JavaScript, APIs, and Markup – each of which is a crucial aspect of simplifying our web applications and making them easy to write, build, and deploy.

J is for JavaScript

The JAMstack architecture relies heavily on the ubiquity of JavaScript as the language of the web. Many modern web applications use powerful, dynamic front-end frameworks like React and Vue to render user interfaces and process state on the client for users. On the backend, or in Workers’ case, on the edge, any dynamic functionality in your JAMstack application should be built on top of JavaScript, often working in the request-response model that full-stack developers are accustomed to.

The Workers platform is perfectly suited to this requirement! As a developer building on Workers, you have total control of incoming requests and outgoing responses, using the JavaScript Service Worker APIs you know and love. We built Workers Sites as an extension of the Workers platform (and Workers KV as a storage mechanism at the edge), making it possible to deploy your site assets using a single command in Wrangler: wrangler publish.

When your Workers Site receives a new request, we’ll execute JavaScript at the edge to look up a piece of content from Workers KV, and serve it back to the client at lightning speed. Remarkably, you can deploy JAMstack applications on Workers with no additional configuration besides generating your Workers Siteby design, Workers Sites is built to serve as an exceptional JAMstack deployment platform.

A is for APIs

The advent of static site tooling for personal sites makes sense: your site is a few pages: a few blog posts, for instance, and the classic “About” or “Contact” page. When it’s compiled to HTML, the footprint is quite small! This small footprint is what makes static sites easy to reason about: they’re trivial to host in terms of bandwidth and storage costs, and they rarely change, so they’re easily cacheable.

While that works for personal sites, complex applications actually have data requirements! We need to talk to the user data in our databases, and analytics information in our data warehouses. JAMstack apps tackle this by definitively stating that these data sources should be accessible via HTTPS APIs, consumable by the application as a way to provide dynamic information to clients.

Workers is a fascinating platform in regards to JAMstack APIs. It can serve as a gateway to your data, or as a place to persist and return data itself. I can, for instance, expose an API endpoint via my Workers script without giving clients access to my origin. I can also use tooling like Workers KV to persist data directly on the edge, and when a user requests that data, I can resolve the data by returning JSON directly from my application.

This flexibility has been an unexpected part of the experience of developing Built with Workers. In a later section of this post, I’ll talk about how we developed an integral feature of the site based on the unique strengths of Workers as a way to host static assets and as a dynamic JavaScript execution platform. This has remarkable implications that blur the lines between classic static sites and dynamic applications, and I’m really excited about it.

M is for Markup

A breakthrough moment in my understanding of JAMstack came at the beginning of this year. I was working on a job board for frontend developers, using the static site framework Gatsby.js and Sanity.io, a headless CMS tool that allows developers to model content without maintaining a database or any infrastructure. (As a reminder – this set of tools is identical to what we ultimately used to develop Built with Workers. It’s a very good stack!)

SEO is crucial to a job board, and as I began to explore how to drive more traffic to my job board, I landed on the idea of generating a huge amount of search-oriented content automatically from the job data I already had. For instance, if I had job posts with the keywords “React”, “Europe”, and “Senior” (as in “senior developer”), what if I created pages with titles like “Senior React developer jobs in Europe”, or “Remote Angular jobs”? This approach would allow the site to begin ranking for a variety of job positions, locations, and experience levels, and as more jobs were posted on the site, each of these pages would be enriched with more useful information and relevant content, helping it rank higher in search.

“But static sites are… static!”, I told myself. Would I need to build an entire dynamic API on top of my static site, just to be able to serve these search-engine optimized pages? This led me to a “eureka” moment with Gatsby – I could define markup (the “M” in JAMstack), and when I’m building my site, I could look at all the available job data I had, cycling through every available keyword combination and inserting it into my markup to generate thousands of these pages. As I later learned, this idea is not necessarily unique to Gatsby – it is possible, for instance, to automate getting data from your API and writing it to data files in earlier static site frameworks like Hugo – but it is a first-class citizen in Gatsby. There are a ton of data sources available via Gatsby plugins, and because they’re all exposed via HTTPS, the workflow is standardized inside of the framework.

In Built with Workers, we connect to the Sanity.io CMS instance at build-time: crucially, by the time that the site has been deployed to Workers, the application effectively has no idea what Sanity even is! Our Gatsby application connects to Sanity.io via an HTTPS API, and using GraphQL, we look at all the data that we have in our CMS, and make decisions about what pages to generate and how to render the site’s interface, ultimately resulting in a statically-built application that is derived from dynamic data.

This emphasis on the build step in JAMstack is quite different than the classic web application. In the past, a user requested data, a web server looked at what the user was requesting, and then the user waits, as the server gets that data, returns JSON, or interpolates it into templates written in tools like Pug or ERB. With JAMstack, the pages are already built, and the deployed application is just a collection of plain HTML, CSS, and JavaScript.

Why Cloudflare Workers?

Cloudflare’s network is a fascinating place to deploy JAMstack applications. Yes, Cloudflare’s edge network can act as a CDN for your static assets, like your CSS stylesheets, or your client-side JavaScript code. But with Workers, we now have the ability to run JavaScript side-by-side with our static assets. In most JAMstack applications, the CDN is simply a bucket where your application ends up. Usually, the CDN is the most boring part of the stack! With Cloudflare Workers, we don’t just have a CDN: we also have access to an extremely low-latency, fully-featured JavaScript runtime.

The implications of this on the standard JAMstack workflow are, frankly, mind-boggling, and as part of developing Built with Workers, we’ve been exploring what it means to have this runtime available side-by-side with our statically-built JAMstack application.

To demonstrate this, we’ve implemented a bookmarking feature, which allows users of Built with Workers to bookmark projects. If you look at a project’s usage of our streaming HTML rewriter and say “Wow, that’s cool!”, you can also bookmark for the project to show your support. This feature, rendered as a button tag is deceptively simple: it’s a single piece of the user interface that makes use of the entirety of the Workers platform, to provide user-specific dynamic functionality. We’ll explore this in greater detail later in the post – see “Enhancing static sites at the edge”.


A modern development and content workflow

In the announcement post for Workers Sites, Rita outlined the motivations behind launching Workers Sites as a modern way to deploy sites:

“Born on the edge, Workers Sites is what we think modern development on the web should look like, natively secure, fast, and massively scalable. Less of your time is spent on configuration, and more of your time is spent on your code, and content itself.”

A few months later, I can say definitively that Workers Sites has enabled us to develop Built with Workers and spend almost no time on configuration. Using our GitHub Action for deploying Workers applications with Wrangler, the site has been continuously deploying to a staging environment for the past couple weeks. The simplicity around this continuous deployment workflow has allowed us to focus on the important aspects of the project: development and content.

The static site framework ecosystem is fairly competitive, but as we considered our options for this site, I advocated strongly for Gatsby.js. It’s an incredible tool for building JAMstack applications, with a great set of default for performant applications. It’s common to see Gatsby sites with Lighthouse scores in the upper 90s, and the decision to use React for implementing the UI makes it straightforward to onboard new developers if they’re familiar with React.

As I mentioned in a previous section, Gatsby shines at build-time. Gatsby’s APIs for creating pages during the build process based on API data are incredibly powerful, allowing developers to concretely define every statically-generated page on their web application, as well as any relevant data that needs to be passed in.

With Gatsby decided upon as our static site framework, we needed to evaluate where our content would live. Built with Workers has two primary data models, used to generate the UI:

  • Projects: websites, applications, and APIs created by developers using Cloudflare Workers. For instance, Built with Workers!
  • Features: features available on the Workers platform used to build applications. For instance, Workers KV, or our streaming HTML rewriter/parser.

Given these requirements, there were a number of potential approaches to take to store this data, and make it accessible. Keeping in line with JAMstack, we know that we probably should expose it via an HTTPS API, but from where? In what format?

As a full-stack developer who’s comfortable with databases, it’s easy to envision a world where we spin up a PostgreSQL instance, write a REST API, and write all kinds of fetch('/api/projects') to get the information we need. This method works, but we can do better! In the same way we built Workers Sites to simplify the deployment process, it was worthwhile to explore the JAMstack ecosystem and see what solutions exist for modeling data without being on the hook for more infrastructure.

Of the different tools in the ecosystem – databases, whether SQL or NoSQL, key-value stores (such as our own, Workers KV), etc. – the growth of “headless CMS” tools has made the largest impact on my development workflow.

On CSS Tricks, Chris Coyier wrote about the rise of headless CMS tools back in March 2016, and summarizes their function well:

[Headless CMSes are] very related to The Big Conversation™ on the web the last many years. How are we going to handle bringing Our Stuff™ all these different devices/screens/inputs.
Responsive design says "let’s let our design and media accommodate as much variation in screens as possible."
Progressive enhancement says "let’s make the functionality of this site work no matter what."
Designing for accessibility says "let’s ensure everyone can use this regardless of their capabilities as a person."
A headless CMS says "let’s not tie our data to any one way of doing things."

Using our headless CMS, Sanity.io, we can get every project inside our dataset, and call Gatsby’s createPage function to create a new page for each project, using a pre-defined project template file:

// gatsby-node.js

exports.createPages = async ({ graphql, actions }) => {
  const { createPage } = actions;

  const result = await graphql(`
    {
      allSanityProject {
        edges {
          node {
            slug
          }
        }
      }
    }
  `);

  if (result.errors) {
    throw result.errors;
  }

  const {
    data: { allSanityProject }
  } = result;

  const projects = allSanityProject.edges.map(({ node }) => node);
  projects.forEach((node, _index) => {
    const path = `/built-with/projects/${node.slug}`;

    createPage({
      path,
      component: require.resolve("./src/templates/project.js"),
      context: { slug: node.slug }
    });
  });
};

Using Sanity to drive the content for Built with Workers has been a huge win for our team. We’re no longer constrained by code deploys to make changes to content on the site – we don’t need to make a pull request to create a new project, and edits to a project’s name or description aren’t constrained by someone with the ability to deploy the project. Instead, we can empower members of our team to log in directly to the CMS and make changes, and be confident that once the corresponding deploy has completed (see “The CDN is the deployment platform” below), their changes will be live on the site.

Dynamic JAMstack layouts

As our team got up and running with Sanity.io, we found that the flexibility of a headless CMS platform was useful not just for creating our original data requirements – projects and features – but in rethinking and innovating on how we actually format the application itself.

With our previous objective of empowering non-technical folks to make changes to the site without deploying any code in consideration, we’ve also taken the entire homepage of Built with Workers and defined it as an instance of the “layout” data model in Sanity.io. By doing this, we can define corresponding “collections”, which are sets of projects. When a layout has many collections defined inside of the CMS, we can rapidly re-order, re-arrange, and experiment with new collections on the homepage, seeing the updated version of the site reflected immediately, and live on the production site within only a few minutes, after our continuous deployment process has finished.

JAMstack at the Edge: How we built Built with Workers… on Workers
Updating the layout of Built with Workers live from Sanity’s studio

With this work implemented, it’s easy to envision a world where our React code is purely concerned with rendering each individual aspect of the application’s interface – for instance, the project title component, or the “card” for an individual project – and the CMS drives the entire layout of the site. In the future, I’d like to continue exploring this idea in other pages in Built with Workers, including the project pages and any other future content we put on the site.

Enhancing static sites at the edge

Much of what we’ve discussed so far can be thought of as features and workflows that have great DX (developer experience), but not specific to Workers. Gatsby and Sanity.io are great, and although Workers Sites is a great platform for deploying JAMstack applications due to the Workers platform’s low-latency and performance characteristics, you could deploy the site to a number of different providers with no real differentiating features.

As we began building a JAMstack application on top of Built with Workers, we also wanted to explore how the Workers platform could allow developers to combine the simplicity of static site deployments with the dynamism of having a JavaScript runtime immediately available.

In particular, our recently-released streaming HTML rewriter seems like a perfect fit for “enhancing” our static sites. Our application is being served by Workers Sites, which itself is a Workers template that can be customized. By passing each HTML page through the HTML rewriter on its way to the client, we had an opportunity to customize the content without any negative performance implications.

As I mentioned previously, we landed on a first exploration of this platform advantage via the “bookmark” button. Users of Built with Workers can “bookmark” for a project – this action sends a request back up to the Workers application, storing the bookmark data as JSON in Workers KV.

// User-specific data stored in Workers KV, representing
// per-project bookmark information

{
  "bytesized_scraper_bookmarked": false,
  "web_scraper_bookmarked": true
}

When a user returns to Built with Workers, we can make a request to Workers KV, looking for corresponding data for that user and the project they’re currently viewing. If that data exists, we can embed the “edge state” directly into the HTML using the streaming HTML rewriter.

// workers-site/index.js

import { getAssetFromKV } from "@cloudflare/kv-asset-handler"

addEventListener("fetch", event => { 
  event.respondWith(handleEvent(event)) 
})

class EdgeStateEmbed {
  constructor(state) {
    this._state = state
  }
  
  element(element) {
    const edgeStateElement = `
      <script id='edge_state' type='application/json'>
        ${JSON.stringify(this._state)}
      </script>
    `
    element.prepend(edgeStateElement, { html: true })
  }
}

const hydrateEdgeState = async ({ state, response }) => {
  const rewriter = new HTMLRewriter().on(
    "body",
    new EdgeStateEmbed(await state)
  )
  return rewriter.transform(await response)
}

async function handleEvent(event) {
  return hydrateEdgeState({
    response: getAssetFromKV(event, options),
    // Get associated state for a request, based on the user and URL
    state: transformBookmark(event.request),
  })
}

When the React application is rendered on the client, it can then check for that embedded edge state, which influences how the “bookmark” icon is rendered – either as “bookmarked”, or “bookmarked”. To support this, we’ve leaned on React’s useContext, which allows any component inside of the application component tree to pull out the edge state and use it inside of the component:

// edge_state.js

import React from "react"
import { useSSR } from "../utils"

const parseDocumentState = () => {
  const edgeStateElement = document.querySelector("#edge_state")
  return edgeStateElement ? JSON.parse(edgeStateElement.innerText) : {}
}

export const EdgeStateContext = React.createContext([{}, () => {}])
export const EdgeStateProvider = ({ children }) => {
  const { isBrowser } = useSSR()
  if (!isBrowser) {
    return <>{children}</>
  }
  
  const edgeState = parseDocumentState()
  const [state, setState] = React.useState(edgeState)
  const updateState = (newState, options = { immutable: true }) => options.immutable
    ? setState(Object.assign({}, state, newState))
    : setState(newState)
  
  return (
    <EdgeStateContext.Provider value={[state, updateState]}>
      {children}
    </EdgeStateContext.Provider>
  )
}

// Inside of a React component
const Bookmark = ({ bookmarked, project, setBookmarked, setLoaded }) => {
const [state, setState] = React.useContext(EdgeStateContext)
// `bookmarked` value is a simplification of actual code
return <BookmarkButton bookmarked={state[project.id]} />
}

The combination of a straightforward JAMstack deployment platform with dynamic key-value object storage and a streaming HTML rewriter is really, really cool. This is an initial exploration into what I consider to be a platform-defining feature, and if you’re interested in this stuff and want to continue to explore how this will influence how we write web applications, get in touch with me on Twitter!

The CDN is the deployment platform

While it doesn’t appear in the acronym, an unsung hero of the JAMstack architecture is deployment. In my local terminal, when I run gatsby build inside of the Built with Workers project, the result is a folder of static HTML, CSS, and JavaScript. It should be easy to deploy!

The recent release of GitHub Actions has proven to be a great companion to building JAMstack applications with Cloudflare Workers – we’ve open-sourced our own wrangler-action, which allows developers to build their Workers applications and deploy directly from GitHub.

The standard workflows in the continuous deployment world – deploy every hour, deploy on new changes to the master branch, etc – are possible and already being used by many developers who make use of our wrangler-action workflow in their projects. Particular to JAMstack and to headless CMS tools is the idea of “build-on-change”: namely, when someone publishes a change in Sanity.io, we want to do a new deploy of the site to immediately reflect our new content in production.

The versatility of Workers as a place to deploy JavaScript code comes to the rescue, again! By telling Sanity.io to make a GET request to a deployed Workers webhook, we can trigger a repository_event on GitHub Actions for our repository, allowing new deploys to happen immediately after a change is detected in the CMS:

const headers = {
  Accept: 'application/vnd.github.everest-preview+json',
  Authorization: 'Bearer $token',
}

const body = JSON.stringify({ event_type: 'repository_dispatch' })

const url = `https://api.github.com/repos/cloudflare/built-with-workers/dispatches`

const handleRequest = async evt => {
  await fetch(url, { method: 'POST', headers, body })
  return new Response('OK')
}

addEventListener('fetch', handleRequest)

In doing this, we’ve made it possible to completely abstract away every deployment task around the Built with Workers project. Not only does the site deploy on a schedule, and on new commits to master, but it can also do additional deploys as the content changes, so that the site is always reflective of the current content in our CMS.

JAMstack at the Edge: How we built Built with Workers… on Workers
The GitHub Actions deployment workflow for Built with Workers

Conclusion

We’re super excited about Built with Workers, not only because it will serve as a great place to showcase the incredible things people are building with the Cloudflare Workers platform, but because it also has allowed us to explore what the future of web development may look like. I’ve been advocating for what I’ve seen referred to as “full-stack serverless” development throughout 2019, and I couldn’t be happier to start 2020 with launching a project like Built with Workers. The full-stack serverless stack feels like the future, and it’s actually fun to build with on a daily basis!

If you’re building something awesome with Cloudflare Workers, we’re looking for submissions to the site! Get in touch with us via this form – we’re excited to speak with you!

Finally, if topics like JAMstack on Cloudflare Workers, “edge state” and dynamic static site hydration, or continuous deployment interest you, the Built with Workers repository is open-source! Check it out, and if you’re inspired to build something cool with Workers after checking out the code, make sure to let us know!

JavaScript Libraries Are Almost Never Updated Once Installed

Post Syndicated from Zack Bloom original https://blog.cloudflare.com/javascript-libraries-are-almost-never-updated/

JavaScript Libraries Are Almost Never Updated Once Installed

Cloudflare helps run CDNJS, a very popular way of including JavaScript and other frontend resources on web pages. With the CDNJS team’s permission we collect anonymized and aggregated data from CDNJS requests which we use to understand how people build on the Internet. Our analysis today is focused on one question: once installed on a site, do JavaScript libraries ever get updated?

Let’s consider jQuery, the most popular JavaScript library on Earth. This chart shows the number of requests made for a selected list of jQuery versions over the past 12 months:

JavaScript Libraries Are Almost Never Updated Once Installed

Spikes in the CDNJS data as you see with version 3.3.1 are not uncommon as very large sites add and remove CDNJS script tags.

We see a steady rise of version 3.4.1 following its release on May 2nd, 2019. What we don’t see is a substantial decline of old versions. Version 3.2.1 shows an average popularity of 36M requests at the beginning of our sample, and 29M at the end, a decline of approximately 20%. This aligns with a corpus of research which shows the average website lasts somewhere between two and four years. What we don’t see is a decline in our old versions which come close to the volume of growth of new versions when they’re released. In fact the release of 3.4.1, as popular as it quickly becomes, doesn’t change the trend of old version deprecation at all.

If you’re curious, the oldest version of jQuery CDNJS includes is 1.10.0, released on May 25, 2013. The project still gets an average of 100k requests per day, and the sites which use it are growing in popularity:

JavaScript Libraries Are Almost Never Updated Once Installed

To confirm our theory, let’s consider another project, TweenMax:

JavaScript Libraries Are Almost Never Updated Once Installed

As this package isn’t as popular as jQuery, the data has been smoothed with a one week trailing average to make it easier to identify trends.

Version 1.20.4 begins the year with 18M requests, and ends it with 14M, a decline of about 23%, again in alignment with the loss of websites on the Internet. The growth of 2.1.3 shows clear evidence that the release of a new version has almost no bearing on the popularity of old versions, the trend line for those older versions doesn’t change even as 2.1.3 grows to 29M requests per day.

JavaScript Libraries Are Almost Never Updated Once Installed

Clearly the new version cannot be replacing many of the legacy installations.

The clear conclusion is whatever libraries you publish will exist on websites forever. The underlying web platform consequently must support aged conventions indefinitely if it is to continue supporting the full breadth of the web. Cloudflare is also, of course, very interested in how we can contribute to a web which is kept up-to-date. Please make suggestions in the comments below.

An Update on CDNJS

Post Syndicated from Zack Bloom original https://blog.cloudflare.com/an-update-on-cdnjs/

An Update on CDNJS

When you loaded this blog, a file was delivered to your browser called jquery-3.2.1.min.js. jQuery is a library which makes it easier to build websites, and was at one point included on as many as 74.1% of all websites. A full eighteen million sites include jQuery and other libraries using one of the most popular tools on Earth: CDNJS. Beginning about a month ago Cloudflare began to take a more active role in the operation of CDNJS. This post is here to tell you more about CDNJS’ history and explain why we are helping to manage CDNJS.

What CDNJS Does

Virtually every site is composed of not just the code written by its developers, but also dozens or hundreds of libraries. These libraries make it possible for websites to extend what a web browser can do on its own. For example, libraries can allow a site to include powerful data visualizations, respond to user input, or even get more performant.

These libraries created wondrous and magical new capabilities for web browsers, but they can also cause the size of a site to explode. Particularly a decade ago, connections were not always fast enough to permit the use of many libraries while maintaining performance. But if so many websites are all including the same libraries, why was it necessary for each of them to load their own copy?

If we all load jQuery from the same place the browser can do a much better job of not actually needing to download it for every site. When the user visits the first jQuery-powered site it will have to be downloaded, but it will already be cached on the user’s computer for any subsequent jQuery-powered site they might visit.

An Update on CDNJS

The first visit might take time to load:

An Update on CDNJS

But any future visit to any website pointing to this common URL would already be cached:

An Update on CDNJS

<!-- Loaded only on my site, will need to be downloaded by every user -->
<script src="./jquery.js"></script>

<!-- Loaded from a common location across many sites -->
<script src="https://cdnjs.cloudflare.com/jquery.js"></script>

Beyond the performance advantage, including files this way also made it very easy for users to experiment and create. When using a web browser as a creation tool users often didn’t have elaborate build systems (this was also before npm), so being able to include a simple script tag was a boon. It’s worth noting that it’s not clear a massive performance advantage was ever actually provided by this scheme. It is becoming even less of a performance advantage now that browser vendors are beginning to use separate cache’s for each website you visit, but with millions of sites using CDNJS there’s no doubt it is a critical part of the web.

A CDN for all of us

My first Pull Request into the CDNJS project was in 2013. Back then if you created a JavaScript project it wasn’t possible to have it included in the jQuery CDN, or the ones provided by large companies like Google and Microsoft. They were only for big, important, projects. Of course, however, even the biggest project starts small. The community needed a CDN which would agree to host nearly all JavaScript projects, even the ones which weren’t world-changing (yet). In 2011, that project was launched by Ryan Kirkman and Thomas Davis as CDNJS.

The project was quickly wildly successful, far beyond their expectations. Their CDN bill quickly began to skyrocket (it would now be over a million dollars a year on AWS). Under the threat of having to shut down the service, Cloudflare was approached by the CDNJS team to see if we could help. We agreed to support their efforts and created cdnjs.cloudflare.com which serves the CDNJS project free of charge.

CDNJS has been astonishingly successful. The project is currently installed on over eighteen million websites (10% of the Internet!), offers files totaling over 1.5 billion lines of code, and serves over 173 billion requests a month. CDNJS only gets more popular as sites get larger, with 34% of the top 10k websites using the service. Each month we serve almost three petabytes of JavaScript, CSS, and other resources which power the web via cdnjs.cloudflare.com.

An Update on CDNJS
Spikes can happen when a very large or popular site installs CDNJS, or when a misbehaving web crawler discovers a CDNJS link.

The future value of CDNJS is now in doubt, as web browsers are beginning to use a separate cache for every website you visit. It is currently used on such a wide swath of the web, however, it is unlikely it will be disappearing any time soon.

How CDNJS Works

CDNJS starts with a Github repo. That project contains every file served by CDNJS, at every version which it has ever offered. That’s 182 GB without the commit history, over five million files, and over 1.5 billion lines of code.

Given that it stores and delivers versioned code files, in many ways it was the Internet’s first JavaScript package manager. Unlike other package managers and even other CDNs everything CDNJS serves is publicly versioned. All 67,724 commits! This means you as a user can verify that you are being served files which haven’t been tampered with.

To make changes to CDNJS a commit has to be made. For new projects being added to CDNJS, or when projects change significantly, these commits are made by humans, and get reviewed by other humans. When projects just release new versions there is a bot made by Peter and maintained by Sven which sucks up changes from npm and automatically creates commits.

Within Cloudflare’s infrastructure there is a set of machines which are responsible for pulling the latest version of the repo periodically. Those machines then become the origin for cdnjs.cloudflare.com, with Cloudflare’s Global Load Balancer automatically handling failures. Cloudflare’s cache automatically stores copies of many of the projects making it possible for us to deliver them quickly from all 195 of our data centers.

An Update on CDNJS

The Internet on a Shoestring Budget

The CDNJS project has always been administered independently of Cloudflare. In addition to the founders, the project has additionally been maintained by exceptionally hard-working caretakers like Peter and Matt Cowley. Maintaining a single repo of nearly every frontend project on Earth is no small task, and it has required a substantial amount of both manual work and bot development.

Unfortunately approximately thirty days ago one of those bots stopped working, preventing updated projects from appearing in CDNJS. The bot’s open-source maintainer was not able to invest the time necessary to keep the bot running. After several weeks we were asked by the community and the CDNJS founders to take over maintenance of the CDNJS repo itself. This means the Cloudflare engineering team is taking responsibility for keeping the contents of github.com/cdnjs/cdnjs up to date, in addition to ensuring it is correctly served on cdnjs.cloudflare.com.

We agreed to do this because we were, frankly, scared. Like so many open-source projects CDNJS was a critical part of our world, but wasn’t getting the attention it needed to survive. The Internet relies on CDNJS as much as on any other single project, losing it or allowing it to be commandeered would be catastrophic to millions of websites and their visitors. If it began to fail, some sites would adapt and update, others would be broken forever.

CDNJS has always been, and remains, a project for and by the community. We are invested in making all decisions in a transparent and inclusive manner. If you are interested in contributing to CDNJS or in the topics we’re currently discussing please visit the CDNJS Github Issues page.

An Update on CDNJS

A Plan for the Future

One example of an area where we could use your help is in charting a path towards a CDNJS which requires less manual moderation. Nothing can replace the intelligence and creativity of a human (yet), but for a task like managing what resources go into a CDN, it is error prone and time consuming. At present a human has to review every new project to be included, and often has to take additional steps to include new versions of a project.

As a part of our analysis of the project we examined a snapshot of the still-open PRs made against CDNJS for several months:

An Update on CDNJS

The vast majority of these PRs were changes which ultimately passed the automated review but nevertheless couldn’t be merged without manual review.

There is consensus that we should move to a model which does not require human involvement in most cases. We would love your input and collaboration on the best way for that to be solved. If this is something you are passionate about, please contribute here.

Our plan is to support the CDNJS project in whichever ways it requires for as long as the Internet relies upon it. We invite you to use CDNJS in your next project with the full assurance that it is backed by the same network and team who protect and accelerate over twenty million of your favorite websites across the Internet. We are also planning more posts diving further into the CDNJS data, subscribe to this blog if you would like to be notified upon their release.

A History of HTML Parsing at Cloudflare: Part 2

Post Syndicated from Andrew Galloni original https://blog.cloudflare.com/html-parsing-2/

A History of HTML Parsing at Cloudflare: Part 2

A History of HTML Parsing at Cloudflare: Part 2

The second blog post in the series on HTML rewriters picks up the story in 2017 after the launch of the Cloudflare edge compute platform Cloudflare Workers. It became clear that the developers using workers wanted the same HTML rewriting capabilities that we used internally, but accessible via a JavaScript API.

This blog post describes the building of a streaming HTML rewriter/parser with a CSS-selector based API in Rust. It is used as the back-end for the Cloudflare Workers HTMLRewriter. We have open-sourced the library (LOL HTML) as it can also be used as a stand-alone HTML rewriting/parsing library.

The major change compared to LazyHTML, the previous rewriter, is the dual-parser architecture required to overcome the additional performance overhead of wrapping/unwrapping each token when propagating tokens to the workers runtime. The remainder of the post describes a CSS selector matching engine inspired by a Virtual Machine approach to regular expression matching.

v2 : Give it to everyone and make it faster

In 2017, Cloudflare introduced an edge compute platform – Cloudflare Workers. It was no surprise that customers quickly required the same HTML rewriting capabilities that we were using internally. Our team was impressed with the platform and decided to migrate some of our features to Workers. The goal was to improve our developer experience working with modern JavaScript rather than statically linked NGINX modules implemented in C with a Lua API.

It is possible to rewrite HTML in Workers, though for that you needed a third party JavaScript package (such as Cheerio). These packages are not designed for HTML rewriting on the edge due to the latency, speed and memory considerations described in the previous post.

JavaScript is really fast but it still can’t always produce performance comparable to native code for some tasks – parsing being one of those. Customers typically needed to buffer the whole content of the page to do the rewriting resulting in considerable output latency and memory consumption that often exceeded the memory limits enforced by the Workers runtime.

We started to think about how we could reuse the technology in Workers. LazyHTML was a perfect fit in terms of parsing performance, but it had two issues:

  1. API ergonomics: LazyHTML produces a stream of HTML tokens. This is sufficient for our internal needs. However, for an average user, it is not as convenient as the jQuery-like API of Cheerio.
  2. Performance: Even though LazyHTML is tremendously fast, integration with the Workers runtime adds even more limitations. LazyHTML operates as a simple parse-modify-serialize pipeline, which means that it produces tokens for the whole content of the page. All of these tokens then have to be propagated to the Workers runtime and wrapped inside a JavaScript object and then unwrapped and fed back to LazyHTML for serialization. This is an extremely expensive operation which would nullify the performance benefit of LazyHTML.

A History of HTML Parsing at Cloudflare: Part 2
LazyHTML with V8

LOL HTML

We needed something new, designed with Workers requirements in mind, using a language with the native speed and safety guarantees (it’s incredibly easy to shoot yourself in the foot doing parsing). Rust was the obvious choice as it provides the native speed and the best guarantee of memory safety which minimises the attack surface of untrusted input. Wherever possible the Low Output Latency HTML rewriter (LOL HTML) uses all the previous optimizations developed for LazyHTML such as tag name hashing.

Dual-parser architecture

Most developers are familiar and prefer to use CSS selector-based APIs (as in Cheerio, jQuery or DOM itself) for HTML mutation tasks. We decided to base our API on CSS selectors as well. Although this meant additional implementation complexity, the decision created even more opportunities for parsing optimizations.

As selectors define the scope of the content that should be rewritten, we realised we can skip the content that is not in this scope and not produce tokens for it. This not only significantly speeds up the parsing itself, but also avoids the performance burden of the back and forth interactions with the JavaScript VM. As ever the best optimization is not to do something.

A History of HTML Parsing at Cloudflare: Part 2

Considering the tasks required, LOL HTML’s parser consists of two internal parsers:

  • Lexer – a regular full parser, that produces output for all types of content that it encounters;
  • Tag scanner – looks for start and end tags and skips parsing the rest of the content. The tag scanner parses only the tag name and feeds it to the selector matcher. The matcher will switch parser to the lexer if there was a match or additional information about the tag (such as attributes) are required for matching.

The parser switches back to the tag scanner as soon as input leaves the scope of all selector matches. The tag scanner may also sometimes switch the parser to the Lexer – if it requires additional tag information for the parsing feedback simulation.

A History of HTML Parsing at Cloudflare: Part 2
LOL HTML architecture

Having two different parser implementations for the same grammar will increase development costs and is error-prone due to implementation inconsistencies. We minimize these risks by implementing a small Rust macro-based DSL which is similar in spirit to Ragel. The DSL program describes Nondeterministic finite automaton states and actions associated with each state transition and matched input byte.

An example of a DSL state definition:

tag_name_state {
   whitespace => ( finish_tag_name?; --> before_attribute_name_state )
   b'/'       => ( finish_tag_name?; --> self_closing_start_tag_state )
   b'>'       => ( finish_tag_name?; emit_tag?; --> data_state )
   eof        => ( emit_raw_without_token_and_eof?; )
   _          => ( update_tag_name_hash; )
}

The DSL program gets expanded by the Rust compiler into not quite as beautiful, but extremely efficient Rust code.

We no longer need to reimplement the code that drives the parsing process for each of our parsers. All we need to do is to define different action implementations for each. In the case of the tag scanner, the majority of these actions are a no-op, so the Rust compiler does the NFA optimization job for us: it optimizes away state branches with no-op actions and even whole states if all of the branches have no-op actions. Now that’s cool.

Byte slice processing optimisations

Moving to a memory-safe language provided new challenges. Rust has great memory safety mechanisms, however sometimes they have a runtime performance cost.

The task of the parser is to scan through the input and find the boundaries of lexical units of the language – tokens and their internal parts. For example, an HTML start tag token consists of multiple parts: a byte slice of input that represents the tag name and multiple pairs of input slices that represent attributes and values:

struct StartTagToken<'i> {
   name: &'i [u8],
   attributes: Vec<(&'i [u8], &'i [u8])>,
   self_closing: bool
}

As Rust uses bound checks on memory access, construction of a token might be a relatively expensive operation. We need to be capable of constructing thousands of them in a fraction of second, so every CPU instruction counts.

Following the principle of doing as little as possible to improve performance we use a “token outline” representation of tokens: instead of having memory slices for token parts we use numeric ranges which are lazily transformed into a byte slice when required.

struct StartTagTokenOutline {
   name: Range<usize>,
   attributes: Vec<(Range<usize>, Range<usize>)>,
   self_closing: bool
}

As you might have noticed, with this approach we are no longer bound to the lifetime of the input chunk which turns out to be very useful. If a start tag is spread across multiple input chunks we can easily update the token that is currently in construction, as new chunks of input arrive by just adjusting integer indices. This allows us to avoid constructing a new token with slices from the new input memory region (it could be the input chunk itself or the internal parser’s buffer).

This time we can’t get away with avoiding the conversion of input character encoding; we expose a user-facing API that operates on JavaScript strings and input HTML can be of any encoding. Luckily, as we can still parse without decoding and only encode and decode within token bounds by a request (though we still can’t do that for UTF-16 encoding).

So, when a user requests an element’s tag name in the API, internally it is still represented as a byte slice in the character encoding of the input, but when provided to the user it gets dynamically decoded. The opposite process happens when a user sets a new tag name.

For selector matching we can still operate on the original encoding representation – because we know the input encoding ahead of time we preemptively convert values in a selector to the page’s character encoding, so comparisons can be done without decoding fields of each token.

As you can see, the new parser architecture along with all these optimizations produced great performance results:

A History of HTML Parsing at Cloudflare: Part 2
Average parsing time depending on the input size – lower is better

LOL HTML’s tag scanner is typically twice as fast as LazyHTML and the lexer has comparable performance, outperforming LazyHTML on bigger inputs. Both are a few times faster than the tokenizer from html5ever – another parser implemented in Rust used in the Mozilla’s Servo browser engine.

CSS selector matching VM

With an impressively fast parser on our hands we had only one thing missing – the CSS selector matcher. Initially we thought we could just use Servo’s CSS selector matching engine for this purpose. After a couple of days of experimentation it turned out that it is not quite suitable for our task.

It did not work well with our dual parser architecture. We first need to to match just a tag name from the tag scanner, and then, if we fail, query the lexer for the attributes. The selectors library wasn’t designed with this architecture in mind so we needed ugly hacks to bail out from matching in case of insufficient information. It was inefficient as we needed to start matching again after the bailout doing twice the work. There were other problems, such as the integration of lazy character decoding and integration of tag name comparison using tag name hashes.

Matching direction

The main problem encountered was the need to backtrack all the open elements for matching. Browsers match selectors from right to left and traverse all ancestors of an element. This StackOverflow has a good explanation of why they do it this way. We would need to store information about all open elements and their attributes – something that we can’t do while operating with tight memory constraints. This matching approach would be inefficient for our case – unlike browsers, we expect to have just a few selectors and a lot of elements. In this case it is much more efficient to match selectors from left to right.

And this is when we had a revelation. Consider the following CSS selector:

body > div.foo  img[alt] > div.foo ul

It can be split into individual components attributed to a particular element with hierarchical combinators in between:

body > div.foo img[alt] > div.foo  ul
---    ------- --------   -------  --

Each component is easy to match having a start tag token – it’s just a matter of comparison of token fields with values in the component. Let’s dive into abstract thinking and imagine that each such component is a character in the infinite alphabet of all possible components:

Selector componentCharacter
bodya
div.foob
img[alt]c
uld

Let’s rewrite our selector with selector components replaced by our imaginary characters:

a > b c > b d

Does this remind you of something?

A   `>` combinator can be considered a child element, or “immediately followed by”.

The ` ` (space) is a descendant element can be thought of as there might be zero or more elements in between.

There is a very well known abstraction to express these relations – regular expressions. The selector replacing combinators can be replaced with a regular expression syntax:

ab.*cb.*d

We transformed our CSS selector into a regular expression that can be executed on the sequence of start tag tokens. Note that not all CSS selectors can be converted to such a regular grammar and the input on which we match has some specifics, which we’ll discuss later. However, it was a good starting point: it allowed us to express a significant subset of selectors.

Implementing a Virtual Machine

Next, we started looking at non-backtracking algorithms for regular expressions. The virtual machine approach seemed suitable for our task as it was possible to have a non-backtracking implementation that was flexible enough to work around differences between real regular expression matching on strings and our abstraction.

VM-based regular expression matching is implemented as one of the engines in many regular expression libraries such as regexp2 and Rust’s regex. The basic idea is that instead of building an NFA or DFA for a regular expression it is instead converted into DSL assembly language with instructions later executed by the virtual machine – regular expressions are treated as programs that accept strings for matching.

Since the VM program is just a representation of NFA with ε-transitions it can exist in multiple states simultaneously during the execution, or, in other words, spawns multiple threads. The regular expression matches if one or more states succeed.

For example, consider the following VM instructions:

  • expect c – waits for next input character, aborts the thread if doesn’t equal to the instruction’s operand;
  • jmp L – jump to label ‘L’;
  • thread L1, L2 – spawns threads for labels L1 and L2, effectively splitting the execution;
  • match – succeed the thread with a match;

For example, using this instructions set regular expression “ab*c” can be translated into:

    expect a
L1: thread L2, L3
L2: expect b
    jmp L1
L3: expect c
    match

Let’s try to translate the regular expression ab.*cb.*d from the selector we saw earlier:

    expect a
    expect b
L1: thread L2, L3
L2: expect [any]
    jmp L1
L3: expect c
    expect b
L4: thread L5, L6
L5: expect [any]
    jmp L4
L6: expect d
    match

That looks complex! Though this assembly language is designed for regular expressions in general, and regular expressions can be much more complex than our case. For us the only kind of repetition that matters is “.*”. So, instead of expressing it with multiple instructions we can use just one called hereditary_jmp:

    expect a
    expect b
    hereditary_jmp L1
L1: expect c
    expect b
    hereditary_jmp L2
L2: expect d
    match

The instruction tells VM to memoize instruction’s label operand and unconditionally spawn a thread with a jump to this label on each input character.

There is one significant distinction between the string input of regular expressions and the input provided to our VM. The input can shrink!

A regular string is just a contiguous sequence of characters, whereas we operate on a sequence of open elements. As new tokens arrive this sequence can grow as well as shrink. Assume we represent <div> as ‘a’ character in our imaginary language, so having <div><div><div> input we can represent it as aaa, if the next token in the input is </div> then our “string” shrinks to aa.

You might think at this point that our abstraction doesn’t work and we should try something else. What we have as an input for our machine is a stack of open elements and we needed a stack-like structure to store our hereditrary_jmp instruction labels that VM had seen so far. So, why not store it on the open element stack? If we store the next instruction pointer on each of stack items on which the expect instruction was successfully executed, we’ll have a full snapshot of the VM state, so we can easily roll back to it if our stack shrinks.

With this implementation we don’t need to store anything except a tag name on the stack, and, considering that we can use the tag name hashing algorithm, it is just a 64-bit integer per open element. As an additional small optimization, to avoid traversing of the whole stack in search of active hereditary jumps on each new input we store an index of the first ancestor with a hereditary jump on each stack item.

For example, having the following selector “body > div span” we’ll have the following VM program (let’s get rid of labels and just use instruction indices instead):

0| expect <body>
1| expect <div>
2| hereditary_jmp 3
3| expect <span>
4| match

Having an input “<body><div><div><a>” we’ll have the following stack:

A History of HTML Parsing at Cloudflare: Part 2

Now, if the next token is a start tag <span> the VM will first try to execute the selectors program from the beginning and will fail on the first instruction. However, it will also look for any active hereditary jumps on the stack. We have one which jumps to the instructions at index 3. After jumping to this instruction the VM successfully produces a match. If we get yet another <span> start tag later it will much as well following the same steps which is exactly what we expect for the descendant selector.

If we then receive a sequence of “</span></span></div></a></div>” end tags our stack will contain only one item:

A History of HTML Parsing at Cloudflare: Part 2

which instructs VM to jump to instruction at index 1, effectively rolling back to matching the div component of the selector.

We mentioned earlier that we can bail out from the matching process if we only have a tag name from the tag scanner and we need to obtain more information by running the lexer? With a VM approach it is as easy as stopping the execution of the current instruction and resuming it later when we get the required information.

Duplicate selectors

As we need a separate program for each selector we need to match, how can we stop the same simple components doing the same job? The AST for our selector matching program is a radix tree-like structure whose edge labels are simple selector components and nodes are hierarchical combinators.
For example for the following selectors:

body > div > link[rel]
body > span
body > span a

we’ll get the following AST:

A History of HTML Parsing at Cloudflare: Part 2

If selectors have common prefixes we can match them just once for all these selectors. In the compilation process, we flatten this structure into a vector of instructions.

[not] JIT-compilation

For performance reasons compiled instructions are macro-instructions – they incorporate multiple basic VM instruction calls. This way the VM can execute only one macro instruction per input token. Each of the macro instructions compiled using the so-called “[not] JIT-compilation” (the same approach to the compilation is used in our other Rust project – wirefilter).

Internally the macro instruction contains expect and following jmp, hereditary_jmp and match basic instructions. In that sense macro-instructions resemble microcode making it easy to suspend execution of a macro instruction if we need to request attributes information from the lexer.

What’s next

It is obviously not the end of the road, but hopefully, we’ve got a bit closer to it. There are still multiple bits of functionality that need to be implemented and certainly, there is a space for more optimizations.

If you are interested in the topic don’t hesitate to join us in development of LazyHTML and LOL HTML at GitHub and, of course, we are always happy to see people passionate about technology here at Cloudflare, so don’t hesitate to contact us if you are too :).

A History of HTML Parsing at Cloudflare: Part 1

Post Syndicated from Andrew Galloni original https://blog.cloudflare.com/html-parsing-1/

A History of HTML Parsing at Cloudflare: Part 1

A History of HTML Parsing at Cloudflare: Part 1

To coincide with the launch of streaming HTML rewriting functionality for Cloudflare Workers we are open sourcing the Rust HTML rewriter (LOL  HTML) used to back the Workers HTMLRewriter API. We also thought it was about time to review the history of HTML rewriting at Cloudflare.

The first blog post will explain the basics of a streaming HTML rewriter and our particular requirements. We start around 8 years ago by describing the group of ‘ad-hoc’ parsers that were created with specific functionality such as to rewrite e-mail addresses or minify HTML. By 2016 the state machine defined in the HTML5 specification could be used to build a single spec-compliant HTML pluggable rewriter, to replace the existing collection of parsers. The source code for this rewriter is now public and available here: https://github.com/cloudflare/lazyhtml.

The second blog post will describe the next iteration of rewriter. With the launch of the edge compute platform Cloudflare Workers we came to realise that developers wanted the same HTML rewriting capabilities with a JavaScript API. The post describes the thoughts behind a low latency streaming HTML rewriter with a CSS-selector based API. We open-sourced the Rust library as it can also be used as a stand-alone HTML rewriting/parsing library.

What is a streaming HTML rewriter ?

A streaming HTML rewriter takes either a HTML string or byte stream input, parses it into tokens or any other structured intermediate representation (IR) – such as an Abstract Syntax Tree (AST). It then performs transformations on the tokens before converting back to HTML. This provides the ability to modify, extract or add to an existing HTML document as the bytes are being  processed. Compare this with a standard HTML tree parser which needs to retrieve the entire file to generate a full DOM tree. The tree-based rewriter will both take longer to deliver the first processed bytes and require significantly more memory.

A History of HTML Parsing at Cloudflare: Part 1
HTML rewriter

For example; consider you own a large site with a lot of historical content that you want to now serve over HTTPS. You will quickly run into the problem of resources (images, scripts, videos) being served over HTTP. This ‘mixed content’ opens a security hole and browsers will warn or block these resources. It can be difficult or even impossible to update every link on every page of a website. With a streaming HTML rewriter you can select the URI attribute of any HTML tag and change any HTTP links to HTTPS. We built this very feature Automatic HTTPS rewrites back in 2016 to solve mixed content issues for our customers.

The reader may already be wondering: “Isn’t this a solved problem, aren’t there many widely used open-source browsers out there with HTML parsers that can be used for this purpose?”. The reality is that writing code to run in 190+ PoPs around the world with a strict low latency requirement turns even seemingly trivial problems into complex engineering challenges.

The following blog posts will detail the journey of how starting with a simple idea of finding email addresses within an HTML page led to building an almost spec compliant HTML parser and then on to a CSS selector matching Virtual Machine. We learned a lot on this journey. I hope you find some of this as interesting as we did.

Rewriting at the edge

When rewriting content through Cloudflare we do not want to impact site performance. The balance in designing a streaming HTML rewriter is to minimise the pause in response byte flow by holding onto as little information as possible whilst retaining the ability to rewrite matching tokens.

The difference in requirements compared to an HTML parser used in a browser include:

Output latency

For browsers, the Document Object Model (DOM) is the end product of the parsing process but in our case we have to parse, rewrite and serialize back to HTML. In the case of Cloudflare’s reverse proxy any content processing on the edge server results in latency between the server and an eyeball. It is desirable to minimize the latency impact of HTML handling, which involves parsing, rewriting and serializing back to HTML. In all of these stages we want to be as fast as possible to minimize latency.

Parser throughput

Let’s assume that usually browsers rarely need to deal with HTML pages bigger than 1Mb in size and an average page load time is somewhere around 3s at best. HTML parsing is not the main bottleneck of the page loading process as the browser will be blocked on running scripts and loading other render-critical resources. We can roughly estimate that ~3Mbps is an acceptable throughput for browser’s HTML parser. At Cloudflare we have hundreds of megabytes of traffic per CPU, so we need a parser that is faster by an order of magnitude.

Memory limitations

As most users must realise, browsers have the luxury of being able to consume memory. For example, this simple HTML markup when opened in a browser will consume a significant chunk of your system memory before eventually killing a browser tab (and all this memory will be consumed by the parser) :

<script>
   document.write('<');
   while(true) {
      document.write('aaaaaaaaaaaaaaaaaaaaaaaa');
   }
</script>

Unfortunately, buffering of some fraction of the input is inevitable even for streaming HTML rewriting. Consider these 2 HTML snippets:

<div foo="bar" qux="qux">

<div foo="bar" qux="qux"

These seemingly similar fragments of HTML will be treated completely differently when encountered at the end of an HTML page. The first fragment will be parsed as a start tag and the second one will be ignored. By just seeing a `<` character followed by a tag name, the parser can’t determine if it has found a start tag or not. It needs to traverse the input in the search of the closing `>` to make a decision, buffering all content in between, so it can later be emitted to the consumer as a start tag token.

This requirement forces browsers to indefinitely buffer content before eventually giving up with the out-of-memory error.

In our case, we can’t afford to spend hundreds of megabytes of memory parsing a single HTML file (actual constraints are even tighter – even using a dozen kilobytes for each request would be unacceptable). We need to be much more sophisticated than other implementations in terms of memory usage and gracefully handle all the situations where provided memory capacity is insufficient to accomplish parsing.

v0 : “Ad-hoc parsers”

As usual with big projects, it all started pretty innocently.

Find and obfuscate an email

In 2010, Cloudflare decided to provide a feature that would stop popular email scrapers. The basic idea of this protection was to find and obfuscate emails on pages and later decode them back in the browser with injected JavaScript code. Sounds easy, right? You search for anything that looks like an email, encode it and then decode it with some JavaScript magic and present the result to the end-user.

However, even such a seemingly simple task already requires solving several issues. First of all, we need to define what an email is, and there is no simple answer. Even the infamous regex supposedly covering the entire RFC is, in fact, outdated and incomplete as the new RFC added lots of valid email constructions, including Unicode support. Let’s not go down that rabbit hole for now and instead focus on a higher-level issue: transforming streaming content.

Content from the network comes in packets, which have to be buffered and parsed as HTTP by our servers. You can’t predict how the content will be split, which means you always need to buffer some of it because content that is going to be replaced can be present in multiple input chunks.

Let’s say we decided to go with a simple regex like `[\w.][email protected][\w.]+`. If the content that comes through contains the email “[email protected]”, it might be split in the following chunks:

A History of HTML Parsing at Cloudflare: Part 1

In order to keep good Time To First Byte (TTFB) and consistent speed, we want to ensure that the preceding chunk is emitted as soon as we determine that it’s not interesting for replacement purposes.

The easiest way to do that is to transform our regex into a state machine, or a finite automata. While you could do that by hand, you will end up with hard-to-maintain and error-prone code. Instead, Ragel was chosen to transform regular expressions into efficient native state machine code. Ragel doesn’t try to take care of buffering or anything other than traversing the state machine. It provides a syntax that not only describes patterns, but can also associate custom actions (code in a host language) with any given state.

In our case we can pass through buffers until we match the beginning of an email. If we subsequently find out the pattern is not an email we can bail out from buffering as soon as the pattern stops matching. Otherwise, we can retrieve the matched email and replace it with new content.

To turn our pattern into a streaming parser we can remember the position of the potential start of an email and, unless it was already discarded or replaced by the end of the current input, store the unhandled part in a permanent buffer. Then, when a new chunk comes, we can process it separately, resuming from a state Ragel remembers itself, but then use both the buffered chunk and a new one to either emit or obfuscate.

Now that we have solved the problem of matching email patterns in text, we need to deal with the fact that they need to be obfuscated on pages. This is when the first hints of HTML “parsing” were introduced.

I’ve put “parsing” in quotes because, rather than implementing the whole parser, the email filter (as the module was called) didn’t attempt to replicate the whole HTML grammar, but rather added custom Ragel patterns just for skipping over comments and tags where emails should not be obfuscated.

This was a reasonable approach, especially back in 2010 – four years before the HTML5 specification, when all browsers had their own quirks handling of HTML. However, as you can imagine, this approach did not scale well. If you’re trying to work around quirks in other parsers, you start gaining more and more quirks in your own, and then work around these too. Simultaneously, new features started to be added, which also required modifying HTML on the fly (like automatic insertion of Google Analytics script), and an existing module seemed to be the best place for that. It grew to handle more and more tags, operations and syntactic edge cases.

Now let’s minify..

In 2011, Cloudflare decided to also add minification to allow customers to speed up their websites even if they had not employed minification themselves. For that, we decided to use an existing streaming minifier – jitify. It already had NGINX bindings, which made it a great candidate for integration into the existing pipeline.

Unfortunately, just like most other parsers from that time as well as ours described above, it had its own processing rules for HTML, JavaScript and CSS, which weren’t precise but rather tried to parse content on a best-effort basis. This led to us having two independent streaming parsers that were incompatible and could produce bugs either individually or only in combination.

v1 : “(Almost) HTML5 Spec compliant parser”

Over the years engineers kept adding new features to the ever-growing state machines, while fixing new bugs arising from imprecise syntax implementations, conflicts between various parsers, and problems in features themselves.

By 2016, it was time to get out of the multiple ad hoc parsers business and do things ‘the right way’.

The next section(s) will describe how we built our HTML5 compliant parser starting from the specification state machine. Using only this state machine it should have been straight-forward to build a parser. You may be aware that historically the parsing of HTML had not been entirely strict which meant to not break existing implementations the building of an actual DOM was required for parsing. This is not possible for a streaming rewriter so a simulator of the parser feedback was developed. In terms of performance, it is always better not to do something. We  then describe why the rewriter can be ‘lazy’ and not perform the expensive encoding and decoding of text when rewriting HTML. The surprisingly difficult problem of deciding if a response is HTML is then detailed.

HTML5

By 2016, HTML5 had defined precise syntax rules for parsing and compatibility with legacy content and custom browser implementations. It was already implemented by all browsers and many 3rd-party implementations.

The HTML5 parsing specification defines basic HTML syntax in the form of a state machine. We already had experience with Ragel for similar use cases, so there was no question about what to use for the new streaming parser. Despite the complexity of the grammar, the translation of the specification to Ragel syntax was straightforward. The code looks simpler than the formal description of the state machine, thanks to the ability to mix regex syntax with explicit transitions.

A History of HTML Parsing at Cloudflare: Part 1
A visualisation of a small fraction of the HTML state machine. Source: https://twitter.com/RReverser/status/715937136520916992

HTML5 parsing requires a ‘DOM’

However, HTML has a history. To not break existing implementations HTML5 is specified with  recovery procedures for incorrect tag nesting, ordering, unclosed tags, missing attributes and all the other possible quirks that used to work in older browsers.  In order to resolve these issues, the specification expects a tree builder to drive the lexer, essentially meaning you can’t correctly tokenize HTML (split into separate tags) without a DOM.

A History of HTML Parsing at Cloudflare: Part 1
HTML parsing flow as defined by the specification

For this reason, most parsers don’t even try to perform streaming parsing and instead take the input as a whole and produce a document tree as an output. This is not something we could do for streaming transformation without adding significant delays to page loading.

An existing HTML5 JavaScript parser – parse5 – had already implemented spec-compliant tree parsing using a streaming tokenizer and rewriter. To avoid having to create a full DOM the concept of a “parser feedback simulator” was introduced.

Tree builder feedback

As you can guess from the name, this is a module that aims to simulate a full parser’s feedback to the tokenizer, without actually building the whole DOM, but instead preserving only the required information and context necessary for correctly driving the state machine.

After rigorous testing and upstreaming a test runner to parse5, we found this technique to be suitable for the majority of even poorly written pages on the Internet, and employed it in LazyHTML.

A History of HTML Parsing at Cloudflare: Part 1
LazyHTML architecture

Avoiding decoding – everything is ASCII

Now that we had a streaming tokenizer working, we wanted to make sure that it was fast enough so that users didn’t notice any slowdowns to their pages as they go through the parser and transformations. Otherwise it would completely circumvent any optimisations we’d want to attempt on the fly.

It would not only cause a performance hit due to decoding and re-encoding any modified HTML content, but also significantly complicates our implementation due to multiple sources of potential encoding information  required to determine the character encoding, including sniffing of the first 1 KB of the content.

The “living” HTML Standard specification permits only encodings defined in the Encoding Standard. If we look carefully through those encodings, as well as a remark on Character encodings section of the HTML spec, we find that all of them are ASCII-compatible with the exception of UTF-16 and ISO-2022-JP.

This means that any ASCII text will be represented in such encodings exactly as it would be in ASCII, and any non-ASCII text will be represented by bytes outside of the ASCII range. This property allows us to safely tokenize, compare and even modify original HTML without decoding or even knowing which particular encoding it contains. It is possible as all the token boundaries in HTML grammar are represented by an ASCII character.

We need to detect UTF-16 by sniffing and either decode or skip such documents without modification. We chose the latter to avoid potential security-sensitive bugs which are common with UTF-16, and because the character encoding is seen in less than 0.1% of known character encodings luckily.

The only issue left with this approach is that in most places the HTML tokenization specification  requires you to replace U+0000 (NUL) characters with U+FFFD (replacement character) during parsing. Presumably, this was added as a security precaution against bugs in C implementations of old engines which could treat NUL character, encoded in ASCII / UTF-8 / … as a 0x00 byte, as the end of the string (yay, null-terminated strings…). It’s problematic for us because U+FFFD is outside of the ASCII range, and will be represented by different sequences of bytes in different encodings. We don’t know the encoding of the document, so this will lead to corruption of the output.

Luckily, we’re not in the same business as browser vendors, and don’t worry about NUL characters in strings as much – we use “fat pointer” string representation, in which the length of the string is determined not by the position of the NUL character, but stored along with the data pointer as an integer field:

typedef struct {
   const char *data;
   size_t length;
} lhtml_string_t;

Instead, we can quietly ignore these parts of the spec (sorry!), and keep U+0000 characters as-is and add them as such to tag, attribute names, and other strings, and later re-emit to the document. This is safe to do, because it doesn’t affect any state machine transitions, but merely preserves original 0x00 bytes and delegates their replacement to the parser in the end user’s browser.

Content type madness

We want to be lazy and minimise false positives. We only want to spend time parsing, decoding and rewriting actual HTML rather than breaking images or JSON. So the question is how do you decide if something is a HTML document. Can you just use the Content-Type for example ? A comment left in the source code best describes the reality.

/*
Dear future generations. I didn't like this hack either and hoped
we could do the right thing instead. Unfortunately, the Internet
was a bad and scary place at the moment of writing. If this
ever changes and websites become more standards compliant,
please do remove it just like I tried.
Many websites use PHP which sets Content-Type: text/html by
default. There is no error or warning if you don't provide own
one, so most websites don't bother to change it and serve
JSON API responses, private keys and binary data like images
with this default Content-Type, which we would happily try to
parse and transforms. This not only hurts performance, but also
easily breaks response data itself whenever some sequence inside
it happens to look like a valid HTML tag that we are interested
in. It gets even worse when JSON contains valid HTML inside of it
and we treat it as such, and append random scripts to the end
breaking APIs critical for popular web apps.
This hack attempts to mitigate the risk by ensuring that the
first significant character (ignoring whitespaces and BOM)
is actually `<` - which increases the chances that it's indeed HTML.
That way we can potentially skip some responses that otherwise
could be rendered by a browser as part of AJAX response, but this
is still better than the opposite situation.
*/

The reader might think that it’s a rare edge case, however, our observations show that almost 25% of the traffic served through Cloudflare with the “text/html” content type is unlikely to be HTML.

A History of HTML Parsing at Cloudflare: Part 1

The trouble doesn’t end there: it turns out that there is a considerable amount of XML content served with the “text/html” content type which can’t be always processed correctly when treated as HTML.

Over time bailouts for binary data, JSON, AMP and correctly identifying HTML fragments leads to the content sniffing logic which can be described by the following diagram:

A History of HTML Parsing at Cloudflare: Part 1

This is a good example of divergence between formal specifications and reality.

Tag name comparison optimisation

But just having fast parsing is not enough – we have functionality that consumes the output of the parser, rewrites it and feeds it back for the serialization. And all the memory and time constraints that we have for the parser are applicable for this code as well, as it is a part of the same content processing pipeline.

It’s a common requirement to compare parsed HTML tag names, e.g. to determine if the current tag should be rewritten or not. A naive implementation will use regular per-byte comparison which can require traversing the whole tag name. We were able to narrow this operation to a single integer comparison instruction in the majority of cases by using specially designed hashing algorithm.

The tag names of all standard HTML elements contain only alphabetical ASCII characters and digits from 1 to 6 (in numbered header tags, i.e. <h1> – <h6>). Comparison of tag names is case-insensitive, so we only need 26 characters to represent alphabetical characters. Using the same basic idea as arithmetic coding, we can represent each of the possible 32 characters of a  tag name using just 5 bits and, thus, fit up to floor(64 / 5) = 12 characters in a 64-bit integer which is enough for all the standard tag names and any other tag names that satisfy the same requirements! The great part is that we don’t even need to additionally traverse a tag name to hash it – we can do that as we parse the tag name consuming the input byte by byte.

However, there is one problem with this hashing algorithm and the culprit is not so obvious: to fit all 32 characters in 5 bits we need to use all possible bit combinations including 00000. This means that if the leading character of the tag name is represented with 00000 then we will not be able to differentiate between a varying number of consequent repetitions of this character.

For example, considering that ‘a’ is encoded as 00000 and ‘b’ as 00001 :

Tag nameBit representationEncoded value
ab00000 000011
aab00000 00000 000011

Luckily, we know that HTML grammar doesn’t allow the first character of a tag name to be anything except an ASCII alphabetical character, so reserving numbers from 0 to 5 (00000b-00101b) for digits and numbers from 6 to 31 (00110b – 11111b) for ASCII alphabetical characters solves the problem.

LazyHTML

After taking everything  mentioned  above into  consideration the LazyHTML (https://github.com/cloudflare/lazyhtml) library was created. It is a fast streaming HTML parser and serializer with a token based C-API derived from the HTML5 lexer written in Ragel. It provides a pluggable transformation pipeline to allow multiple transformation handlers to be chained together.

An example of a function that transforms `href` property of links:

// define static string to be used for replacements
static const lhtml_string_t REPLACEMENT = {
   .data = "[REPLACED]",
   .length = sizeof("[REPLACED]") - 1
};

static void token_handler(lhtml_token_t *token, void *extra /* this can be your state */) {
  if (token->type == LHTML_TOKEN_START_TAG) { // we're interested only in start tags
    const lhtml_token_starttag_t *tag = &token->start_tag;
    if (tag->type == LHTML_TAG_A) { // check whether tag is of type <a>
      const size_t n_attrs = tag->attributes.count;
      const lhtml_attribute_t *attrs = tag->attributes.items;
      for (size_t i = 0; i < n_attrs; i++) { // iterate over attributes
        const lhtml_attribute_t *attr = &attrs[i];
        if (lhtml_name_equals(attr->name, "href")) { // match the attribute name
          attr->value = REPLACEMENT; // set the attribute value
        }
      }
    }
  }
  lhtml_emit(token, extra); // pass transformed token(s) to next handler(s)
}

So, is it correct and how fast is  it?

It is HTML5 compliant as tested against the official test suites. As part of the work several contributions were sent to the specification itself for clarification / simplification of the spec language.

Unlike the previous parser(s), it didn’t bail out on any of the 2,382,625 documents from HTTP Archive, although 0.2% of documents exceeded expected bufferization limits as they were in fact JavaScript or RSS or other types of content incorrectly served with Content-Type: text/html, and since anything is valid HTML5, the parser tried to parse e.g. a<b; x=3; y=4 as incomplete tag with attributes. This is very rare (and goes to even lower amount of 0.03% when two error-prone advertisement networks are excluded from those results), but still needs to be accounted for and is a valid case for bailing out.

As for the benchmarks, In September 2016 using an example which transforms the HTML spec itself (7.9 MB HTML file) by replacing every <a href> (only that property only in those tags) to a static value. It was compared against the few existing and popular HTML parsers (only tokenization mode was used for the fair comparison, so that they don’t need to build AST and so on), and timings in milliseconds for 100 iterations are the following (lazy mode means that we’re using raw strings whenever possible, the other one serializes each token just for comparison):

A History of HTML Parsing at Cloudflare: Part 1

The results show that LazyHTML parser speeds are around an order of magnitude faster.

That concludes the first post in our series on HTML rewriters at Cloudflare. The next post describes how we built  a new streaming rewriter on top of the ideas of LazyHTML. The major update was to provide an easier to use CSS selector API.  It provides the back-end for the Cloudflare workers HTMLRewriter JavaScript API.

Introducing the HTMLRewriter API to Cloudflare Workers

Post Syndicated from Rita Kozlov original https://blog.cloudflare.com/introducing-htmlrewriter/

Introducing the HTMLRewriter API to Cloudflare Workers

Introducing the HTMLRewriter API to Cloudflare Workers

We are excited to announce that the HTMLRewriter API for Cloudflare Workers is now GA! You can get started today by checking out our documentation, or trying out our tutorial for localizing your site with the HTMLRewriter.

Want to know how it works under the hood? We are excited to tell you everything you wanted to know but were afraid to ask, about building a streaming HTML parser on the edge; read about it in part 1 (and stay tuned for part two coming tomorrow!).

Faster, more scalable applications at the edge

The HTMLRewriter can help solve two big problems web developers face today: making changes to the HTML, when they are hard to make at the server level, and making it possible for HTML to live on the edge, closer to the user — without sacrificing dynamic functionality.

Since the introduction of Workers, Workers have helped customers regain control where control either wasn’t provided, or very hard to obtain at the origin level. Just like Workers can help you set CORS headers at the middleware layer, between your users and the origin, the HTMLRewriter can assist with things like URL rewrites (see the example below!).

Back in January, we introduced the Cache API, giving you more control than ever over what you could store in our edge caches, and how. By helping users have closer control of the cache, users could push experiences that were previously only able to live at the origin server to the network edge. A great example of this is the ability to cache POST requests, where previously only GET requests were able to benefit from living on the edge.

Later this year, during Birthday Week, we announced Workers Sites, taking the idea of content on the edge one step further, and allowing you to fully deploy your static sites to the edge.

We were not the first to realize that the web could benefit from more content being accessible from the CDN level. The motivation behind the resurgence of static sites is to have a canonical version of HTML that can easily be cached by a CDN.

This has been great progress for static content on the web, but what about so much of the content that’s almost static, but not quite? Imagine an e-commerce website: the items being offered, and the promotions are all the same — except that pesky shopping cart on the top right. That small difference means the HTML can no longer be cached, and sent to the edge. That tiny little number of items in the cart requires you to travel all the way to your origin, and make a call to the database, for every user that visits your site. This time of year, around Black Friday, and Cyber Monday, this means you have to make sure that your origin, and database are able to scale to the maximum load of all the excited shoppers eager to buy presents for their loved ones.

Edge-side JAMstack with HTMLRewriter

One way to solve this problem of reducing origin load while retaining dynamic functionality is to move more of the logic to the client — this approach of making the HTML content static, leveraging CDN for caching, and relying on client-side API calls for dynamic content is known as the JAMstack (JavaScript, APIs, Markdown). The JAMstack is certainly a move in the right direction, trying to maximize content on the edge. We’re excited to embrace that approach even further, by allowing dynamic logic of the site to live on the edge as well.

In September, we introduced the HTMLRewriter API in beta to the Cloudflare Workers runtime — a streaming parser API to allow developers to develop fully dynamic applications on the edge. HTMLRewriter is a jQuery-like experience inside your Workers application, allowing you to manipulate the DOM using selectors and handlers. The same way jQuery revolutionized front-end development, we’re excited for the HTMLRewriter to change how we think about architecting dynamic applications.

There are a few advantages to rewriting HTML on the edge, rather than on the client. First, updating content client-side introduces a tradeoff between the performance and consistency of the application — that is, if you want to, update a page to a logged in version client side, the user will either be presented with the static version first, and witness the page update (resulting in an inconsistency known as a flicker), or the rendering will be blocked until the customization can be fully rendered. Having the DOM modifications made edge-side provides the benefits of a scalable application, without the degradation of user experience. The second problem is the inevitable client-side bloat. Most of the world today is connected to the internet on a mobile phone with significantly less powerful CPU, and on shaky last-mile connections where each additional API call risks never making it back to the client. With the HTMLRewriter within Cloudflare Workers, the API calls for dynamic content, (or even calls directly to Workers KV for user-specific data) can be made from the Worker instead, on a much more powerful machine, over much more resilient backbone connections.

Here’s what the developers at Happy Cog have to say about HTMLRewriter:

"At Happy Cog, we’re excited about the JAMstack and static site generators for our clients because of their speed and security, but in the past have often relied on additional browser-based JavaScript and internet-exposed backend services to customize user experiences based on a cookied logged-in state. With Cloudflare’s HTMLRewriter, that’s now changed. HTMLRewriter is a fundamentally new powerful capability for static sites. It lets us parse the request and customize each user’s experience without exposing any backend services, while continuing to use a JAMstack architecture. And it’s fast. Because HTMLRewriter streams responses directly to users, our code can make changes on-the-fly without incurring a giant TTFB speed penalty. HTMLRewriter is going to allow us to make architectural decisions that benefit our clients and their users — while avoiding a lot of the tradeoffs we’d usually have to consider."

Matt Weinberg, – President of Development and Technology, Happy Cog

HTMLRewriter in Action

Since most web developers are familiar with jQuery, we chose to keep the HTMLRewriter very similar, using selectors and handlers to modify content.

Below, we have a simple example of how to use the HTMLRewriter to rewrite the URLs in your application from myolddomain.com to mynewdomain.com:

async function handleRequest(req) {
  const res = await fetch(req)
  return rewriter.transform(res)
}
 
const rewriter = new HTMLRewriter()
  .on('a', new AttributeRewriter('href'))
  .on('img', new AttributeRewriter('src'))
 
class AttributeRewriter {
  constructor(attributeName) {
    this.attributeName = attributeName
  }
 
  element(element) {
    const attribute = element.getAttribute(this.attributeName)
    if (attribute) {
      element.setAttribute(
        this.attributeName,
        attribute.replace('myolddomain.com', 'mynewdomain.com')
      )
    }
  }
}
 
addEventListener('fetch', event => {
  event.respondWith(handleRequest(event.request))
})

The HTMLRewriter, which is instantiated once per Worker, is used here to select the a and img elements.

An element handler responds to any incoming element, when attached using the .on function of the HTMLRewriter instance.

Here, our AttributeRewriter will receive every element parsed by the HTMLRewriter instance. We will then use it to rewrite each attribute, href for the a element for example, to update the URL in the HTML.

Getting started

You can get started with the HTMLRewriter by signing up for Workers today. For a full guide on how to use the HTMLRewriter API, we suggest checking out our documentation.

How the the HTMLRewriter work?

How does the HTMLRewriter work under the hood? We’re excited to tell you about it, and have spared no details. Check out the first part of our two-blog post series to learn all about the not-so-brief history of rewriter HTML at Cloudflare, and stay tuned for part two tomorrow!

How to enable encryption in a browser with the AWS Encryption SDK for JavaScript and Node.js

Post Syndicated from Spencer Janyk original https://aws.amazon.com/blogs/security/how-to-enable-encryption-browser-aws-encryption-sdk-javascript-node-js/

In this post, we’ll show you how to use the AWS Encryption SDK (“ESDK”) for JavaScript to handle an in-browser encryption workload for a hypothetical application. First, we’ll review some of the security and privacy properties of encryption, including the names AWS uses for the different components of a typical application. Then, we’ll discuss some of the reasons you might want to encrypt each of those components, with a focus on in-browser encryption, and we’ll describe how to perform that encryption using the ESDK. Lastly, we’ll talk about some of the security properties to be mindful of when designing an application, and where to find additional resources.

An overview of the security and privacy properties of encryption

Encryption is a technique that can restrict access to sensitive data by making it unreadable without a key. An encryption process takes data that is plainly readable or processable (“plaintext”) and uses principles of mathematics to obscure the contents so that it can’t be read without the use of a secret key. To preserve user privacy and prevent unauthorized disclosure of sensitive business data, developers need ways to protect sensitive data during the entire data lifecycle. Data needs to be protected from risks associated with unintentional disclosure as data flows between collection, storage, processing, and sharing components of an application. In this context, encryption is typically divided into two separate techniques: encryption at rest for storing data; and encryption in transit for moving data between entities or systems.

Many applications use encryption in transit to secure connections between their users and the services they provide, and then encrypt the data before it’s stored. However, as applications become more complex and data must be moved between more nodes and stored in more diverse places, there are more opportunities for data to be accidentally leaked or unintentionally disclosed. When a user enters their data in a browser, Transport Layer Security (TLS) can protect that data in transit between the user’s browser and a service endpoint. But in a distributed system, intermediary services between that endpoint and the service that processes that sensitive data might log or cache the data before transporting it. Encrypting sensitive data at the point of collection in the browser is a form of encryption at rest that minimizes the risk of unauthorized access and protects the data if it’s lost, stolen, or accidentally exposed. Encrypting data in the browser means that even if it’s completely exposed elsewhere, it’s unreadable and worthless to anyone without access to the key.

A typical web application

A typical web application will accept some data as input, process it, and then store it. When the user needs to access stored data, the data often follows the same path used when it was input. In our example there are three primary components to the path:

Figure 1: A hypothetical web application where the application is composed of an end-user interacting with a browser front-end, a third party which processes data received from the browser, processing is performed in Amazon EC2, and storage happens in Amazon S3

Figure 1: A hypothetical web application where the application is composed of an end-user interacting with a browser front-end, a third party which processes data received from the browser, processing is performed in Amazon EC2, and storage happens in Amazon S3

  1. An end-user interacts with the application using an interface in the browser.
  2. As data is sent to Amazon EC2, it passes through the infrastructure of a third party which could be an Internet Service Provider, an appliance in the user’s environment, or an application running in the cloud.
  3. The application on Amazon EC2 processes the data once it has been received.
  4. Once the application is done processing data, it is stored in Amazon S3 until it is needed again.

As data moves between components, TLS is used to prevent inadvertent disclosure. But what if one or more of these components is a third-party service that doesn’t need access to sensitive data? That’s where encryption at rest comes in.

Encryption at rest is available as a server-side, client-side, and client-side in-browser protection. Server-side encryption (SSE) is the most commonly used form of encryption with AWS customers, and for good reason: it’s easy to use because it’s natively supported by many services, such as Amazon S3. When SSE is used, the service that’s storing data will encrypt each piece of data with a key (a “data key”) when it’s received, and then decrypt it transparently when it’s requested by an authorized user. This has the benefit of being seamless for application developers because they only need to check a box in Amazon S3 to enable encryption, and it also adds an additional level of access control by having separate permissions to download an object and perform a decryption operation. However, there is a security/convenience tradeoff to consider, because the service will allow any role with the appropriate permissions to perform a decryption. For additional control, many AWS services—including S3—support the use of customer-managed AWS Key Management Service (AWS KMS) customer master keys (CMKs) that allow you to specify key policies or use grants or AWS Identity and Access Management (IAM) policies to control which roles or users have access to decryption, and when. Configuring permission to decrypt using customer-managed CMKs is often sufficient to satisfy compliance regimes that require “application-level encryption.”

Some threat models or compliance regimes may require client-side encryption (CSE), which can add a powerful additional level of access control at the expense of additional complexity. As noted above, services perform server-side encryption on data after it has left the boundary of your application. TLS is used to secure the data in transit to the service, but some customers might want to only manage encrypt/decrypt operations within their application on EC2 or in the browser. Applications can use the AWS Encryption SDK to encrypt data within the application trust boundary before it’s sent to a storage service.

But what about a use case where customers don’t even want plaintext data to leave the browser? Or what if end-users input data that is passed through or logged by intermediate systems that belong to a third-party? It’s possible to create a separate application that only manages encryption to ensure that your environment is segregated, but using the AWS Encryption SDK for JavaScript allows you to encrypt data in an end-user browser before it’s ever sent to your application, so only your end-user will be able to view their plaintext data. As you can see in Figure 2 below, in-browser encryption can allow data to be safely handled by untrusted intermediate systems while ensuring its confidentiality and integrity.

Figure 2: A hypothetical web application with encryption where the application is composed of an end-user interacting with a browser front-end, a third party which processes data received from the browser, processing is performed in Amazon EC2, and storage happens in Amazon S3

Figure 2: A hypothetical web application with encryption where the application is composed of an end-user interacting with a browser front-end, a third party which processes data received from the browser, processing is performed in Amazon EC2, and storage happens in Amazon S3

  1. The application in the browser requests a data key to encrypt sensitive data entered by the user before it is passed to a third party.
  2. Because the sensitive data has been encrypted, the third party cannot read it. The third party may be an Internet Service Provider, an appliance in the user’s environment, an application running in the cloud, or a variety of other actors.
  3. The application on Amazon EC2 can make a request to KMS to decrypt the data key so the data can be decrypted, processed, and re-encrypted.
  4. The encrypted object is stored in S3 where a second encryption request is made so the object can be encrypted when it is stored server side.

How to encrypt in the browser

The first step of in-browser encryption is including a copy of the AWS Encryption SDK for JavaScript with the scripts you’re already sending to the user when they access your application. Once it’s present in the end-user environment, it’s available for your application to make calls. To perform the encryption, the ESDK will request a data key from the cryptographic materials provider that is used to encrypt, and an encrypted copy of the data key that will be stored with the object being encrypted. After a piece of data is encrypted within the browser, the ciphertext can be uploaded to your application backend for processing or storage. When a user needs to retrieve the plaintext, the ESDK can read the metadata attached to the ciphertext to determine the appropriate method to decrypt the data key, and if they have access to the CMK decrypt the data key and then use it to decrypt the data.

Important considerations

One common issue with browser-based applications is inconsistent feature support across different browser vendors and versions. For example, how will the application respond to browsers that lack native support for the strongest recommended cryptographic algorithm suites? Or, will there be a message or alternative mode if a user accesses the application using a browser that has JavaScript disabled? The ESDK for JavaScript natively supports a fallback mode, but it may not be appropriate for all use cases. Be sure to understand what kind of browser environments you will need to support to determine whether in-browser encryption is appropriate, and include support for graceful degradation if you expect limited browser support. Developers should also consider the ways that unauthorized users might monitor user actions via a browser extension, make unauthorized browser requests without user knowledge, or request a “downgraded” (less mathematically intensive) cryptographic operation.

It’s always a good idea to have your application designs reviewed by security professionals. If you have an AWS Account Manager or Technical Account Manager, you can ask them to connect you with a Solutions Architect to review your design. If you’re an AWS customer but don’t have an account manager, consider visiting an AWS Loft to participate in our “Ask an Expert” plan.

Where to learn more

If you have questions about this post, let us know in the Comments section below, or consult the AWS Encryption SDK Developer Forum. Because the Encryption SDK is open-source, you can always contribute, open an issue, or ask questions in Github.

The AWS Encryption SDK for JavaScript is available at: https://github.com/awslabs/aws-encryption-sdk-javascript
Documentation is available at: https://docs.aws.amazon.com/encryption-sdk/latest/developer-guide/javascript.html

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Janyk author photo

Spencer Janyk

Spencer is a Senior Product Manager at Amazon Web Services working on data encryption and privacy. He has previously worked on vulnerability management and monitoring for enterprises and applying machine learning to challenges in ad tech, social media, diversity in recruiting, and talent management. Spencer holds a Master of Arts in Performance Studies from New York University and a Bachelor of Arts in Gender Studies from Whitman College.

Gray author photo

Amanda Gray

Amanda is a Senior Security Engineer at Amazon Web Services on the Crypto Tools team. Previously, Amanda worked on application security and privacy by design, and she continues to promote these goals every day. Amanda holds Bachelors’ degrees in Physics and Computer Science from the University of Washington and Smith College respectively, and a Master’s degree in Physical Oceanography from the University of Washington.

Workers Sites: deploy your website directly to our network

Post Syndicated from Rita Kozlov original https://blog.cloudflare.com/workers-sites/

Workers Sites: deploy your website directly to our network

Workers Sites: deploy your website directly to our network

Performance on the web has always been a battle against the speed of light — accessing a site from London that is served from Seattle, WA means every single asset request has to travel over seven thousand miles. The first breakthrough in the web performance battle was HTTP/1.1 connection keep-alive and browsers opening multiple connections. The next breakthrough was the CDN, bringing your static assets closer to your end users by caching them in data centers closer to them. Today, with Workers Sites, we’re excited to announce the next big breakthrough — entire sites distributed directly onto the edge of the Internet.

Deploying to the edge of the network

Why isn’t just caching assets sufficient? Yes, caching improves performance, but significant performance improvement comes with a series of headaches. The CDN can make a guess at which assets it should cache, but that is just a guess. Configuring your site for maximum performance has always been an error-prone process, requiring a wide collection of esoteric rules and headers. Even when perfectly configured, almost nothing is cached forever, precious requests still often need to travel all the way to your origin (wherever it may be). Cache invalidation is, after all, one of the hardest problems in computer science.

This begs the question: rather than moving bytes from the origin to the edge bit by bit clumsily, why not push the whole origin to the edge?

Workers Sites: Extending the Workers platform

Two years ago for Birthday Week, we announced Cloudflare Workers, a way for developers to write and run JavaScript and WebAssembly on our network in 194 cities around the world. A year later, we released Workers KV, our distributed key-value store that gave developers the ability to store state at the edge in those same cities.

Workers Sites leverages the power of Workers and Workers KV by allowing developers to upload their sites directly to the edge, and closer to the end users. Born on the edge, Workers Sites is what we think modern development on the web should look like, natively secure, fast, and massively scalable. Less of your time is spent on configuration, and more of your time is spent on your code, and content itself.

How it works

Workers Sites are deployed with a few terminal commands, and can serve a site generated by any static site generator, such as Hugo, Gatsby or Jekyll. Using Wrangler (our CLI), you can upload your site’s assets directly into KV. When a request hits your Workers Site, the Cloudflare Worker generated by Wrangler, will read and serve the asset from KV, with the appropriate headers (no need to worry about Content-Type, and Cache-Control; we’ve got you covered).

Workers Sites can be used to deploy any static site such as a blog, marketing sites, or portfolio.  If you ever decide your site needs to become a little less static, your Worker is just code, edit and extend it until you have a dynamic site running all around the world.

Getting started

To get started with Workers Sites, you first need to sign up for Workers. After selecting your workers.dev subdomain, choose the Workers Unlimited plan (starting at $5 / month) to get access to Workers KV and the ability to deploy Workers Sites.

After signing up for Workers Unlimited you’ll need to install the CLI for Workers, Wrangler. Wrangler can be installed either from NPM or Cargo:

# NPM Installation
npm i @cloudflare/wrangler -g
# Cargo Installation
cargo install wrangler

Once you install Wrangler, you are ready to deploy your static site, with the following steps:

  1. Run wrangler init --site in the directory that contains your static site’s built assets
  2. Fill in the newly created wrangler.toml file with your account and project details
  3. Publish your site with wrangler publish

You can also check out our Workers Sites reference documentation or follow the full tutorial for create-react-app in the docs.

If you’d prefer to get started by watching a video, we’ve got you covered! This video will walk you through creating and deploying your first Workers Site.


Blazing fast: from Atlanta to Zagreb

In addition to improving the developer experience, we did a lot of work behind the scenes making sure that both deploys and the sites themselves are blazing fast — we’re excited to share the how with you in our technical blog post.

To test the performance of Workers Sites we took one of our personal sites and deployed it to run some benchmarks. This test was for our site but your results may vary.

One common way to benchmark the performance of your site it using Google Lighthouse, which you can do directly from the Audits tab of your Chrome browser.

Workers Sites: deploy your website directly to our network

So we passed the first test with flying colors — 100! However, running a benchmark from your own computer introduces a bias: your users are not necessarily where you are. In fact, your users are increasingly not where you are.

Where you’re benchmarking from is really important: running tests from different locations will yield different results. Benchmarking from Seattle and hitting a server on the West coast says very little about your global performance.

We decided to use a tool called Catchpoint to run benchmarks from cities around the world. To see how we compare, we deployed the site to three different static site deployment platforms including Workers Sites.

Since providers offer data center regions on the coasts of the United States, or central Europe, it’s common to see good performance in regions such as North America, and we’ve got you covered here:

Workers Sites: deploy your website directly to our network

But what about your users in the rest of the world? Performance is even more critical in those regions: the first users are not going to be connecting to your site on a MacBook Pro, on a blazing fast connection. Workers Sites allows you to reach those regions without any additional effort on your part — every time our map grows, your global presence grows with it.

We’ve done the work of running some benchmarks from different parts of the world for you, and we’re pleased to share the results:

Workers Sites: deploy your website directly to our network

One last thing…

Deploying your next site with Workers Sites is easy and leads to great performance, so we thought it was only right that we deploy with Workers Sites ourselves. With this announcement, we are also open sourcing the Cloudflare Workers docs! And, they are now served from a Cloudflare data center near you using Workers Sites.

We can’t wait to see what you deploy with Workers Sites!


Have you built something interesting with Workers or Workers Sites? Let us know @CloudflareDev!

Building a GraphQL server on the edge with Cloudflare Workers

Post Syndicated from Kristian Freeman original https://blog.cloudflare.com/building-a-graphql-server-on-the-edge-with-cloudflare-workers/

Building a GraphQL server on the edge with Cloudflare Workers

Building a GraphQL server on the edge with Cloudflare Workers

Today, we’re open-sourcing an exciting project that showcases the strengths of our Cloudflare Workers platform: workers-graphql-server is a batteries-included Apollo GraphQL server, designed to get you up and running quickly with GraphQL.

Building a GraphQL server on the edge with Cloudflare Workers
Testing GraphQL queries in the GraphQL Playground

As a full-stack developer, I’m really excited about GraphQL. I love building user interfaces with React, but as a project gets more complex, it can become really difficult to manage how your data is managed inside of an application. GraphQL makes that really easy – instead of having to recall the REST URL structure of your backend API, or remember when your backend server doesn’t quite follow REST conventions – you just tell GraphQL what data you want, and it takes care of the rest.

Cloudflare Workers is uniquely suited as a platform to being an incredible place to host a GraphQL server. Because your code is running on Cloudflare’s servers around the world, the average latency for your requests is extremely low, and by using Wrangler, our open-source command line tool for building and managing Workers projects, you can deploy new versions of your GraphQL server around the world within seconds.

If you’d like to try the GraphQL server, check out a demo GraphQL playground, deployed on Workers.dev. This optional add-on to the GraphQL server allows you to experiment with GraphQL queries and mutations, giving you a super powerful way to understand how to interface with your data, without having to hop into a codebase.

If you’re ready to get started building your own GraphQL server with our new open-source project, we’ve added a new tutorial to our Workers documentation to help you get up and running – check it out here!

Finally, if you’re interested in how the project works, or want to help contribute – it’s open-source! We’d love to hear your feedback and see your contributions. Check out the project on GitHub.

Join Cloudflare & Moz at our next meetup, Serverless in Seattle!

Post Syndicated from Giuliana DeAngelis original https://blog.cloudflare.com/join-cloudflare-moz-at-our-next-meetup-serverless-in-seattle/

Join Cloudflare & Moz at our next meetup, Serverless in Seattle!
Photo by oakie / Unsplash

Join Cloudflare & Moz at our next meetup, Serverless in Seattle!

Cloudflare is organizing a meetup in Seattle on Tuesday, June 25th and we hope you can join. We’ll be bringing together members of the developers community and Cloudflare users for an evening of discussion about serverless compute and the infinite number of use cases for deploying code at the edge.

To kick things off, our guest speaker Devin Ellis will share how Moz uses Cloudflare Workers to reduce time to first byte 30-70% by caching dynamic content at the edge. Kirk Schwenkler, Solutions Engineering Lead at Cloudflare, will facilitate this discussion and share his perspective on how to grow and secure businesses at scale.

Next up, Developer Advocate Kristian Freeman will take you through a live demo of Workers and highlight new features of the platform. This will be an interactive session where you can try out Workers for free and develop your own applications using our new command-line tool.

Food and drinks will be served til close so grab your laptop and a friend and come on by!

View Event Details & Register Here

Agenda:

  • 5:00 pm Doors open, food and drinks
  • 5:30 pm Customer use case by Devin and Kirk
  • 6:00 pm Workers deep dive with Kristian
  • 6:30 – 8:30 pm Networking, food and drinks

Join Cloudflare & Moz at our next meetup, Serverless in Seattle!

Post Syndicated from Giuliana DeAngelis original https://blog.cloudflare.com/join-cloudflare-moz-at-our-next-meetup-serverless-in-seattle/

Join Cloudflare & Moz at our next meetup, Serverless in Seattle!
Photo by oakie / Unsplash

Join Cloudflare & Moz at our next meetup, Serverless in Seattle!

Cloudflare is organizing a meetup in Seattle on Tuesday, June 25th and we hope you can join. We’ll be bringing together members of the developers community and Cloudflare users for an evening of discussion about serverless compute and the infinite number of use cases for deploying code at the edge.

To kick things off, our guest speaker Devin Ellis will share how Moz uses Cloudflare Workers to reduce time to first byte 30-70% by caching dynamic content at the edge. Kirk Schwenkler, Solutions Engineering Lead at Cloudflare, will facilitate this discussion and share his perspective on how to grow and secure businesses at scale.

Next up, Developer Advocate Kristian Freeman will take you through a live demo of Workers and highlight new features of the platform. This will be an interactive session where you can try out Workers for free and develop your own applications using our new command-line tool.

Food and drinks will be served til close so grab your laptop and a friend and come on by!

View Event Details & Register Here

Agenda:

  • 5:00 pm Doors open, food and drinks
  • 5:30 pm Customer use case by Devin and Kirk
  • 6:00 pm Workers deep dive with Kristian
  • 6:30 – 8:30 pm Networking, food and drinks

Enhancing the Optimizely Experimentation Platform with Cloudflare Workers

Post Syndicated from Remy Guercio original https://blog.cloudflare.com/enhancing-optimizely-with-cloudflare-workers/

Enhancing the Optimizely Experimentation Platform with Cloudflare Workers

Enhancing the Optimizely Experimentation Platform with Cloudflare Workers

This is a joint post by Whelan Boyd, Senior Product Manager at Optimizely and Remy Guercio, Product Marketing Manager for Cloudflare Workers.

Experimentation is an important ingredient in driving business growth: whether you’re iterating on a product or testing new messaging, there’s no substitute for the data and insights gathered from conducting rigorous experiments in the wild.

Optimizely is the world’s leading experimentation platform, with thousands of customers worldwide running tests for over 140 million visitors daily. If Optimizely were a website, it would be the third most trafficked in the US.  And when it came time to experiment with reinvigorating their own platform, Optimizely chose Cloudflare Workers.

Improving Performance and Agility with Cloudflare Workers

Cloudflare Workers is a globally distributed serverless compute platform that runs across Cloudflare’s network of 180 locations worldwide. Workers are designed for flexibility, with many different use cases ranging from customizing configuration of Cloudflare services and features to building full, independent applications.

In this post, we’re going to focus on how Workers can be used to improve performance and increase agility for more complex applications. One of the key benefits of Workers is that they allow developers to move decision logic and data into a highly efficient runtime operating in close proximity to end users — resulting in significant performance benefits and flexibility. Which brings us to Optimizely…

How Optimizely Works

Every week Optimizely delivers billions of experiences to help teams A/B test new products, de-risk new feature launches, and validate alternative designs. Optimizely lets companies test client-side changes like layouts and copy, as well as server-side changes like algorithms and feature rollouts.

Let’s explore how both have challenges that can be overcome with Workers, starting with Optimizely’s client-side A/B testing, or Optimizely Web, product.

Use Case: Optimizely Web

The main benefit of Optimizely Web — Optimizely’s client-side testing framework — is that it supports A/B testing via straightforward insertion of a JavaScript tag on the web page. The test is designed via the Optimizely WYSIWYG editor, and is live within minutes. Common use cases include style updates, image swaps, headlines and other text changes. You can also write any custom JavaScript or CSS you want.

With client-side A/B testing, the browser downloads JavaScript that modifies the page as it’s loading.  To avoid “flash-of-unstyled-content” (FOUC), developers need to implement this JavaScript synchronously in their <head> tag.  This constraint, though, can lead to page performance issues, especially on slower connections and devices.  Downloading and executing JavaScript in the browser has a cost, and this cost increases if the amount of JavaScript is large.  With a normal Optimizely Web implementation, all experiments are included in the JavaScript loaded on every page.

Enhancing the Optimizely Experimentation Platform with Cloudflare Workers
A traditional Optimizely implementation

With Workers, Optimizely can support many of these same use cases, but hoists critical logic to the edge to avoid much of the performance cost. Here’s how it works:

Enhancing the Optimizely Experimentation Platform with Cloudflare Workers
Implementing tests with Optimizely and Cloudflare Workers

This diagram shows how Optimizely customers can execute experiments created in the point-and-click UI through a Cloudflare Worker.  Rather than the browser downloading a large JavaScript file, your Worker handling HTTP/S requests calls out to Optimizely’s Worker.  Optimizely’s Worker determines which A/B tests should be active on this page and returns a small amount of JavaScript back to your Worker.  In fact, it is the JavaScript required to execute A/B test variations on just that specific page load.  Your Worker inlines the code in the page and returns it to the visitor’s browser.  

Not only does this avoid a browser bottleneck downloading a lot of data, the amount of code to execute is a fraction of a normal client-side implementation.  Since the experiments are set up inside the Optimizely interface just like any other Web experiment, you can run as many as you want without waiting for code deploy cycles.  Better yet, your non-technical (e.g. marketing) teams can still run these without depending on developers for each test.  It’s a one-time implementation.

Use Case: Going Further with Feature Rollouts

Optimizely Full Stack is Optimizely’s server-side experimentation and feature flagging platform for websites, mobile apps, chatbots, APIs, smart devices, and anything else with a network connection.  You can deploy code behind feature flags, experiment with A/B tests, and roll out or roll back features immediately.  Optimizely Rollouts is a free version of Full Stack that supports key feature rollout capabilities.

Full Stack SDKs are often implemented and instantiated directly in application code.

Enhancing the Optimizely Experimentation Platform with Cloudflare Workers
An Optimizely full stack experimentation setup

The main blocker to high velocity server-side testing is that experiments and feature rollouts must go through the code-deploy cycle — and to further add to the headache, many sites cache content on CDNs, so experiments or rollouts running at the origin never execute.  

In this example, we’ll consider a new feature you’d like to roll out gradually, exposing more and more users over time between code deploys. With Workers, you can implement feature rollouts by running the Optimizely JavaScript SDK at the edge.  The Worker is effectively a decision service.  Instead of installing the JS SDK inside each application service where you might need to gate or roll out features, centralize instantiation in a Worker.

From your application, simply hit the Worker and the response will tell you whether a feature is enabled for that particular user.  In the example below, we supply via query parameters a userId, feature, and account-specific SDK key and the Worker responds with its decision in result.  Below is a sample Cloudflare Worker:

import { createManager } from '../index'

/// <reference lib="es2015" />
/// <reference lib="webworker" />

addEventListener('fetch', (event: any) => {
  event.respondWith(handleRequest(event.request))
})

/**
 * Fetch and log a request
 * @param {Request} request
 */
async function handleRequest(request: Request): Promise<Response> {
  const url = new URL(request.url)
  const key = url.searchParams.get('key')
  const userId = url.searchParams.get('userId')
  const feature = url.searchParams.get('feature')
  if (!feature || !key || !userId) {
    throw new Error('must supply "feature", "userId" and "key"')
  }

  try {
    const manager = createManager({
      sdkKey: key,
    })

    await manager.onReady().catch(err => {
      return new Response(JSON.stringify({ status: 'error' }))
    })
    const client = manager.getClient()

    const result = await client.feature({
      key: feature,
      userId,
    })

    return new Response(JSON.stringify(result))
  } catch (e) {
    return new Response(JSON.stringify({ status: 'error' }))
  }
}

This kind of setup is common for React applications, which may update store values based on decisions returned by the Worker. No need to force a request all the way back to origin.

All in all, using Workers as a centralized decision service can reduce the complexity of your Full Stack implementation and support applications that rely on heavy caching.

How to Improve Your Experimentation Setup

Both of the examples above demonstrate how Workers can provide speed and flexibility to experimentation and feature flagging.  But this is just the tip of the iceberg!  There are plenty of other ways you can use these two technologies together. We’d love to hear from you and explore them together!

Are you a developer looking for a feature flagging or server-side testing solution? The Optimizely Rollouts product is free and ready for you to sign up!

Or does your marketing team need a high performance A/B testing solution? The Optimizely Web use case is in developer preview.

  • Cloudflare Enterprise Customers: Reach out to your dedicated Cloudflare account manager learn more and start the process.
  • Optimizely Customers and Cloudflare Customers (who aren’t on an enterprise plan): Reach out to your Optimizely contact to learn more and start the process.

You can sign up for and learn more about using Cloudflare Workers here!

Just Write Code: Improving Developer Experience for Cloudflare Workers

Post Syndicated from Rita Kozlov original https://blog.cloudflare.com/just-write-code-improving-developer-experience-for-cloudflare-workers/

Just Write Code: Improving Developer Experience for Cloudflare Workers

Just Write Code: Improving Developer Experience for Cloudflare Workers

We’re excited to announce that starting today, Cloudflare Workers® gets a CLI, new and improved docs, multiple scripts for everyone, the ability to run applications on workers.dev without bringing your own domain, and a free tier to make experimentation easier than ever. We are building the serverless platform of the future, and want you to build your application on it, today. In this post, we’ll elaborate on what a serverless platform of the future looks like, how it changes today’s paradigms, and our commitment to making building on it a great experience.

Three years ago, I was interviewing with Cloudflare for a Solutions Engineering role. As a part of an interview assignment, I had to set up an origin behind Cloudflare on my own  domain. I spent my weekend, frustrated and lost in configurations, trying to figure out how to set up an EC2 instance, connect to it over IPv6, and install NGINX on Ubuntu 16.4 just so I could end up with a static site with a picture of my cat on it. I have a computer science degree, and spent my career up until that point as a software engineer — building this simple app was a horrible experience. A weekend spent coding, without worrying about servers, would have yielded a much richer application.

And this is just one rung in the ladder — the first one. While the primitives have moved up the stack, the fact is, developing an application, putting it on the Internet, and growing it from MVP to a scalable, performant product all still remain distinct steps in the development process.

Just Write Code: Improving Developer Experience for Cloudflare Workers

This is what “serverless” has promised to solve. Abstract away the servers at all stages of the process, and allow developers to do what they do best: develop, without having to worry about infrastructure.

And yet, with many serverless offerings today, the first thing they do is the thing that they promised you they wouldn’t — they make you think about servers. “What region would you like?” (the first question that comes to my mind: why are you forcing me to think about which customers I care more about: East Coast, or West Coast? Why can’t you solve this for me?). Or: “how much memory do you think you’ll need?” (again: why are you making this my problem?! You figure it out!).

We don’t think it should work like this.

I often think back to that problem I was facing myself three years ago, and that I know developers all around the world face every day. Developers should be able to just focus on the code. Someone else should deal with everything else from setting up infrastructure through making that infrastructure fast and scalable. While we’ve made some architectural decisions in building Workers that enable us to do this better than anyone else, today isn’t about expounding on them (though if you’d like to read more, here’s a great blog post detailing some of them). What today is about is really honing Workers in on the needs of developers.

We want Workers to bring the dream of serverless to life —  of letting developers only worry about bugs in their code. Today marks the start of a sustained push that Cloudflare is making towards building a great developer experience with Workers. We have some exciting things to announce today — but this is just the beginning.

Wrangler: the official Workers CLI

Wrangler, originally open sourced as the Rust CLI for Workers, has graduated into being the official Workers CLI, supporting all your Workers deployment needs.

Get started now by installing Wrangler

npm install -g @cloudflare/wrangler

Generate your first project from our template gallery

wrangler generate <name> <template> --type=["webpack", "javascript", "rust"]

Just Write Code: Improving Developer Experience for Cloudflare Workers

Wrangler will take care of webpacking your project, compiling to WebAssembly, and uploading your project to Workers, all in one simple step:

wrangler publish

Just Write Code: Improving Developer Experience for Cloudflare Workers

A few of the other goodies we’re excited for you to use Wrangler for:

  • Compile Rust, C, and C++ to WebAssembly
  • Create single or multi-file JavaScript applications
  • Install NPM dependencies (we take care of webpack for you)
  • Add KV namespaces and bindings
  • Get started with pre-made templates

New and Improved Docs

We’ve updated our docs (and used Wrangler to do so!) to make it easier than ever for you to get started and deploy your first application with Workers.

Check out our new tutorials:

Multiscript for All

You asked, we listened. When we introduced Workers, we wanted to keep things as simple as possible. As a developer, you want to break up your code into logical components. Rather than having a single monolithic script, we want to allow you to deploy your code in a way that makes sense to you.

no-domain-required.workers.dev

Writing software is a creative process: a new project means creating something out of nothing. You may not entirely know what exactly it’s going to be yet, let alone what to name it.

We are changing the way you get started on Workers, by allowing you to deploy to a-subdomain-of-your-choice.workers.dev.

You may have heard about this announcement back in February, and we’re excited to deliver. For those of you who pre-registered, your subdomains will be waiting for you upon signing up and clicking into Workers.

A Free Tier to Experiment

Great products don’t always come from great ideas, they often come from freedom to tinker. When tinkering comes at a price, even if it’s $5, we realized we were limiting peoples’ ability to experiment.

Starting today, we are announcing a free tier for Workers.

The free tier will allow you to use Workers at up to 100,000 requests per day, on your own domain or workers.dev. You can learn more about the limits here.

New and improved UI

We have packaged this up into a clean, and easy experience that allows you to go from sign up to a deployed Worker in less than 2 minutes:

Just Write Code: Improving Developer Experience for Cloudflare Workers

Our commitment

We have a long way to go. This is not about crossing developer experience off our list, rather, about emphasizing our commitment to it. As our co-founder, Michelle likes to say, “we’re only getting started”.

There’s a lot here, and there’s a lot more to come. Join us over at workers.cloudflare.com to find out more, and if you’re ready to give it a spin, you can sign up there.

We’re excited to see what you build!

The Serverlist Newsletter: Connecting the Serverless Ecosystem

Post Syndicated from Connor Peshek original https://blog.cloudflare.com/the-serverlist-newsletter-5/

The Serverlist Newsletter: Connecting the Serverless Ecosystem

Check out our fifth edition of The Serverlist below. Get the latest scoop on the serverless space, get your hands dirty with new developer tutorials, engage in conversations with other serverless developers, and find upcoming meetups and conferences to attend.

Sign up below to have The Serverlist sent directly to your mailbox.



Building a To-Do List with Workers and KV

Post Syndicated from Kristian Freeman original https://blog.cloudflare.com/building-a-to-do-list-with-workers-and-kv/

Building a To-Do List with Workers and KV

Building a To-Do List with Workers and KV

In this tutorial, we’ll build a todo list application in HTML, CSS and JavaScript, with a twist: all the data should be stored inside of the newly-launched Workers KV, and the application itself should be served directly from Cloudflare’s edge network, using Cloudflare Workers.

To start, let’s break this project down into a couple different discrete steps. In particular, it can help to focus on the constraint of working with Workers KV, as handling data is generally the most complex part of building an application:

  1. Build a todos data structure
  2. Write the todos into Workers KV
  3. Retrieve the todos from Workers KV
  4. Return an HTML page to the client, including the todos (if they exist)
  5. Allow creation of new todos in the UI
  6. Allow completion of todos in the UI
  7. Handle todo updates

This task order is pretty convenient, because it’s almost perfectly split into two parts: first, understanding the Cloudflare/API-level things we need to know about Workers and KV, and second, actually building up a user interface to work with the data.

Understanding Workers

In terms of implementation, a great deal of this project is centered around KV – although that may be the case, it’s useful to break down what Workers are exactly.

Service Workers are background scripts that run in your browser, alongside your application. Cloudflare Workers are the same concept, but super-powered: your Worker scripts run on Cloudflare’s edge network, in-between your application and the client’s browser. This opens up a huge amount of opportunity for interesting integrations, especially considering the network’s massive scale around the world. Here’s some of the use-cases that I think are the most interesting:

  1. Custom security/filter rules to block bad actors before they ever reach the origin
  2. Replacing/augmenting your website’s content based on the request content (i.e. user agents and other headers)
  3. Caching requests to improve performance, or using Cloudflare KV to optimize high-read tasks in your application
  4. Building an application directly on the edge, removing the dependence on origin servers entirely

For this project, we’ll lean heavily towards the latter end of that list, building an application that clients communicate with, served on Cloudflare’s edge network. This means that it’ll be globally available, with low-latency, while still allowing the ease-of-use in building applications directly in JavaScript.

Setting up a canvas

To start, I wanted to approach this project from the bare minimum: no frameworks, JS utilities, or anything like that. In particular, I was most interested in writing a project from scratch and serving it directly from the edge. Normally, I would deploy a site to something like GitHub Pages, but avoiding the need for an origin server altogether seems like a really powerful (and performant idea) – let’s try it!

I also considered using TodoMVC as the blueprint for building the functionality for the application, but even the Vanilla JS version is a pretty impressive amount of code, including a number of Node packages – it wasn’t exactly a concise chunk of code to just dump into the Worker itself.

Instead, I decided to approach the beginnings of this project by building a simple, blank HTML page, and including it inside of the Worker. To start, we’ll sketch something out locally, like this:

<!DOCTYPE html>
<html>
  <head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width,initial-scale=1">
    <title>Todos</title>
  </head>
  <body>
    <h1>Todos</h1>
  </body>
</html>

Hold on to this code – we’ll add it later, inside of the Workers script. For the purposes of the tutorial, I’ll be serving up this project at todo.kristianfreeman.com,. My personal website was already hosted on Cloudflare, and since I’ll be serving , it was time to create my first Worker.

Creating a worker

Inside of my Cloudflare account, I hopped into the Workers tab and launched the Workers editor.

This is one of my favorite features of the editor – working with your actual website, understanding how the worker will interface with your existing project.

Building a To-Do List with Workers and KV

The process of writing a Worker should be familiar to anyone who’s used the fetch library before. In short, the default code for a Worker hooks into the fetch event, passing the request of that event into a custom function, handleRequest:

addEventListener('fetch', event => {
  event.respondWith(handleRequest(event.request))
})

Within handleRequest, we make the actual request, using fetch, and return the response to the client. In short, we have a place to intercept the response body, but by default, we let it pass-through:

async function handleRequest(request) {
  console.log('Got request', request)
  const response = await fetch(request)
  console.log('Got response', response)
  return response
}

So, given this, where do we begin actually doing stuff with our worker?

Unlike the default code given to you in the Workers interface, we want to skip fetching the incoming request: instead, we’ll construct a new Response, and serve it directly from the edge:

async function handleRequest(request) {
  const response = new Response("Hello!")
  return response
}

Given that very small functionality we’ve added to the worker, let’s deploy it. Moving into the “Routes” tab of the Worker editor, I added the route https://todo.kristianfreeman.com/* and attached it to the cloudflare-worker-todos script.

Building a To-Do List with Workers and KV

Once attached, I deployed the worker, and voila! Visiting todo.kristianfreeman.com in-browser gives me my simple “Hello!” response back.

Building a To-Do List with Workers and KV

Writing data to KV

The next step is to populate our todo list with actual data. To do this, we’ll make use of Cloudflare’s Workers KV – it’s a simple key-value store that you can access inside of your Worker script to read (and write, although it’s less common) data.

To get started with KV, we need to set up a “namespace”. All of our cached data will be stored inside that namespace, and given just a bit of configuration, we can access that namespace inside the script with a predefined variable.

I’ll create a new namespace called KRISTIAN_TODOS, and in the Worker editor, I’ll expose the namespace by binding it to the variable KRISTIAN_TODOS.

Building a To-Do List with Workers and KV

Given the presence of KRISTIAN_TODOS in my script, it’s time to understand the KV API. At time of writing, a KV namespace has three primary methods you can use to interface with your cache: get, put, and delete. Pretty straightforward!

Let’s start storing data by defining an initial set of data, which we’ll put inside of the cache using the put method. I’ve opted to define an object, defaultData, instead of a simple array of todos: we may want to store metadata and other information inside of this cache object later on. Given that data object, I’ll use JSON.stringify to put a simple string into the cache:

async function handleRequest(request) {
  // ...previous code
  
  const defaultData = { 
    todos: [
      {
        id: 1,
        name: 'Finish the Cloudflare Workers blog post',
          completed: false
      }
    ] 
  }
  KRISTIAN_TODOS.put("data", JSON.stringify(defaultData))
}

The Worker KV data store is eventually consistent: writing to the cache means that it will become available eventually, but it’s possible to attempt to read a value back from the cache immediately after writing it, only to find that the cache hasn’t been updated yet.

Given the presence of data in the cache, and the assumption that our cache is eventually consistent, we should adjust this code slightly: first, we should actually read from the cache, parsing the value back out, and using it as the data source if exists. If it doesn’t, we’ll refer to defaultData, setting it as the data source for now (remember, it should be set in the future… eventually), while also setting it in the cache for future use. After breaking out the code into a few functions for simplicity, the result looks like this:

const defaultData = { 
  todos: [
    {
      id: 1,
      name: 'Finish the Cloudflare Workers blog post',
      completed: false
    }
  ] 
}

const setCache = data => KRISTIAN_TODOS.put("data", data)
const getCache = () => KRISTIAN_TODOS.get("data")

async function getTodos(request) {
  // ... previous code
  
  let data;
  const cache = await getCache()
  if (!cache) {
    await setCache(JSON.stringify(defaultData))
    data = defaultData
  } else {
    data = JSON.parse(cache)
  }
}

Rendering data from KV

Given the presence of data in our code, which is the cached data object for our application, we should actually take this data and make it available on screen.

In our Workers script, we’ll make a new variable, html, and use it to build up a static HTML template that we can serve to the client. In handleRequest, we can construct a new Response (with a Content-Type header of text/html), and serve it to the client:

const html = `
<!DOCTYPE html>
<html>
  <head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width,initial-scale=1">
    <title>Todos</title>
  </head>
  <body>
    <h1>Todos</h1>
  </body>
</html>
`

async function handleRequest(request) {
  const response = new Response(html, {
    headers: { 'Content-Type': 'text/html' }
  })
  return response
}

Building a To-Do List with Workers and KV

We have a static HTML site being rendered, and now we can begin populating it with data! In the body, we’ll add a ul tag with an id of todos:

<body>
  <h1>Todos</h1>
  <ul id="todos"></ul>
</body>

Given that body, we can also add a script after the body that takes a todos array, loops through it, and for each todo in the array, creates a li element and appends it to the todos list:

<script>
  window.todos = [];
  var todoContainer = document.querySelector("#todos");
  window.todos.forEach(todo => {
    var el = document.createElement("li");
    el.innerText = todo.name;
    todoContainer.appendChild(el);
  });
</script>

Our static page can take in window.todos, and render HTML based on it, but we haven’t actually passed in any data from KV. To do this, we’ll need to make a couple changes.

First, our html variable will change to a function. The function will take in an argument, todos, which will populate the window.todos variable in the above code sample:

const html = todos => `
<!doctype html>
<html>
  <!-- ... -->
  <script>
    window.todos = ${todos || []}
    var todoContainer = document.querySelector("#todos");
    // ...
  <script>
</html>
`

In handleRequest, we can use the retrieved KV data to call the html function, and generate a Response based on it:

async function handleRequest(request) {
  let data;
  
  // Set data using cache or defaultData from previous section...
  
  const body = html(JSON.stringify(data.todos))
  const response = new Response(body, {
    headers: { 'Content-Type': 'text/html' }
  })
  return response
}

The finished product looks something like this:

Building a To-Do List with Workers and KV

Adding todos from the UI

At this point, we’ve built a Cloudflare Worker that takes data from Cloudflare KV and renders a static page based on it. That static page reads the data, and generates a todo list based on that data. Of course, the piece we’re missing is creating todos, from inside the UI. We know that we can add todos using the KV API – we could simply update the cache by saying KRISTIAN_TODOS.put(newData), but how do we update it from inside the UI?

It’s worth noting here that Cloudflare’s Workers documentation suggests that any writes to your KV namespace happen via their API – that is, at its simplest form, a cURL statement:

curl "<https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/storage/kv/namespaces/$NAMESPACE_ID/values/first-key>" \
  -X PUT \
  -H "X-Auth-Email: $CLOUDFLARE_EMAIL" \
  -H "X-Auth-Key: $CLOUDFLARE_AUTH_KEY" \
  --data 'My first value!'

We’ll implement something similar by handling a second route in our worker, designed to watch for PUT requests to /. When a body is received at that URL, the worker will send the new todo data to our KV store.

I’ll add this new functionality to my worker, and in handleRequest, if the request method is a PUT, it will take the request body and update the cache:

addEventListener('fetch', event => {
  event.respondWith(handleRequest(event.request))
})

const putInCache = body => {
  const accountId = "$accountId"
  const namespaceId = "$namespaceId"
  const key = "data"
  return fetch(
    `https://api.cloudflare.com/client/v4/accounts/${accountId}/storage/kv/namespaces/${namespaceId}/values/${key}`,
    { 
      method: "PUT",
      body,
      headers: { 
        'X-Auth-Email': '$accountEmail',
        'X-Auth-Key': "$authKey"
      }
    }
  )
}

async function updateTodos(request) {
  const body = await request.text()
  const ip = request.headers.get("CF-Connecting-IP")
  const cacheKey = `data-${ip}`;
  try {
    JSON.parse(body)
    await putInCache(cacheKey, body)
    return new Response(body, { status: 200 })
  } catch (err) {
    return new Response(err, { status: 500 })
  }
}

async function handleRequest(request) {
  if (request.method === "PUT") {
    return updateTodos(request);
  } else {
    // Defined in previous code block
    return getTodos(request);
  }
}

The script is pretty straightforward – we check that the request is a PUT, and wrap the remainder of the code in a try/catch block. First, we parse the body of the request coming in, ensuring that it is JSON, before we update the cache with the new data, and return it to the user. If anything goes wrong, we simply return a 500. If the route is hit with an HTTP method other than PUT – that is, GET, DELETE, or anything else – we return a 404.

With this script, we can now add some “dynamic” functionality to our HTML page to actually hit this route.

First, we’ll create an input for our todo “name”, and a button for “submitting” the todo.

<div>
  <input type="text" name="name" placeholder="A new todo"></input>
  <button id="create">Create</button>
</div>

Given that input and button, we can add a corresponding JavaScript function to watch for clicks on the button – once the button is clicked, the browser will PUT to / and submit the todo.

var createTodo = function() {
  var input = document.querySelector("input[name=name]");
  if (input.value.length) {
    fetch("/", { 
      method: 'PUT', 
      body: JSON.stringify({ todos: todos }) 
    });
  }
};

document.querySelector("#create")
  .addEventListener('click', createTodo);

This code updates the cache, but what about our local UI? Remember that the KV cache is eventually consistent – even if we were to update our worker to read from the cache and return it, we have no guarantees it’ll actually be up-to-date. Instead, let’s just update the list of todos locally, by taking our original code for rendering the todo list, making it a re-usable function called populateTodos, and calling it when the page loads and when the cache request has finished:

var populateTodos = function() {
  var todoContainer = document.querySelector("#todos");
  todoContainer.innerHTML = null;
  window.todos.forEach(todo => {
    var el = document.createElement("li");
    el.innerText = todo.name;
    todoContainer.appendChild(el);
  });
};

populateTodos();

var createTodo = function() {
  var input = document.querySelector("input[name=name]");
  if (input.value.length) {
    todos = [].concat(todos, { 
      id: todos.length + 1, 
      name: input.value,
      completed: false,
    });
    fetch("/", { 
      method: 'PUT', 
      body: JSON.stringify({ todos: todos }) 
    });
    populateTodos();
    input.value = "";
  }
};

document.querySelector("#create")
  .addEventListener('click', createTodo);

With the client-side code in place, deploying the new Worker should put all these pieces together. The result is an actual dynamic todo list!

Building a To-Do List with Workers and KV

Updating todos from the UI

For the final piece of our (very) basic todo list, we need to be able to update todos – specifically, marking them as completed.

Luckily, a great deal of the infrastructure for this work is already in place. We can currently update the todo list data in our cache, as evidenced by our createTodo function. Performing updates on a todo, in fact, is much more of a client-side task than a Worker-side one!

To start, let’s update the client-side code for generating a todo. Instead of a ul-based list, we’ll migrate the todo container and the todos themselves into using divs:

<!-- <ul id="todos"></ul> becomes... -->
<div id="todos"></div>

The populateTodos function can be updated to generate a div for each todo. In addition, we’ll move the name of the todo into a child element of that div:

var populateTodos = function() {
  var todoContainer = document.querySelector("#todos");
  todoContainer.innerHTML = null;
  window.todos.forEach(todo => {
    var el = document.createElement("div");
    var name = document.createElement("span");
    name.innerText = todo.name;
    el.appendChild(name);
    todoContainer.appendChild(el);
  });
}

So far, we’ve designed the client-side part of this code to take an array of todos in, and given that array, render out a list of simple HTML elements. There’s a number of things that we’ve been doing that we haven’t quite had a use for, yet: specifically, the inclusion of IDs, and updating the completed value on a todo. Luckily, these things work well together, in order to support actually updating todos in the UI.

To start, it would be useful to signify the ID of each todo in the HTML. By doing this, we can then refer to the element later, in order to correspond it to the todo in the JavaScript part of our code. Data attributes, and the corresponding dataset method in JavaScript, are a perfect way to implement this. When we generate our div element for each todo, we can simply attach a data attribute called todo to each div:

window.todos.forEach(todo => {
  var el = document.createElement("div");
  el.dataset.todo = todo.id
  // ... more setup

  todoContainer.appendChild(el);
});

Inside our HTML, each div for a todo now has an attached data attribute, which looks like:

<div data-todo="1"></div>
<div data-todo="2"></div>

Now we can generate a checkbox for each todo element. This checkbox will default to unchecked for new todos, of course, but we can mark it as checked as the element is rendered in the window:

window.todos.forEach(todo => {
  var el = document.createElement("div");
  el.dataset.todo = todo.id
  
  var name = document.createElement("span");
  name.innerText = todo.name;
  
  var checkbox = document.createElement("input")
  checkbox.type = "checkbox"
  checkbox.checked = todo.completed ? 1 : 0;

  el.appendChild(checkbox);
  el.appendChild(name);
  todoContainer.appendChild(el);
})

The checkbox is set up to correctly reflect the value of completed on each todo, but it doesn’t yet update when we actually check the box! To do this, we’ll add an event listener on the click event, calling completeTodo. Inside the function, we’ll inspect the checkbox element, finding its parent (the todo div), and using the todo data attribute on it to find the corresponding todo in our data. Given that todo, we can toggle the value of completed, update our data, and re-render the UI:

var completeTodo = function(evt) {
  var checkbox = evt.target;
  var todoElement = checkbox.parentNode;
  
  var newTodoSet = [].concat(window.todos)
  var todo = newTodoSet.find(t => 
    t.id == todoElement.dataset.todo
  );
  todo.completed = !todo.completed;
  todos = newTodoSet;
  updateTodos()
}

The final result of our code is a system that simply checks the todos variable, updates our Cloudflare KV cache with that value, and then does a straightforward re-render of the UI based on the data it has locally.

Building a To-Do List with Workers and KV

Conclusions and next steps

With this, we’ve created a pretty remarkable project: an almost entirely static HTML/JS application, transparently powered by Cloudflare KV and Workers, served at the edge. There’s a number of additions to be made to this application, whether you want to implement a better design (I’ll leave this as an exercise for readers to implement – you can see my version at todo.kristianfreeman.com), security, speed, etc.

Building a To-Do List with Workers and KV

One interesting and fairly trivial addition is implementing per-user caching. Of course, right now, the cache key is simply “data”: anyone visiting the site will share a todo list with any other user. Because we have the request information inside of our worker, it’s easy to make this data user-specific. For instance, implementing per-user caching by generating the cache key based on the requesting IP:

const ip = request.headers.get("CF-Connecting-IP")
const cacheKey = `data-${ip}`;
const getCache = key => KRISTIAN_TODOS.get(key)
getCache(cacheKey)

One more deploy of our Workers project, and we have a full todo list application, with per-user functionality, served at the edge!

The final version of our Workers script looks like this:

const html = todos => `
<!DOCTYPE html>
<html>
  <head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width,initial-scale=1">
    <title>Todos</title>
    <link href="https://cdn.jsdelivr.net/npm/tailwindcss/dist/tailwind.min.css" rel="stylesheet"></link>
  </head>

  <body class="bg-blue-100">
    <div class="w-full h-full flex content-center justify-center mt-8">
      <div class="bg-white shadow-md rounded px-8 pt-6 py-8 mb-4">
        <h1 class="block text-grey-800 text-md font-bold mb-2">Todos</h1>
        <div class="flex">
          <input class="shadow appearance-none border rounded w-full py-2 px-3 text-grey-800 leading-tight focus:outline-none focus:shadow-outline" type="text" name="name" placeholder="A new todo"></input>
          <button class="bg-blue-500 hover:bg-blue-800 text-white font-bold ml-2 py-2 px-4 rounded focus:outline-none focus:shadow-outline" id="create" type="submit">Create</button>
        </div>
        <div class="mt-4" id="todos"></div>
      </div>
    </div>
  </body>

  <script>
    window.todos = ${todos || []}

    var updateTodos = function() {
      fetch("/", { method: 'PUT', body: JSON.stringify({ todos: window.todos }) })
      populateTodos()
    }

    var completeTodo = function(evt) {
      var checkbox = evt.target
      var todoElement = checkbox.parentNode
      var newTodoSet = [].concat(window.todos)
      var todo = newTodoSet.find(t => t.id == todoElement.dataset.todo)
      todo.completed = !todo.completed
      window.todos = newTodoSet
      updateTodos()
    }

    var populateTodos = function() {
      var todoContainer = document.querySelector("#todos")
      todoContainer.innerHTML = null

      window.todos.forEach(todo => {
        var el = document.createElement("div")
        el.className = "border-t py-4"
        el.dataset.todo = todo.id

        var name = document.createElement("span")
        name.className = todo.completed ? "line-through" : ""
        name.innerText = todo.name

        var checkbox = document.createElement("input")
        checkbox.className = "mx-4"
        checkbox.type = "checkbox"
        checkbox.checked = todo.completed ? 1 : 0
        checkbox.addEventListener('click', completeTodo)

        el.appendChild(checkbox)
        el.appendChild(name)
        todoContainer.appendChild(el)
      })
    }

    populateTodos()

    var createTodo = function() {
      var input = document.querySelector("input[name=name]")
      if (input.value.length) {
        window.todos = [].concat(todos, { id: window.todos.length + 1, name: input.value, completed: false })
        input.value = ""
        updateTodos()
      }
    }

    document.querySelector("#create").addEventListener('click', createTodo)
  </script>
</html>
`

const defaultData = { todos: [] }

const setCache = (key, data) => KRISTIAN_TODOS.put(key, data)
const getCache = key => KRISTIAN_TODOS.get(key)

async function getTodos(request) {
  const ip = request.headers.get('CF-Connecting-IP')
  const cacheKey = `data-${ip}`
  let data
  const cache = await getCache(cacheKey)
  if (!cache) {
    await setCache(cacheKey, JSON.stringify(defaultData))
    data = defaultData
  } else {
    data = JSON.parse(cache)
  }
  const body = html(JSON.stringify(data.todos || []))
  return new Response(body, {
    headers: { 'Content-Type': 'text/html' },
  })
}

const putInCache = (cacheKey, body) => {
  const accountId = '$accountId'
  const namespaceId = '$namespaceId'
  return fetch(
    `https://api.cloudflare.com/client/v4/accounts/${accountId}/storage/kv/namespaces/${namespaceId}/values/${cacheKey}`,
    {
      method: 'PUT',
      body,
      headers: {
        'X-Auth-Email': '$cloudflareEmail',
        'X-Auth-Key': '$cloudflareApiKey',
      },
    },
  )
}

async function updateTodos(request) {
  const body = await request.text()
  const ip = request.headers.get('CF-Connecting-IP')
  const cacheKey = `data-${ip}`
  try {
    JSON.parse(body)
    await putInCache(cacheKey, body)
    return new Response(body, { status: 200 })
  } catch (err) {
    return new Response(err, { status: 500 })
  }
}

async function handleRequest(request) {
  if (request.method === 'PUT') {
    return updateTodos(request)
  } else {
    return getTodos(request)
  }
}

addEventListener('fetch', event => {
  event.respondWith(handleRequest(event.request))
})

You can find the source code for this project, as well as a README with deployment instructions, on GitHub.

Get ready to write — Workers KV is now in GA!

Post Syndicated from Ashcon Partovi original https://blog.cloudflare.com/workers-kv-is-ga/

Get ready to write — Workers KV is now in GA!

Today, we’re excited to announce Workers KV is entering general availability and is ready for production use!

Get ready to write — Workers KV is now in GA!

What is Workers KV?

Workers KV is a highly distributed, eventually consistent, key-value store that spans Cloudflare’s global edge. It allows you to store billions of key-value pairs and read them with ultra-low latency anywhere in the world. Now you can build entire applications with the performance of a CDN static cache.

Why did we build it?

Workers is a platform that lets you run JavaScript on Cloudflare’s global edge of 175+ data centers. With only a few lines of code, you can route HTTP requests, modify responses, or even create new responses without an origin server.

// A Worker that handles a single redirect,
// such a humble beginning...
addEventListener("fetch", event => {
  event.respondWith(handleOneRedirect(event.request))
})

async function handleOneRedirect(request) {
  let url = new URL(request.url)
  let device = request.headers.get("CF-Device-Type")
  // If the device is mobile, add a prefix to the hostname.
  // (eg. example.com becomes mobile.example.com)
  if (device === "mobile") {
    url.hostname = "mobile." + url.hostname
    return Response.redirect(url, 302)
  }
  // Otherwise, send request to the original hostname.
  return await fetch(request)
}

Customers quickly came to us with use cases that required a way to store persistent data. Following our example above, it’s easy to handle a single redirect, but what if you want to handle billions of them? You would have to hard-code them into your Workers script, fit it all in under 1 MB, and re-deploy it every time you wanted to make a change — yikes! That’s why we built Workers KV.

// A Worker that can handle billions of redirects,
// now that's more like it!
addEventListener("fetch", event => {
  event.respondWith(handleBillionsOfRedirects(event.request))
})

async function handleBillionsOfRedirects(request) {
  let prefix = "/redirect"
  let url = new URL(request.url)
  // Check if the URL is a special redirect.
  // (eg. example.com/redirect/<random-hash>)
  if (url.pathname.startsWith(prefix)) {
    // REDIRECTS is a custom variable that you define,
    // it binds to a Workers KV "namespace." (aka. a storage bucket)
    let redirect = await REDIRECTS.get(url.pathname.replace(prefix, ""))
    if (redirect) {
      url.pathname = redirect
      return Response.redirect(url, 302)
    }
  }
  // Otherwise, send request to the original path.
  return await fetch(request)
}

With only a few changes from our previous example, we scaled from one redirect to billions − that’s just a taste of what you can build with Workers KV.

How does it work?

Distributed data stores are often modeled using the CAP Theorem, which states that distributed systems can only pick between 2 out of the 3 following guarantees:

  • Consistency – is my data the same everywhere?
  • Availability – is my data accessible all the time?
  • Partition tolerance – is my data stored in multiple locations?

Get ready to write — Workers KV is now in GA!
Diagram of the choices and tradeoffs of the CAP Theorem.

Workers KV chooses to guarantee Availability and Partition tolerance. This combination is known as eventual consistency, which presents Workers KV with two unique competitive advantages:

  • Reads are ultra fast (median of 12 ms) since its powered by our caching technology.
  • Data is available across 175+ edge data centers and resilient to regional outages.

Although, there are tradeoffs to eventual consistency. If two clients write different values to the same key at the same time, the last client to write eventually “wins” and its value becomes globally consistent. This also means that if a client writes to a key and that same client reads that same key, the values may be inconsistent for a short amount of time.

To help visualize this scenario, here’s a real-life example amongst three friends:

  • Suppose Matthew, Michelle, and Lee are planning their weekly lunch.
  • Matthew decides they’re going out for sushi.
  • Matthew tells Michelle their sushi plans, Michelle agrees.
  • Lee, not knowing the plans, tells Michelle they’re actually having pizza.

An hour later, Michelle and Lee are waiting at the pizza parlor while Matthew is sitting alone at the sushi restaurant — what went wrong? We can chalk this up to eventual consistency, because after waiting for a few minutes, Matthew looks at his updated calendar and eventually finds the new truth, they’re going out for pizza instead.

While it may take minutes in real-life, Workers KV is much faster. It can achieve global consistency in less than 60 seconds. Additionally, when a Worker writes to a key, then immediately reads that same key, it can expect the values to be consistent if both operations came from the same location.

When should I use it?

Now that you understand the benefits and tradeoffs of using eventual consistency, how do you determine if it’s the right storage solution for your application? Simply put, if you want global availability with ultra-fast reads, Workers KV is right for you.

However, if your application is frequently writing to the same key, there is an additional consideration. We call it “the Matthew question”: Are you okay with the Matthews of the world occasionally going to the wrong restaurant?

You can imagine use cases (like our redirect Worker example) where this doesn’t make any material difference. But if you decide to keep track of a user’s bank account balance, you would not want the possibility of two balances existing at once, since they could purchase something with money they’ve already spent.

What can I build with it?

Here are a few examples of applications that have been built with KV:

  • Mass redirects – handle billions of HTTP redirects.
  • User authentication – validate user requests to your API.
  • Translation keys – dynamically localize your web pages.
  • Configuration data – manage who can access your origin.
  • Step functions – sync state data between multiple APIs functions.
  • Edge file store – host large amounts of small files.

We’ve highlighted several of those use cases in our previous blog post. We also have some more in-depth code walkthroughs, including a recently published blog post on how to build an online To-do list with Workers KV.

Get ready to write — Workers KV is now in GA!

What’s new since beta?

By far, our most common request was to make it easier to write data to Workers KV. That’s why we’re releasing three new ways to make that experience even better:

1. Bulk Writes

If you want to import your existing data into Workers KV, you don’t want to go through the hassle of sending an HTTP request for every key-value pair. That’s why we added a bulk endpoint to the Cloudflare API. Now you can upload up to 10,000 pairs (up to 100 MB of data) in a single PUT request.

curl "https://api.cloudflare.com/client/v4/accounts/ \
     $ACCOUNT_ID/storage/kv/namespaces/$NAMESPACE_ID/bulk" \
  -X PUT \
  -H "X-Auth-Key: $CLOUDFLARE_AUTH_KEY" \
  -H "X-Auth-Email: $CLOUDFLARE_AUTH_EMAIL" \
  -d '[
    {"key": "built_by",    value: "kyle, alex, charlie, andrew, and brett"},
    {"key": "reviewed_by", value: "joaquin"},
    {"key": "approved_by", value: "steve"}
  ]'

Let’s walk through an example use case: you want to off-load your website translation to Workers. Since you’re reading translation keys frequently and only occasionally updating them, this application works well with the eventual consistency model of Workers KV.

In this example, we hook into Crowdin, a popular platform to manage translation data. This Worker responds to a /translate endpoint, downloads all your translation keys, and bulk writes them to Workers KV so you can read it later on our edge:

addEventListener("fetch", event => {
  if (event.request.url.pathname === "/translate") {
    event.respondWith(uploadTranslations())
  }
})

async function uploadTranslations() {
  // Ask crowdin for all of our translations.
  var response = await fetch(
    "https://api.crowdin.com/api/project" +
    "/:ci_project_id/download/all.zip?key=:ci_secret_key")
  // If crowdin is responding, parse the response into
  // a single json with all of our translations.
  if (response.ok) {
    var translations = await zipToJson(response)
    return await bulkWrite(translations)
  }
  // Return the errored response from crowdin.
  return response
}

async function bulkWrite(keyValuePairs) {
  return fetch(
    "https://api.cloudflare.com/client/v4/accounts" +
    "/:cf_account_id/storage/kv/namespaces/:cf_namespace_id/bulk",
    {
      method: "PUT",
      headers: {
        "Content-Type": "application/json",
        "X-Auth-Key": ":cf_auth_key",
        "X-Auth-Email": ":cf_email"
      },
      body: JSON.stringify(keyValuePairs)
    }
  )
}

async function zipToJson(response) {
  // ... omitted for brevity ...
  // (eg. https://stuk.github.io/jszip)
  return [
    {key: "hello.EN", value: "Hello World"},
    {key: "hello.ES", value: "Hola Mundo"}
  ]
}

Now, when you want to translate a page, all you have to do is read from Workers KV:

async function translate(keys, lang) {
  // You bind your translations namespace to the TRANSLATIONS variable.
  return Promise.all(keys.map(key => TRANSLATIONS.get(key + "." + lang)))
}

2. Expiring Keys

By default, key-value pairs stored in Workers KV last forever. However, sometimes you want your data to auto-delete after a certain amount of time. That’s why we’re introducing the expiration and expirationTtloptions for write operations.

// Key expires 60 seconds from now.
NAMESPACE.put("myKey", "myValue", {expirationTtl: 60})

// Key expires if the UNIX epoch is in the past.
NAMESPACE.put("myKey", "myValue", {expiration: 1247788800})

# You can also set keys to expire from the Cloudflare API.
curl "https://api.cloudflare.com/client/v4/accounts/ \
     $ACCOUNT_ID/storage/kv/namespaces/$NAMESPACE_ID/ \
     values/$KEY?expiration_ttl=$EXPIRATION_IN_SECONDS"
  -X PUT \
  -H "X-Auth-Key: $CLOUDFLARE_AUTH_KEY" \
  -H "X-Auth-Email: $CLOUDFLARE_AUTH_EMAIL" \
  -d "$VALUE"

Let’s say you want to block users that have been flagged as inappropriate from your website, but only for a week. With an expiring key, you can set the expire time and not have to worry about deleting it later.

In this example, we assume users and IP addresses are one of the same. If your application has authentication, you could use access tokens as the key identifier.

addEventListener("fetch", event => {
  var url = new URL(event.request.url)
  // An internal API that blocks a new user IP.
  // (eg. example.com/block/1.2.3.4)
  if (url.pathname.startsWith("/block")) {
    var ip = url.pathname.split("/").pop()
    event.respondWith(blockIp(ip))
  } else {
    // Other requests check if the IP is blocked.
   event.respondWith(handleRequest(event.request))
  }
})

async function blockIp(ip) {
  // Values are allowed to be empty in KV,
  // we don't need to store any extra information anyway.
  await BLOCKED.put(ip, "", {expirationTtl: 60*60*24*7})
  return new Response("ok")
}

async function handleRequest(request) {
  var ip = request.headers.get("CF-Connecting-IP")
  if (ip) {
    var blocked = await BLOCKED.get(ip)
    // If we detect an IP and its blocked, respond with a 403 error.
    if (blocked) {
      return new Response({status: 403, statusText: "You are blocked!"})
    }
  }
  // Otherwise, passthrough the original request.
  return fetch(request)
}

3. Larger Values

We’ve increased our size limit on values from 64 kB to 2 MB. This is quite useful if you need to store buffer-based or file data in Workers KV.

Get ready to write — Workers KV is now in GA!

Consider this scenario: you want to let your users upload their favorite GIF to their profile without having to store these GIFs as binaries in your database or managing another cloud storage bucket.

Workers KV is a great fit for this use case! You can create a Workers KV namespace for your users’ GIFs that is fast and reliable wherever your customers are located.

In this example, users upload a link to their favorite GIF, then a Worker downloads it and stores it to Workers KV.

addEventListener("fetch", event => {
  var url = event.request.url
  var arg = request.url.split("/").pop()
  // User sends a URI encoded link to the GIF they wish to upload.
  // (eg. example.com/api/upload_gif/<encoded-uri>)
  if (url.pathname.startsWith("/api/upload_gif")) {
    event.respondWith(uploadGif(arg))
    // Profile contains link to view the GIF.
    // (eg. example.com/api/view_gif/<username>)
  } else if (url.pathname.startsWith("/api/view_gif")) {
    event.respondWith(getGif(arg))
  }
})

async function uploadGif(url) {
  // Fetch the GIF from the Internet.
  var gif = await fetch(decodeURIComponent(url))
  var buffer = await gif.arrayBuffer()
  // Upload the GIF as a buffer to Workers KV.
  await GIFS.put(user.name, buffer)
  return gif
}

async function getGif(username) {
  var gif = await GIFS.get(username, "arrayBuffer")
  // If the user has set one, respond with the GIF.
  if (gif) {
    return new Response(gif, {headers: {"Content-Type": "image/gif"}})
  } else {
    return new Response({status: 404, statusText: "User has no GIF!"})
  }
}

Lastly, we want to thank all of our beta customers. It was your valuable feedback that led us to develop these changes to Workers KV. Make sure to stay in touch with us, we’re always looking ahead for what’s next and we love hearing from you!

Pricing

We’re also ready to announce our GA pricing. If you’re one of our Enterprise customers, your pricing obviously remains unchanged.

  • $0.50 / GB of data stored, 1 GB included
  • $0.50 / million reads, 10 million included
  • $5 / million write, list, and delete operations, 1 million included

During the beta period, we learned customers don’t want to just read values at our edge, they want to write values from our edge too. Since there is high demand for these edge operations, which are more costly, we have started charging non-read operations per month.

Limits

As mentioned earlier, we increased our value size limit from 64 kB to 2 MB. We’ve also removed our cap on the number of keys per namespace — it’s now unlimited. Here are our GA limits:

  • Up to 20 namespaces per account, each with unlimited keys
  • Keys of up to 512 bytes and values of up to 2 MB
  • Unlimited writes per second for different keys
  • One write per second for the same key
  • Unlimited reads per second per key

Try it out now!

Now open to all customers, you can start using Workers KV today from your Cloudflare dashboard under the Workers tab. You can also look at our updated documentation.

We’re really excited to see what you all can build with Workers KV!

Faster script loading with BinaryAST?

Post Syndicated from Ingvar Stepanyan original https://blog.cloudflare.com/binary-ast/

Faster script loading with BinaryAST?

JavaScript Cold starts

Faster script loading with BinaryAST?

The performance of applications on the web platform is becoming increasingly bottlenecked by the startup (load) time. Large amounts of JavaScript code are required to create rich web experiences that we’ve become used to. When we look at the total size of JavaScript requested on mobile devices from HTTPArchive, we see that an average page loads 350KB of JavaScript, while 10% of pages go over the 1MB threshold. The rise of more complex applications can push these numbers even higher.

While caching helps, popular websites regularly release new code, which makes cold start (first load) times particularly important. With browsers moving to separate caches for different domains to prevent cross-site leaks, the importance of cold starts is growing even for popular subresources served from CDNs, as they can no longer be safely shared.

Usually, when talking about the cold start performance, the primary factor considered is a raw download speed. However, on modern interactive pages one of the other big contributors to cold starts is JavaScript parsing time. This might seem surprising at first, but makes sense – before starting to execute the code, the engine has to first parse the fetched JavaScript, make sure it doesn’t contain any syntax errors and then compile it to the initial bytecode. As networks become faster, parsing and compilation of JavaScript could become the dominant factor.

Faster script loading with BinaryAST?

Faster script loading with BinaryAST?

The device capability (CPU or memory performance) is the most important factor in the variance of JavaScript parsing times and correspondingly the time to application start. A 1MB JavaScript file will take an order of a 100 ms to parse on a modern desktop or high-end mobile device but can take over a second on an average phone  (Moto G4).

A more detailed post on the overall cost of parsing, compiling and execution of JavaScript shows how the JavaScript boot time can vary on different mobile devices. For example, in the case of news.google.com, it can range from 4s on a Pixel 2 to 28s on a low-end device.

While engines continuously improve raw parsing performance, with V8 in particular doubling it over the past year, as well as moving more things off the main thread, parsers still have to do lots of potentially unnecessary work that consumes memory, battery and might delay the processing of the useful resources.

The “BinaryAST” Proposal

This is where BinaryAST comes in. BinaryAST is a new over-the-wire format for JavaScript proposed and actively developed by Mozilla that aims to speed up parsing while keeping the semantics of the original JavaScript intact. It does so by using an efficient binary representation for code and data structures, as well as by storing and providing extra information to guide the parser ahead of time.

The name comes from the fact that the format stores the JavaScript source as an AST encoded into a binary file. The specification lives at tc39.github.io/proposal-binary-ast and is being worked on by engineers from Mozilla, Facebook, Bloomberg and Cloudflare.

“Making sure that web applications start quickly is one of the most important, but also one of the most challenging parts of web development. We know that BinaryAST can radically reduce startup time, but we need to collect real-world data to demonstrate its impact. Cloudflare’s work on enabling use of BinaryAST with Cloudflare Workers is an important step towards gathering this data at scale.”

Till Schneidereit, Senior Engineering Manager, Developer Technologies
Mozilla

Parsing JavaScript

For regular JavaScript code to execute in a browser the source is parsed into an intermediate representation known as an AST that describes the syntactic structure of the code. This representation can then be compiled into a byte code or a native machine code for execution.

Faster script loading with BinaryAST?

A simple example of adding two numbers can be represented in an AST as:

Faster script loading with BinaryAST?

Parsing JavaScript is not an easy task; no matter which optimisations you apply, it still requires reading the entire text file char by char, while tracking extra context for syntactic analysis.

The goal of the BinaryAST is to reduce the complexity and the amount of work the browser parser has to do overall by providing an additional information and context by the time and place where the parser needs it.

To execute JavaScript delivered as BinaryAST the only steps required are:

Faster script loading with BinaryAST?

Another benefit of BinaryAST is that it makes possible to only parse the critical code necessary for start-up, completely skipping over the unused bits. This can dramatically improve the initial loading time.

Faster script loading with BinaryAST?

Faster script loading with BinaryAST?

Faster script loading with BinaryAST?

Faster script loading with BinaryAST?

This post will now describe some of the challenges of parsing JavaScript in more detail, explain how the proposed format addressed them, and how we made it possible to run its encoder in Workers.

Hoisting

JavaScript relies on hoisting for all declarations – variables, functions, classes. Hoisting is a property of the language that allows you to declare items after the point they’re syntactically used.

Let’s take the following example:

function f() {
	return g();
}

function g() {
	return 42;
}

Here, when the parser is looking at the body of f, it doesn’t know yet what g is referring to – it could be an already existing global function or something declared further in the same file – so it can’t finalise parsing of the original function and start the actual compilation.

BinaryAST fixes this by storing all the scope information and making it available upfront before the actual expressions.

Faster script loading with BinaryAST?

As shown by the difference between the initial AST and the enhanced AST in a JSON representation:

Faster script loading with BinaryAST?

Lazy parsing

One common technique used by modern engines to improve parsing times is lazy parsing. It utilises the fact that lots of websites include more JavaScript than they actually need, especially for the start-up.

Working around this involves a set of heuristics that try to guess when any given function body in the code can be safely skipped by the parser initially and delayed for later. A common example of such heuristic is immediately running the full parser for any function that is wrapped into parentheses:

(function(...

Such prefix usually indicates that a following function is going to be an IIFE (immediately-invoked function expression), and so the parser can assume that it will be compiled and executed ASAP, and wouldn’t benefit from being skipped over and delayed for later.

(function() {
	…
})();

These heuristics significantly improve the performance of the initial parsing and cold starts, but they’re not completely reliable or trivial to implement.

One of the reasons is the same as in the previous section – even with lazy parsing, you still need to read the contents, analyse them and store an additional scope information for the declarations.

Another reason is that the JavaScript specification requires reporting any syntax errors immediately during load time, and not when the code is actually executed. A class of these errors, called early errors, is checking for mistakes like usage of the reserved words in invalid contexts, strict mode violations, variable name clashes and more. All of these checks require not only lexing JavaScript source, but also tracking extra state even during the lazy parsing.

Having to do such extra work means you need to be careful about marking functions as lazy too eagerly, especially if they actually end up being executed during the page load. Otherwise you’re making cold start costs even worse, as now every function that is erroneously marked as lazy, needs to be parsed twice – once by the lazy parser and then again by the full one.

Because BinaryAST is meant to be an output format of other tools such as Babel, TypeScript and bundlers such as Webpack, the browser parser can rely on the JavaScript being already analysed and verified by the initial parser. This allows it to skip function bodies completely, making lazy parsing essentially free.

It reduces the cost of a completely unused code – while including it is still a problem in terms of the network bandwidth (don’t do this!), at least it’s not affecting parsing times anymore. These benefits apply equally to the code that is used later in the page lifecycle (for example, invoked in response to user actions), but is not required during the startup.

Last but not least important benefit of such approach is that BinaryAST encodes lazy annotations as part of the format, giving tools and developers direct and full control over the heuristics. For example, a tool targeting the Web platform or a framework CLI can use its domain-specific knowledge to mark some event handlers as lazy or eager depending on the context and the event type.

Avoiding ambiguity in parsing

Using a text format for a programming language is great for readability and debugging, but it’s not the most efficient representation for parsing and execution.

For example, parsing low-level types like numbers, booleans and even strings from text requires extra analysis and computation, which is unnecessary when you can just store and read them as native binary-encoded values in the first place and read directly on the other side.

Another problem is an ambiguity in the grammar itself. It was already an issue in the ES5 world, but could usually be resolved with some extra bookkeeping based on the previously seen tokens. However, in ES6+ there are productions that can be ambiguous all the way through until they’re parsed completely.

For example, a token sequence like:

(a, {b: c, d}, [e = 1])...

can start either a parenthesized comma expression with nested object and array literals and an assignment:

(a, {b: c, d}, [e = 1]); // it was an expression

or a parameter list of an arrow expression function with nested object and array patterns and a default value:

(a, {b: c, d}, [e = 1]) => … // it was a parameter list

Both representations are perfectly valid, but have completely different semantics, and you can’t know which one you’re dealing with until you see the final token.

To work around this, parsers usually have to either backtrack, which can easily get exponentially slow, or to parse contents into intermediate node types that are capable of holding both expressions and patterns, with following conversion. The latter approach preserves linear performance, but makes the implementation more complicated and requires preserving more state.

In the BinaryAST format this issue doesn’t exist in the first place because the parser sees the type of each node before it even starts parsing its contents.

Cloudflare Implementation

Currently, the format is still in flux, but the very first version of the client-side implementation was released under a flag in Firefox Nightly several months ago. Keep in mind this is only an initial unoptimised prototype, and there are already several experiments changing the format to provide improvements to both size and parsing performance.

On the producer side, the reference implementation lives at github.com/binast/binjs-ref. Our goal was to take this reference implementation and consider how we would deploy it at Cloudflare scale.

If you dig into the codebase, you will notice that it currently consists of two parts.

Faster script loading with BinaryAST?

One is the encoder itself, which is responsible for taking a parsed AST, annotating it with scope and other relevant information, and writing out the result in one of the currently supported formats. This part is written in Rust and is fully native.

Another part is what produces that initial AST – the parser. Interestingly, unlike the encoder, it’s implemented in JavaScript.

Unfortunately, there is currently no battle-tested native JavaScript parser with an open API, let alone implemented in Rust. There have been a few attempts, but, given the complexity of JavaScript grammar, it’s better to wait a bit and make sure they’re well-tested before incorporating it into the production encoder.

On the other hand, over the last few years the JavaScript ecosystem grew to extensively rely on developer tools implemented in JavaScript itself. In particular, this gave a push to rigorous parser development and testing. There are several JavaScript parser implementations that have been proven to work on thousands of real-world projects.

With that in mind, it makes sense that the BinaryAST implementation chose to use one of them – in particular, Shift – and integrated it with the Rust encoder, instead of attempting to use a native parser.

Connecting Rust and JavaScript

Integration is where things get interesting.

Rust is a native language that can compile to an executable binary, but JavaScript requires a separate engine to be executed. To connect them, we need some way to transfer data between the two without sharing the memory.

Initially, the reference implementation generated JavaScript code with an embedded input on the fly, passed it to Node.js and then read the output when the process had finished. That code contained a call to the Shift parser with an inlined input string and produced the AST back in a JSON format.

This doesn’t scale well when parsing lots of JavaScript files, so the first thing we did is transformed the Node.js side into a long-living daemon. Now Rust could spawn a required Node.js process just once and keep passing inputs into it and getting responses back as individual messages.

Faster script loading with BinaryAST?

Running in the cloud

While the Node.js solution worked fairly well after these optimisations, shipping both a Node.js instance and a native bundle to production requires some effort. It’s also potentially risky and requires manual sandboxing of both processes to make sure we don’t accidentally start executing malicious code.

On the other hand, the only thing we needed from Node.js is the ability to run the JavaScript parser code. And we already have an isolated JavaScript engine running in the cloud – Cloudflare Workers! By additionally compiling the native Rust encoder to Wasm (which is quite easy with the native toolchain and wasm-bindgen), we can even run both parts of the code in the same process, making cold starts and communication much faster than in a previous model.

Faster script loading with BinaryAST?

Optimising data transfer

The next logical step is to reduce the overhead of data transfer. JSON worked fine for communication between separate processes, but with a single process we should be able to retrieve the required bits directly from the JavaScript-based AST.

To attempt this, first of all, we needed to move away from the direct JSON usage to something more generic that would allow us to support various import formats. The Rust ecosystem already has an amazing serialisation framework for that – Serde.

Aside from allowing us to be more flexible in regard to the inputs, rewriting to Serde helped an existing native use case too. Now, instead of parsing JSON into an intermediate representation and then walking through it, all the native typed AST structures can be deserialized directly from the stdout pipe of the Node.js process in a streaming manner. This significantly improved both the CPU usage and memory pressure.

But there is one more thing we can do: instead of serializing and deserializing from an intermediate format (let alone, a text format like JSON), we should be able to operate [almost] directly on JavaScript values, saving memory and repetitive work.

How is this possible? wasm-bindgen provides a type called JsValue that stores a handle to an arbitrary value on the JavaScript side. This handle internally contains an index into a predefined array.

Each time a JavaScript value is passed to the Rust side as a result of a function call or a property access, it’s stored in this array and an index is sent to Rust. The next time Rust wants to do something with that value, it passes the index back and the JavaScript side retrieves the original value from the array and performs the required operation.

By reusing this mechanism, we could implement a Serde deserializer that requests only the required values from the JS side and immediately converts them to their native representation. It’s now open-sourced under https://github.com/cloudflare/serde-wasm-bindgen.

Faster script loading with BinaryAST?

At first, we got a much worse performance out of this due to the overhead of more frequent calls between 1) Wasm and JavaScript – SpiderMonkey has improved these recently, but other engines still lag behind and 2) JavaScript and C++, which also can’t be optimised well in most engines.

The JavaScript <-> C++ overhead comes from the usage of TextEncoder to pass strings between JavaScript and Wasm in wasm-bindgen, and, indeed, it showed up as the highest in the benchmark profiles. This wasn’t surprising – after all, strings can appear not only in the value payloads, but also in property names, which have to be serialized and sent between JavaScript and Wasm over and over when using a generic JSON-like structure.

Luckily, because our deserializer doesn’t have to be compatible with JSON anymore, we can use our knowledge of Rust types and cache all the serialized property names as JavaScript value handles just once, and then keep reusing them for further property accesses.

This, combined with some changes to wasm-bindgen which we have upstreamed, allows our deserializer to be up to 3.5x faster in benchmarks than the original Serde support in wasm-bindgen, while saving ~33% off the resulting code size. Note that for string-heavy data structures it might still be slower than the current JSON-based integration, but situation is expected to improve over time when reference types proposal lands natively in Wasm.

After implementing and integrating this deserializer, we used the wasm-pack plugin for Webpack to build a Worker with both Rust and JavaScript parts combined and shipped it to some test zones.

Show me the numbers

Keep in mind that this proposal is in very early stages, and current benchmarks and demos are not representative of the final outcome (which should improve numbers much further).

As mentioned earlier, BinaryAST can mark functions that should be parsed lazily ahead of time. By using different levels of lazification in the encoder (https://github.com/binast/binjs-ref/blob/b72aff7dac7c692a604e91f166028af957cdcda5/crates/binjs_es6/src/lazy.rs#L43) and running tests against some popular JavaScript libraries, we found following speed-ups.

Level 0 (no functions are lazified)

With lazy parsing disabled in both parsers we got a raw parsing speed improvement of between 3 and 10%.

NameSource size (kb)JavaScript Parse time (average ms)BinaryAST parse time (average ms)Diff (%)
React200.4030.385-4.56
D3 (v5)24011.17810.525-6.018
Angular1806.9856.331-9.822
Babel78021.25520.599-3.135
Backbone320.7750.699-10.312
wabtjs172064.83659.556-8.489
Fuzzball (1.2)723.1652.768-13.383

Level 3 (functions up to 3 levels deep are lazified)

But with the lazification set to skip nested functions of up to 3 levels we see much more dramatic improvements in parsing time between 90 and 97%. As mentioned earlier in the post, BinaryAST makes lazy parsing essentially free by completely skipping over the marked functions.

NameSource size (kb)Parse time (average ms)BinaryAST parse time (average ms)Diff (%)
React200.4070.032-92.138
D3 (v5)24011.6230.224-98.073
Angular1807.0930.680-90.413
Babel78021.1000.895-95.758
Backbone320.8980.045-94.989
wabtjs172059.8021.601-97.323
Fuzzball (1.2)722.9370.089-96.970

All the numbers are from manual tests on a Linux x64 Intel i7 with 16Gb of ram.

While these synthetic benchmarks are impressive, they are not representative of real-world scenarios. Normally you will use at least some of the loaded JavaScript during the startup. To check this scenario, we decided to test some realistic pages and demos on desktop and mobile Firefox and found speed-ups in page loads too.

For a sample application (https://github.com/cloudflare/binjs-demo, https://serve-binjs.that-test.site/) which weighed in at around 1.2 MB of JavaScript we got the following numbers for initial script execution:

DeviceJavaScriptBinaryAST
Desktop338ms314ms
Mobile (HTC One M8)2019ms1455ms

Here is a video that will give you an idea of the improvement as seen by a user on mobile Firefox (in this case showing the entire page startup time):

Faster script loading with BinaryAST?

Next step is to start gathering data on real-world websites, while improving the underlying format.

How do I test BinaryAST on my website?

We’ve open-sourced our Worker so that it could be installed on any Cloudflare zone: https://github.com/binast/binjs-ref/tree/cf-wasm.

One thing to be currently wary of is that, even though the result gets stored in the cache, the initial encoding is still an expensive process, and might easily hit CPU limits on any non-trivial JavaScript files and fall back to the unencoded variant. We are working to improve this situation by releasing BinaryAST encoder as a separate feature with more relaxed limits in the following few days.

Meanwhile, if you want to play with BinaryAST on larger real-world scripts, an alternative option is to use a static binjs_encode tool from https://github.com/binast/binjs-ref to pre-encode JavaScript files ahead of time. Then, you can use a Worker from https://github.com/cloudflare/binast-cf-worker to serve the resulting BinaryAST assets when supported and requested by the browser.

On the client side, you’ll currently need to download Firefox Nightly, go to about:config and enable unrestricted BinaryAST support via the following options:

Faster script loading with BinaryAST?

Now, when opening a website with either of the Workers installed, Firefox will get BinaryAST instead of JavaScript automatically.

Summary

The amount of JavaScript in modern apps is presenting performance challenges for all consumers. Engine vendors are experimenting with different ways to improve the situation – some are focusing on raw decoding performance, some on parallelizing operations to reduce overall latency, some are researching new optimised formats for data representation, and some are inventing and improving protocols for the network delivery.

No matter which one it is, we all have a shared goal of making the Web better and faster. On Cloudflare’s side, we’re always excited about collaborating with all the vendors and combining various approaches to make that goal closer with every step.

Unit Testing Workers, in Cloudflare Workers

Post Syndicated from Tim Obezuk original https://blog.cloudflare.com/unit-testing-workers-in-cloudflare-workers/

Unit Testing Workers, in Cloudflare Workers

Unit Testing Workers, in Cloudflare Workers

We recently wrote about unit testing Cloudflare Workers within a mock environment using CloudWorker (a Node.js based mock Cloudflare Worker environment created by Dollar Shave Club’s engineering team). See Unit Testing Worker Functions.

Even though Cloudflare Workers deploy globally within seconds, software developers often choose to use local mock environments to have the fastest possible feedback loop while developing on their local machines. CloudWorker is perfect for this use case but as it is still a mock environment it does not guarantee an identical runtime or environment with all Cloudflare Worker APIs and features. This gap can make developers uneasy as they do not have 100% certainty that their tests will succeed in the production environment.

In this post, we’re going to demonstrate how to generate a Cloudflare Worker compatible test harness which can execute mocha unit tests directly in the production Cloudflare environment.

Directory Setup

Create a new folder for your project, change it to your working directory and run npm init to initialise the package.json file.

Run mkdir -p src && mkdir -p test/lib && mkdir dist to create folders used by the next steps. Your folder should look like this:

.
./dist
./src/worker.js
./test
./test/lib
./package.json

npm install --save-dev mocha exports-loader webpack webpack-cli

This will install Mocha (the unit testing framework), Webpack (a tool used to package the code into a single Worker script) and Exports Loader (a tool used by Webpack to import the Worker script into the Worker based Mocha environment.

npm install --save-dev git+https://github.com/obezuk/mocha-loader.git

This will install a modified version of Webpack’s mocha loader. It has been modified to support the Web Worker environment type. We are excited to see Web Worker support merged into Mocha Loader so please vote for our pull request here: https://github.com/webpack-contrib/mocha-loader/pull/77

Example Script

Create your Worker script in ./src/worker.js:

addEventListener('fetch', event => {
 event.respondWith(handleRequest(event.request))
})

async function addition(a, b) {
  return a + b
}

async function handleRequest(request) {
  const added = await addition(1,3)
  return new Response(`The Sum is ${added}!`)
}

Add Tests

Create your unit tests in ./test/test.test.js:

const assert = require('assert')

describe('Worker Test', function() {

    it('returns a body that says The Sum is 4', async function () {
        let url = new URL('https://worker.example.com')
        let req = new Request(url)
        let res = await handleRequest(req)
        let body = await res.text()
        assert.equal(body, 'The Sum is 4!')
    })

    it('does addition properly', async function() {
        let res = await addition(1, 1)
        assert.equal(res, 2)
    })

})

Mocha in Worker Test Harness

In order to execute mocha and unit tests within Cloudflare Workers we are going to build a Test Harness. The Test Harness script looks a lot like a normal Worker script but integrates your ./src/worker.js and ./test/test.test.js into a script which is capable of executing the Mocha unit tests within the Cloudflare Worker runtime.

Create the below script in ./test/lib/serviceworker-mocha-harness.js.

import 'mocha';

import 'mocha-loader!../test.test.js';

var testResults;

async function mochaRun() {
    return new Promise(function (accept, reject) {
        var runner = mocha.run(function () {
            testResults = runner.testResults;
            accept();
        });
    });
}

addEventListener('fetch', event => {
    event.respondWith(handleMochaRequest(event.request))
});

async function handleMochaRequest(request) {

    if (!testResults) {
        await mochaRun();
    }

    var headers = new Headers({
        "content-type": "application/json"
    })

    var statusCode = 200;

    if (testResults.failures != 0) {
        statusCode = 500;
    }

    return new Response(JSON.stringify(testResults), {
        "status": statusCode,
        "headers": headers
    });

}

Object.assign(global, require('exports-loader?handleRequest,addition!../../src/worker.js'));

Mocha Webpack Configuration

Create a new file in the project root directory called: ./webpack.mocha.config.js. This file is used by Webpack to bundle the test harness, worker script and unit tests into a single script that can be deployed to Cloudflare.

module.exports = {
  target: 'webworker',
  entry: "./test/lib/serviceworker-mocha-harness.js",
  mode: "development",
  optimization: {
    minimize: false
  },
  performance: {
    hints: false
  },
  node: {
    fs: 'empty'
  },
  module: {
    exprContextCritical: false
  },
  output: {
    path: __dirname + "/dist",
    publicPath: "dist",
    filename: "worker-mocha-harness.js"
  }
};

Your file structure should look like (excluding node_modules):

.
./dist
./src/worker.js
./test/test.test.js
./test/lib/serviceworker-mocha-harness.js
./package.json
./package-lock.json
./webpack.mocha.config.js

Customising the test harness.

If you wish to extend the test harness to support your own test files you will need to add additional test imports to the top of the script:

import 'mocha-loader!/* TEST FILE NAME HERE */'

If you wish to import additional functions from your Worker script into the test harness environment you will need to add them comma separated into the last line:

Object.assign(global, require('exports-loader?/* COMMA SEPARATED FUNCTION NAMES HERE */!../../src/worker.js'));

Running the test harness

Deploying and running the test harness is identical to deploying any other Worker script with Webpack.

Modify the scripts section of package.json to include the build-harness command.

"scripts": {
  "build-harness": "webpack --config webpack.mocha.config.js -p --progress --colors"
}

In the project root directory run the command npm run build-harness to generate and bundle your Worker script, Mocha and your unit tests into ./dist/worker-mocha-harness.js.

Upload this script to a test Cloudflare workers route and run curl --fail https://test.example.org. If the unit tests are successful it will return a 200 response, and if the unit tests fail a 500 response.

Integrating into an existing CI/CD pipeline

You can integrate Cloudflare Workers and the test harness into your existing CI/CD pipeline by using our API: https://developers.cloudflare.com/workers/api/.

The test harness returns detailed test reports in JSON format:

Example Success Response

{
  "stats": {
    "suites": 1,
    "tests": 2,
    "passes": 2,
    "pending": 0,
    "failures": 0,
    "start": "2019-04-23T06:24:33.492Z",
    "end": "2019-04-23T06:24:33.590Z",
    "duration": 98
  },
  "tests": [
    {
      "title": "returns a body that says The Sum is 4",
      "fullTitle": "Worker Test returns a body that says The Sum is 4",
      "duration": 0,
      "currentRetry": 0,
      "err": {}
    },
    {
      "title": "does addition properly",
      "fullTitle": "Worker Test does addition properly",
      "duration": 0,
      "currentRetry": 0,
      "err": {}
    }
  ],
  "pending": [],
  "failures": [],
  "passes": [
    {
      "title": "returns a body that says The Sum is 4",
      "fullTitle": "Worker Test returns a body that says The Sum is 4",
      "duration": 0,
      "currentRetry": 0,
      "err": {}
    },
    {
      "title": "does addition properly",
      "fullTitle": "Worker Test does addition properly",
      "duration": 0,
      "currentRetry": 0,
      "err": {}
    }
  ]
}

Example Failure Response

{
  "stats": {
    "suites": 1,
    "tests": 2,
    "passes": 0,
    "pending": 0,
    "failures": 2,
    "start": "2019-04-23T06:25:52.100Z",
    "end": "2019-04-23T06:25:52.170Z",
    "duration": 70
  },
  "tests": [
    {
      "title": "returns a body that says The Sum is 4",
      "fullTitle": "Worker Test returns a body that says The Sum is 4",
      "duration": 0,
      "currentRetry": 0,
      "err": {
        "name": "AssertionError",
        "actual": "The Sum is 5!",
        "expected": "The Sum is 4!",
        "operator": "==",
        "message": "'The Sum is 5!' == 'The Sum is 4!'",
        "generatedMessage": true,
        "stack": "AssertionError: 'The Sum is 5!' == 'The Sum is 4!'\n    at Context.<anonymous> (worker.js:19152:16)"
      }
    },
    {
      "title": "does addition properly",
      "fullTitle": "Worker Test does addition properly",
      "duration": 0,
      "currentRetry": 0,
      "err": {
        "name": "AssertionError",
        "actual": "3",
        "expected": "2",
        "operator": "==",
        "message": "3 == 2",
        "generatedMessage": true,
        "stack": "AssertionError: 3 == 2\n    at Context.<anonymous> (worker.js:19157:16)"
      }
    }
  ],
  "pending": [],
  "failures": [
    {
      "title": "returns a body that says The Sum is 4",
      "fullTitle": "Worker Test returns a body that says The Sum is 4",
      "duration": 0,
      "currentRetry": 0,
      "err": {
        "name": "AssertionError",
        "actual": "The Sum is 5!",
        "expected": "The Sum is 4!",
        "operator": "==",
        "message": "'The Sum is 5!' == 'The Sum is 4!'",
        "generatedMessage": true,
        "stack": "AssertionError: 'The Sum is 5!' == 'The Sum is 4!'\n    at Context.<anonymous> (worker.js:19152:16)"
      }
    },
    {
      "title": "does addition properly",
      "fullTitle": "Worker Test does addition properly",
      "duration": 0,
      "currentRetry": 0,
      "err": {
        "name": "AssertionError",
        "actual": "3",
        "expected": "2",
        "operator": "==",
        "message": "3 == 2",
        "generatedMessage": true,
        "stack": "AssertionError: 3 == 2\n    at Context.<anonymous> (worker.js:19157:16)"
      }
    }
  ],
  "passes": []
}

This is really powerful and can allow you to execute your unit tests directly in the Cloudflare runtime, giving you more confidence before releasing your code into production. We hope this was useful and welcome any feedback.

Rapid Development of Serverless Chatbots with Cloudflare Workers and Workers KV

Post Syndicated from Steven Pack original https://blog.cloudflare.com/rapid-development-of-serverless-chatbots-with-cloudflare-workers-and-workers-kv/

Rapid Development of Serverless Chatbots with Cloudflare Workers and Workers KV

Rapid Development of Serverless Chatbots with Cloudflare Workers and Workers KV

I’m the Product Manager for the Application Services team here at Cloudflare. We recently identified a need for a new tool around service ownership. As a fast growing engineering organization, ownership of services changes fairly frequently. Many cycles get burned in chat with questions like "Who owns service x now?

Whilst it’s easy to see how a tool like this saves a few seconds per day for the asker and askee, and saves on some mental context switches, the time saved is unlikely to add up to the cost of development and maintenance.

= 5 minutes per day
x 260 work days 
= 1300 mins 
/ 60 mins 
= 20 person hours per year

So a 20 hour investment in that tool would pay itself back in a year valuing everyone’s time the same. While we’ve made great strides in improving the efficiency of building tools at Cloudflare, 20 hours is a stretch for an end-to-end build, deploy and operation of a new tool.

Enter Cloudflare Workers + Workers KV

The more I use Serverless and Workers, the more I’m struck with the benefits of:

1. Reduced operational overhead

When I upload a Worker, it’s automatically distributed to 175+ data centers. I don’t have to be worried about uptime – it will be up, and it will be fast.

2. Reduced dev time

With operational overhead largely removed, I’m able to focus purely on code. A constrained problem space like this lends itself really well to Workers. I reckon we can knock this out in well under 20 hours.

Requirements

At Cloudflare, people ask these questions in Chat, so that’s a natural interface to service ownership. Here’s the spec:

Use CaseInputOutput
Add@ownerbot add Jira IT http://chat.google.com/room/ABC123Service added
Delete@ownerbot delete JiraService deleted
Question@ownerbot KibanaSRE Core owns Kibana. The room is: http://chat.google.com/ABC123
Export@ownerbot export[{name: "Kibana", owner: "SRE Core"...}]

Hello @ownerbot

Following the Hangouts Chat API Guide, let’s start with a hello world bot.

  1. To configure the bot, go to the Publish page and scroll down to the Enable The API button:

  2. Enter the bot name

  3. Download the private key json file

  4. Go to the API Console

  5. Search for the Hangouts Chat API (Note: not the Google+ Hangouts API)

    Rapid Development of Serverless Chatbots with Cloudflare Workers and Workers KV

  6. Click Configuration onthe left menu

  7. Fill out the form as per below [1]

    • Use a hard to guess URL. I generate a guid and use that in the url.
    • The URL will be the route you associate with your Worker in the Dashboard
      Rapid Development of Serverless Chatbots with Cloudflare Workers and Workers KV
  8. Click Save

So Google Chat should know about our bot now. Back in Google Chat, click in the "Find people, rooms, bots" textbox and choose "Message a Bot". Your bot should show up in the search:

Rapid Development of Serverless Chatbots with Cloudflare Workers and Workers KV

It won’t be too useful just yet, as we need to create our Worker to receive the messages and respond!

The Worker

In the Workers dashboard, create a script and associate with the route you defined in step #7 (the one with the guid). It should look something like below. [2]

Rapid Development of Serverless Chatbots with Cloudflare Workers and Workers KV

The Google Chatbot interface is pretty simple, but weirdly obfuscated in the Hangouts API guide IMHO. You have to reverse engineer the python example.

Basically, if we message our bot like @ownerbot-blog Kibana, we’ll get a message like this:

  {
    "type": "MESSAGE",
    "message": {
      "argumentText": "Kibana"
    }
  }

To respond, we need to respond with 200 OK and JSON body like this:

content-length: 27
content-type: application/json

{"text":"Hello chat world"}

So, the minimum Chatbot Worker looks something like this:

addEventListener('fetch', event => { event.respondWith(process(event.request)) });

function process(request) {
  let body = {
	text: "Hello chat world"
  }
  return new Response(JSON.stringify(body), {
    status: 200,
    headers: {
        "Content-Type": "application/json",
        "Cache-Control": "no-cache"
    }
  });
}

Save and deploy that and we should be able message our bot:

Rapid Development of Serverless Chatbots with Cloudflare Workers and Workers KV

Success!

Implementation

OK, on to the meat of the code. Based on the requirements, I see a need for an AddCommand, QueryCommand, DeleteCommand and HelpCommand. I also see some sort of ServiceDirectory that knows how to add, delete and retrieve services.

I created a CommandFactory which accepts a ServiceDirectory, as well as an implementation of a KV store, which will be Workers KV in production, but I’ll mock out in tests.

class CommandFactory {
    constructor(serviceDirectory, kv) {
        this.serviceDirectory = serviceDirectory;
        this.kv = kv;
    }

    create(argumentText) {
        let parts = argumentText.split(' ');
        let primary = parts[0];       
        
        switch (primary) {
            case "add":
                return new AddCommand(argumentText, this.serviceDirectory, this.kv);
            case "delete":
                return new DeleteCommand(argumentText, this.serviceDirectory, this.kv);
            case "help":
                return new HelpCommand(argumentText, this.serviceDirectory, this.kv);
            default:
                return new QueryCommand(argumentText, this.serviceDirectory, this.kv);
        }
    }
}

OK, so if we receive a message like @ownerbot add, we’ll interpret it as an AddCommand, but if it’s not something we recognize, we’ll assume it’s a QueryCommand like @ownerbot Kibana which makes it easy to parse commands.

OK, our commands need a service directory, which will look something like this:

class ServiceDirectory {     
    get(serviceName) {...}
    async add(service) {...}
    async delete(serviceName) {...}
    find(serviceName) {...}
    getNames() {...}
}

Let’s build some commands. Oh, and my chatbot is going to be Ultima IV themed, because… reasons.

class AddCommand extends Command {

    async respond() {
        let cmdParts = this.commandParts;
        if (cmdParts.length !== 6) {
            return new OwnerbotResponse("Adding a service requireth Name, Owner, Room Name and Google Chat Room Url.", false);
        }
        let name = this.commandParts[1];
        let owner = this.commandParts[2];
        let room = this.commandParts[3];
        let url = this.commandParts[4];
        let aliasesPart = this.commandParts[5];
        let aliases = aliasesPart.split(' ');
        let service = {
            name: name,
            owner: owner,
            room: room,
            url: url,
            aliases: aliases
        }
        await this.serviceDirectory.add(service);
        return new OwnerbotResponse(`My codex of knowledge has expanded to contain knowledge of ${name}. Congratulations virtuous Paladin.`);
    }
}

The nice thing about the Command pattern for chatbots, is you can encapsulate the logic of each command for testing, as well as compose series of commands together to test out conversations. Later, we could extend it to support undo. Let’s test the AddCommand

  it('requires all args', async function() {
            let addCmd = new AddCommand("add AdminPanel 'Internal Tools' 'Internal Tools'", dir, kv); //missing url            
            let res = await addCmd.respond();
            console.log(res.text);
            assert.equal(res.success, false, "Adding with missing args should fail");            
        });

        it('returns success for all args', async function() {
            let addCmd = new AddCommand("add AdminPanel 'Internal Tools' 'Internal Tools Room' 'http://chat.google.com/roomXYZ'", dir, kv);            
            let res = await addCmd.respond();
            console.debug(res.text);
            assert.equal(res.success, true, "Should have succeeded with all args");            
        });
$ mocha -g "AddCommand"
  AddCommand
    add
      ✓ requires all args
      ✓ returns success for all args
  2 passing (19ms)

So far so good. But adding commands to our ownerbot isn’t going to be so useful unless we can query them.

class QueryCommand extends Command {
    async respond() {
        let service = this.serviceDirectory.get(this.argumentText);
        if (service) {
            return new OwnerbotResponse(`${service.owner} owns ${service.name}. Seeketh thee room ${service.room} - ${service.url})`);
        }
        let serviceNames = this.serviceDirectory.getNames().join(", ");
        return new OwnerbotResponse(`I knoweth not of that service. Thou mightst asketh me of: ${serviceNames}`);
    }
}

Let’s write a test that runs an AddCommand followed by a QueryCommand

describe ('QueryCommand', function() {
    let kv = new MockKeyValueStore();
    let dir = new ServiceDirectory(kv);
    await dir.init();

    it('Returns added services', async function() {    
        let addCmd = new AddCommand("add AdminPanel 'Internal Tools' 'Internal Tools Room' url 'alias' abc123", dir, kv);            
        await addCmd.respond();

        let queryCmd = new QueryCommand("AdminPanel", dir, kv);
        let res = await queryCmd.respond();
        assert.equal(res.success, true, "Should have succeeded");
        assert(res.text.indexOf('Internal Tools') > -1, "Should have returned the team name in the query response");
    })
})

Demo

A lot of the code as been elided for brevity, but you can view the full source on Github. Let’s take it for a spin!

Rapid Development of Serverless Chatbots with Cloudflare Workers and Workers KV

Learnings

Some of the things I learned during the development of @ownerbot were:

  • Chatbots are an awesome use case for Serverless. You can deploy and not worry again about the infrastructure
  • Workers KV means extends the range of useful chat bots to include stateful bots like @ownerbot
  • The Command pattern provides a useful way to encapsulate the parsing and responding to commands in a chat bot.

In Part 2 we’ll add authentication to ensure we’re only responding to requests from our instance of Google Chat


  1. For simplicity, I’m going to use a static shared key, but Google have recently rolled out a more secure method for verifying the caller’s authenticity, which we’ll expand on in Part 2. ↩︎

  2. This UI is the multiscript version available to Enterprise customers. You can still implement the bot with a single Worker, you’ll just need to recognize and route requests to your chatbot code. ↩︎

🤠 The Wrangler CLI: Deploying Rust with WASM on Cloudflare Workers

Post Syndicated from Ashley Williams original https://blog.cloudflare.com/introducing-wrangler-cli/

🤠 The Wrangler CLI: Deploying Rust with WASM on Cloudflare Workers
Wrangler is a CLI tool for building Rust WebAssembly Workers

🤠 The Wrangler CLI: Deploying Rust with WASM on Cloudflare Workers

Today, we’re open sourcing and announcing wrangler, a CLI tool for building, previewing, and publishing Rust and WebAssembly Cloudflare Workers.

If that sounds like some word salad to you, that’s a reasonable reaction. All three of the technologies involved are relatively new and upcoming: WebAssembly, Rust, and Cloudflare Workers.

Why WebAssembly?

Cloudflare’s mission is to help build a better Internet. We see Workers as an extension of the already incredibly powerful Web Platform, where JavaScript has allowed users to go from building small bits of interactivity, to building full applications. Node.js first extended this from the client to the server- unifying web application development around a single language – JavaScript. By choosing to use V8 isolates (the technology that powers both Node.js and the most popular browser, Chrome), we sought to make its Workers product a fully compatible, new platform for the Web, eliding the distinction between server and client. By leveraging its large global network of servers, Workers allows users to run code as close as possible to end users, eliminating the latency associated server-side logic or large client-side bundles.

But not everyone wants to write JavaScript, and JavaScript is not well suited to express every application. WebAssembly emerged in 2017 as a way to further extend the Web Platform to applications, such as games and other high resource intensive programs, that were previously excluded by the limitations of JavaScript.

V8 isolates give us both JavaScript and WebAssembly. This means that you can leverage the prototyping power and extensive ecosystem of JavaScript, alongside the power of WebAssembly, which in addition to fast, predictable, performance, also opens up the wealth of libraries written in languages that can target WebAssembly, like C, C++, and Rust.

WebAssembly on Workers eliminates trade-offs that were originally considered irresolvable: low-latency, high-performing, and Web Platform compatible- pick three.

Why Rust?

Rust is a relatively new programming language with the goal of “empowering everyone to build reliable and efficient software”. It’s a systems level language that offers its users a high amount of control, while still seeking to offer an ergonomic, friendly and modern development experience.

The Rust-WebAssembly Working Group made incredible efforts last year to build out a suite of developer tools for WebAssembly. At Cloudflare, we’re excited to support those efforts with paid developer hours and leverage those efforts to empower our users to start harnessing the power of WebAssembly on Workers now.

There are several other toolchains including Emscripten (C, C++) and AssemblyScript (TypeScript) that we’re eager to support in the future. Rust is just the beginning (but we think it’s a pretty great place to start!).

Why now?

When developing new, highly technical, products, it’s easy to get caught up in the promise and vision- often to the detriment of getting the technology into the hands of the folks who will be using it every day in the future.

We want to broaden the community that has access at the early stages of this technology- who can bring their valuable perspectives and experience, and help us shape the future of these tools.

The first step to accomplishing that is building the tools that can enable folks to engage with the new platform. wrangler is a that enabling tool. It’s just enough to unblock users who were previously unable to interact with the platform because there was no paved path.

We don’t plan to stop here. Folks will rightly note that there are some critical developer workflow steps that are missing from wrangler: linting, testing, benchmarking, and size profiling are a few that come to mind. We’ve got some big plans and we’re excited to build out more, but we’re eager to release this now to enable more folks to participate in the process. The best way to know what developers need is to ask and listen- by creating and open sourcing wrangler in such an early phase we’re hoping to shorten the feedback cycle between product and user- and build the right thing, faster.

You can install wrangler using cargo:

cargo install wrangler

To get started, head on over to the Cloudflare docs and follow the tutorial. You’ll build and preview a Cloudflare Worker that uses Rust compiled to WebAssembly to parse Markdown.

Don’t stop there though. Please check out the repo, file some issues, build project templates, and write about your experience- we want to hear from you.

We’re really excited to see what y’all build!