Tag Archives: programming

Let’s Annotate Our Methods With The Features They Implement

Post Syndicated from Bozho original https://techblog.bozho.net/lets-annotate-our-methods-with-the-features-they-implement/

Writing software consists of very little actual “writing”, and much more thinking, designing, reading, “digging”, analyzing, debugging, refactoring, aligning and meeting others.

The reading and digging part is where you try to understand what has been implemented before, why it has been implemented, and how it works. In larger projects it becomes increasingly hard to find what is happening and why – there are so many classes that interfere, and so many methods participate in implementing a particular feature.

That’s probably because there is a mismatch between the programming units (classes, methods) and the business logic units (features). Product owners want a “password reset” feature, and they don’t care if it’s done using framework configuration, custom code split in three classes, or one monolithic controller method that does that job.

This mismatch is partially addressed by the so called BDD (behaviour driven development), as business people can define scenarios in a formalized language (although they rarely do, it’s still up to the QAs or developers to write the tests). But having your tests organized around features and behaviours doesn’t mean the code is, and BDD doesn’t help in making your way through the codebase in search of why and how something is implemented.

Another issue is linking a piece of code to the issue tracking system. Source control conventions and hooks allow for setting the issue tracker number as part of the commit, and then when browsing the code, you can annotate the file and see the issue number. However, due the the many changes, even a very strict team will end up methods that are related to multiple issues and you can’t easily tell which is the proper one.

Yet another issue with the lack of a “feature” unit in programming languages is that you can’t trivially reuse existing projects to start a new one. We’ve all been there – you have a similar project and you want to get a skeleton to get thing running faster. And while there are many tools to help that (Spring Boot, Spring Roo, and other scaffolding utilities), they can rarely deliver what you need – you always have to tweak something, delete something, customize some configuration, as defaults are almost never practical.

And I have a simple proposal that will help with the issues above. As with any complex problem, simple ideas don’t solve everything, but are at least a step forward.

The proposal is in the title – let’s annotate our methods with the features they implement. Let’s have @Feature(name = "Forgotten password", issueTrackerCode="PROJ-123"). A method can implement multiple features, but that is generally discouraged by best practices (e.g. the single responsibility principle). The granularity of “feature” is something that has to be determined by each team and is the tricky part – sometimes an epic describes a feature, sometimes individual stories or even subtasks do. A definition of a feature should be agreed upon and every new team member should be told what to do and how to interpret it.

There is of course a lot of complexity, e.g. for generic methods like DAO methods, utility methods, or methods that are reused in too many places. But they also represent features, it’s just that these features are horizontal. “Data access layer” is a feature – a more technical one indeed, but it counts, and maybe deserves a story in the issue tracker.

Your features can actually be listed in one or several enums, grouped by type – business, horizontal, performance, etc. That way you can even compose features – e.g. account creation contains the logic itself, database access, a security layer.

How does such a proposal help?

  • Consciousnesses about the single responsibility of methods and that code should be readable
  • Provides a rationale for the existence of each method. Even if a proper comment is missing, the annotation will put a method (or a class) in context
  • Helps navigating code and fixing issues (if you can see all places where a feature is implemented, you are more likely to spot an issue)
  • Allows tools to analyze your features – amount, complexity, how chaotic a feature is spread across the code base, test coverage per feature, etc.
  • Allows tools to use existing projects for scaffolding for new ones – you specify the features you want to have, and they are automatically copied

At this point I’m supposed to give a link to a GitHub project for a feature annotation library. But it doesn’t make sense to have a single-annotation project. It can easily be part of guava or something similar Or can be manually created in each project. The complex part – the tools that will do the scanning and analysis, deserve separate projects, but unfortunately I don’t have time to write one.

But even without the tools, the concept of annotating methods with their high-level features is I think a useful one. Instead of trying to deduce why is this method here and what requirements does it have to implement (and were all necessary tests written at the time), such an annotation can come handy.

The post Let’s Annotate Our Methods With The Features They Implement appeared first on Bozho's tech blog.

Re-Architecting the Video Gatekeeper

Post Syndicated from Netflix Technology Blog original https://medium.com/netflix-techblog/re-architecting-the-video-gatekeeper-f7b0ac2f6b00?source=rss----2615bd06b42e---4

By Drew Koszewnik

This is the story about how the Content Setup Engineering team used Hollow, a Netflix OSS technology, to re-architect and simplify an essential component in our content pipeline — delivering a large amount of business value in the process.

The Context

Each movie and show on the Netflix service is carefully curated to ensure an optimal viewing experience. The team responsible for this curation is Title Operations. Title Operations will confirm, among other things:

  • We are in compliance with the contracts — date ranges and places where we can show a video are set up correctly for each title
  • Video with captions, subtitles, and secondary audio “dub” assets are sourced, translated, and made available to the right populations around the world
  • Title name and synopsis are available and translated
  • The appropriate maturity ratings are available for each country

When a title meets all of the minimum above requirements, then it is allowed to go live on the service. Gatekeeper is the system at Netflix responsible for evaluating the “liveness” of videos and assets on the site. A title doesn’t become visible to members until Gatekeeper approves it — and if it can’t validate the setup, then it will assist Title Operations by pointing out what’s missing from the baseline customer experience.

Gatekeeper accomplishes its prescribed task by aggregating data from multiple upstream systems, applying some business logic, then producing an output detailing the status of each video in each country.

The Tech

Hollow, an OSS technology we released a few years ago, has been best described as a total high-density near cache:

  • Total: The entire dataset is cached on each node — there is no eviction policy, and there are no cache misses.
  • High-Density: encoding, bit-packing, and deduplication techniques are employed to optimize the memory footprint of the dataset.
  • Near: the cache exists in RAM on any instance which requires access to the dataset.

One exciting thing about the total nature of this technology — because we don’t have to worry about swapping records in-and-out of memory, we can make assumptions and do some precomputation of the in-memory representation of the dataset which would not otherwise be possible. The net result is, for many datasets, vastly more efficient use of RAM. Whereas with a traditional partial-cache solution you may wonder whether you can get away with caching only 5% of the dataset, or if you need to reserve enough space for 10% in order to get an acceptable hit/miss ratio — with the same amount of memory Hollow may be able to cache 100% of your dataset and achieve a 100% hit rate.

And obviously, if you get a 100% hit rate, you eliminate all I/O required to access your data — and can achieve orders of magnitude more efficient data access, which opens up many possibilities.

The Status-Quo

Until very recently, Gatekeeper was a completely event-driven system. When a change for a video occurred in any one of its upstream systems, that system would send an event to Gatekeeper. Gatekeeper would react to that event by reaching into each of its upstream services, gathering the necessary input data to evaluate the liveness of the video and its associated assets. It would then produce a single-record output detailing the status of that single video.

Old Gatekeeper Architecture

This model had several problems associated with it:

  • This process was completely I/O bound and put a lot of load on upstream systems.
  • Consequently, these events would queue up throughout the day and cause processing delays, which meant that titles may not actually go live on time.
  • Worse, events would occasionally get missed, meaning titles wouldn’t go live at all until someone from Title Operations realized there was a problem.

The mitigation for these issues was to “sweep” the catalog so Videos matching specific criteria (e.g., scheduled to launch next week) would get events automatically injected into the processing queue. Unfortunately, this mitigation added many more events into the queue, which exacerbated the problem.

Clearly, a change in direction was necessary.

The Idea

We decided to employ a total high-density near cache (i.e., Hollow) to eliminate our I/O bottlenecks. For each of our upstream systems, we would create a Hollow dataset which encompasses all of the data necessary for Gatekeeper to perform its evaluation. Each upstream system would now be responsible for keeping its cache updated.

New Gatekeeper Architecture

With this model, liveness evaluation is conceptually separated from the data retrieval from upstream systems. Instead of reacting to events, Gatekeeper would continuously process liveness for all assets in all videos across all countries in a repeating cycle. The cycle iterates over every video available at Netflix, calculating liveness details for each of them. At the end of each cycle, it produces a complete output (also a Hollow dataset) representing the liveness status details of all videos in all countries.

We expected that this continuous processing model was possible because a complete removal of our I/O bottlenecks would mean that we should be able to operate orders of magnitude more efficiently. We also expected that by moving to this model, we would realize many positive effects for the business.

  • A definitive solution for the excess load on upstream systems generated by Gatekeeper
  • A complete elimination of liveness processing delays and missed go-live dates.
  • A reduction in the time the Content Setup Engineering team spends on performance-related issues.
  • Improved debuggability and visibility into liveness processing.

The Problem

Hollow can also be thought of like a time machine. As a dataset changes over time, it communicates those changes to consumers by breaking the timeline down into a series of discrete data states. Each data state represents a snapshot of the entire dataset at a specific moment in time.

Hollow is like a time machine

Usually, consumers of a Hollow dataset are loading the latest data state and keeping their cache updated as new states are produced. However, they may instead point to a prior state — which will revert their view of the entire dataset to a point in the past.

The traditional method of producing data states is to maintain a single producer which runs a repeating cycle. During that cycle, the producer iterates over all records from the source of truth. As it iterates, it adds each record to the Hollow library. Hollow then calculates the differences between the data added during this cycle and the data added during the last cycle, then publishes the state to a location known to consumers.

Traditional Hollow usage

The problem with this total-source-of-truth iteration model is that it can take a long time. In the case of some of our upstream systems, this could take hours. This data-propagation latency was unacceptable — we can’t wait hours for liveness processing if, for example, Title Operations adds a rating to a movie that needs to go live imminently.

The Improvement

What we needed was a faster time machine — one which could produce states with a more frequent cadence, so that changes could be more quickly realized by consumers.

Incremental Hollow is like a faster time machine

To achieve this, we created an incremental Hollow infrastructure for Netflix, leveraging work which had been done in the Hollow library earlier, and pioneered in production usage by the Streaming Platform Team at Target (and is now a public non-beta API).

With this infrastructure, each time a change is detected in a source application, the updated record is encoded and emitted to a Kafka topic. A new component that is not part of the source application, the Hollow Incremental Producer service, performs a repeating cycle at a predefined cadence. During each cycle, it reads all messages which have been added to the topic since the last cycle and mutates the Hollow state engine to reflect the new state of the updated records.

If a message from the Kafka topic contains the exact same data as already reflected in the Hollow dataset, no action is taken.

Hollow Incremental Producer Service

To mitigate issues arising from missed events, we implement a sweep mechanism that periodically iterates over an entire source dataset. As it iterates, it emits the content of each record to the Kafka topic. In this way, any updates which may have been missed will eventually be reflected in the Hollow dataset. Additionally, because this is not the primary mechanism by which updates are propagated to the Hollow dataset, this does not have to be run as quickly or frequently as a cycle must iterate the source in traditional Hollow usage.

The Hollow Incremental Producer is capable of reading a great many messages from the Kafka topic and mutating its Hollow state internally very quickly — so we can configure its cycle times to be very short (we are currently defaulting this to 30 seconds).

This is how we built a faster time machine. Now, if Title Operations adds a maturity rating to a movie, within 30 seconds, that data is available in the corresponding Hollow dataset.

The Tangible Result

With the data propagation latency issue solved, we were able to re-implement the Gatekeeper system to eliminate all I/O boundaries. With the prior implementation of Gatekeeper, re-evaluating all assets for all videos in all countries would have been unthinkable — it would tie up the entire content pipeline for more than a week (and we would then still be behind by a week since nothing else could be processed in the meantime). Now we re-evaluate everything in about 30 seconds — and we do that every minute.

There is no such thing as a missed or delayed liveness evaluation any longer, and the disablement of the prior Gatekeeper system reduced the load on our upstream systems — in some cases by up to 80%.

Load reduction on one upstream system

In addition to these performance benefits, we also get a resiliency benefit. In the prior Gatekeeper system, if one of the upstream services went down, we were unable to evaluate liveness at all because we were unable to retrieve any data from that system. In the new implementation, if one of the upstream systems goes down then it does stop publishing — but we still gate stale data for its corresponding dataset while all others make progress. So for example, if the translated synopsis system goes down, we can still bring a movie on-site in a region if it was held back for, and then receives, the correct subtitles.

The Intangible Result

Perhaps even more beneficial than the performance gains has been the improvement in our development velocity in this system. We can now develop, validate, and release changes in minutes which might have before taken days or weeks — and we can do so with significantly increased release quality.

The time-machine aspect of Hollow means that every deterministic process which uses Hollow exclusively as input data is 100% reproducible. For Gatekeeper, this means that an exact replay of what happened at time X can be accomplished by reverting all of our input states to time X, then re-evaluating everything again.

We use this fact to iterate quickly on changes to the Gatekeeper business logic. We maintain a PREPROD Gatekeeper instance which “follows” our PROD Gatekeeper instance. PREPROD is also continuously evaluating liveness for the entire catalog, but publishing its output to a different Hollow dataset. At the beginning of each cycle, the PREPROD environment will gather the latest produced state from PROD, and set each of its input datasets to the exact same versions which were used to produce the PROD output.

The PREPROD Gatekeeper instance “follows” the PROD instance

When we want to make a change to the Gatekeeper business logic, we do so and then publish it to our PREPROD cluster. The subsequent output state from PREPROD can be diffed with its corresponding output state from PROD to view the precise effect that the logic change will cause. In this way, at a glance, we can validate that our changes have precisely the intended effect, and zero unintended consequences.

A Hollow diff shows exactly what changes

This, coupled with some iteration on the deployment process, has resulted in the ability for our team to code, validate, and deploy impactful changes to Gatekeeper in literally minutes — at least an order of magnitude faster than in the prior system — and we can do so with a higher level of safety than was possible in the previous architecture.

Conclusion

This new implementation of the Gatekeeper system opens up opportunities to capture additional business value, which we plan to pursue over the coming quarters. Additionally, this is a pattern that can be replicated to other systems within the Content Engineering space and elsewhere at Netflix — already a couple of follow-up projects have been launched to formalize and capitalize on the benefits of this n-hollow-input, one-hollow-output architecture.

Content Setup Engineering is an exciting space right now, especially as we scale up our pipeline to produce more content with each passing quarter. We have many opportunities to solve real problems and provide massive value to the business — and to do so with a deep focus on computer science, using and often pioneering leading-edge technologies. If this kind of work sounds appealing to you, reach out to Ivan to get the ball rolling.


Re-Architecting the Video Gatekeeper was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Join Cloudflare & Moz at our next meetup, Serverless in Seattle!

Post Syndicated from Giuliana DeAngelis original https://blog.cloudflare.com/join-cloudflare-moz-at-our-next-meetup-serverless-in-seattle/

Join Cloudflare & Moz at our next meetup, Serverless in Seattle!
Photo by oakie / Unsplash

Join Cloudflare & Moz at our next meetup, Serverless in Seattle!

Cloudflare is organizing a meetup in Seattle on Tuesday, June 25th and we hope you can join. We’ll be bringing together members of the developers community and Cloudflare users for an evening of discussion about serverless compute and the infinite number of use cases for deploying code at the edge.

To kick things off, our guest speaker Devin Ellis will share how Moz uses Cloudflare Workers to reduce time to first byte 30-70% by caching dynamic content at the edge. Kirk Schwenkler, Solutions Engineering Lead at Cloudflare, will facilitate this discussion and share his perspective on how to grow and secure businesses at scale.

Next up, Developer Advocate Kristian Freeman will take you through a live demo of Workers and highlight new features of the platform. This will be an interactive session where you can try out Workers for free and develop your own applications using our new command-line tool.

Food and drinks will be served til close so grab your laptop and a friend and come on by!

View Event Details & Register Here

Agenda:

  • 5:00 pm Doors open, food and drinks
  • 5:30 pm Customer use case by Devin and Kirk
  • 6:00 pm Workers deep dive with Kristian
  • 6:30 – 8:30 pm Networking, food and drinks

Join Cloudflare & Moz at our next meetup, Serverless in Seattle!

Post Syndicated from Giuliana DeAngelis original https://blog.cloudflare.com/join-cloudflare-moz-at-our-next-meetup-serverless-in-seattle/

Join Cloudflare & Moz at our next meetup, Serverless in Seattle!
Photo by oakie / Unsplash

Join Cloudflare & Moz at our next meetup, Serverless in Seattle!

Cloudflare is organizing a meetup in Seattle on Tuesday, June 25th and we hope you can join. We’ll be bringing together members of the developers community and Cloudflare users for an evening of discussion about serverless compute and the infinite number of use cases for deploying code at the edge.

To kick things off, our guest speaker Devin Ellis will share how Moz uses Cloudflare Workers to reduce time to first byte 30-70% by caching dynamic content at the edge. Kirk Schwenkler, Solutions Engineering Lead at Cloudflare, will facilitate this discussion and share his perspective on how to grow and secure businesses at scale.

Next up, Developer Advocate Kristian Freeman will take you through a live demo of Workers and highlight new features of the platform. This will be an interactive session where you can try out Workers for free and develop your own applications using our new command-line tool.

Food and drinks will be served til close so grab your laptop and a friend and come on by!

View Event Details & Register Here

Agenda:

  • 5:00 pm Doors open, food and drinks
  • 5:30 pm Customer use case by Devin and Kirk
  • 6:00 pm Workers deep dive with Kristian
  • 6:30 – 8:30 pm Networking, food and drinks

Inside the Entropy

Post Syndicated from Alex Davidson original https://blog.cloudflare.com/inside-the-entropy/

Inside the Entropy

Inside the Entropy

Randomness, randomness everywhere;
Nor any verifiable entropy.

Generating random outcomes is an essential part of everyday life; from lottery drawings and constructing competitions, to performing deep cryptographic computations. To use randomness, we must have some way to ‘sample’ it. This requires interpreting some natural phenomenon (such as a fair dice roll) as an event that generates some random output. From a computing perspective, we interpret random outputs as bytes that we can then use in algorithms (such as drawing a lottery) to achieve the functionality that we want.

The sampling of randomness securely and efficiently is a critical component of all modern computing systems. For example, nearly all public-key cryptography relies on the fact that algorithms can be seeded with bytes generated from genuinely random outcomes.

In scientific experiments, a random sampling of results is necessary to ensure that data collection measurements are not skewed. Until now, generating random outputs in a way that we can verify that they are indeed random has been very difficult; typically involving taking a variety of statistical measurements.

Inside the Entropy

During Crypto week, Cloudflare is releasing a new public randomness beacon as part of the launch of the League of Entropy. The League of Entropy is a network of beacons that produces distributed, publicly verifiable random outputs for use in applications where the nature of the randomness must be publicly audited. The underlying cryptographic architecture is based on the drand project.

Verifiable randomness is essential for ensuring trust in various institutional decision-making processes such as elections and lotteries. There are also cryptographic applications that require verifiable randomness. In the land of decentralized consensus mechanisms, the DFINITY approach uses random seeds to decide the outcome of leadership elections. In this setting, it is essential that the randomness is publicly verifiable so that the outcome of the leadership election is trustworthy. Such a situation arises more generally in Sortitions: an election where leaders are selected as a random individual (or subset of individuals) from a larger set.

In this blog post, we will give a technical overview behind the cryptography used in the distributed randomness beacon, and how it can be used to generate publicly verifiable randomness. We believe that distributed randomness beacons have a huge amount of utility in realizing the Internet of the Future; where we will be able to rely on distributed, decentralized solutions to problems of a global-scale.

Randomness & entropy

A source of randomness is measured in terms of the amount of entropy it provides. Think about the entropy provided by a random output as a score to indicate how “random” the output actually is. The notion of information entropy was concretised by the famous scientist Claude Shannon in his paper A Mathematical Theory of Communication, and is sometimes known as Shannon Entropy.

A common way to think about random outputs is: a sequence of bits derived from some random outcome. For the sake of an argument, consider a fair 8-sided dice roll with sides marked 0-7. The outputs of the dice can be written as the bit-strings 000,001,010,...,111. Since the dice is fair, any of these outputs is equally likely. This is means that each of the bits is equally likely to be 0 or 1. Consequently, interpreting the output of the dice roll as a random output then derives randomness with 3 bits of entropy.

More generally, if a perfect source of randomness guarantees strings with n bits of entropy, then it generates bit-strings where each bit is equally likely to be 0 or 1. This allows us to predict the value of any bit with maximum probability 1/2. If the outputs are sampled from such a perfect source, we consider them uniformly distributed. If we sample the outputs from a source where one bit is predictable with higher probability, then the string has n-1 bits of entropy. To go back to the dice analogy, rolling a 6-sided dice provides less than 3 bits of entropy because the possible outputs are 000,001,010,011,100,101 and so the 2nd and 3rd bits are more likely to be to set to 0 than to 1.

It is possible to mix entropy sources using specifically designed mixing functions to retrieve something with even greater entropy. The maximum resulting entropy is the sum of the entropy taken from the number of entropic sources used as input.

Inside the Entropy

Sampling randomness

To sample randomness, let’s first identify the appropriate sources. There are many natural phenomena that one can use:

  • atmospheric noise;
  • radioactive decay;
  • turbulent motion; like that generated in Cloudflare’s wall of lava lamps(!).

Inside the Entropy

Unfortunately, these phenomena require very specific measuring tools, which are prohibitively expensive to install in mainstream consumer electronics. As such, most personal computing devices usually use external usage characteristics for seeding specific generator functions that output randomness as and when the system requires it. These characteristics include keyboard typing patterns and speed and mouse movement – since such usage patterns are based on the human user, it is assumed they provide sufficient entropy as a randomness source. An example of a random number generator that takes entropy from these characteristics is the Linux-based RDRAND function.

Naturally, it is difficult to tell whether a system is actually returning random outputs by only inspecting the outputs. There are statistical tests that detect whether a series of outputs is not uniformly distributed, but these tests cannot ensure that they are unpredictable. This means that it is hard to detect if a given system has had its randomness generation compromised.

Distributed randomness

It’s clear we need alternative methods for sampling randomness so that we can provide guarantees that trusted mechanisms, such as elections and lotteries, take place in secure tamper-resistant environments. The drand project was started by researchers at EPFL to address this problem. The drand charter is to provide an easily configurable randomness beacon running at geographically distributed locations around the world. The intention is for each of these beacons to generate portions of randomness that can be combined into a single random string that is publicly verifiable.

This functionality is achieved using threshold cryptography. Threshold cryptography seeks to derive solutions for standard cryptographic problems by combining information from multiple distributed entities. The notion of the threshold means that if there are n entities, then any t of the entities can combine to construct some cryptographic object (like a ciphertext, or a digital signature). These threshold systems are characterised by a setup phase, where each entity learns a share of data. They will later use this share of data to create a combined cryptographic object with a subset of the other entities.

Threshold randomness

In the case of a distributed randomness protocol, there are n randomness beacons that broadcast random values sampled from their initial data share, and the current state of the system. This data share is created during a trusted setup phase, and also takes in some internal random value that is generated by the beacon itself.

When a user needs randomness, they send requests to some number t of beacons, where t < n, and combine these values using a specific procedure. The result is a random value that can be verified and used for public auditing mechanisms.

Inside the Entropy

Consider what happens if some proportion c/n of the randomness beacons are corrupted at any one time. The nature of a threshold cryptographic system is that, as long as c < t, then the end result still remains random.

Inside the Entropy

If c exceeds t then the random values produced by the system become predictable and the notion of randomness is lost. In summary, the distributed randomness procedure provides verifiably random outputs with sufficient entropy only when c < t.

By distributing the beacons independent of each other and in geographically disparate locations, the probability that t locations can be corrupted at any one time is extremely low. The minimum choice of t is equal to n/2.

How does it actually work?

What we described above sounds a bit like magic<sup>tm</sup>. Even if c = t-1, then we can ensure that the output is indeed random and unpredictable! To make it clearer how this works, let’s dive a bit deeper into the underlying cryptography.

Two core components of drand are: a distributed key generation (DKG) procedure, and a threshold signature scheme. These core components are used in setup and randomness generation procedures, respectively. In just a bit, we’ll outline how drand uses these components (without navigating too deeply into the onerous mathematics).

Distributed key generation

At a high-level, the DKG procedure creates a distributed secret key that is formed of n different key pairs (vk_i, sk_i), each one being held by the entity i in the system. These key pairs will eventually be used to instantiate a (t,n)-threshold signature scheme (we will discuss this more later). In essence, t of the entities will be able to combine to construct a valid signature on any message.

To think about how this might work, consider a distributed key generation scheme that creates n distributed keys that are going to be represented by pizzas. Each pizza is split into n slices and one slice from each is secretly passed to one of the participants. Each entity receives one slice from each of the different pizzas (n in total) and combines these slices to form their own pizza. Each combined pizza is unique and secret for each entity, representing their own key pair.

Inside the Entropy

Inside the Entropy

Mathematical intuition

Mathematically speaking, and rather than thinking about pizzas, we can describe the underlying phenomenon by reconstructing lines or curves on a graph. We can take two coordinates on a (x,y) plane and immediately (and uniquely) define a line with the equation y = ax+b. For example, the points (2,3) and (4,7) immediately define a line with gradient (7-3)/(4/2) = 2 so a=2. You can then derive the b coefficient as -1 by evaluating either of the coordinates in the equation y = 2x + b. By uniquely, we mean that only the line y = 2x -1 satisfies the two coordinates that are chosen; no other choice of a or b fits.

Inside the Entropy

The curve ax+b has degree 1, where the degree of the equation refers to the highest order multiplication of unknown variables in the equation. That might seem like mathematical jargon, but the equation above contains only one term ax, which depends on the unknown variable x. In this specific term, the  exponent (or power) of x is 1, and so the degree of the entire equation is also 1.

Likewise, by taking three sets of coordinates pairs in the same plane, we uniquely define a quadratic curve with an equation that approximately takes the form y = ax^2 + bx + c with the coefficients a,b,c uniquely defined by the chosen coordinates. The process is a bit more involved than the above case, but it essentially starts in the same way using three coordinate pairs (x_1, y_1), (x_2, y_2) and (x_3, y_3).

Inside the Entropy

By a quadratic curve, we mean a curve of degree 2. We can see that this curve has degree 2 because it contains two terms ax^2 and bx that depend on x. The highest order term is the ax^2 term with an exponent of 2, so this curve has degree 2 (ignore the term bx which has a smaller power).

What we are ultimately trying to show is that this approach scales for curves of degree n (of the form y = a_n x^n + … a_1 x + a_0). So, if we take n+1 coordinates on the (x,y) plane, then we can uniquely reconstruct the curve of this form entirely. Such degree n equations are also known as polynomials of degree n.

In order to generalise the approach to general degrees we need some kind of formula. This formula should take n+1 pairs of coordinates and return a polynomial of degree n. Fortunately, such a formula exists without us having to derive it ourselves, it is known as the Lagrange interpolation polynomial. Using the formula in the link, we can reconstruct any n degree polynomial using n+1 unique pairs of coordinates.

Going back to pizzas temporarily, it will become clear in the next section how this Lagrange interpolation procedure essentially describes the dissemination of one slice (corresponding to (x,y) coordinates) taken from a single pizza (the entire n-1 degree polynomial) among n participants. Running this procedure n times in parallel allows each entity to construct their entire pizza (or the eventual key pair).

Back to key generation

Intuitively, in the DKG procedure we want to distribute n key pairs among n participants. This effectively means running n parallel instances of a t-out-of-n Shamir Secret Sharing scheme. This secret sharing scheme is built entirely upon the polynomial interpolation technique that we described above.

In a single instance, we take the secret key to be the first coefficient of a polynomial of degree t-1 and the public key is a published value that depends on this secret key, but does not reveal the actual coefficient. Think of RSA, where we have a number N = pq for secret large prime numbers p,q, where N is public but does not reveal the actual factorisation. Notice that if the polynomial is reconstructed using the interpolation technique above, then we immediately learn the secret key, because the first coefficient will be made explicit.

Each secret sharing scheme publishes shares, where each share is a different evaluation of the polynomial (dependent on the entity i receiving the key share). These evaluations are essentially coordinates on the (x,y) plane.

Inside the Entropy

By running n parallel instances of the secret sharing scheme, each entity receives n shares and then combines all of these to form their overall key pair (vk_i, sk_i).

The DKG procedure uses n parallel secret sharing procedures along with Pedersen commitments to distribute the key pairs. We explain in the next section how this procedure is part of the procedure for provisioning randomness beacons.

In summary, it is important to remember that each party in the DKG protocol generates a random secret key from the n shares that they receive, and they compute the corresponding public key from this. We will now explain how each entity uses this key pair to perform the cryptographic procedure that is used by the drand protocol.

Threshold signature scheme

Remember: a standard signature scheme considers a key-pair (vk,sk), where vk is a public verification key and sk is a private signing key. So, messages m signed with sk can be verified with vk. The security of the scheme ensures that it is difficult for anybody who does not hold sk to compute a valid signature for any message m.

A threshold signature scheme allows a set of users holding distributed key-pairs (vk_i,sk_i) to compute intermediate signatures u_i on a given message m.

Given knowledge of some number t of intermediate signatures u_i, a valid signature u on the message m can be reconstructed under the combined secret key sk. The public key vk can also be inferred using knowledge of the public keys vk_i, and then this public key can be used to verify u.

Again, think back to reconstructing the degree t-1 curves on graphs with t known coordinates. In this case, the coordinates correspond to the intermediate signatures u_i, and the signature u corresponds to the entire curve. For the actual signature schemes, the mathematics are much more involved than in the DKG procedure, but the principal is the same.

Inside the Entropy

drand protocol

The n beacons that will take part in the drand project are identified. In the trusted setup phase, the DKG protocol from above is run, and each beacon effectively creates a key pair (vk_i, sk_i) for a threshold signature scheme. In other words, this key pair will be able to generate intermediate signatures that can be combined to create an entire signature for the system.

Inside the Entropy

For each round (occurring once a minute, for example), the beacons agree on a signature u evaluated over a message containing the previous round’s signature and the current round’s number. This signature u is the result of combining the intermediate signatures u_i over the same message. Each intermediate signature u_i is created by each of the beacons using their secret sk_i.

Once this aggregation completes, each beacon displays the signature for the current round, along with the previous signature and round number. This allows any client to publicly verify the signature over this data to verify that the beacons honestly aggregate. This provides a chain of verifiable signatures, extending back to the first round of output. In addition, there are threshold signature schemes that output signatures that are indistinguishable from random sequences of bytes. Therefore, these signatures can be used directly as verifiable randomness for the applications we discussed previously.

What does drand use?

To instantiate the required threshold signature scheme, drand uses the (t,n)BLS signature scheme of Boneh, Lynn and Shacham. In particular, we can instantiate this scheme in the elliptic curve setting using  Barreto-Naehrig curves. Moreover, the BLS signature scheme outputs sufficiently large signatures that are randomly distributed, giving them enough entropy to be sources of randomness. Specifically the signatures are randomly distributed over 64 bytes.

BLS signatures use a specific form of mathematical operation known as a cryptographic pairing. Pairings can be computed in certain elliptic curves, including the Barreto-Naehrig curve configurations. A detailed description of pairing operations is beyond the scope of this blog post; though it is important to remember that these operations are integral to how BLS signatures work.

Concretely speaking, all drand cryptographic operations are carried out using a library built on top of Cloudflare’s implementation of the bn256 curve. The Pedersen DKG protocol follows the design of Gennaro et al..

How does it work?

The randomness beacons are synchronised in rounds. At each round, a beacon produces a new signature u_i using its private key sk_i on the previous signature generated and the round ID. These signatures are usually broadcast on the URL drand.<host>.com/api/public. These signatures can be verified using the keys vk_i and over the same data that is signed. By signing the previous signature and the current round identifier, this establishes a chain of trust for the randomness beacon that can be traced back to the original signature value.

The randomness can be retrieved by combining the signatures from each of the beacons using the threshold property of the scheme. This reconstruction of the signature u from each intermediate signature u_i is done internally by the League of Entropy nodes. Each beacon broadcasts the entire signature u, that can be accessed over the HTTP endpoint above.

Cloudflare’s drand beacon

As we mentioned at the start of this blog post, Cloudflare has launched our distributed randomness beacon. This beacon is part of a network of beacons from different institutions around the globe that form the League of  Entropy.

The Cloudflare beacon uses LavaRand as its internal source of randomness for the DKG. Other League of Entropy drand beacons have their own sources of randomness.

Give me randomness!

The Cloudflare randomness beacon allows you to retrieve the latest random value from the League of Entropy using a simple HTTP request:

curl https://drand.cloudflare.com/api/public

The response is a JSON blob of the form:

{
    "round": 7,
    "previous": <hex-encoded-previous-signature>,
    "randomness": {
        "gid": 21,
        "point": <hex-encoded-new-signature>
    }
}

where, randomness.point is the signature u aggregated among the entire set of beacons.

The signature is computed as an evaluation of the message, and is comprised of the signature of the previous round, previous, the current round number, round, and the aggregated secret key of the system. This signature can be verified using the entire public key vk of the Cloudflare beacon, learned using another HTTP request:

curl https://drand.cloudflare.com/api/public

There are eight collaborators in the League of Entropy. You can learn the current round of randomness (or the system’s public key) by querying these beacons on the HTTP endpoints listed above.

Randomness & the future

Cloudflare will continue to take an active role in the drand project, both as a contributor and by running a randomness beacon with the League of Entropy. The League of Entropy is a worldwide joint effort of individuals and academic institutions. We at Cloudflare believe it can help us realize the mission of helping Build a Better Internet. For more information on Cloudflare’s participation in the League of Entropy, visit https://cloudflare.com/drand or read Dina’s blog post.

Cloudflare would like to thank all of their collaborators in the League of Entropy; from EPFL, UChile, Kudelski Security and Protocol Labs. This work would not have been possible without the work of those who contributed to the open-source drand project. We would also like to thank and appreciate the work of Gabbi Fisher, Brendan McMillion, and Mahrud Sayrafi in launching the Cloudflare randomness beacon.

Building a To-Do List with Workers and KV

Post Syndicated from Kristian Freeman original https://blog.cloudflare.com/building-a-to-do-list-with-workers-and-kv/

Building a To-Do List with Workers and KV

Building a To-Do List with Workers and KV

In this tutorial, we’ll build a todo list application in HTML, CSS and JavaScript, with a twist: all the data should be stored inside of the newly-launched Workers KV, and the application itself should be served directly from Cloudflare’s edge network, using Cloudflare Workers.

To start, let’s break this project down into a couple different discrete steps. In particular, it can help to focus on the constraint of working with Workers KV, as handling data is generally the most complex part of building an application:

  1. Build a todos data structure
  2. Write the todos into Workers KV
  3. Retrieve the todos from Workers KV
  4. Return an HTML page to the client, including the todos (if they exist)
  5. Allow creation of new todos in the UI
  6. Allow completion of todos in the UI
  7. Handle todo updates

This task order is pretty convenient, because it’s almost perfectly split into two parts: first, understanding the Cloudflare/API-level things we need to know about Workers and KV, and second, actually building up a user interface to work with the data.

Understanding Workers

In terms of implementation, a great deal of this project is centered around KV – although that may be the case, it’s useful to break down what Workers are exactly.

Service Workers are background scripts that run in your browser, alongside your application. Cloudflare Workers are the same concept, but super-powered: your Worker scripts run on Cloudflare’s edge network, in-between your application and the client’s browser. This opens up a huge amount of opportunity for interesting integrations, especially considering the network’s massive scale around the world. Here’s some of the use-cases that I think are the most interesting:

  1. Custom security/filter rules to block bad actors before they ever reach the origin
  2. Replacing/augmenting your website’s content based on the request content (i.e. user agents and other headers)
  3. Caching requests to improve performance, or using Cloudflare KV to optimize high-read tasks in your application
  4. Building an application directly on the edge, removing the dependence on origin servers entirely

For this project, we’ll lean heavily towards the latter end of that list, building an application that clients communicate with, served on Cloudflare’s edge network. This means that it’ll be globally available, with low-latency, while still allowing the ease-of-use in building applications directly in JavaScript.

Setting up a canvas

To start, I wanted to approach this project from the bare minimum: no frameworks, JS utilities, or anything like that. In particular, I was most interested in writing a project from scratch and serving it directly from the edge. Normally, I would deploy a site to something like GitHub Pages, but avoiding the need for an origin server altogether seems like a really powerful (and performant idea) – let’s try it!

I also considered using TodoMVC as the blueprint for building the functionality for the application, but even the Vanilla JS version is a pretty impressive amount of code, including a number of Node packages – it wasn’t exactly a concise chunk of code to just dump into the Worker itself.

Instead, I decided to approach the beginnings of this project by building a simple, blank HTML page, and including it inside of the Worker. To start, we’ll sketch something out locally, like this:

<!DOCTYPE html>
<html>
  <head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width,initial-scale=1">
    <title>Todos</title>
  </head>
  <body>
    <h1>Todos</h1>
  </body>
</html>

Hold on to this code – we’ll add it later, inside of the Workers script. For the purposes of the tutorial, I’ll be serving up this project at todo.kristianfreeman.com,. My personal website was already hosted on Cloudflare, and since I’ll be serving , it was time to create my first Worker.

Creating a worker

Inside of my Cloudflare account, I hopped into the Workers tab and launched the Workers editor.

This is one of my favorite features of the editor – working with your actual website, understanding how the worker will interface with your existing project.

Building a To-Do List with Workers and KV

The process of writing a Worker should be familiar to anyone who’s used the fetch library before. In short, the default code for a Worker hooks into the fetch event, passing the request of that event into a custom function, handleRequest:

addEventListener('fetch', event => {
  event.respondWith(handleRequest(event.request))
})

Within handleRequest, we make the actual request, using fetch, and return the response to the client. In short, we have a place to intercept the response body, but by default, we let it pass-through:

async function handleRequest(request) {
  console.log('Got request', request)
  const response = await fetch(request)
  console.log('Got response', response)
  return response
}

So, given this, where do we begin actually doing stuff with our worker?

Unlike the default code given to you in the Workers interface, we want to skip fetching the incoming request: instead, we’ll construct a new Response, and serve it directly from the edge:

async function handleRequest(request) {
  const response = new Response("Hello!")
  return response
}

Given that very small functionality we’ve added to the worker, let’s deploy it. Moving into the “Routes” tab of the Worker editor, I added the route https://todo.kristianfreeman.com/* and attached it to the cloudflare-worker-todos script.

Building a To-Do List with Workers and KV

Once attached, I deployed the worker, and voila! Visiting todo.kristianfreeman.com in-browser gives me my simple “Hello!” response back.

Building a To-Do List with Workers and KV

Writing data to KV

The next step is to populate our todo list with actual data. To do this, we’ll make use of Cloudflare’s Workers KV – it’s a simple key-value store that you can access inside of your Worker script to read (and write, although it’s less common) data.

To get started with KV, we need to set up a “namespace”. All of our cached data will be stored inside that namespace, and given just a bit of configuration, we can access that namespace inside the script with a predefined variable.

I’ll create a new namespace called KRISTIAN_TODOS, and in the Worker editor, I’ll expose the namespace by binding it to the variable KRISTIAN_TODOS.

Building a To-Do List with Workers and KV

Given the presence of KRISTIAN_TODOS in my script, it’s time to understand the KV API. At time of writing, a KV namespace has three primary methods you can use to interface with your cache: get, put, and delete. Pretty straightforward!

Let’s start storing data by defining an initial set of data, which we’ll put inside of the cache using the put method. I’ve opted to define an object, defaultData, instead of a simple array of todos: we may want to store metadata and other information inside of this cache object later on. Given that data object, I’ll use JSON.stringify to put a simple string into the cache:

async function handleRequest(request) {
  // ...previous code
  
  const defaultData = { 
    todos: [
      {
        id: 1,
        name: 'Finish the Cloudflare Workers blog post',
          completed: false
      }
    ] 
  }
  KRISTIAN_TODOS.put("data", JSON.stringify(defaultData))
}

The Worker KV data store is eventually consistent: writing to the cache means that it will become available eventually, but it’s possible to attempt to read a value back from the cache immediately after writing it, only to find that the cache hasn’t been updated yet.

Given the presence of data in the cache, and the assumption that our cache is eventually consistent, we should adjust this code slightly: first, we should actually read from the cache, parsing the value back out, and using it as the data source if exists. If it doesn’t, we’ll refer to defaultData, setting it as the data source for now (remember, it should be set in the future… eventually), while also setting it in the cache for future use. After breaking out the code into a few functions for simplicity, the result looks like this:

const defaultData = { 
  todos: [
    {
      id: 1,
      name: 'Finish the Cloudflare Workers blog post',
      completed: false
    }
  ] 
}

const setCache = data => KRISTIAN_TODOS.put("data", data)
const getCache = () => KRISTIAN_TODOS.get("data")

async function getTodos(request) {
  // ... previous code
  
  let data;
  const cache = await getCache()
  if (!cache) {
    await setCache(JSON.stringify(defaultData))
    data = defaultData
  } else {
    data = JSON.parse(cache)
  }
}

Rendering data from KV

Given the presence of data in our code, which is the cached data object for our application, we should actually take this data and make it available on screen.

In our Workers script, we’ll make a new variable, html, and use it to build up a static HTML template that we can serve to the client. In handleRequest, we can construct a new Response (with a Content-Type header of text/html), and serve it to the client:

const html = `
<!DOCTYPE html>
<html>
  <head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width,initial-scale=1">
    <title>Todos</title>
  </head>
  <body>
    <h1>Todos</h1>
  </body>
</html>
`

async function handleRequest(request) {
  const response = new Response(html, {
    headers: { 'Content-Type': 'text/html' }
  })
  return response
}

Building a To-Do List with Workers and KV

We have a static HTML site being rendered, and now we can begin populating it with data! In the body, we’ll add a ul tag with an id of todos:

<body>
  <h1>Todos</h1>
  <ul id="todos"></ul>
</body>

Given that body, we can also add a script after the body that takes a todos array, loops through it, and for each todo in the array, creates a li element and appends it to the todos list:

<script>
  window.todos = [];
  var todoContainer = document.querySelector("#todos");
  window.todos.forEach(todo => {
    var el = document.createElement("li");
    el.innerText = todo.name;
    todoContainer.appendChild(el);
  });
</script>

Our static page can take in window.todos, and render HTML based on it, but we haven’t actually passed in any data from KV. To do this, we’ll need to make a couple changes.

First, our html variable will change to a function. The function will take in an argument, todos, which will populate the window.todos variable in the above code sample:

const html = todos => `
<!doctype html>
<html>
  <!-- ... -->
  <script>
    window.todos = ${todos || []}
    var todoContainer = document.querySelector("#todos");
    // ...
  <script>
</html>
`

In handleRequest, we can use the retrieved KV data to call the html function, and generate a Response based on it:

async function handleRequest(request) {
  let data;
  
  // Set data using cache or defaultData from previous section...
  
  const body = html(JSON.stringify(data.todos))
  const response = new Response(body, {
    headers: { 'Content-Type': 'text/html' }
  })
  return response
}

The finished product looks something like this:

Building a To-Do List with Workers and KV

Adding todos from the UI

At this point, we’ve built a Cloudflare Worker that takes data from Cloudflare KV and renders a static page based on it. That static page reads the data, and generates a todo list based on that data. Of course, the piece we’re missing is creating todos, from inside the UI. We know that we can add todos using the KV API – we could simply update the cache by saying KRISTIAN_TODOS.put(newData), but how do we update it from inside the UI?

It’s worth noting here that Cloudflare’s Workers documentation suggests that any writes to your KV namespace happen via their API – that is, at its simplest form, a cURL statement:

curl "<https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/storage/kv/namespaces/$NAMESPACE_ID/values/first-key>" \
  -X PUT \
  -H "X-Auth-Email: $CLOUDFLARE_EMAIL" \
  -H "X-Auth-Key: $CLOUDFLARE_AUTH_KEY" \
  --data 'My first value!'

We’ll implement something similar by handling a second route in our worker, designed to watch for PUT requests to /. When a body is received at that URL, the worker will send the new todo data to our KV store.

I’ll add this new functionality to my worker, and in handleRequest, if the request method is a PUT, it will take the request body and update the cache:

addEventListener('fetch', event => {
  event.respondWith(handleRequest(event.request))
})

const putInCache = body => {
  const accountId = "$accountId"
  const namespaceId = "$namespaceId"
  const key = "data"
  return fetch(
    `https://api.cloudflare.com/client/v4/accounts/${accountId}/storage/kv/namespaces/${namespaceId}/values/${key}`,
    { 
      method: "PUT",
      body,
      headers: { 
        'X-Auth-Email': '$accountEmail',
        'X-Auth-Key': "$authKey"
      }
    }
  )
}

async function updateTodos(request) {
  const body = await request.text()
  const ip = request.headers.get("CF-Connecting-IP")
  const cacheKey = `data-${ip}`;
  try {
    JSON.parse(body)
    await putInCache(cacheKey, body)
    return new Response(body, { status: 200 })
  } catch (err) {
    return new Response(err, { status: 500 })
  }
}

async function handleRequest(request) {
  if (request.method === "PUT") {
    return updateTodos(request);
  } else {
    // Defined in previous code block
    return getTodos(request);
  }
}

The script is pretty straightforward – we check that the request is a PUT, and wrap the remainder of the code in a try/catch block. First, we parse the body of the request coming in, ensuring that it is JSON, before we update the cache with the new data, and return it to the user. If anything goes wrong, we simply return a 500. If the route is hit with an HTTP method other than PUT – that is, GET, DELETE, or anything else – we return a 404.

With this script, we can now add some “dynamic” functionality to our HTML page to actually hit this route.

First, we’ll create an input for our todo “name”, and a button for “submitting” the todo.

<div>
  <input type="text" name="name" placeholder="A new todo"></input>
  <button id="create">Create</button>
</div>

Given that input and button, we can add a corresponding JavaScript function to watch for clicks on the button – once the button is clicked, the browser will PUT to / and submit the todo.

var createTodo = function() {
  var input = document.querySelector("input[name=name]");
  if (input.value.length) {
    fetch("/", { 
      method: 'PUT', 
      body: JSON.stringify({ todos: todos }) 
    });
  }
};

document.querySelector("#create")
  .addEventListener('click', createTodo);

This code updates the cache, but what about our local UI? Remember that the KV cache is eventually consistent – even if we were to update our worker to read from the cache and return it, we have no guarantees it’ll actually be up-to-date. Instead, let’s just update the list of todos locally, by taking our original code for rendering the todo list, making it a re-usable function called populateTodos, and calling it when the page loads and when the cache request has finished:

var populateTodos = function() {
  var todoContainer = document.querySelector("#todos");
  todoContainer.innerHTML = null;
  window.todos.forEach(todo => {
    var el = document.createElement("li");
    el.innerText = todo.name;
    todoContainer.appendChild(el);
  });
};

populateTodos();

var createTodo = function() {
  var input = document.querySelector("input[name=name]");
  if (input.value.length) {
    todos = [].concat(todos, { 
      id: todos.length + 1, 
      name: input.value,
      completed: false,
    });
    fetch("/", { 
      method: 'PUT', 
      body: JSON.stringify({ todos: todos }) 
    });
    populateTodos();
    input.value = "";
  }
};

document.querySelector("#create")
  .addEventListener('click', createTodo);

With the client-side code in place, deploying the new Worker should put all these pieces together. The result is an actual dynamic todo list!

Building a To-Do List with Workers and KV

Updating todos from the UI

For the final piece of our (very) basic todo list, we need to be able to update todos – specifically, marking them as completed.

Luckily, a great deal of the infrastructure for this work is already in place. We can currently update the todo list data in our cache, as evidenced by our createTodo function. Performing updates on a todo, in fact, is much more of a client-side task than a Worker-side one!

To start, let’s update the client-side code for generating a todo. Instead of a ul-based list, we’ll migrate the todo container and the todos themselves into using divs:

<!-- <ul id="todos"></ul> becomes... -->
<div id="todos"></div>

The populateTodos function can be updated to generate a div for each todo. In addition, we’ll move the name of the todo into a child element of that div:

var populateTodos = function() {
  var todoContainer = document.querySelector("#todos");
  todoContainer.innerHTML = null;
  window.todos.forEach(todo => {
    var el = document.createElement("div");
    var name = document.createElement("span");
    name.innerText = todo.name;
    el.appendChild(name);
    todoContainer.appendChild(el);
  });
}

So far, we’ve designed the client-side part of this code to take an array of todos in, and given that array, render out a list of simple HTML elements. There’s a number of things that we’ve been doing that we haven’t quite had a use for, yet: specifically, the inclusion of IDs, and updating the completed value on a todo. Luckily, these things work well together, in order to support actually updating todos in the UI.

To start, it would be useful to signify the ID of each todo in the HTML. By doing this, we can then refer to the element later, in order to correspond it to the todo in the JavaScript part of our code. Data attributes, and the corresponding dataset method in JavaScript, are a perfect way to implement this. When we generate our div element for each todo, we can simply attach a data attribute called todo to each div:

window.todos.forEach(todo => {
  var el = document.createElement("div");
  el.dataset.todo = todo.id
  // ... more setup

  todoContainer.appendChild(el);
});

Inside our HTML, each div for a todo now has an attached data attribute, which looks like:

<div data-todo="1"></div>
<div data-todo="2"></div>

Now we can generate a checkbox for each todo element. This checkbox will default to unchecked for new todos, of course, but we can mark it as checked as the element is rendered in the window:

window.todos.forEach(todo => {
  var el = document.createElement("div");
  el.dataset.todo = todo.id
  
  var name = document.createElement("span");
  name.innerText = todo.name;
  
  var checkbox = document.createElement("input")
  checkbox.type = "checkbox"
  checkbox.checked = todo.completed ? 1 : 0;

  el.appendChild(checkbox);
  el.appendChild(name);
  todoContainer.appendChild(el);
})

The checkbox is set up to correctly reflect the value of completed on each todo, but it doesn’t yet update when we actually check the box! To do this, we’ll add an event listener on the click event, calling completeTodo. Inside the function, we’ll inspect the checkbox element, finding its parent (the todo div), and using the todo data attribute on it to find the corresponding todo in our data. Given that todo, we can toggle the value of completed, update our data, and re-render the UI:

var completeTodo = function(evt) {
  var checkbox = evt.target;
  var todoElement = checkbox.parentNode;
  
  var newTodoSet = [].concat(window.todos)
  var todo = newTodoSet.find(t => 
    t.id == todoElement.dataset.todo
  );
  todo.completed = !todo.completed;
  todos = newTodoSet;
  updateTodos()
}

The final result of our code is a system that simply checks the todos variable, updates our Cloudflare KV cache with that value, and then does a straightforward re-render of the UI based on the data it has locally.

Building a To-Do List with Workers and KV

Conclusions and next steps

With this, we’ve created a pretty remarkable project: an almost entirely static HTML/JS application, transparently powered by Cloudflare KV and Workers, served at the edge. There’s a number of additions to be made to this application, whether you want to implement a better design (I’ll leave this as an exercise for readers to implement – you can see my version at todo.kristianfreeman.com), security, speed, etc.

Building a To-Do List with Workers and KV

One interesting and fairly trivial addition is implementing per-user caching. Of course, right now, the cache key is simply “data”: anyone visiting the site will share a todo list with any other user. Because we have the request information inside of our worker, it’s easy to make this data user-specific. For instance, implementing per-user caching by generating the cache key based on the requesting IP:

const ip = request.headers.get("CF-Connecting-IP")
const cacheKey = `data-${ip}`;
const getCache = key => KRISTIAN_TODOS.get(key)
getCache(cacheKey)

One more deploy of our Workers project, and we have a full todo list application, with per-user functionality, served at the edge!

The final version of our Workers script looks like this:

const html = todos => `
<!DOCTYPE html>
<html>
  <head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width,initial-scale=1">
    <title>Todos</title>
    <link href="https://cdn.jsdelivr.net/npm/tailwindcss/dist/tailwind.min.css" rel="stylesheet"></link>
  </head>

  <body class="bg-blue-100">
    <div class="w-full h-full flex content-center justify-center mt-8">
      <div class="bg-white shadow-md rounded px-8 pt-6 py-8 mb-4">
        <h1 class="block text-grey-800 text-md font-bold mb-2">Todos</h1>
        <div class="flex">
          <input class="shadow appearance-none border rounded w-full py-2 px-3 text-grey-800 leading-tight focus:outline-none focus:shadow-outline" type="text" name="name" placeholder="A new todo"></input>
          <button class="bg-blue-500 hover:bg-blue-800 text-white font-bold ml-2 py-2 px-4 rounded focus:outline-none focus:shadow-outline" id="create" type="submit">Create</button>
        </div>
        <div class="mt-4" id="todos"></div>
      </div>
    </div>
  </body>

  <script>
    window.todos = ${todos || []}

    var updateTodos = function() {
      fetch("/", { method: 'PUT', body: JSON.stringify({ todos: window.todos }) })
      populateTodos()
    }

    var completeTodo = function(evt) {
      var checkbox = evt.target
      var todoElement = checkbox.parentNode
      var newTodoSet = [].concat(window.todos)
      var todo = newTodoSet.find(t => t.id == todoElement.dataset.todo)
      todo.completed = !todo.completed
      window.todos = newTodoSet
      updateTodos()
    }

    var populateTodos = function() {
      var todoContainer = document.querySelector("#todos")
      todoContainer.innerHTML = null

      window.todos.forEach(todo => {
        var el = document.createElement("div")
        el.className = "border-t py-4"
        el.dataset.todo = todo.id

        var name = document.createElement("span")
        name.className = todo.completed ? "line-through" : ""
        name.innerText = todo.name

        var checkbox = document.createElement("input")
        checkbox.className = "mx-4"
        checkbox.type = "checkbox"
        checkbox.checked = todo.completed ? 1 : 0
        checkbox.addEventListener('click', completeTodo)

        el.appendChild(checkbox)
        el.appendChild(name)
        todoContainer.appendChild(el)
      })
    }

    populateTodos()

    var createTodo = function() {
      var input = document.querySelector("input[name=name]")
      if (input.value.length) {
        window.todos = [].concat(todos, { id: window.todos.length + 1, name: input.value, completed: false })
        input.value = ""
        updateTodos()
      }
    }

    document.querySelector("#create").addEventListener('click', createTodo)
  </script>
</html>
`

const defaultData = { todos: [] }

const setCache = (key, data) => KRISTIAN_TODOS.put(key, data)
const getCache = key => KRISTIAN_TODOS.get(key)

async function getTodos(request) {
  const ip = request.headers.get('CF-Connecting-IP')
  const cacheKey = `data-${ip}`
  let data
  const cache = await getCache(cacheKey)
  if (!cache) {
    await setCache(cacheKey, JSON.stringify(defaultData))
    data = defaultData
  } else {
    data = JSON.parse(cache)
  }
  const body = html(JSON.stringify(data.todos || []))
  return new Response(body, {
    headers: { 'Content-Type': 'text/html' },
  })
}

const putInCache = (cacheKey, body) => {
  const accountId = '$accountId'
  const namespaceId = '$namespaceId'
  return fetch(
    `https://api.cloudflare.com/client/v4/accounts/${accountId}/storage/kv/namespaces/${namespaceId}/values/${cacheKey}`,
    {
      method: 'PUT',
      body,
      headers: {
        'X-Auth-Email': '$cloudflareEmail',
        'X-Auth-Key': '$cloudflareApiKey',
      },
    },
  )
}

async function updateTodos(request) {
  const body = await request.text()
  const ip = request.headers.get('CF-Connecting-IP')
  const cacheKey = `data-${ip}`
  try {
    JSON.parse(body)
    await putInCache(cacheKey, body)
    return new Response(body, { status: 200 })
  } catch (err) {
    return new Response(err, { status: 500 })
  }
}

async function handleRequest(request) {
  if (request.method === 'PUT') {
    return updateTodos(request)
  } else {
    return getTodos(request)
  }
}

addEventListener('fetch', event => {
  event.respondWith(handleRequest(event.request))
})

You can find the source code for this project, as well as a README with deployment instructions, on GitHub.

Get ready to write — Workers KV is now in GA!

Post Syndicated from Ashcon Partovi original https://blog.cloudflare.com/workers-kv-is-ga/

Get ready to write — Workers KV is now in GA!

Today, we’re excited to announce Workers KV is entering general availability and is ready for production use!

Get ready to write — Workers KV is now in GA!

What is Workers KV?

Workers KV is a highly distributed, eventually consistent, key-value store that spans Cloudflare’s global edge. It allows you to store billions of key-value pairs and read them with ultra-low latency anywhere in the world. Now you can build entire applications with the performance of a CDN static cache.

Why did we build it?

Workers is a platform that lets you run JavaScript on Cloudflare’s global edge of 175+ data centers. With only a few lines of code, you can route HTTP requests, modify responses, or even create new responses without an origin server.

// A Worker that handles a single redirect,
// such a humble beginning...
addEventListener("fetch", event => {
  event.respondWith(handleOneRedirect(event.request))
})

async function handleOneRedirect(request) {
  let url = new URL(request.url)
  let device = request.headers.get("CF-Device-Type")
  // If the device is mobile, add a prefix to the hostname.
  // (eg. example.com becomes mobile.example.com)
  if (device === "mobile") {
    url.hostname = "mobile." + url.hostname
    return Response.redirect(url, 302)
  }
  // Otherwise, send request to the original hostname.
  return await fetch(request)
}

Customers quickly came to us with use cases that required a way to store persistent data. Following our example above, it’s easy to handle a single redirect, but what if you want to handle billions of them? You would have to hard-code them into your Workers script, fit it all in under 1 MB, and re-deploy it every time you wanted to make a change — yikes! That’s why we built Workers KV.

// A Worker that can handle billions of redirects,
// now that's more like it!
addEventListener("fetch", event => {
  event.respondWith(handleBillionsOfRedirects(event.request))
})

async function handleBillionsOfRedirects(request) {
  let prefix = "/redirect"
  let url = new URL(request.url)
  // Check if the URL is a special redirect.
  // (eg. example.com/redirect/<random-hash>)
  if (url.pathname.startsWith(prefix)) {
    // REDIRECTS is a custom variable that you define,
    // it binds to a Workers KV "namespace." (aka. a storage bucket)
    let redirect = await REDIRECTS.get(url.pathname.replace(prefix, ""))
    if (redirect) {
      url.pathname = redirect
      return Response.redirect(url, 302)
    }
  }
  // Otherwise, send request to the original path.
  return await fetch(request)
}

With only a few changes from our previous example, we scaled from one redirect to billions − that’s just a taste of what you can build with Workers KV.

How does it work?

Distributed data stores are often modeled using the CAP Theorem, which states that distributed systems can only pick between 2 out of the 3 following guarantees:

  • Consistency – is my data the same everywhere?
  • Availability – is my data accessible all the time?
  • Partition tolerance – is my data stored in multiple locations?

Get ready to write — Workers KV is now in GA!
Diagram of the choices and tradeoffs of the CAP Theorem.

Workers KV chooses to guarantee Availability and Partition tolerance. This combination is known as eventual consistency, which presents Workers KV with two unique competitive advantages:

  • Reads are ultra fast (median of 12 ms) since its powered by our caching technology.
  • Data is available across 175+ edge data centers and resilient to regional outages.

Although, there are tradeoffs to eventual consistency. If two clients write different values to the same key at the same time, the last client to write eventually “wins” and its value becomes globally consistent. This also means that if a client writes to a key and that same client reads that same key, the values may be inconsistent for a short amount of time.

To help visualize this scenario, here’s a real-life example amongst three friends:

  • Suppose Matthew, Michelle, and Lee are planning their weekly lunch.
  • Matthew decides they’re going out for sushi.
  • Matthew tells Michelle their sushi plans, Michelle agrees.
  • Lee, not knowing the plans, tells Michelle they’re actually having pizza.

An hour later, Michelle and Lee are waiting at the pizza parlor while Matthew is sitting alone at the sushi restaurant — what went wrong? We can chalk this up to eventual consistency, because after waiting for a few minutes, Matthew looks at his updated calendar and eventually finds the new truth, they’re going out for pizza instead.

While it may take minutes in real-life, Workers KV is much faster. It can achieve global consistency in less than 60 seconds. Additionally, when a Worker writes to a key, then immediately reads that same key, it can expect the values to be consistent if both operations came from the same location.

When should I use it?

Now that you understand the benefits and tradeoffs of using eventual consistency, how do you determine if it’s the right storage solution for your application? Simply put, if you want global availability with ultra-fast reads, Workers KV is right for you.

However, if your application is frequently writing to the same key, there is an additional consideration. We call it “the Matthew question”: Are you okay with the Matthews of the world occasionally going to the wrong restaurant?

You can imagine use cases (like our redirect Worker example) where this doesn’t make any material difference. But if you decide to keep track of a user’s bank account balance, you would not want the possibility of two balances existing at once, since they could purchase something with money they’ve already spent.

What can I build with it?

Here are a few examples of applications that have been built with KV:

  • Mass redirects – handle billions of HTTP redirects.
  • User authentication – validate user requests to your API.
  • Translation keys – dynamically localize your web pages.
  • Configuration data – manage who can access your origin.
  • Step functions – sync state data between multiple APIs functions.
  • Edge file store – host large amounts of small files.

We’ve highlighted several of those use cases in our previous blog post. We also have some more in-depth code walkthroughs, including a recently published blog post on how to build an online To-do list with Workers KV.

Get ready to write — Workers KV is now in GA!

What’s new since beta?

By far, our most common request was to make it easier to write data to Workers KV. That’s why we’re releasing three new ways to make that experience even better:

1. Bulk Writes

If you want to import your existing data into Workers KV, you don’t want to go through the hassle of sending an HTTP request for every key-value pair. That’s why we added a bulk endpoint to the Cloudflare API. Now you can upload up to 10,000 pairs (up to 100 MB of data) in a single PUT request.

curl "https://api.cloudflare.com/client/v4/accounts/ \
     $ACCOUNT_ID/storage/kv/namespaces/$NAMESPACE_ID/bulk" \
  -X PUT \
  -H "X-Auth-Key: $CLOUDFLARE_AUTH_KEY" \
  -H "X-Auth-Email: $CLOUDFLARE_AUTH_EMAIL" \
  -d '[
    {"key": "built_by",    value: "kyle, alex, charlie, andrew, and brett"},
    {"key": "reviewed_by", value: "joaquin"},
    {"key": "approved_by", value: "steve"}
  ]'

Let’s walk through an example use case: you want to off-load your website translation to Workers. Since you’re reading translation keys frequently and only occasionally updating them, this application works well with the eventual consistency model of Workers KV.

In this example, we hook into Crowdin, a popular platform to manage translation data. This Worker responds to a /translate endpoint, downloads all your translation keys, and bulk writes them to Workers KV so you can read it later on our edge:

addEventListener("fetch", event => {
  if (event.request.url.pathname === "/translate") {
    event.respondWith(uploadTranslations())
  }
})

async function uploadTranslations() {
  // Ask crowdin for all of our translations.
  var response = await fetch(
    "https://api.crowdin.com/api/project" +
    "/:ci_project_id/download/all.zip?key=:ci_secret_key")
  // If crowdin is responding, parse the response into
  // a single json with all of our translations.
  if (response.ok) {
    var translations = await zipToJson(response)
    return await bulkWrite(translations)
  }
  // Return the errored response from crowdin.
  return response
}

async function bulkWrite(keyValuePairs) {
  return fetch(
    "https://api.cloudflare.com/client/v4/accounts" +
    "/:cf_account_id/storage/kv/namespaces/:cf_namespace_id/bulk",
    {
      method: "PUT",
      headers: {
        "Content-Type": "application/json",
        "X-Auth-Key": ":cf_auth_key",
        "X-Auth-Email": ":cf_email"
      },
      body: JSON.stringify(keyValuePairs)
    }
  )
}

async function zipToJson(response) {
  // ... omitted for brevity ...
  // (eg. https://stuk.github.io/jszip)
  return [
    {key: "hello.EN", value: "Hello World"},
    {key: "hello.ES", value: "Hola Mundo"}
  ]
}

Now, when you want to translate a page, all you have to do is read from Workers KV:

async function translate(keys, lang) {
  // You bind your translations namespace to the TRANSLATIONS variable.
  return Promise.all(keys.map(key => TRANSLATIONS.get(key + "." + lang)))
}

2. Expiring Keys

By default, key-value pairs stored in Workers KV last forever. However, sometimes you want your data to auto-delete after a certain amount of time. That’s why we’re introducing the expiration and expirationTtloptions for write operations.

// Key expires 60 seconds from now.
NAMESPACE.put("myKey", "myValue", {expirationTtl: 60})

// Key expires if the UNIX epoch is in the past.
NAMESPACE.put("myKey", "myValue", {expiration: 1247788800})

# You can also set keys to expire from the Cloudflare API.
curl "https://api.cloudflare.com/client/v4/accounts/ \
     $ACCOUNT_ID/storage/kv/namespaces/$NAMESPACE_ID/ \
     values/$KEY?expiration_ttl=$EXPIRATION_IN_SECONDS"
  -X PUT \
  -H "X-Auth-Key: $CLOUDFLARE_AUTH_KEY" \
  -H "X-Auth-Email: $CLOUDFLARE_AUTH_EMAIL" \
  -d "$VALUE"

Let’s say you want to block users that have been flagged as inappropriate from your website, but only for a week. With an expiring key, you can set the expire time and not have to worry about deleting it later.

In this example, we assume users and IP addresses are one of the same. If your application has authentication, you could use access tokens as the key identifier.

addEventListener("fetch", event => {
  var url = new URL(event.request.url)
  // An internal API that blocks a new user IP.
  // (eg. example.com/block/1.2.3.4)
  if (url.pathname.startsWith("/block")) {
    var ip = url.pathname.split("/").pop()
    event.respondWith(blockIp(ip))
  } else {
    // Other requests check if the IP is blocked.
   event.respondWith(handleRequest(event.request))
  }
})

async function blockIp(ip) {
  // Values are allowed to be empty in KV,
  // we don't need to store any extra information anyway.
  await BLOCKED.put(ip, "", {expirationTtl: 60*60*24*7})
  return new Response("ok")
}

async function handleRequest(request) {
  var ip = request.headers.get("CF-Connecting-IP")
  if (ip) {
    var blocked = await BLOCKED.get(ip)
    // If we detect an IP and its blocked, respond with a 403 error.
    if (blocked) {
      return new Response({status: 403, statusText: "You are blocked!"})
    }
  }
  // Otherwise, passthrough the original request.
  return fetch(request)
}

3. Larger Values

We’ve increased our size limit on values from 64 kB to 2 MB. This is quite useful if you need to store buffer-based or file data in Workers KV.

Get ready to write — Workers KV is now in GA!

Consider this scenario: you want to let your users upload their favorite GIF to their profile without having to store these GIFs as binaries in your database or managing another cloud storage bucket.

Workers KV is a great fit for this use case! You can create a Workers KV namespace for your users’ GIFs that is fast and reliable wherever your customers are located.

In this example, users upload a link to their favorite GIF, then a Worker downloads it and stores it to Workers KV.

addEventListener("fetch", event => {
  var url = event.request.url
  var arg = request.url.split("/").pop()
  // User sends a URI encoded link to the GIF they wish to upload.
  // (eg. example.com/api/upload_gif/<encoded-uri>)
  if (url.pathname.startsWith("/api/upload_gif")) {
    event.respondWith(uploadGif(arg))
    // Profile contains link to view the GIF.
    // (eg. example.com/api/view_gif/<username>)
  } else if (url.pathname.startsWith("/api/view_gif")) {
    event.respondWith(getGif(arg))
  }
})

async function uploadGif(url) {
  // Fetch the GIF from the Internet.
  var gif = await fetch(decodeURIComponent(url))
  var buffer = await gif.arrayBuffer()
  // Upload the GIF as a buffer to Workers KV.
  await GIFS.put(user.name, buffer)
  return gif
}

async function getGif(username) {
  var gif = await GIFS.get(username, "arrayBuffer")
  // If the user has set one, respond with the GIF.
  if (gif) {
    return new Response(gif, {headers: {"Content-Type": "image/gif"}})
  } else {
    return new Response({status: 404, statusText: "User has no GIF!"})
  }
}

Lastly, we want to thank all of our beta customers. It was your valuable feedback that led us to develop these changes to Workers KV. Make sure to stay in touch with us, we’re always looking ahead for what’s next and we love hearing from you!

Pricing

We’re also ready to announce our GA pricing. If you’re one of our Enterprise customers, your pricing obviously remains unchanged.

  • $0.50 / GB of data stored, 1 GB included
  • $0.50 / million reads, 10 million included
  • $5 / million write, list, and delete operations, 1 million included

During the beta period, we learned customers don’t want to just read values at our edge, they want to write values from our edge too. Since there is high demand for these edge operations, which are more costly, we have started charging non-read operations per month.

Limits

As mentioned earlier, we increased our value size limit from 64 kB to 2 MB. We’ve also removed our cap on the number of keys per namespace — it’s now unlimited. Here are our GA limits:

  • Up to 20 namespaces per account, each with unlimited keys
  • Keys of up to 512 bytes and values of up to 2 MB
  • Unlimited writes per second for different keys
  • One write per second for the same key
  • Unlimited reads per second per key

Try it out now!

Now open to all customers, you can start using Workers KV today from your Cloudflare dashboard under the Workers tab. You can also look at our updated documentation.

We’re really excited to see what you all can build with Workers KV!

Cloudflare architecture and how BPF eats the world

Post Syndicated from Marek Majkowski original https://blog.cloudflare.com/cloudflare-architecture-and-how-bpf-eats-the-world/

Cloudflare architecture and how BPF eats the world

Recently at Netdev 0x13, the Conference on Linux Networking in Prague, I gave a short talk titled “Linux at Cloudflare”. The talk ended up being mostly about BPF. It seems, no matter the question – BPF is the answer.

Here is a transcript of a slightly adjusted version of that talk.


Cloudflare architecture and how BPF eats the world

At Cloudflare we run Linux on our servers. We operate two categories of data centers: large “Core” data centers, processing logs, analyzing attacks, computing analytics, and the “Edge” server fleet, delivering customer content from 180 locations across the world.

In this talk, we will focus on the “Edge” servers. It’s here where we use the newest Linux features, optimize for performance and care deeply about DoS resilience.


Cloudflare architecture and how BPF eats the world

Our edge service is special due to our network configuration – we are extensively using anycast routing. Anycast means that the same set of IP addresses are announced by all our data centers.

This design has great advantages. First, it guarantees the optimal speed for end users. No matter where you are located, you will always reach the closest data center. Then, anycast helps us to spread out DoS traffic. During attacks each of the locations receives a small fraction of the total traffic, making it easier to ingest and filter out unwanted traffic.


Cloudflare architecture and how BPF eats the world

Anycast allows us to keep the networking setup uniform across all edge data centers. We applied the same design inside our data centers – our software stack is uniform across the edge servers. All software pieces are running on all the servers.

In principle, every machine can handle every task – and we run many diverse and demanding tasks. We have a full HTTP stack, the magical Cloudflare Workers, two sets of DNS servers – authoritative and resolver, and many other publicly facing applications like Spectrum and Warp.

Even though every server has all the software running, requests typically cross many machines on their journey through the stack. For example, an HTTP request might be handled by a different machine during each of the 5 stages of the processing.


Cloudflare architecture and how BPF eats the world

Let me walk you through the early stages of inbound packet processing:

(1) First, the packets hit our router. The router does ECMP, and forwards packets onto our Linux servers. We use ECMP to spread each target IP across many, at least 16, machines. This is used as a rudimentary load balancing technique.

(2) On the servers we ingest packets with XDP eBPF. In XDP we perform two stages. First, we run volumetric DoS mitigations, dropping packets belonging to very large layer 3 attacks.

(3) Then, still in XDP, we perform layer 4 load balancing. All the non-attack packets are redirected across the machines. This is used to work around the ECMP problems, gives us fine-granularity load balancing and allows us to gracefully take servers out of service.

(4) Following the redirection the packets reach a designated machine. At this point they are ingested by the normal Linux networking stack, go through the usual iptables firewall, and are dispatched to an appropriate network socket.

(5) Finally packets are received by an application. For example HTTP connections are handled by a “protocol” server, responsible for performing TLS encryption and processing HTTP, HTTP/2 and QUIC protocols.

It’s in these early phases of request processing where we use the coolest new Linux features. We can group useful modern functionalities into three categories:

  • DoS handling
  • Load balancing
  • Socket dispatch


Cloudflare architecture and how BPF eats the world

Let’s discuss DoS handling in more detail. As mentioned earlier, the first step after ECMP routing is Linux’s XDP stack where, among other things, we run DoS mitigations.

Historically our mitigations for volumetric attacks were expressed in classic BPF and iptables-style grammar. Recently we adapted them to execute in the XDP eBPF context, which turned out to be surprisingly hard. Read on about our adventures:

During this project we encountered a number of eBPF/XDP limitations. One of them was the lack of concurrency primitives. It was very hard to implement things like race-free token buckets. Later we found that Facebook engineer Julia Kartseva had the same issues. In February this problem has been addressed with the introduction of bpf_spin_lock helper.


Cloudflare architecture and how BPF eats the world

While our modern volumetric DoS defenses are done in XDP layer, we still rely on iptables for application layer 7 mitigations. Here, a higher level firewall’s features are useful: connlimit, hashlimits and ipsets. We also use the xt_bpf iptables module to run cBPF in iptables to match on packet payloads. We talked about this in the past:


Cloudflare architecture and how BPF eats the world

After XDP and iptables, we have one final kernel side DoS defense layer.

Consider a situation when our UDP mitigations fail. In such case we might be left with a flood of packets hitting our application UDP socket. This might overflow the socket causing packet loss. This is problematic – both good and bad packets will be dropped indiscriminately. For applications like DNS it’s catastrophic. In the past to reduce the harm, we ran one UDP socket per IP address. An unmitigated flood was bad, but at least it didn’t affect the traffic to other server IP addresses.

Nowadays that architecture is no longer suitable. We are running more than 30,000 DNS IP’s and running that number of UDP sockets is not optimal. Our modern solution is to run a single UDP socket with a complex eBPF socket filter on it – using the SO_ATTACH_BPF socket option. We talked about running eBPF on network sockets in past blog posts:

The mentioned eBPF rate limits the packets. It keeps the state – packet counts – in an eBPF map. We can be sure that a single flooded IP won’t affect other traffic. This works well, though during work on this project we found a rather worrying bug in the eBPF verifier:

I guess running eBPF on a UDP socket is not a common thing to do.


Cloudflare architecture and how BPF eats the world

Apart from the DoS, in XDP we also run a layer 4 load balancer layer. This is a new project, and we haven’t talked much about it yet. Without getting into many details: in certain situations we need to perform a socket lookup from XDP.

The problem is relatively simple – our code needs to look up the “socket” kernel structure for a 5-tuple extracted from a packet. This is generally easy – there is a bpf_sk_lookup helper available for this. Unsurprisingly, there were some complications. One problem was the inability to verify if a received ACK packet was a valid part of a three-way handshake when SYN-cookies are enabled. My colleague Lorenz Bauer is working on adding support for this corner case.


Cloudflare architecture and how BPF eats the world

After DoS and the load balancing layers, the packets are passed onto the usual Linux TCP / UDP stack. Here we do a socket dispatch – for example packets going to port 53 are passed onto a socket belonging to our DNS server.

We do our best to use vanilla Linux features, but things get complex when you use thousands of IP addresses on the servers.

Convincing Linux to route packets correctly is relatively easy with the “AnyIP” trick. Ensuring packets are dispatched to the right application is another matter. Unfortunately, standard Linux socket dispatch logic is not flexible enough for our needs. For popular ports like TCP/80 we want to share the port between multiple applications, each handling it on a different IP range. Linux doesn’t support this out of the box. You can call bind() either on a specific IP address or all IP’s (with 0.0.0.0).


Cloudflare architecture and how BPF eats the world

In order to fix this, we developed a custom kernel patch which adds a SO_BINDTOPREFIX socket option. As the name suggests – it allows us to call bind() on a selected IP prefix. This solves the problem of multiple applications sharing popular ports like 53 or 80.

Then we run into another problem. For our Spectrum product we need to listen on all 65535 ports. Running so many listen sockets is not a good idea (see our old war story blog), so we had to find another way. After some experiments we learned to utilize an obscure iptables module – TPROXY – for this purpose. Read about it here:

This setup is working, but we don’t like the extra firewall rules. We are working on solving this problem correctly – actually extending the socket dispatch logic. You guessed it – we want to extend socket dispatch logic by utilizing eBPF. Expect some patches from us.


Cloudflare architecture and how BPF eats the world

Then there is a way to use eBPF to improve applications. Recently we got excited about doing TCP splicing with SOCKMAP:

This technique has a great potential for improving tail latency across many pieces of our software stack. The current SOCKMAP implementation is not quite ready for prime time yet, but the potential is vast.

Similarly, the new TCP-BPF aka BPF_SOCK_OPS hooks provide a great way of inspecting performance parameters of TCP flows. This functionality is super useful for our performance team.


Cloudflare architecture and how BPF eats the world

Some Linux features didn’t age well and we need to work around them. For example, we are hitting limitations of networking metrics. Don’t get me wrong – the networking metrics are awesome, but sadly they are not granular enough. Things like TcpExtListenDrops and TcpExtListenOverflows are reported as global counters, while we need to know it on a per-application basis.

Our solution is to use eBPF probes to extract the numbers directly from the kernel. My colleague Ivan Babrou wrote a Prometheus metrics exporter called “ebpf_exporter” to facilitate this. Read on:

With “ebpf_exporter” we can generate all manner of detailed metrics. It is very powerful and saved us on many occasions.


Cloudflare architecture and how BPF eats the world

In this talk we discussed 6 layers of BPFs running on our edge servers:

  • Volumetric DoS mitigations are running on XDP eBPF
  • Iptables xt_bpf cBPF for application-layer attacks
  • SO_ATTACH_BPF for rate limits on UDP sockets
  • Load balancer, running on XDP
  • eBPFs running application helpers like SOCKMAP for TCP socket splicing, and TCP-BPF for TCP measurements
  • “ebpf_exporter” for granular metrics

And we’re just getting started! Soon we will be doing more with eBPF based socket dispatch, eBPF running on Linux TC (Traffic Control) layer and more integration with cgroup eBPF hooks. Then, our SRE team is maintaining ever-growing list of BCC scripts useful for debugging.

It feels like Linux stopped developing new API’s and all the new features are implemented as eBPF hooks and helpers. This is fine and it has strong advantages. It’s easier and safer to upgrade eBPF program than having to recompile a kernel module. Some things like TCP-BPF, exposing high-volume performance tracing data, would probably be impossible without eBPF.

Some say “software is eating the world”, I would say that: “BPF is eating the software”.

eBPF can’t count?!

Post Syndicated from Jakub Sitnicki original https://blog.cloudflare.com/ebpf-cant-count/

eBPF can't count?!
Grant mechanical calculating machine, public domain image

eBPF can't count?!

It is unlikely we can tell you anything new about the extended Berkeley Packet Filter, eBPF for short, if you’ve read all the great man pages, docs, guides, and some of our blogs out there.

But we can tell you a war story, and who doesn’t like those? This one is about how eBPF lost its ability to count for a while1.

They say in our Austin, Texas office that all good stories start with "y’all ain’t gonna believe this… tale." This one though, starts with a post to Linux netdev mailing list from Marek Majkowski after what I heard was a long night:

eBPF can't count?!

Marek’s findings were quite shocking – if you subtract two 64-bit timestamps in eBPF, the result is garbage. But only when running as an unprivileged user. From rool all works fine. Huh.

If you’ve seen Marek’s presentation from the Netdev 0x13 conference, you know that we are using BPF socket filters as one of the defenses against simple, volumetric DoS attacks. So potentially getting your packet count wrong could be a Bad Thing™, and affect legitimate traffic.

Let’s try to reproduce this bug with a simplified eBPF socket filter that subtracts two 64-bit unsigned integers passed to it from user-space though a BPF map. The input for our BPF program comes from a BPF array map, so that the values we operate on are not known at build time. This allows for easy experimentation and prevents the compiler from optimizing out the operations.

Starting small, eBPF, what is 2 – 1? View the code on our GitHub.

$ ./run-bpf 2 1
arg0                    2 0x0000000000000002
arg1                    1 0x0000000000000001
diff                    1 0x0000000000000001

OK, eBPF, what is 2^32 – 1?

$ ./run-bpf $[2**32] 1
arg0           4294967296 0x0000000100000000
arg1                    1 0x0000000000000001
diff 18446744073709551615 0xffffffffffffffff

Wrong! But if we ask nicely with sudo:

$ sudo ./run-bpf $[2**32] 1
[sudo] password for jkbs:
arg0           4294967296 0x0000000100000000
arg1                    1 0x0000000000000001
diff           4294967295 0x00000000ffffffff

Who is messing with my eBPF?

When computers stop subtracting, you know something big is up. We called for reinforcements.

Our colleague Arthur Fabre quickly noticed something is off when you examine the eBPF code loaded into the kernel. It turns out kernel doesn’t actually run the eBPF it’s supplied – it sometimes rewrites it first.

Any sane programmer would expect 64-bit subtraction to be expressed as a single eBPF instruction

$ llvm-objdump -S -no-show-raw-insn -section=socket1 bpf/filter.o
…
      20:       1f 76 00 00 00 00 00 00         r6 -= r7
…

However, that’s not what the kernel actually runs. Apparently after the rewrite the subtraction becomes a complex, multi-step operation.

To see what the kernel is actually running we can use little known bpftool utility. First, we need to load our BPF

$ ./run-bpf --stop-after-load 2 1
[2]+  Stopped                 ./run-bpf 2 1

Then list all BPF programs loaded into the kernel with bpftool prog list

$ sudo bpftool prog list
…
5951: socket_filter  name filter_alu64  tag 11186be60c0d0c0f  gpl
        loaded_at 2019-04-05T13:01:24+0200  uid 1000
        xlated 424B  jited 262B  memlock 4096B  map_ids 28786

The most recently loaded socket_filter must be our program (filter_alu64). Now we now know its id is 5951 and we can list its bytecode with

$ sudo bpftool prog dump xlated id 5951
…
  33: (79) r7 = *(u64 *)(r0 +0)
  34: (b4) (u32) r11 = (u32) -1
  35: (1f) r11 -= r6
  36: (4f) r11 |= r6
  37: (87) r11 = -r11
  38: (c7) r11 s>>= 63
  39: (5f) r6 &= r11
  40: (1f) r6 -= r7
  41: (7b) *(u64 *)(r10 -16) = r6
…

bpftool can also display the JITed code with: bpftool prog dump jited id 5951.

As you see, subtraction is replaced with a series of opcodes. That is unless you are root. When running from root all is good

$ sudo ./run-bpf --stop-after-load 0 0
[1]+  Stopped                 sudo ./run-bpf --stop-after-load 0 0
$ sudo bpftool prog list | grep socket_filter
659: socket_filter  name filter_alu64  tag 9e7ffb08218476f3  gpl
$ sudo bpftool prog dump xlated id 659
…
  31: (79) r7 = *(u64 *)(r0 +0)
  32: (1f) r6 -= r7
  33: (7b) *(u64 *)(r10 -16) = r6
…

If you’ve spent any time using eBPF, you must have experienced first hand the dreaded eBPF verifier. It’s a merciless judge of all eBPF code that will reject any programs that it deems not worthy of running in kernel-space.

What perhaps nobody has told you before, and what might come as a surprise, is that the very same verifier will actually also rewrite and patch up your eBPF code as needed to make it safe.

The problems with subtraction were introduced by an inconspicuous security fix to the verifier. The patch in question first landed in Linux 5.0 and was backported to 4.20.6 stable and 4.19.19 LTS kernel. The over 2000 words long commit message doesn’t spare you any details on the attack vector it targets.

The mitigation stems from CVE-2019-7308 vulnerability discovered by Jann Horn at Project Zero, which exploits pointer arithmetic, i.e. adding a scalar value to a pointer, to trigger speculative memory loads from out-of-bounds addresses. Such speculative loads change the CPU cache state and can be used to mount a Spectre variant 1 attack.

To mitigate it the eBPF verifier rewrites any arithmetic operations on pointer values in such a way the result is always a memory location within bounds. The patch demonstrates how arithmetic operations on pointers get rewritten and we can spot a familiar pattern there

eBPF can't count?!

Wait a minute… What pointer arithmetic? We are just trying to subtract two scalar values. How come the mitigation kicks in?

It shouldn’t. It’s a bug. The eBPF verifier keeps track of what kind of values the ALU is operating on, and in this corner case the state was ignored.

Why running BPF as root is fine, you ask? If your program has CAP_SYS_ADMIN privileges, side-channel mitigations don’t apply. As root you already have access to kernel address space, so nothing new can leak through BPF.

After our report, the fix has quickly landed in v5.0 kernel and got backported to stable kernels 4.20.15 and 4.19.28. Kudos to Daniel Borkmann for getting the fix out fast. However, kernel upgrades are hard and in the meantime we were left with code running in production that was not doing what it was supposed to.

32-bit ALU to the rescue

As one of the eBPF maintainers has pointed out, 32-bit arithmetic operations are not affected by the verifier bug. This opens a door for a work-around.

eBPF registers, r0..r10, are 64-bits wide, but you can also access just the lower 32 bits, which are exposed as subregisters w0..w10. You can operate on the 32-bit subregisters using BPF ALU32 instruction subset. LLVM 7+ can generate eBPF code that uses this instruction subset. Of course, you need to you ask it nicely with trivial -Xclang -target-feature -Xclang +alu32 toggle:

$ cat sub32.c
#include "common.h"

u32 sub32(u32 x, u32 y)
{
        return x - y;
}
$ clang -O2 -target bpf -Xclang -target-feature -Xclang +alu32 -c sub32.c
$ llvm-objdump -S -no-show-raw-insn sub32.o
…
sub32:
       0:       bc 10 00 00 00 00 00 00         w0 = w1
       1:       1c 20 00 00 00 00 00 00         w0 -= w2
       2:       95 00 00 00 00 00 00 00         exit

The 0x1c opcode of the instruction #1, which can be broken down as BPF_ALU | BPF_X | BPF_SUB (read more in the kernel docs), is the 32-bit subtraction between registers we are looking for, as opposed to regular 64-bit subtract operation 0x1f = BPF_ALU64 | BPF_X | BPF_SUB, which will get rewritten.

Armed with this knowledge we can borrow a page from bignum arithmetic and subtract 64-bit numbers using just 32-bit ops:

u64 sub64(u64 x, u64 y)
{
        u32 xh, xl, yh, yl;
        u32 hi, lo;

        xl = x;
        yl = y;
        lo = xl - yl;

        xh = x >> 32;
        yh = y >> 32;
        hi = xh - yh - (lo > xl); /* underflow? */

        return ((u64)hi << 32) | (u64)lo;
}

This code compiles as expected on normal architectures, like x86-64 or ARM64, but BPF Clang target plays by its own rules:

$ clang -O2 -target bpf -Xclang -target-feature -Xclang +alu32 -c sub64.c -o - \
  | llvm-objdump -S -
…  
      13:       1f 40 00 00 00 00 00 00         r0 -= r4
      14:       1f 30 00 00 00 00 00 00         r0 -= r3
      15:       1f 21 00 00 00 00 00 00         r1 -= r2
      16:       67 00 00 00 20 00 00 00         r0 <<= 32
      17:       67 01 00 00 20 00 00 00         r1 <<= 32
      18:       77 01 00 00 20 00 00 00         r1 >>= 32
      19:       4f 10 00 00 00 00 00 00         r0 |= r1
      20:       95 00 00 00 00 00 00 00         exit

Apparently the compiler decided it was better to operate on 64-bit registers and discard the upper 32 bits. Thus we weren’t able to get rid of the problematic 0x1f opcode. Annoying, back to square one.

Surely a bit of IR will do?

The problem was in Clang frontend – compiling C to IR. We know that BPF "assembly" backend for LLVM can generate bytecode that uses ALU32 instructions. Maybe if we tweak the Clang compiler’s output just a little we can achieve what we want. This means we have to get our hands dirty with the LLVM Intermediate Representation (IR).

If you haven’t heard of LLVM IR before, now is a good time to do some reading2. In short the LLVM IR is what Clang produces and LLVM BPF backend consumes.

Time to write IR by hand! Here’s a hand-tweaked IR variant of our sub64() function:

define dso_local i64 @sub64_ir(i64, i64) local_unnamed_addr #0 {
  %3 = trunc i64 %0 to i32      ; xl = (u32) x;
  %4 = trunc i64 %1 to i32      ; yl = (u32) y;
  %5 = sub i32 %3, %4           ; lo = xl - yl;
  %6 = zext i32 %5 to i64
  %7 = lshr i64 %0, 32          ; tmp1 = x >> 32;
  %8 = lshr i64 %1, 32          ; tmp2 = y >> 32;
  %9 = trunc i64 %7 to i32      ; xh = (u32) tmp1;
  %10 = trunc i64 %8 to i32     ; yh = (u32) tmp2;
  %11 = sub i32 %9, %10         ; hi = xh - yh
  %12 = icmp ult i32 %3, %5     ; tmp3 = xl < lo
  %13 = zext i1 %12 to i32
  %14 = sub i32 %11, %13        ; hi -= tmp3
  %15 = zext i32 %14 to i64
  %16 = shl i64 %15, 32         ; tmp2 = hi << 32
  %17 = or i64 %16, %6          ; res = tmp2 | (u64)lo
  ret i64 %17
}

It may not be pretty but it does produce desired BPF code when compiled3. You will likely find the LLVM IR reference helpful when deciphering it.

And voila! First working solution that produces correct results:

$ ./run-bpf -filter ir $[2**32] 1
arg0           4294967296 0x0000000100000000
arg1                    1 0x0000000000000001
diff           4294967295 0x00000000ffffffff

Actually using this hand-written IR function from C is tricky. See our code on GitHub.

eBPF can't count?!
public domain image by Sergei Frolov

The final trick

Hand-written IR does the job. The downside is that linking IR modules to your C modules is hard. Fortunately there is a better way. You can persuade Clang to stick to 32-bit ALU ops in generated IR.

We’ve already seen the problem. To recap, if we ask Clang to subtract 32-bit integers, it will operate on 64-bit values and throw away the top 32-bits. Putting C, IR, and eBPF side-by-side helps visualize this:

eBPF can't count?!

The trick to get around it is to declare the 32-bit variable that holds the result as volatile. You might already know the volatile keyword if you’ve written Unix signal handlers. It basically tells the compiler that the value of the variable may change under its feet so it should refrain from reorganizing loads (reads) from it, as well as that stores (writes) to it might have side-effects so changing the order or eliminating them, by skipping writing it to the memory, is not allowed either.

Using volatile makes Clang emit special loads and/or stores at the IR level, which then on eBPF level translates to writing/reading the value from memory (stack) on every access. While this sounds not related to the problem at hand, there is a surprising side-effect to it:

eBPF can't count?!

With volatile access compiler doesn’t promote the subtraction to 64 bits! Don’t ask me why, although I would love to hear an explanation. For now, consider this a hack. One that does not come for free – there is the overhead of going through the stack on each read/write.

However, if we play our cards right we just might reduce it a little. We don’t actually need the volatile load or store to happen, we just want the side effect. So instead of declaring the value as volatile, which implies that both reads and writes are volatile, let’s try to make only the writes volatile with a help of a macro:

/* Emits a "store volatile" in LLVM IR */
#define ST_V(rhs, lhs) (*(volatile typeof(rhs) *) &(rhs) = (lhs))

If this macro looks strangely familiar, it’s because it does the same thing as WRITE_ONCE() macro in the Linux kernel. Applying it to our example:

eBPF can't count?!

That’s another hacky but working solution. Pick your poison.

eBPF can't count?!
CC BY-SA 3.0 image by ANKAWÜ

So there you have it – from C, to IR, and back to C to hack around a bug in eBPF verifier and be able to subtract 64-bit integers again. Usually you won’t have to dive into LLVM IR or assembly to make use of eBPF. But it does help to know a little about it when things don’t work as expected.

Did I mention that 64-bit addition is also broken? Have fun fixing it!


1 Okay, it was more like 3 months time until the bug was discovered and fixed.

2 Some even think that it is better than assembly.

3 How do we know? The litmus test is to look for statements matching r[0-9] [-+]= r[0-9] in BPF assembly.

Python at Netflix

Post Syndicated from Netflix Technology Blog original https://medium.com/netflix-techblog/python-at-netflix-bba45dae649e?source=rss----2615bd06b42e---4

By Pythonistas at Netflix, coordinated by Amjith Ramanujam and edited by Ellen Livengood

As many of us prepare to go to PyCon, we wanted to share a sampling of how Python is used at Netflix. We use Python through the full content lifecycle, from deciding which content to fund all the way to operating the CDN that serves the final video to 148 million members. We use and contribute to many open-source Python packages, some of which are mentioned below. If any of this interests you, check out the jobs site or find us at PyCon. We have donated a few Netflix Originals posters to the PyLadies Auction and look forward to seeing you all there.

Open Connect

Open Connect is Netflix’s content delivery network (CDN). An easy, though imprecise, way of thinking about Netflix infrastructure is that everything that happens before you press Play on your remote control (e.g., are you logged in? what plan do you have? what have you watched so we can recommend new titles to you? what do you want to watch?) takes place in Amazon Web Services (AWS), whereas everything that happens afterwards (i.e., video streaming) takes place in the Open Connect network. Content is placed on the network of servers in the Open Connect CDN as close to the end user as possible, improving the streaming experience for our customers and reducing costs for both Netflix and our Internet Service Provider (ISP) partners.

Various software systems are needed to design, build, and operate this CDN infrastructure, and a significant number of them are written in Python. The network devices that underlie a large portion of the CDN are mostly managed by Python applications. Such applications track the inventory of our network gear: what devices, of which models, with which hardware components, located in which sites. The configuration of these devices is controlled by several other systems including source of truth, application of configurations to devices, and back up. Device interaction for the collection of health and other operational data is yet another Python application. Python has long been a popular programming language in the networking space because it’s an intuitive language that allows engineers to quickly solve networking problems. Subsequently, many useful libraries get developed, making the language even more desirable to learn and use.

Demand Engineering

Demand Engineering is responsible for Regional Failovers, Traffic Distribution, Capacity Operations, and Fleet Efficiency of the Netflix cloud. We are proud to say that our team’s tools are built primarily in Python. The service that orchestrates failover uses numpy and scipy to perform numerical analysis, boto3 to make changes to our AWS infrastructure, rq to run asynchronous workloads and we wrap it all up in a thin layer of Flask APIs. The ability to drop into a bpython shell and improvise has saved the day more than once.

We are heavy users of Jupyter Notebooks and nteract to analyze operational data and prototype visualization tools that help us detect capacity regressions.

CORE

The CORE team uses Python in our alerting and statistical analytical work. We lean on many of the statistical and mathematical libraries (numpy, scipy, ruptures, pandas) to help automate the analysis of 1000s of related signals when our alerting systems indicate problems. We’ve developed a time series correlation system used both inside and outside the team as well as a distributed worker system to parallelize large amounts of analytical work to deliver results quickly.

Python is also a tool we typically use for automation tasks, data exploration and cleaning, and as a convenient source for visualization work.

Monitoring, alerting and auto-remediation

The Insight Engineering team is responsible for building and operating the tools for operational insight, alerting, diagnostics, and auto-remediation. With the increased popularity of Python, the team now supports Python clients for most of their services. One example is the Spectator Python client library, a library for instrumenting code to record dimensional time series metrics. We build Python libraries to interact with other Netflix platform level services. In addition to libraries, the Winston and Bolt products are also built using Python frameworks (Gunicorn + Flask + Flask-RESTPlus).

Information Security

The information security team uses Python to accomplish a number of high leverage goals for Netflix: security automation, risk classification, auto-remediation, and vulnerability identification to name a few. We’ve had a number of successful Python open sources, including Security Monkey (our team’s most active open source project). We leverage Python to protect our SSH resources using Bless. Our Infrastructure Security team leverages Python to help with IAM permission tuning using Repokid. We use Python to help generate TLS certificates using Lemur.

Some of our more recent projects include Prism: a batch framework to help security engineers measure paved road adoption, risk factors, and identify vulnerabilities in source code. We currently provide Python and Ruby libraries for Prism. The Diffy forensics triage tool is written entirely in Python. We also use Python to detect sensitive data using Lanius.

Personalization Algorithms

We use Python extensively within our broader Personalization Machine Learning Infrastructure to train some of the Machine Learning models for key aspects of the Netflix experience: from our recommendation algorithms to artwork personalization to marketing algorithms. For example, some algorithms use TensorFlow, Keras, and PyTorch to learn Deep Neural Networks, XGBoost and LightGBM to learn Gradient Boosted Decision Trees or the broader scientific stack in Python (e.g. numpy, scipy, sklearn, matplotlib, pandas, cvxpy). Because we’re constantly trying out new approaches, we use Jupyter Notebooks to drive many of our experiments. We have also developed a number of higher-level libraries to help integrate these with the rest of our ecosystem (e.g. data access, fact logging and feature extraction, model evaluation, and publishing).

Machine Learning Infrastructure

Besides personalization, Netflix applies machine learning to hundreds of use cases across the company. Many of these applications are powered by Metaflow, a Python framework that makes it easy to execute ML projects from the prototype stage to production.

Metaflow pushes the limits of Python: We leverage well parallelized and optimized Python code to fetch data at 10Gbps, handle hundreds of millions of data points in memory, and orchestrate computation over tens of thousands of CPU cores.

Notebooks

We are avid users of Jupyter notebooks at Netflix, and we’ve written about the reasons and nature of this investment before.

But Python plays a huge role in how we provide those services. Python is a primary language when we need to develop, debug, explore, and prototype different interactions with the Jupyter ecosystem. We use Python to build custom extensions to the Jupyter server that allows us to manage tasks like logging, archiving, publishing, and cloning notebooks on behalf of our users.
We provide many flavors of Python to our users via different Jupyter kernels, and manage the deployment of those kernel specifications using Python.

Orchestration

The Big Data Orchestration team is responsible for providing all of the services and tooling to schedule and execute ETL and Adhoc pipelines.

Many of the components of the orchestration service are written in Python. Starting with our scheduler, which uses Jupyter Notebooks with papermill to provide templatized job types (Spark, Presto, …). This allows our users to have a standardized and easy way to express work that needs to be executed. You can see some deeper details on the subject here. We have been using notebooks as real runbooks for situations where human intervention is required — for example: to restart everything that has failed in the last hour.

Internally, we also built an event-driven platform that is fully written in Python. We have created streams of events from a number of systems that get unified into a single tool. This allows us to define conditions to filter events, and actions to react or route them. As a result of this, we have been able to decouple microservices and get visibility into everything that happens on the data platform.

Our team also built the pygenie client which interfaces with Genie, a federated job execution service. Internally, we have additional extensions to this library that apply business conventions and integrate with the Netflix platform. These libraries are the primary way users interface programmatically with work in the Big Data platform.

Finally, it’s been our team’s commitment to contribute to papermill and scrapbook open source projects. Our work there has been both for our own and external use cases. These efforts have been gaining a lot of traction in the open source community and we’re glad to be able to contribute to these shared projects.

Experimentation Platform

The scientific computing team for experimentation is creating a platform for scientists and engineers to analyze AB tests and other experiments. Scientists and engineers can contribute new innovations on three fronts, data, statistics, and visualizations.

The Metrics Repo is a Python framework based on PyPika that allows contributors to write reusable parameterized SQL queries. It serves as an entry point into any new analysis.

The Causal Models library is a Python & R framework for scientists to contribute new models for causal inference. It leverages PyArrow and RPy2 so that statistics can be calculated seamlessly in either language.

The Visualizations library is based on Plotly. Since Plotly is a widely adopted visualization spec, there are a variety of tools that allow contributors to produce an output that is consumable by our platforms.

Partner Ecosystem

The Partner Ecosystem group is expanding its use of Python for testing Netflix applications on devices. Python is forming the core of a new CI infrastructure, including controlling our orchestration servers, controlling Spinnaker, test case querying and filtering, and scheduling test runs on devices and containers. Additional post-run analysis is being done in Python using TensorFlow to determine which tests are most likely to show problems on which devices.

Video Encoding and Media Cloud Engineering

Our team takes care of encoding (and re-encoding) the Netflix catalog, as well as leveraging machine learning for insights into that catalog.
We use Python for ~50 projects such as vmaf and mezzfs, we build computer vision solutions using a media map-reduce platform called Archer, and we use Python for many internal projects.
We have also open sourced a few tools to ease development/distribution of Python projects, like setupmeta and pickley.

Netflix Animation and NVFX

Python is the industry standard for all of the major applications we use to create Animated and VFX content, so it goes without saying that we are using it very heavily. All of our integrations with Maya and Nuke are in Python, and the bulk of our Shotgun tools are also in Python. We’re just getting started on getting our tooling in the cloud, and anticipate deploying many of our own custom Python AMIs/containers.

Content Machine Learning, Science & Analytics

The Content Machine Learning team uses Python extensively for the development of machine learning models that are the core of forecasting audience size, viewership, and other demand metrics for all content.


Python at Netflix was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

xdpcap: XDP Packet Capture

Post Syndicated from Arthur Fabre original https://blog.cloudflare.com/xdpcap/

xdpcap: XDP Packet Capture

Our servers process a lot of network packets, be it legitimate traffic or large denial of service attacks. To do so efficiently, we’ve embraced eXpress Data Path (XDP), a Linux kernel technology that provides a high performance mechanism for low level packet processing. We’re using it to drop DoS attack packets with L4Drop, and also in our new layer 4 load balancer. But there’s a downside to XDP: because it processes packets before the normal Linux network stack sees them, packets redirected or dropped are invisible to regular debugging tools such as tcpdump.

To address this, we built a tcpdump replacement for XDP, xdpcap. We are open sourcing this tool: the code and documentation are available on GitHub.

xdpcap uses our classic BPF (cBPF) to eBPF or C compiler, cbpfc, which we are also open sourcing: the code and documentation are available on GitHub.

xdpcap: XDP Packet Capture
CC BY 4.0 image by Christoph Müller

Tcpdump provides an easy way to dump specific packets of interest. For example, to capture all IPv4 DNS packets, one could:

$ tcpdump ip and udp port 53

xdpcap reuses the same syntax! xdpcap can write packets to a pcap file:

$ xdpcap /path/to/hook capture.pcap "ip and udp port 53"
XDPAborted: 0/0   XDPDrop: 0/0   XDPPass: 254/0   XDPTx: 0/0   (received/matched packets)
XDPAborted: 0/0   XDPDrop: 0/0   XDPPass: 995/1   XDPTx: 0/0   (received/matched packets)

Or write the pcap to stdout, and decode the packets with tcpdump:

$ xdpcap /path/to/hook - "ip and udp port 53" | sudo tcpdump -r -
reading from file -, link-type EN10MB (Ethernet)
16:18:37.911670 IP 1.1.1.1 > 1.2.3.4.21563: 26445$ 1/0/1 A 93.184.216.34 (56)

The remainder of this post explains how we built xdpcap, including how /path/to/hook/ is used to attach to XDP programs.

tcpdump

To replicate tcpdump, we first need to understand its inner workings. Marek Majkowski has previously written a detailed post on the subject. Tcpdump exposes a high level filter language, pcap-filter, to specify which packets are of interest. Reusing our earlier example, the following filter expression captures all IPv4 UDP packets to or from port 53, likely DNS traffic:

ip and udp port 53

Internally, tcpdump uses libpcap to compile the filter to classic BPF (cBPF). cBPF is a simple bytecode language to represent programs that inspect the contents of a packet. A program returns non-zero to indicate that a packet matched the filter, and zero otherwise. The virtual machine that executes cBPF programs is very simple, featuring only two registers, a and x. There is no way of checking the length of the input packet[1]; instead any out of bounds packet access will terminate the cBPF program, returning 0 (no match). The full set of opcodes are listed in the Linux documentation. Returning to our example filter, ip and udp port 53 compiles to the following cBPF program, expressed as an annotated flowchart:

Example cBPF filter flowchart

Tcpdump attaches the generated cBPF filter to a raw packet socket using a setsockopt system call with SO_ATTACH_FILTER. The kernel runs the filter on every packet destined for the socket, but only delivers matching packets. Tcpdump displays the delivered packets, or writes them to a pcap capture file for later analysis.

xdpcap

In the context of XDP, our tcpdump replacement should:

  • Accept filters in the same filter language as tcpdump
  • Dynamically instrument XDP programs of interest
  • Expose matching packets to userspace

XDP

XDP uses an extended version of the cBPF instruction set, eBPF, to allow arbitrary programs to run for each packet received by a network card, potentially modifying the packets. A stringent kernel verifier statically analyzes eBPF programs, ensuring that memory bounds are checked for every packet load.

eBPF programs can return:

  • XDP_DROP: Drop the packet
  • XDP_TX: Transmit the packet back out the network interface
  • XDP_PASS: Pass the packet up the network stack

eBPF introduces several new features, notably helper function calls, enabling programs to call functions exposed by the kernel. This includes retrieving or setting values in maps, key-value data structures that can also be accessed from userspace.

Filter

A key feature of tcpdump is the ability to efficiently pick out packets of interest; packets are filtered before reaching userspace. To achieve this in XDP, the desired filter must be converted to eBPF.

cBPF is already used in our XDP based DoS mitigation pipeline: cBPF filters are first converted to C by cbpfc, and the result compiled with Clang to eBPF. Reusing this mechanism allows us to fully support libpcap filter expressions:

Pipeline to convert pcap-filter expressions to eBPF via C using cbpfc

To remove the Clang runtime dependency, our cBPF compiler, cbpfc, was extended to directly generate eBPF:

Pipeline to convert pcap-filter expressions directly to eBPF using cbpfc

Converted to eBPF using cbpfc, ip and udp port 53 yields:

Example cBPF filter converted to eBPF with cbpfc flowchart

The emitted eBPF requires a prologue, which is responsible for loading a pointer to the beginning, and end, of the input packet into registers r6 and r7 respectively[2].

The generated code follows a very similar structure to the original cBPF filter, but with:

  • Bswap instructions to convert big endian packet data to little endian.
  • Guards to check the length of the packet before we load data from it. These are required by the kernel verifier.

The epilogue can use the result of the filter to perform different actions on the input packet.

As mentioned earlier, we’re open sourcing cbpfc; the code and documentation are available on GitHub. It can be used to compile cBPF to C, or directly to eBPF, and the generated code is accepted by the kernel verifier.

Instrument

Tcpdump can start and stop capturing packets at any time, without requiring coordination from applications. This rules out modifying existing XDP programs to directly run the generated eBPF filter; the programs would have to be modified each time xdpcap is run. Instead, programs should expose a hook that can be used by xdpcap to attach filters at runtime.

xdpcap’s hook support is built around eBPF tail-calls. XDP programs can yield control to other programs using the tail-call helper. Control is never handed back to the calling program, the return code of the subsequent program is used. For example, consider two XDP programs, foo and bar, with foo attached to the network interface. Foo can tail-call into bar:

Flow of XDP program foo tail-calling into program bar

The program to tail-call into is configured at runtime, using a special eBPF program array map. eBPF programs tail-call into a specific index of the map, the value of which is set by userspace. From our example above, foo’s tail-call map holds a single entry:

indexprogram
0bar

A tail-call into an empty index will not do anything, XDP programs always need to return an action themselves after a tail-call should it fail. Once again, this is enforced by the kernel verifier. In the case of program foo:

int foo(struct xdp_md *ctx) {
    // tail-call into index 0 - program bar
    tail_call(ctx, &map, 0);

    // tail-call failed, pass the packet
    return XDP_PASS;
}

To leverage this as a hook point, the instrumented programs are modified to always tail-call, using a map that is exposed to xdpcap by pinning it to a bpffs. To attach a filter, xdpcap can set it in the map. If no filter is attached, the instrumented program returns the correct action itself.

With a filter attached to program foo, we have:

Flow of XDP program foo tail-calling into an xdpcap filter

The filter must return the original action taken by the instrumented program to ensure the packet is processed correctly. To achieve this, xdpcap generates one filter program per possible XDP action, each one hardcoded to return that specific action. All the programs are set in the map:

indexprogram
0 (XDP_ABORTED)filter XDP_ABORTED
1 (XDP_DROP)filter XDP_DROP
2 (XDP_PASS)filter XDP_PASS
3 (XDP_TX)filter XDP_TX

By tail-calling into the correct index, the instrumented program determines the final action:

Flow of XDP program foo tail-calling into xdpcap filters, one for each action

xdpcap provides a helper function that attempts a tail-call for the given action. Should it fail, the action is returned instead:

enum xdp_action xdpcap_exit(struct xdp_md *ctx, enum xdp_action action) {
    // tail-call into the filter using the action as an index
    tail_call((void *)ctx, &xdpcap_hook, action);

    // tail-call failed, return the action
    return action;
}

This allows an XDP program to simply:

int foo(struct xdp_md *ctx) {
    return xdpcap_exit(ctx, XDP_PASS);
}

Expose

Matching packets, as well as the original action taken for them, need to be exposed to userspace. Once again, such a mechanism is already part of our XDP based DoS mitigation pipeline!

Another eBPF helper, perf_event_output, allows an XDP program to generate a perf event containing, amongst some metadata, the packet. As xdpcap generates one filter per XDP action, the filter program can include the action taken in the metadata. A userspace program can create a perf event ring buffer to receive events into, obtaining both the action and the packet.


  1. This is true of the original cBPF, but Linux implements a number of extensions, one of which allows the length of the input packet to be retrieved. ↩︎

  2. This example uses registers r6 and r7, but cbpfc can be configured to use any registers. ↩︎

An Introduction to C & GUI Programming – the new book from Raspberry Pi Press

Post Syndicated from Simon Long original https://www.raspberrypi.org/blog/an-introduction-to-c-gui-programming-the-new-book-from-raspberry-pi-press/

The latest book from Raspberry Pi Press, An Introduction to C & GUI Programming, is now available. Author Simon Long explains how it came to be written…

An Introduction to C and GUI programming by Simon Long

Learning C

I remember my first day in a ‘proper’ job very well. I’d just left university, and was delighted to have been taken on by a world-renowned consultancy firm as a software engineer. I was told that most of my work would be in C, which I had never used, so the first order of business was to learn it.

My manager handed me a copy of Kernighan & Ritchie’s The C Programming Language, pointed to a terminal in the corner, said ‘That’s got a compiler. Off you go!’, and left me to it. So, I started reading the book, which is affectionately known to most software engineers as ‘K&R‘.

I didn’t get very far. K&R is basically the specification of the C language. Dennis Ritchie, the eponymous ‘R’, invented C, and while the book he helped write is an excellent reference guide, it is not a great introduction for a beginner. Like most people who know their subject inside out, the authors tend to assume that you know more than you do, so reading the book when you don’t know anything about the language at all is a little frustrating. I do know people who have learned C from K&R, and they have my undying respect!

I ended up learning C on the job as I went along; I looked at other people’s code, hacked stuff together, worked out why things didn’t work, asked for help from my colleagues, made a lot of mistakes, and gradually got the hang of it. I found only one book that was helpful for a beginner: it was called C For Yourself, and was actually one of the manuals for the long-extinct Microsoft QuickC compiler. That book is now impossible to find, so I’ve always had to tell people that the best book for learning C as a beginner is ‘C For Yourself, but you won’t be able to find a copy!’

Writing An Introduction to C & GUI Programming

When I embarked on this project, the editor of The MagPi and I were discussing possible series for the magazine, and we thought about creating a guide to writing GUI applications in C — that’s what I do in my day job at Raspberry Pi, so it seemed a logical place to start. We realised that the reader would need to know C to benefit from the series, and they wouldn’t be able to find a copy of C For Yourself. We decided that I ought to solve that problem first, so I wrote the original beginners’ guide to C series for The MagPi.

(At this point, I should stress that the series is aimed at absolute beginners. I freely admit that I have simplified parts of the language so that the reader does not have to absorb as much in one go. So yes, I do know about returning a success/fail code from a program, but beginners really don’t need to learn about that in the first chapter — especially when many will never need to write a program which does it. That’s why it isn’t explained until Chapter 9.)

An Introduction to C and GUI programming by Simon Long published by Raspberry Pi Press

So, the beginners’ guide to C came first, and I have now got round to writing the second part, which was what I’d planned to write all along. The section on GUIs describes how to write applications using the GTK toolkit, which is used for most of the Raspberry Pi Desktop and its associated applications. GTK is very powerful, and allows you to write rich graphical user interfaces with relatively few lines of code, but it’s not the most intuitive for beginners. (Much like C itself!) The book walks you through the basics of creating a window, putting widgets on it, and making the widgets do useful things, and gets you to the point where you know enough to be able to write an application like the ones I have written for the Raspberry Pi Desktop.

An Introduction to C and GUI programming by Simon Long published by Raspberry Pi Press

It then seemed logical to bring the two parts together in a single volume, so that someone with no experience of C has enough information to go from a standing start to writing useful desktop applications.

I hope that I’ve achieved that and if nothing else, I hope that I’ve written a book which is a bit more approachable for beginners than K&R!

Get An Introduction to C & GUI Programming today!

An Introduction to C & GUI Programming is available today from the Raspberry Pi Press online store, or as a free download here. You can also pick up a copy from the Raspberry Pi Store in Cambridge, or ask your local bookstore if they have it in stock or can order it in for you.

Alex interjects to state the obvious: Basically, what we’re saying here is that there’s no reason for you not to read Simon’s book. Oh, and it feels really nice too.

The post An Introduction to C & GUI Programming – the new book from Raspberry Pi Press appeared first on Raspberry Pi.

Eating Dogfood at Scale: How We Build Serverless Apps with Workers

Post Syndicated from Jonathan Spies original https://blog.cloudflare.com/building-serverless-apps-with-workers/

Eating Dogfood at Scale: How We Build Serverless Apps with Workers

Eating Dogfood at Scale: How We Build Serverless Apps with Workers

You’ve had a chance to build a Cloudflare Worker. You’ve tried KV Storage and have a great use case for your Worker. You’ve even demonstrated the usefulness to your product or organization. Now you need to go from writing a single file in the Cloudflare Dashboard UI Editor to source controlled code with multiple environments deployed using your favorite CI tool.

Fortunately, we have a powerful and flexible API for managing your workers. You can customize your deployment to your heart’s content. Our blog has already featured many things made possible by that API:

These tools make deployments easier to configure, but it still takes time to manage. The Serverless Framework Cloudflare Workers plugin removes that deployment overhead so you can spend more time working on your application and less on your deployment.

Focus on your application

Here at Cloudflare, we’ve been working to rebuild our Access product to run entirely on Workers. The move will allow Access to take advantage of the resiliency, performance, and flexibility of Workers. We’ll publish a more detailed post about that migration once complete, but the experience required that we retool some of our process to match or existing development experience as much as possible.

To us this meant:

  • Git
  • Easily deploy
  • Different environments
  • Unit Testing
  • CI Integration
  • Typescript/Multiple Files
  • Everything Must Be Automated

The Cloudflare Access team looked at three options for automating all of these tools in our pipeline. All of the options will work and could be right for you, but custom scripting can be a chore to maintain and Terraform lacked some extensibility.

  1. Custom Scripting
  2. Terraform
  3. Serverless Framework

We decided on the Serverless Framework. Serverless Framework provided a tool to mirror our existing process as closely as possible without too much DevOps overhead. Serverless is extremely simple and doesn’t interfere with the application code. You can get a project set up and deployed in seconds. It’s obviously less work than writing your own custom management scripts. But it also requires less boiler plate than Terraform because the Serverless Framework is designed for the “serverless” niche. However, if you are already using Terraform to manage other Cloudflare products, Terraform might be the best fit.

Walkthrough

Everything for the project happens in a YAML file called serverless.yml. Let’s go through the features of the configuration file.

To get started, we need to install serverless from npm and generate a new project.

npm install serverless -g
serverless create --template cloudflare-workers --path myproject
cd myproject
npm install

If you are an enterprise client, you want to use the cloudflare-workers-enterprise template as it will set up more than one worker (but don’t worry, you can add more to any template). Also, I’ll touch on this later, but if you want to write your workers in Rust, use the cloudflare-workers-rust template.

You should now have a project that feels familiar, ready to be added to your favorite source control. In the project should be a serverless.yml file like the following.

service:
    name: hello-world

provider:
  name: cloudflare
  config:
    accountId: CLOUDFLARE_ACCOUNT_ID
    zoneId: CLOUDFLARE_ZONE_ID

plugins:
  - serverless-cloudflare-workers

functions:
  hello:
    name: hello
    script: helloWorld  # there must be a file called helloWorld.js
    events:
      - http:
          url: example.com/hello/*
          method: GET
          headers:
            foo: bar
            x-client-data: value

The service block simply contains the name of your service. This will be used in your Worker script names if you do not overwrite them.

Under provider, name must be ‘cloudflare’  and you need to add your account and zone IDs. You can find them in the Cloudflare Dashboard.

The plugins section adds the Cloudflare specific code.

Now for the good part: functions. Each block under functions is a Worker script.

name: (optional) If left blank it will be STAGE-service.name-script.identifier. If I removed name from this file and deployed in production stage, the script would be named production-hello-world-hello.

script: the relative path to the javascript file with the worker script. I like to organize mine in a folder called handlers.

events: Currently Workers only support http events. We call these routes. The example provided says that GET https://example.com/hello/<anything here> will  cause this worker to execute. The headers block is for testing invocations.

At this point you can deploy your worker!

[email protected] CLOUDFLARE_AUTH_KEY=XXXXXXXX serverless deploy

This is very easy to deploy, but it doesn’t address our requirements. Luckily, there’s just a few simple modifications to make.

Maturing our YAML File

Here’s a more complex YAML file.

service:
  name: hello-world

package:
  exclude:
    - node_modules/**
  excludeDevDependencies: false

custom:
  defaultStage: development
  deployVars: ${file(./config/deploy.${self:provider.stage}.yml)}

kv: &kv
  - variable: MYUSERS
    namespace: users

provider:
  name: cloudflare
  stage: ${opt:stage, self:custom.defaultStage}
  config:
    accountId: ${env:CLOUDFLARE_ACCOUNT_ID}
    zoneId: ${env:CLOUDFLARE_ZONE_ID}

plugins:
  - serverless-cloudflare-workers

functions:
  hello:
    name: ${self:provider.stage}-hello
    script: handlers/hello
    webpack: true
    environment:
      MY_ENV_VAR: ${self:custom.deployVars.env_var_value}
      SENTRY_KEY: ${self:custom.deployVars.sentry_key}
    resources: 
      kv: *kv
    events:
      - http:
          url: "${self:custom.deployVars.SUBDOMAIN}.mydomain.com/hello"
          method: GET
      - http:
          url: "${self:custom.deployVars.SUBDOMAIN}.mydomain.com/alsohello*"
          method: GET

We can add a custom section where we can put custom variables to use later in the file.

defaultStage: We set this to development so that forgetting to pass a stage doesn’t trigger a production deploy. Combined with the stage option under provider we can set the stage for deployment.

deployVars: We use this custom variable to load another YAML file dependent on the stage. This lets us have different values for different stages. In development, this line loads the file ./config/deploy.development.yml. Here’s an example file:

env_var_value: true
sentry_key: XXXXX
SUBDOMAIN: dev

kv: Here we are showing off a feature of YAML. If you assign a name to a block using the ‘&’, you can use it later as a YAML variable. This is very handy in a multi script account. We could have named this variable anything, but we are naming it kv since it holds our Workers Key Value storage settings to be used in our function below.

Inside of the kv block we’re creating a namespace and binding it to a variable available in your Worker. It will ensure that the namespace “users” exists and is bound to MYUSERS.

kv: &kv
  - variable: MYUSERS
    namespace: users

provider: The only new part of the provider block is stage.

stage: ${opt:stage, self:custom.defaultStage}

This line sets stage to either the command line option or custom.defaultStage if opt:stage is blank. When we deploy, we pass —stage=production to serverless deploy.

Now under our function we have added webpack, resources, and environment.

webpack: If set to true, will simply bundle each handler into a single file for deployment. It will also take a string representing a path to a webpack config file so you can customize it. This is how we add Typescript support to our projects.

resources: This block is used to automate resource creation. In resources we’re linking back to the kv block we created earlier.

Side note: If you would like to include WASM bindings in your project, it can be done in a very similar way to how we included Workers KV. For more information on WASM, see the documentation.

environment: This is the butter for the bread that is managing configuration for different stages. Here we can specify values to bind to variables to use in worker scripts. Combined with YAML magic, we can store our values in the aforementioned config files so that we deploy different values in different stages. With environments, we can easily tie into our CI tool. The CI tool has our deploy.production.yml. We simply run the following command from within our CI.

sls deploy --stage=production

Finally, I added a route to demonstrate that a script can be executed on multiple routes.

At this point I’ve covered (or hinted) at everything on our original list except Unit Testing. There are a few ways to do this.

We have a previous blog post about Unit Testing that covers using cloud worker, a great tool built by Dollar Shave Club.

My team opted to use the classic node frameworks mocha and sinon. Because we are using Typescript, we can build for node or build for v8. You can also make mocha work for non-typescript projects if you use an experimental feature that adds import/export support to node.

--experimental-modules

We’re excited about moving more and more of our services to Cloudflare Workers, and the Serverless Framework makes that easier to do. If you’d like to learn even more or get involved with the project, see us on github.com. For additional information on using Serverless Framework with Cloudflare Workers, check out our documentation on the Serverless Framework.

Programmers Who Don’t Understand Security Are Poor at Security

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2019/03/programmers_who.html

A university study confirmed the obvious: if you pay a random bunch of freelance programmers a small amount of money to write security software, they’re not going to do a very good job at it.

In an experiment that involved 43 programmers hired via the Freelancer.com platform, University of Bonn academics have discovered that developers tend to take the easy way out and write code that stores user passwords in an unsafe manner.

For their study, the German academics asked a group of 260 Java programmers to write a user registration system for a fake social network.

Of the 260 developers, only 43 took up the job, which involved using technologies such as Java, JSF, Hibernate, and PostgreSQL to create the user registration component.

Of the 43, academics paid half of the group with €100, and the other half with €200, to determine if higher pay made a difference in the implementation of password security features.

Further, they divided the developer group a second time, prompting half of the developers to store passwords in a secure manner, and leaving the other half to store passwords in their preferred method — hence forming four quarters of developers paid €100 and prompted to use a secure password storage method (P100), developers paid €200 and prompted to use a secure password storage method (P200), devs paid €100 but not prompted for password security (N100), and those paid €200 but not prompted for password security (N200).

I don’t know why anyone would expect this group of people to implement a good secure password system. Look at how they were hired. Look at the scope of the project. Look at what they were paid. I’m sure they grabbed the first thing they found on GitHub that did the job.

I’m not very impressed with the study or its conclusions.

Microsoft acquires GitHub

Post Syndicated from corbet original https://lwn.net/Articles/756443/rss

Here’s the
press release
announcing Microsoft’s agreement to acquire GitHub for a
mere $7.5 billion. “GitHub will retain its developer-first
ethos and will operate independently to provide an open platform for all
developers in all industries. Developers will continue to be able to use
the programming languages, tools and operating systems of their choice for
their projects — and will still be able to deploy their code to any
operating system, any cloud and any device.

Build your own weather station with our new guide!

Post Syndicated from Richard Hayler original https://www.raspberrypi.org/blog/build-your-own-weather-station/

One of the most common enquiries I receive at Pi Towers is “How can I get my hands on a Raspberry Pi Oracle Weather Station?” Now the answer is: “Why not build your own version using our guide?”

Build Your Own weather station kit assembled

Tadaaaa! The BYO weather station fully assembled.

Our Oracle Weather Station

In 2016 we sent out nearly 1000 Raspberry Pi Oracle Weather Station kits to schools from around the world who had applied to be part of our weather station programme. In the original kit was a special HAT that allows the Pi to collect weather data with a set of sensors.

The original Raspberry Pi Oracle Weather Station HAT – Build Your Own Raspberry Pi weather station

The original Raspberry Pi Oracle Weather Station HAT

We designed the HAT to enable students to create their own weather stations and mount them at their schools. As part of the programme, we also provide an ever-growing range of supporting resources. We’ve seen Oracle Weather Stations in great locations with a huge differences in climate, and they’ve even recorded the effects of a solar eclipse.

Our new BYO weather station guide

We only had a single batch of HATs made, and unfortunately we’ve given nearly* all the Weather Station kits away. Not only are the kits really popular, we also receive lots of questions about how to add extra sensors or how to take more precise measurements of a particular weather phenomenon. So today, to satisfy your demand for a hackable weather station, we’re launching our Build your own weather station guide!

Build Your Own Raspberry Pi weather station

Fun with meteorological experiments!

Our guide suggests the use of many of the sensors from the Oracle Weather Station kit, so can build a station that’s as close as possible to the original. As you know, the Raspberry Pi is incredibly versatile, and we’ve made it easy to hack the design in case you want to use different sensors.

Many other tutorials for Pi-powered weather stations don’t explain how the various sensors work or how to store your data. Ours goes into more detail. It shows you how to put together a breadboard prototype, it describes how to write Python code to take readings in different ways, and it guides you through recording these readings in a database.

Build Your Own Raspberry Pi weather station on a breadboard

There’s also a section on how to make your station weatherproof. And in case you want to move past the breadboard stage, we also help you with that. The guide shows you how to solder together all the components, similar to the original Oracle Weather Station HAT.

Who should try this build

We think this is a great project to tackle at home, at a STEM club, Scout group, or CoderDojo, and we’re sure that many of you will be chomping at the bit to get started. Before you do, please note that we’ve designed the build to be as straight-forward as possible, but it’s still fairly advanced both in terms of electronics and programming. You should read through the whole guide before purchasing any components.

Build Your Own Raspberry Pi weather station – components

The sensors and components we’re suggesting balance cost, accuracy, and easy of use. Depending on what you want to use your station for, you may wish to use different components. Similarly, the final soldered design in the guide may not be the most elegant, but we think it is achievable for someone with modest soldering experience and basic equipment.

You can build a functioning weather station without soldering with our guide, but the build will be more durable if you do solder it. If you’ve never tried soldering before, that’s OK: we have a Getting started with soldering resource plus video tutorial that will walk you through how it works step by step.

Prototyping HAT for Raspberry Pi weather station sensors

For those of you who are more experienced makers, there are plenty of different ways to put the final build together. We always like to hear about alternative builds, so please post your designs in the Weather Station forum.

Our plans for the guide

Our next step is publishing supplementary guides for adding extra functionality to your weather station. We’d love to hear which enhancements you would most like to see! Our current ideas under development include adding a webcam, making a tweeting weather station, adding a light/UV meter, and incorporating a lightning sensor. Let us know which of these is your favourite, or suggest your own amazing ideas in the comments!

*We do have a very small number of kits reserved for interesting projects or locations: a particularly cool experiment, a novel idea for how the Oracle Weather Station could be used, or places with specific weather phenomena. If have such a project in mind, please send a brief outline to [email protected], and we’ll consider how we might be able to help you.

The post Build your own weather station with our new guide! appeared first on Raspberry Pi.

Fully-Loaded Kodi Box Sellers Receive Hefty Jail Sentences

Post Syndicated from Andy original https://torrentfreak.com/fully-loaded-kodi-box-sellers-receive-hefty-jail-sentences-180524/

While users of older peer-to-peer based file-sharing systems have to work relatively hard to obtain content, users of the Kodi media player have things an awful lot easier.

As standard, Kodi is perfectly legal. However, when augmented with third-party add-ons it becomes a media discovery powerhouse, providing most of the content anyone could desire. A system like this can be set up by the user but for many, buying a so-called “fully-loaded” box from a seller is the easier option.

As a result, hundreds – probably thousands – of cottage industries have sprung up to service this hungry market in the UK, with regular people making a business out of setting up and selling such devices. Until three years ago, that’s what Michael Jarman and Natalie Forber of Colwyn Bay, Wales, found themselves doing.

According to reports in local media, Jarman was arrested in January 2015 when police were called to a disturbance at Jarman and Forber’s home. A large number of devices were spotted and an investigation was launched by Trading Standards officers. The pair were later arrested and charged with fraud offenses.

While 37-year-old Jarman pleaded guilty, 36-year-old Forber initially denied the charges and was due to stand trial. However, she later changed her mind and like Jarman, pleaded guilty to participating in a fraudulent business. Forber also pleaded guilty to transferring criminal property by shifting cash from the scheme through various bank accounts.

The pair attended a sentencing hearing before Judge Niclas Parry at Caernarfon Crown Court yesterday. According to local reporter Eryl Crump, the Court heard that the couple had run their business for about two years, selling around 1,000 fully-loaded Kodi-enabled devices for £100 each via social media.

According to David Birrell for the prosecution, the operation wasn’t particularly sophisticated but it involved Forber programming the devices as well as handling customer service. Forber claimed she was forced into the scheme by Jarman but that claim was rejected by the prosecution.

Between February 2013 and January 2015 the pair banked £105,000 from the business, money that was transferred between bank accounts in an effort to launder the takings.

Reporting from Court via Twitter, Crump said that Jarman’s defense lawyer accepted that a prison sentence was inevitable for his client but asked for the most lenient sentence possible.

Forber’s lawyer pointed out she had no previous convictions. The mother-of-two broke up with Jarman following her arrest and is now back in work and studying at college.

Sentencing the pair, Judge Niclas Parry described the offenses as a “relatively sophisticated fraud” carried out over a significant period. He jailed Jarman for 21 months and Forber for 16 months, suspended for two years. She must also carry out 200 hours of unpaid work.

The pair will also face a Proceeds of Crime investigation which could see them paying large sums to the state, should any assets be recoverable.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

Join us at the Education Summit at PyCon UK 2018

Post Syndicated from Ben Nuttall original https://www.raspberrypi.org/blog/pycon-uk-2018/

PyCon UK 2018 will take place on Saturday 15 September to Wednesday 19 September in the splendid Cardiff City Hall, just a few miles from the Sony Technology Centre where the vast majority of Raspberry Pis is made. We’re pleased to announce that we’re curating this year’s Education Summit at the conference, where we’ll offer opportunities for young people to learn programming skills, and for educators to undertake professional development!

PyCon UK Education Summit logo

PyCon UK 2018 is your chance to be welcomed into the wonderful Python community. At the Education Summit, we’ll put on a young coders’ day on the Saturday, and an educators’ day on the Sunday.

Saturday — young coders’ day

On Saturday we’ll be running a CoderDojo full of workshops on Raspberry Pi and micro:bits for young people aged 7 to 17. If they wish, participants will get to make a project and present it to the conference on the main stage, and everyone will be given a free micro:bit to take home!

Kids’ tickets at just £6 will be available here soon.

Kids on a stage at PyCon UK

Kids presenting their projects to the conference

Sunday — educators’ day

PyCon UK has been bringing developers and educators together ever since it first started its education track in 2011. This year’s Sunday will be a day of professional development: we’ll give teachers, educators, parents, and coding club leaders the chance to learn from us and from each other to build their programming, computing, and digital making skills.

Educator workshop at PyCon UK

Professional development for educators

Educators get a special entrance rate for the conference, starting at £48 — get your tickets now. Financial assistance is also available.

Call for proposals

We invite you to send in your proposal for a talk and workshop at the Education Summit! We’re looking for:

  • 25-minute talks for the educators’ day
  • 50-minute workshops for either the young coders’ or the educators’ day

If you have something you’d like to share, such as a professional development session for educators, advice on best practice for teaching programming, a workshop for up-skilling in Python, or a fun physical computing activity for the CoderDojo, then we’d love to hear about it! Please submit your proposal by 15 June.




After the Education Summit, the conference will continue for two days of talks and a final day of development sprints. Feel free to submit your education-related talk to the main conference too if you want to share it with a wider audience! Check out the PyCon UK 2018 website for more information.

We’re looking forward to seeing you in September!

The post Join us at the Education Summit at PyCon UK 2018 appeared first on Raspberry Pi.

Raspberry Jam Cameroon #PiParty

Post Syndicated from Ben Nuttall original https://www.raspberrypi.org/blog/raspberry-jam-cameroon-piparty/

Earlier this year on 3 and 4 March, communities around the world held Raspberry Jam events to celebrate Raspberry Pi’s sixth birthday. We sent out special birthday kits to participating Jams — it was amazing to know the kits would end up in the hands of people in parts of the world very far from Raspberry Pi HQ in Cambridge, UK.

The Raspberry Jam Camer team: Damien Doumer, Eyong Etta, Loïc Dessap and Lionel Sichom, aka Lionel Tellem

Preparing for the #PiParty

One birthday kit went to Yaoundé, the capital of Cameroon. There, a team of four students in their twenties — Lionel Sichom (aka Lionel Tellem), Eyong Etta, Loïc Dessap, and Damien Doumer — were organising Yaoundé’s first Jam, called Raspberry Jam Camer, as part of the Raspberry Jam Big Birthday Weekend. The team knew one another through their shared interests and skills in electronics, robotics, and programming. Damien explains in his blog post about the Jam that they planned ahead for several activities for the Jam based on their own projects, so they could be confident of having a few things that would definitely be successful for attendees to do and see.

Show-and-tell at Raspberry Jam Cameroon

Loïc presented a Raspberry Pi–based, Android app–controlled robot arm that he had built, and Lionel coded a small video game using Scratch on Raspberry Pi while the audience watched. Damien demonstrated the possibilities of Windows 10 IoT Core on Raspberry Pi, showing how to install it, how to use it remotely, and what you can do with it, including building a simple application.

Loïc Dessap, wearing a Raspberry Jam Big Birthday Weekend T-shirt, sits at a table with a robot arm, a laptop with a Pi sticker and other components. He is making an adjustment to his set-up.

Loïc showcases the prototype robot arm he built

There was lots more too, with others discussing their own Pi projects and talking about the possibilities Raspberry Pi offers, including a Pi-controlled drone and car. Cake was a prevailing theme of the Raspberry Jam Big Birthday Weekend around the world, and Raspberry Jam Camer made sure they didn’t miss out.

A round pink-iced cake decorated with the words "Happy Birthday RBP" and six candles, on a table beside Raspberry Pi stickers, Raspberry Jam stickers and Raspberry Jam fliers

Yay, birthday cake!!

A big success

Most visitors to the Jam were secondary school students, while others were university students and graduates. The majority were unfamiliar with Raspberry Pi, but all wanted to learn about Raspberry Pi and what they could do with it. Damien comments that the fact most people were new to Raspberry Pi made the event more interactive rather than creating any challenges, because the visitors were all interested in finding out about the little computer. The Jam was an all-round success, and the team was pleased with how it went:

What I liked the most was that we sensitized several people about the Raspberry Pi and what one can be capable of with such a small but powerful device. — Damien Doumer

The Jam team rounded off the event by announcing that this was the start of a Raspberry Pi community in Yaoundé. They hope that they and others will be able to organise more Jams and similar events in the area to spread the word about what people can do with Raspberry Pi, and to help them realise their ideas.

The Raspberry Jam Camer team, wearing Raspberry Jam Big Birthday Weekend T-shirts, pose with young Jam attendees outside their venue

Raspberry Jam Camer gets the thumbs-up

The Raspberry Pi community in Cameroon

In a French-language interview about their Jam, the team behind Raspberry Jam Camer said they’d like programming to become the third official language of Cameroon, after French and English; their aim is to to popularise programming and digital making across Cameroonian society. Neither of these fields is very familiar to most people in Cameroon, but both are very well aligned with the country’s ambitions for development. The team is conscious of the difficulties around the emergence of information and communication technologies in the Cameroonian context; in response, they are seizing the opportunities Raspberry Pi offers to give children and young people access to modern and constantly evolving technology at low cost.

Thanks to Lionel, Eyong, Damien, and Loïc, and to everyone who helped put on a Jam for the Big Birthday Weekend! Remember, anyone can start a Jam at any time — and we provide plenty of resources to get you started. Check out the Guidebook, the Jam branding pack, our specially-made Jam activities online (in multiple languages), printable worksheets, and more.

The post Raspberry Jam Cameroon #PiParty appeared first on Raspberry Pi.