SVT-AV1 is an open-source AV1 codec implementation hosted on GitHub https://github.com/OpenVisualCloud/SVT-AV1/ under a BSD + patent license. As mentioned in our earlier blog post, Intel and Netflix have been collaborating on the SVT-AV1 encoder and decoder framework since August 2018. The teams have been working closely on SVT-AV1 development, discussing architectural decisions, implementing new tools, and improving compression efficiency. Since open-sourcing the project, other partner companies and the open-source community have contributed to SVT-AV1. In this tech blog, we will report the current status of the SVT-AV1 project, as well as the characteristics and performance of the encoder and decoder.
SVT-AV1 codebase status
The SVT-AV1 repository includes both an AV1 encoder and decoder, which share a significant amount of the code. The SVT-AV1 decoder is fully functional and compliant with the AV1 specification for all three profiles (Main, High, and Professional).
The SVT-AV1 encoder supports all AV1 tools which contribute to compression efficiency. Compared to the most recent master version of libaom (AV1 reference software), SVT-AV1 is similar in compression efficiency and at the same time achieves significantly lower encoding latency on multi-core platforms when using its inherent parallelization capabilities.
SVT-AV1 is written in C and can be compiled on major platforms, such as Windows, Linux, and macOS. In addition to the pure C function implementations, which allows for more flexible experimentation, the codec features extensive assembly and intrinsic optimizations for the x86 platform. See the next section for an outline of the main SVT-AV1 features that allow high performance at competitive compression efficiency. SVT-AV1 also includes extensive documentation on the encoder design targeted to facilitate the onboarding process for new developers.
One of Intel’s goals for SVT-AV1 development was to create an AV1 encoder that could offer performance and scalability. SVT-AV1 uses parallelization at several stages of the encoding process, which allows it to adapt to the number of available cores, including the newest servers with significant core count. This makes it possible for SVT-AV1 to decrease encoding time while still maintaining compression efficiency.
The SVT-AV1 encoder uses multi-dimensional (process-, picture/tile-, and segment-based) parallelism, multi-stage partitioning decisions, block-based multi-stage and multi-class mode decisions, and RD-optimized classification to achieve attractive trade-offs between compression and performance. Another feature of the SVT architecture is open-loop hierarchical motion estimation, which makes it possible to decouple the first stage of motion estimation from the rest of the encoding process.
Compression efficiency and performance
SVT-AV1 reaches similar compression efficiency as libaom at the slowest speed settings. During the codec development, we have been tracking the compression and encoding results at the https://videocodectracker.dev/ site. The plot below shows the improvements in the compression efficiency of SVT-AV1 compared to the libaom encoder over time. Note that the libaom compression has also been improving over time, and the plot below represents SVT-AV1 catching up with the moving target. In the plot, the Y-axis shows the additional bitrate in percent needed to achieve similar quality as libaom encoder according to three metrics. The plot shows the results of the 2-pass encoding mode in both codecs. SVT-AV1 uses 4-thread mode, whereas libaom operates in a single-thread mode. The SVT-AV1 results for the 1-pass fixed-QP encoding mode, commonly used in research, are even more competitive, as detailed below.
The comparison results of the SVT-AV1 against libaom on objective-1-fast test set are presented in the table below. For estimating encoding times, we used Intel(R) Xeon(R) Platinum 8170 CPU @ 2.10GHz machine with 52 physical cores and 96 GB of RAM, with 60 jobs running in parallel. Both codecs use bi-directional hierarchical prediction structure of 16 pictures. The results are presented for 1-pass mode with fixed frame-level QP offsets. A single-threaded compression mode is used. Below, we compute the BD-rates for the various quality metrics: PSNR on all three color planes, VMAF, and MS-SSIM. A negative BD-Rate indicates that the SVT-AV1 encodes produce the same quality with the indicated relative reduction in bitrate. As seen below, SVT-AV1 demonstrates 16.5% decrease in encoding time compared to libaom while being slightly more efficient in compression ability. Note that the encoding times ratio may vary depending on the instruction sets supported by the platform. The results have been obtained on SVT-AV1 cs2 branch (a development branch that is currently being merged into the master, git hash 3a19f29) against the libaom master branch (git hash fe72512). The QP values used to calculate the BD-rates are: 20, 32, 43, 55, 63.
*The overall encoding CPU time difference is calculated as change in total CPU time for all sequences and QPs of the test compared to that of the anchor. It is not equal to the average of per sequence values. Per each sequence, the encoding CPU time difference is calculated as change in total CPU time for all QPs for this sequence.
Since all sequences in the objective-1-fast test set have 60 frames, both codecs use one key frame. The following command line parameters have been used to compare the codecs.
The results above demonstrate the excellent objective performance of SVT-AV1. In addition, SVT-AV1 includes implementations of some subjective quality tools, which can be used if the codec is configured for the subjective quality.
On the objective-1-fast test set, the SVT-AV1 decoder is slightly faster than the libaom in the 1-thread mode, with larger improvements in the 4-thread mode. We observe even larger speed gains over libaom decoder when decoding bitstreams with multiple tiles using the 4-thread mode. The testing has been performed on Windows, Linux, and macOS platforms. We believe the performance is satisfactory for a research decoder, where the trade-offs favor easier experimentation over further optimizations necessary for a production decoder.
To help ensure codec conformance, especially for new code contributions, the code has been comprehensively covered with unit tests and end-to-end tests. The unit tests are built on the Google Test framework. The unit and end-to-end tests are triggered automatically for each pull request to the repository, which is supported by GitHub actions. The tests support sharding, and they run in parallel to speed-up the turn-around time on pull requests.
Over the last several months, SVT-AV1 has matured to become a complete encoder/decoder package providing competitive compression efficiency and performance trade-offs. The project is bolstered with extensive unit test coverage and documentation.
Our hope is that the SVT-AV1 codebase helps further adoption of AV1 and encourages more research and development on top of the current AV1 tools. We believe that the demonstrated advantages of SVT-AV1 make it a good platform for experimentation and research. We invite colleagues from industry and academia to check out the project on Github, reach out to the codebase maintainers for questions and comments or join one of the SVT-AV1 Open Dev meetings. We welcome more contributors to the project.
The legal online movie and TV show streaming landscape has expanded rapidly in recent years but has now developed its own issues.
From having no platforms just a few years ago, there are now many with each attempting to create space for their own exclusive offerings. As a result, consumers must now subscribe to them all at great expense to get everything they desire, something that most wallets are averse to.
At least ostensibly, this is the problem being addressed by AllTheStreams.fm, a self-proclaimed ‘pirate radio’ video site. At the time of writing it is streaming shows from Hulu, Disney+, Netflix, HBO Now, Prime Video and Showtime, all for free.
When we tuned in we were greeted by The Office from Netflix and The Mandalorian from Disney+.
Of course, the latest Disney shows being streamed online for free isn’t something one should ever expect. The AllTheStreams ‘manifesto’ is curious too.
“Netflix has all the Netflix stuff, Disney has all the Disney stuff, and never the twain shall meet. Let’s change that, however briefly,” it reads.
“Whenever media becomes inaccessible, piracy thrives again – from the 1960’s BBC 1-hour limit on pop music, to the iTunes store mp3 tyranny of the 00s. Today, All The Streams comes in response to the fragmentation and walled-garden paradigm that has risen to prominence for online video streaming services.”
The people behind AllTheStreams say they don’t care about several things, including but perhaps not limited to user-utility, scalability and terms of service. “All The Streams is made to revel in platform independence, and to demonstrate how even the most lo-fi hacks can be the equal of giants. We’re going to play anything and everything we feel like,” the manifesto continues.
“We’re going to make a frankensteinian playlist of media that none of these streaming platforms could ever recommend to you because it would cost them the profits of their exclusively-owned content. Sit back and enjoy the ride: like all pirate media offerings, we’re doing this for you.”
Given the tone, one could be forgiven for thinking that AllTheStreams.fm had just redirected to the PR department of The Pirate Bay, which is bizarrely half the way there. While The Pirate Bay has nothing to do with this site, the people behind it are a PR department of sorts.
In the bottom right-hand corner of the site are five fairly familiar letters – MSCHF. Said quickly we hear ‘mischief’, something that the company of the same name has caused plenty of.
MSCHF is a marketing company that claims to run on “structured chaos“. It previously developed a browser add-on that disguised Netflix viewing as a conference call, launched an Internet restaurant that transformed corporate perks into a tool to attack corporations, and sold Air Max 97s with ‘Holy Water’ in the sole.
So what’s this latest campaign all about? TorrentFreak spoke with CEO Gabriel Whaley who didn’t give away too much but pretty much admitted that the campaign is infringing copyrights.
“We don’t have any permissions to be running this whatsoever. But once one network shuts us down, five stand in their place!” he said.
That sounds a little bit like the pirate mantra (shut down one site, another five will appear) completely turned on its head, which is another curiosity. We put it to Whaley that if AllTheStreams doesn’t have permission to stream The Mandalorian online to the masses for free, then it’s possible that the Alliance For Creativity and Entertainment could be paying a visit quite soon.
We received no response to that thought but given MSCHF’s record, of engaging in over-the-top stunts to gain exposure, the legal aspect must have at least crossed the company’s mind. We asked if a calculation had been done to balance the potential marketing exposure against potential legal fees and what the endgame might be, but that didn’t help much either.
“Haha,” Whaley added. “MSCHF’s only endgame is the endgame itself.”
It’s almost impossible to work out what MSCHF is trying to achieve here. Maybe they have some kind of plan to bring all streaming services under one roof at a fair price and if so, that would be impressive, if brutally obvious. Given their history, however, it’s more likely they’re hoping to sell out a new line of exclusive sports clothing for cats made entirely of cheese.
If the company is looking for exposure, which it most certainly is, it has the right recipe here. Perhaps Disney will supply the crackers. Or the lawyers.
Drom: TF, for the latest news on copyright battles, torrent sites and more. We also have an annual VPN review.
Netflix is pleased to announce the open-source release of our crisis management orchestration framework: Dispatch!
Okay, but what is Dispatch? Put simply, Dispatch is:
All of the ad-hoc things you’re doing to manage incidents today, done for you, and a bunch of other things you should’ve been doing, but have not had the time!
Dispatch helps us effectively manage security incidents by deeply integrating with existing tools used throughout an organization (Slack, GSuite, Jira, etc.,) Dispatch leverages the existing familiarity of these tools to provide orchestration instead of introducing another tool.
This means you can let Dispatch focus on creating resources, assembling participants, sending out notifications, tracking tasks, and assisting with post-incident reviews; allowing you to focus on actually fixing the issue! Sounds interesting? Continue reading!
The Challenge of Crisis Management
Managing incidents is a stressful job. You are dealing with many questions all at once: What’s the scope? Who can help me? Who do I need to engage? How do I manage all of this?
In general, every incident is unique and extraordinary, if the same incidents are happening over and over you’re firefighting.
There are four main components to Crisis Management that we are attempting to address:
Resource Management — The management of not only data collected about the incident itself but all of the metadata about the response.
Individual Engagement — Understanding the best way to engage individuals and teams, and doing so based on incident context.
Life Cycle Management — Providing the Incident Commander (IC) tools to easily manage the life cycle of the incident.
Incident Learning — Building on past incidents to speed up the resolution of future incidents.
We will use the following terminology throughout the rest of the discussion:
Incident Commanders are individuals that are responsible for driving the incident to resolution.
Incident Participants are individuals that are Subject Matter Experts (SMEs) that have been engaged to help resolve the incident.
Resources are documents, screenshots, logs or any other piece of digital information that is used during an incident.
For an average incident, there are quite a few steps to managing an incident and much of it is typically handled on an ad-hoc basis by a human. Let’s enumerate them:
Declare an Incident — There are many different entry points to a potential incident: automated alerts, an internal notification, or an external notification.
Determine Incident Commander — Determining the sole individual responsible for driving a particular incident to resolution based on the incident source, type, and priority.
Create Communication Channels — Communication during incidents is key. Establishing dedicated and standardized channels for communication prevents the creation of communication silos.
Create Incident Document — The central document responsible for containing up-to-date incident information, including a description of the incident, links to resources, rough notes from in-person meetings, open questions, action items, and timeline information.
Engage Individual Resources — An incident commander will not be able to resolve an incident by themselves, they must identify and engage additional resources within the organization to help them.
Orient Individual Resources — Engaging additional resources is not enough, the Incident Commander needs to orient these resources to the situation at hand.
Notify Key Stakeholders — For any given incident, key stakeholders not directly involved in resolving the incident need to be made aware of the incident.
Drive Incident to Resolution — The actual resolution of the incident, creating tasks, asking questions, and tracking answers. Making note of key learnings to be addressed after resolution.
Perform Post Incident Review (PIR) — Review how the incident process was performed, tracking actions to be performed after the incident, and driving learning through structuring informal knowledge.
Each of these steps has the incident commander and incident participants moving through various systems and interfaces. Each context switch adds to the cognitive load on the responder and distracting them from resolving the incident itself.
Toward Better Crisis Management
Crisis management is not a new challenge, tools like Jira, PagerDuty, VictorOps are all helping organizations manage and respond to incidents. When setting out to automate our incident management process we had two main goals:
Re-use existing tools users were already familiar with; reducing the learning curve to contributing to incidents.
Catalog, store and analyze our incident data to speed up resolution.
Dispatch is a crisis management orchestration framework that manages incident metadata and resources. It uses tools already in use throughout an organization, providing incident participants a comprehensive crisis management toolset, allowing them to focus on resolving the incident.
Unlike many of our tools Dispatch is not tightly bound to AWS, Dispatch does not use any AWS APIs at all! While Dispatch doesn’t use AWS APIs, it leverages multiple APIs that are deeply embedded into the organization (e.g. Slack, GSuite, PagerDuty, etc.,). In addition to all of the built-in integrations, Dispatch provides multiple integration points that allow it to fit into just about any existing environment.
Although developed as a tool to help Netflix manage security incidents, nothing about Dispatch is specific to a security use-case. At its core, Dispatch aims to manage the entire lifecycle of an incident, focusing on engaging individuals and providing them the context they need to drive the incident to resolution.
Let’s take a look at what an incident commander’s new workflow would look like using Dispatch:
Some key benefits of the new workflow are:
The incident commander no longer needs to manage access to resources or multiple data streams.
Communications are standardized (both in style and interval) across incidents.
Incident participants are automatically engaged based on the type, priority, and description of the incident.
Incident tasks are tracked and owners are reminded if they’re not completed on time.
All incident data is centrally tracked.
A common API is provided for internal users and tools.
We want to make reporting incidents as frictionless as possible, giving users a straightforward path to engage the resources they need in a time of crisis.
Jumping between different tools, ensuring data is correct and in sync is a low-value exercise for an incident commander. Instead, we centralized on two common tools to manage the entire lifecycle. Slack for managing incident metadata (e.g. status, title, description, priority, etc,.) and Google Doc and Google Drive for managing data itself.
When teams need to look across many incidents, Dispatch provides an Admin UI. This interface is also where incident knowledge is managed. From common terms and their definitions, individuals, teams, and services. The Admin UI is how we manage incident knowledge for use in future incidents.
Dispatch makes use of the following components:
Python 3.8 with FastAPI (including helper packages)
We’re shipping Dispatch with built-in plugins that allow you to create and manage resources with GSuite (Docs, Drive, Sheets, Calendar, Groups), Jira, PagerDuty, and Slack. But the plugin architecture allows for integrations with whatever tools your organization is already using.
Feel free to reach out or submit pull requests if you have any suggestions. We’re looking forward to seeing what new plugins you create to make Dispatch work for you! We hope you’ll find Dispatch as useful as we do!
By Aditya Mavlankar, Jan De Cock¹, Cyril Concolato, Kyle Swanson, Anush Moorthy and Anne Aaron
We need an alternative to JPEG that a) is widely supported, b) has better compression efficiency and c) has a wider feature set. We believe AV1 Image File Format (AVIF) has the potential. Using the framework we have open sourced, AVIF compression efficiency can be seen at work and compared against a whole range of image codecs that came before it.
Image compression at Netflix
Netflix is enjoyed by its members on a variety of devices — smart TVs, phones, tablets, personal computers and streaming devices connected to TV screens. The user interface (UI), intended for browsing the catalog and serving up recommendations, is rich in images and graphics across all device categories. Shown below are screenshots of the Netflix app on iOS as an example.
Image assets might be based on still frames from the title, special on-set photography or a combination thereof. Assets could also stem from art generated during the production of the feature.
As seen above, image assets typically have gradients, text and graphics, for example the Netflix symbol or other title-specific symbols such as “The Witcher” insignia, composited on the image. Such special treatments lead to a variety of peculiarities which do not necessarily arise in natural images. Hard edges, including those with chroma differences on either side of the edge, are common and require good detail preservation, since they typically occur at salient locations and convey important information. Further, there is typically a character or a face in salient locations with a smooth, uncluttered background. Again, preservation of detail on the character’s face is of primary importance. In some cases, the background is textured and complex, exhibiting a wide range of frequencies.
After an image asset is ingested, the compression pipeline kicks in and prepares compressed image assets meant for delivering to devices. The goal is to have the compressed image look as close to the original as possible while reducing the number of bytes required. Given the image-heavy nature of the UI, compressing these images well is of primary importance. This involves picking, among other things, the right combination of color subsampling, codec, encoder parameters and encoding resolution.
Let us take color subsampling as an example. Choosing 420 subsampling, over the original 444 format, halves the number of samples (counting across all 3 color planes) that need to be encoded while relying on the fact that the human visual system is more sensitive to luma than chroma. However, 420 subsampling can introduce color bleeding and jaggies in locations with color transitions. Below we toggle between the original source in 444 and the source converted to 420 subsampling. The toggling shows loss introduced just by the color subsampling, even before the codec enters the picture.
Nevertheless, there are source images where the loss due to 420 subsampling is not obvious to human perception and in such cases it can be advantageous to use 420 subsampling. Ideally, a codec should be able to support both subsampling formats. However, there are a few codecs that only support 420 subsampling — webp, discussed below, is one such popular codec.
Brief overview of image coding formats
The JPEG format was introduced in 1992 and is widely popular. It supports various color subsamplings including 420, 422 and 444. JPEG can ingest RGB data and transform it to a luma-chroma representation before performing lossy compression. The discrete cosine transform (DCT) is employed as the decorrelating transform on 8×8 blocks of samples. This is followed by quantization and entropy coding. However, JPEG is restricted to 8-bit imagery and lacks support for alpha channel. The more recent JPEG-XT standard extends JPEG to higher bit-depths, support for alpha channel, lossless compression and more in a backwards compatible way.
The JPEG 2000 format, based on the discrete wavelet transform (DWT), was introduced as a successor to JPEG in the year 2000. It brought a whole range of additional features such as spatial scalability, region of interest coding, range of supported bit-depths, flexible number of color planes, lossless coding, etc. With the motion extension, it was accepted as the video coding standard for digital cinema in 2004.
The webp format was introduced by Google around 2010. Google added decoding support on Android devices and Chrome browser and also released libraries that developers could add to their apps on other platforms, for example iOS. Webp is based on intra-frame coding from the VP8 video coding format. Webp does not have all the flexibilities of JPEG 2000. It does, however, support lossless coding and also a lossless alpha channel, making it a more efficient and faster alternative to PNG in certain situations.
High-Efficiency Video Coding (HEVC) is the successor of H.264, a.k.a. Advanced Video Coding (AVC) format. HEVC intra-frame coding can be encapsulated in the High-Efficiency Image File Format (HEIF). This format is most notably used by Apple devices to store recorded imagery.
Similarly, AV1 Image File Format (AVIF) allows encapsulating AV1 intra-frame coded content, thus taking advantage of excellent compression gains achieved by AV1 over predecessors. We touch upon some appealing technical features of AVIF in the next section.
The JPEG committee is pursuing a coding format called JPEG XL which includes features aimed at helping the transition from legacy JPEG format. Existing JPEG files can be losslessly transcoded to JPEG XL while achieving file size reduction. Also included is a lightweight conversion process back to JPEG format in order to serve clients that only support legacy JPEG.
AVIF technical features
Although modern video codecs were developed with primarily video in mind, the intraframe coding tools in a video codec are not significantly different from image compression tooling. Given the huge compression gains of modern video codecs, they are compelling as image coding formats. There is a potential benefit in reusing the hardware in place for video compression/decompression. Image decoding in hardware may not be a primary motivator, given the peculiarities of OS dependent UI composition, and architectural implications of moving uncompressed image pixels around.
In the area of image coding formats, the Moving Picture Experts Group (MPEG) has standardized a codec-agnostic and generic image container format: ISO/IEC 23000–12 standard (a.k.a. HEIF). HEIF has been used to store most notably HEVC-encoded images (in its HEIC variant) but is also capable of storing AVC-encoded images or even JPEG-encoded images. The Alliance for Open Media (AOM) has recently extended this format to specify the storage of AV1-encoded images in its AVIF format. The base HEIF format offers typical features expected from an image format such as: support for any image codec, ability to use a lossy or a lossless mode for compression, support for varied subsampling and bit-depths, etc. Furthermore, the format also allows the storage of a series of animated frames (offering an efficient and long-awaited alternative to animated GIFs), and the ability to specify an alpha channel (which sees tremendous use in UIs). Further, since the HEIF format borrows learnings from next-generation video compression, the format allows for preserving metadata such as color gamut and high dynamic range (HDR) information.
Image compression comparison framework
We have open sourced a Docker based framework for comparing various image codecs. Salient features include:
Encode orchestration (with parallelization) and insights generation using Python 3
Easy reproducibility of results and
Easy control of target quality range(s).
Since the framework allows one to specify a target quality (using a certain metric) for target codec(s), and stores these results in a local database, one can easily utilize the Bjontegaard-Delta (BD) rate to compare across codecs since the target points can be restricted to a useful or meaningful quality range, instead of blindly sweeping across the encoder parameter range (such as a quality factor) with fixed parameter values and landing on arbitrary quality points.
An an example, below are the calls that would produce compressed images for the choice of codecs at the specified SSIM and VMAF values, with the desired tolerance in target quality:
For the various codecs and configurations involved in the ensuing comparison, the reader can view the actual command lines in the shared repository. We have attempted to get the best compression efficiency out of every codec / configuration compared here. The reader is free to experiment with changes to encoding commands within the framework. Furthermore, newer versions of respective software implementations might have been released compared to versions used at the time of gathering below results. For example, a newer software version of Kakadu demo apps is available compared to the one in the framework snapshot on github used at the time of gathering below results.
This is the section where we get to admire the work of the compression community over the last 3 decades by looking at visual examples comparing JPEG and the state-of-the-art.
The encoded images shown below are illustrative and meant to compare visual quality at various target bitrates. Please note that the quality of the illustrative encodes is not representative of the high quality bar that Netflix employs for streaming image assets on the actual service, and is meant to be purely educative in nature.
Shown below is one original source image from the Kodak dataset and the corresponding result with JPEG 444 @ 20,429 bytes and with AVIF 444 @ 19,788 bytes. The JPEG encode shows very obvious blocking artifacts in the sky, in the pond as well as on the roof. The AVIF encode is much better, with less blocking artifacts, although there is some blurriness and loss of texture on the roof. It is still a remarkable result, given the compression factor of around 59x (original image has dimensions 768×512, thus requiring 768x512x3 bytes compared to the 20k bytes of the compressed image).
For the same source, shown below is the comparison of JPEG 444 @ 40,276 bytes and AVIF 444 @ 39,819 bytes. The JPEG encode still has visible blocking artifacts in the sky, along with ringing around the roof edges and chroma bleeding in several locations. The AVIF image however, is now comparable to the original, with a compression factor of 29x.
Shown below is another original source image from the Kodak dataset and the corresponding result with JPEG 444 @ 13,939 bytes and with AVIF 444 @ 4,176 bytes. The JPEG encode shows blocking artifacts around most edges, particularly around the slanting edge as well as color distortions. The AVIF encode looks “cleaner” even though it is one-third the size of the JPEG encode. It is not a perfect rendition of the original, but with a compression factor of 282x, this is commendable.
Shown below are results for the same image with slightly higher bit-budget; JPEG 444 @ 19,787 bytes versus AVIF 444 @ 20,120 bytes. The JPEG encode still shows blocking artifacts around the slanting edge whereas the AVIF encode looks nearly identical to the source.
Shown below is an original image from the Netflix (internal) 1142×1600 resolution “boxshots-1” dataset. Followed by JPEG 444 @ 69,445 bytes and AVIF 444 @ 40,811 bytes. Severe banding and blocking artifacts along with color distortions are visible in the JPEG encode. Less so in the AVIF encode which is actually 29kB smaller.
Shown below are results for the same image with slightly increased bit-budget. JPEG 444 @ 80,101 bytes versus AVIF 444 @ 85,162 bytes. The banding and blocking is still visible in the JPEG encode whereas the AVIF encode looks very close to the original.
Shown below is another source image from the same boxshots-1 dataset along with JPEG 444 @ 81,745 bytes versus AVIF 444 @ 76,087 bytes. Blocking artifacts overall and mosquito artifacts around text can be seen in the JPEG encode.
Shown below is another source image from the boxshots-1 dataset along with JPEG 444 @ 80,562 bytes versus AVIF 444 @ 80,432 bytes. There is visible banding, blocking and mosquito artifacts in the JPEG encode whereas the AVIF encode looks very close to the original source.
Shown below are results over public datasets as well as Netflix-internal datasets. The reference codec used is JPEG from the JPEG-XT reference software, using the standard quantization matrix defined in Annex K of the JPEG standard. Following are the codecs and/or configurations tested and reported against the baseline in the form of BD rate.
The encoding resolution in these experiments is the same as the source resolution. For 420 subsampling encodes, the quality metrics were computed in 420 subsampling domain. Likewise, for 444 subsampling encodes, the quality metrics were computed in 444 subsampling domain. Along with BD rates associated with various quality metrics, such as SSIM, MS-SSIM, VIF and PSNR, we also show rate-quality plots using SSIM as the metric.
Kodak dataset; 24 images; 768×512 resolution
We have uploaded the source images in PNG format here for easy reference. We give the necessary attribution to Kodak as the source of this dataset.
Given a quality metric, for each image, we consider two separate rate-quality curves. One curve associated with the baseline (JPEG) and one curve associated with the target codec. We compare the two and compute the BD-rate which can be interpreted as the average percentage rate reduction for the same quality over the quality region being considered. A negative value implies rate reduction and hence is better compared to the baseline. As a last step, we report the arithmetic mean of BD rates over all images in the dataset. We also highlight the best performer in the tables below.
Billboard dataset (Netflix-internal); 223 images; 2048×1152 resolution
Billboard images generally occupy a larger canvas than the thumbnail-like boxshot images and are generally horizontal. There is room to overlay text or graphics on one of the sides, either left or right, with salient characters/scenery/art being located on the other side. An example can be seen below. The billboard source images are internal to Netflix and hence do not constitute a public dataset.
Unlike billboard images, boxshot images are vertical and typically boxshot images representing different titles are displayed side-by-side in the UI. Examples from this dataset are showcased in the section above on visual examples. The boxshots-1 source images are internal to Netflix and hence do not constitute a public dataset.
The boxshots-2 dataset also has vertical box art but of lower resolution. The boxshots-2 source images are internal to Netflix and hence do not constitute a public dataset.
At this point, it might be prudent to discuss the omission of VMAF as a quality metric here. In previous work we have shown that for JPEG-like distortions and datasets similar to “boxshots” and “billboards”, VMAF has high correlation with perceived quality. However, VMAF, as of today, is a metric trained and developed to judge encoded videos rather than static images. The range of distortions associated with the range of image codecs in our tests is broader than what was considered in the VMAF development process and to that end, it may not be an accurate measure of image quality for those codecs. Further, today’s VMAF model is not designed to capture chroma artifacts and hence would be unable to distinguish between 420 and 444 subsampling, for instance, apart from other chroma artifacts (this is also true of some other measures we’ve used, but given the lack of alternatives, we’ve leaned on the side of using the most well tested and documented image quality metrics). This is not to say that VMAF is grossly inaccurate for image quality, but to say that we would not use it in our evaluation of image compression algorithms with such a wide diversity of codecs at this time. We have some exciting upcoming work to improve the accuracy of VMAF for images, across a variety of codecs, and resolutions, including chroma channels in the score. Having said that, the code in the repository computes VMAF and the reader is encouraged to try it out and see that AVIF also shines judging by VMAF as is today.
PSNR does not have as high correlation with perceptual quality over a wide quality range. However, if encodes are made with a high PSNR target then one overspends bits but can rest assured that a high PSNR score implies closeness to the original. With perceptually driven metrics, we sometimes see failure manifest in rare cases where the score is undeservingly high but visual quality is lacking.
Interesting observation regarding subsampling
In addition to above quality calculations, we have the following observation which reveals an encouraging trend among modern codecs. After performing an encode with 420 subsampling, let’s assume we decode the image, up-convert it to 444 subsampling and then compute various metrics by comparing against the original source in 444 format. We call this configuration “444u” to distinguish from above cases where “encode-subsampling” and “quality-computation-subsampling” match. Among the chosen metrics, PSNR_AVG is one which takes all 3 channels (1 luma and 2 chroma) into account. With an older codec like JPEG, the bit-budget is spread thin over more samples while encoding 444 subsampling compared to encoding 420 subsampling. This shows as poorer PSNR_AVG for encoding JPEG with 444 subsampling compared to 420 subsampling, as shown below. However, given a rate target, with modern codecs like HEVC and AVIF, it is simply better to encode 444 subsampling over a wide range of bitrates.
We see that with modern codecs we yield a higher PSNR_AVG when encoding 444 subsampling than 420 subsampling over the entire region of “practical” rates, even for the other, more practical, datasets such as boxshots-1. Interestingly, with JPEG, we see a crossover; i.e., after crossing a certain rate, it starts being more efficient to encode 444 subsampling. Such crossovers are analogous to rate-quality curves crossing over when encoding over multiple spatial resolutions. Shown below are rate-quality curves for two different source images from the boxshots-1 dataset, comparing JPEG and AVIF in both 444u and 444 configurations.
AVIF support and next steps
Although AVIF provides superior compression efficiency, it is still at an early deployment stage. Various tools exist to produce and consume AVIF images. The Alliance for Open Media is notably developing an open-source library, called libavif, that can encode and decode AVIF images. The goal of this library is to ease the integration in software from the image community. Such integration has already started, for example, in various browsers, such as Google Chrome, and we expect to see broad support for AVIF images in the near future. Major efforts are also ongoing, in particular from the dav1d team, to make AVIF image decoding as fast as possible, including for 10-bit images. It is conceivable that we will soon test AVIF images on Android following on the heels of our recently announced AV1 video adoption efforts on Android.
The datasets used above have standard dynamic range (SDR) 8-bit imagery. At Netflix, we are also working on HDR images for the UI and are planning to use AVIF for encoding these HDR image assets. This is a continuation of our previous efforts where we experimented with JPEG 2000 as the compression format for HDR images and we are looking forward to the superior compression gains afforded by AVIF.
We would like to thank Marjan Parsa, Pierre Lemieux, Zhi Li, Christos Bampis, Andrey Norkin, Hunter Ford, Igor Okulist, Joe Drago, Benbuck Nason, Yuji Mano, Adam Rofer and Jeff Watts for all their contributions and collaborations.
¹as part of his work while he was affiliated with Netflix
When Netflix had just started offering online video content years ago, it didn’t consider piracy to be a major issue.
However, now that the company itself is one of the largest content producers, this outlook has changed drastically.
Like many other rightsholders, Netflix now keeps a close eye on pirate sites and services. The company has its own in-house anti-piracy team and also works with third-party companies, to issue takedown requests.
Increasingly, the streaming service is also teaming with other rightsholders to coordinate its enforcement efforts. Netflix is one of the founding members of the Alliance For Creativity and Entertainment (ACE) and earlier this year it joined the Motion Picture Association (MPA).
This week brings yet another expansion, one that takes it across borders to Italy. Local anti-piracy organization FAPAV just announced that Netflix and the Italian football league Serie A are its latest members.
“The addition of two new members, Lega Serie A and Netflix, will consolidate and further enrich the association’s membership base,” FAPAV announced at its end-of-year gathering.
The anti-piracy group has been very active in Europe over the past several years. A few months ago, it helped to bring down the pirate release group FREE/iNCOMiNG, for example, and it was also involved in the court case that brought down the famous torrent site TNTVillage.
FAPAV has also been a driving force behind local pirate site blockades. In 2019, more than 400 new sites were blocked in Italy, which is more than double compared to the year before.
With help from Netflix, FAPAV will keep its focus on various piracy sources in the coming year.
IPTV piracy will be one of its main priorities in 2020, the group says. In addition to enforcement actions, it will also continue its efforts to educate the public on the downsides of piracy through informational campaigns.
Hack Day at Netflix is an opportunity to build and show off a feature, tool, or quirky app. The goal is simple: experiment with new ideas/technologies, engage with colleagues across different disciplines, and have fun!
We know even the silliest idea can spur something more.
The most important value of our Hack Days is that they support a culture of innovation. We believe in this work, even if it never ships, and enjoy sharing the creativity and thought put into these ideas.
Below, you can find videos made by the hackers of some of our favorite hacks from this event.
Nostalgiflixis a chrome extension that transforms your Netflix web browser into an interactive TV time machine covering three decades (80’s, 90’s, and 00’s.) By dragging the UI slider around, you can view titles originally released within the selected year ( based on their historic box office and episode air dates.) More importantly you can also adjust the video filters in real-time to creatively downgrade the viewing experience, further enhancing the nostalgic effect. We think this feature could encourage our users to watch more of our older content while having fun reliving those moments of cinematic history.
This is a real time visualization of all contacts around the world. Each square on the map represent one of our global contact centers, spanning from Salt Lake City to Brazil, India, and Japan. The heatmap in the background is a historical trend of calls over the last hour, showing which countries are currently most active in contacting customer service. Every line you see is a live customer contact — starting at the customer’s country and ending at the contact center it was routed to. Four different types of contacts are represented in this visualization, white for regular phone calls, light blue for chats, green for calls that are initiated through our mobile apps on android and iOS, and red for contacts which are escalated from one representative to another.
Audio Descriptive tracks provide descriptive narration in addition to dialog, helping visually impaired and blind members enjoy our shows. For the Hack Day project, we explored using recent research¹ to automatically generate descriptions, then used our own internal authoring tools to refine the output. We then used synthetic audio and automated mixing techniques to deliver a final audio description track.
Despite a clear decrease in momentum in the UK in recent times, site-blocking remains a favored anti-piracy tool in many countries around the world.
Companies exploiting the Australian market seem convinced that the practice is good for business, as a brand new blocking application filed at the Federal Court shows. First reported by ComputerWorld, it features a broad coalition of movie, TV show, and anime companies, all of whom have previous blocking experience in Australia.
To keep the ‘feel’ of the application as local as possible, it’s no surprise that Roadshow Films is the lead applicant, despite having just one movie (The Lego Movie) listed in court documents. The remaining 11 include Disney, Paramount, Columbia, Universal, Warner and Netflix, plus Hong Kong-based broadcaster Television Broadcasts Limited and anime distributor Madman Anime.
With the companies involved having trod the blocking injunction path many times before, the application itself now takes a very familiar form. It demands that 50 local ISPs including Telstra, Optus, TPG and Vodafone block a wide range of ‘pirate’ sites. In terms of content, however, this is one of the broadest applications yet.
In Australia legal-speak, pirate sites of all kinds are referred to as “Target Online Locations” (TOL), of which there are 87 (identified by their domains) in the current application.
There are several categories of ‘TOL’ – streaming platforms, download platforms, linking sites (including torrent sites), sites that offer software that allows streaming or downloads, those that provide subtitles for copyright works, plus sites that offer proxy access to pirate sites.
Some notable inclusions are the community-resurrected KickassTorrents site operating from Katcr.co, plus some less than authentic Kickass clones operating from around half a dozen additional URLs.
The same goes for a range of domains trading on the SolarMovie, YIFY and YTS brands, without being connected to the original sites. In fact, many domains listed in the application follow this copycat theme, including those featuring 123movies, Primewire, CouchTuner, Putlocker, WatchFree, ProjectFreeTV, and YesMovies-style wording.
An interesting addition is that of getpopcorntime.is. This isn’t the original Popcorn Time app download site but does offer a variant of the software that can be used to gain access to movies and TV shows. However, the domain itself doesn’t offer any infringing content, or any links to the same.
Subtitle download sites, including TVSubtitles.net and MSubs.net, are included in the application. These types of platforms were previously the topic of debate in a previous application but the court eventually conceded they can indeed be blocked.
In a sign of how far the net is now being cast (most of the major pirate sites are already blocked in Australia), this application also features Russian torrent giant Rutor.info and China-focused btbtdy.me. Both of these sites have plenty of alternative domains so blocking just these two is unlikely to achieve much.
Finally, no blocking application would be complete without an effort to block all the ‘proxy’ sites that have the sole purpose of facilitating access to sites blocked as a result of previous injunctions. The problem in respect of these proxies seems to be considerable, with at least 13 of the 87 domains in this application falling into that category.
The full list of domains requested for blocking is as follows:
Infrastructure for Contextual Bandits and Reinforcement Learning — theme of the ML Platform meetup hosted at Netflix, Los Gatos on Sep 12, 2019.
Contextual and Multi-armed Bandits enable faster and adaptive alternatives to traditional A/B Testing. They enable rapid learning and better decision-making for product rollouts. Broadly speaking, these approaches can be seen as a stepping stone to full-on Reinforcement Learning (RL) with closed-loop, on-policy evaluation and model objectives tied to reward functions. At Netflix, we are running several such experiments. For example, one set of experiments is focussed on personalizing our artwork assets to quickly select and leverage the “winning” images for a title we recommend to our members.
As with other traditional machine learning and deep learning paths, a lot of what the core algorithms can do depends upon the support they get from the surrounding infrastructure and the tooling that the ML platform provides. Given the infrastructure space for RL approaches is still relatively nascent, we wanted to understand what others in the community are doing in this space.
This was the motivation for the meetup’s theme. It featured three relevant talks from LinkedIn, Netflix and Facebook, and a platform architecture overview talk from first time participant Dropbox.
After a brief introduction on the theme and motivation of its choice, the talks were kicked off by Kinjal Basu from LinkedIn who talked about Online Parameter Selection for Web-Based Ranking via Bayesian Optimization. In this talk, Kinjal used the example of the LinkedIn Feed, to demonstrate how they use bandit algorithms to solve for the optimal parameter selection problem efficiently.
He started by laying out some of the challenges around inefficiencies of engineering time when manually optimizing for weights/parameters in their business objective functions. The key insight was that by assuming a latent Gaussian Process (GP) prior on the key business metric actions like viral engagement, job applications, etc., they were able to reframe the problem as a straight-forward black-box optimization problem. This allowed them to use BayesOpt techniques to solve this problem.
The algorithm used to solve this reformulated optimization problem is a popular E/E technique known as Thompson Sampling. He talked about the infrastructure used to implement this. They have built an offline BayesOpt library, a parameter store to retrieve the right set of parameters, and an online serving layer to score the objective at serving time given the parameter distribution for a particular member.
He also described some practical considerations, like member-parameter stickiness, to avoid per session variance in a member’s experience. Their offline parameter distribution is recomputed hourly, so the member experience remains consistent within the hour. Some simulation results and some online A/B test results were shared, demonstrating substantial lifts in the primary business metrics, while keeping the secondary metrics above preset guardrails.
He concluded by stressing the efficiency their teams had achieved by doing online parameter exploration instead of the much slower human-in-the-loop manual explorations. In the future, they plan to explore adding new algorithms like UCB, considering formulating the problem as a grey-box optimization problem, and switching between the various business metrics to identify which is the optimal metric to optimize.
The second talk was by Netflix on our Bandit Infrastructure built for personalization use cases. Fernando Amat and Elliot Chow jointly gave this talk.
Fernando started the first part of the talk and described the core recommendation problem of identifying the top few titles in a large catalog that will maximize the probability of play. Using the example of evidence personalization — images, text, trailers, synopsis, all assets that come together to add meaning to a title — he described how the problem is essentially a slate recommendation task and is well suited to be solved using a Bandit framework.
If such a framework is to be generic, it must support different contexts, attributions and reward functions. He described a simple Policy API that models the Slate tasks. This API supports the selection of a state given a list of options using the appropriate algorithm and a way to quantify the propensities, so the data can be de-biased. Fernando ended his part by highlighting some of the Bandit Metrics they implemented for offline policy evaluation, like Inverse Propensity Scoring (IPS), Doubly Robust (DR), and Direct Method (DM).
For Bandits, where attribution is a critical part of the equation, it’s imperative to have a flexible and robust data infrastructure. Elliot started the second part of the talk by describing the real-time framework they have built to bring together all signals in one place making them accessible through a queryable API. These signals include member activity data (login, search, playback), intent-to-treat (what title/assets the system wants to impress to the member) and the treatment (impressions of images, trailers) that actually made it to the member’s device.
Elliot talked about what is involved in “Closing the loop”. First, the intent-to-treat needs to be joined with the treatment logging along the way, the policies in effect, the features used and the various propensities. Next, the reward function needs to be updated, in near real time, on every logged action (like a playback) for both short-term and long-term rewards. And finally each new observation needs to update the policy, compute offline policy evaluation metrics and then push the policy back to production so it can generate new intents to treat.
To be able to support this, the team had to standardize on several infrastructure components. Elliot talked about the three key components — a) Standardized Logging from the treatment services, b) Real-time stream processing over Apache Flink for member activity joins, and c) an Apache Spark client for attribution and reward computation. The team has also developed a few common attribution datasets as “out-of-the-box” entities to be used by the consuming teams.
Finally, Elliot ended by talking about some of the challenges in building this Bandit framework. In particular, he talked about the misattribution potential in a complex microservice architecture where often intermediary results are cached. He also talked about common pitfalls of stream-processed data like out of order processing.
This framework has been in production for almost a year now and has been used to support several A/B tests across different recommendation use cases at Netflix.
After a short break, the second session started with a talk from Facebook focussed on practical solutions to exploration problems. Sam Daulton described how the infrastructure and product use cases came along. He described how the adaptive experimentation efforts are aimed at enabling fast experimentation with a goal of adding varying degrees of automation for experts using the platform in an ad hoc fashion all the way to no-human-in-the-loop efforts.
He dived into a policy search problem they tried to solve: How many posts to load for a user depending upon their device’s connection quality. They modeled the problem as an infinite-arm bandit problem and used Gaussian Process (GP) regression. They used Bayesian Optimization to perform multi-metric optimization — e.g., jointly optimizing decrease in CPU utilization along with increase in user engagement. One of the challenges he described was how to efficiently choose a decision point, when the joint optimization search presented a Pareto frontier in the possible solution space. They used constraints on individual metrics in the face of noisy experiments to allow business decision makers to arrive at an optimal decision point.
Not all spaces can be efficiently explored online, so several research teams at Facebook use Simulations offline. For example, a ranking team would ingest live user traffic and subject it to a number of ranking configurations and simulate the event outcomes using predictive models running on canary rankers. The simulations were often biased and needed de-biasing (using multi-task GP regression) for them to be used alongside online results. They observed that by combining their online results with de-biased simulation results they were able to substantially improve their model fit.
To support these efforts, they developed and open sourced some tools along the way. Sam described Ax and BoTorch — Ax is a library for managing adaptive experiments and BoTorch is a library for Bayesian Optimization research. There are many applications already in production for these tools from both basic hyperparameter exploration to more involved AutoML use cases.
The final section of Sam’s talk focussed on Constrained Bayesian Contextual Bandits. They described the problem of video uploads to Facebook where the goal is to maximize the quality of the video without a decrease in reliability of the upload. They modeled it as a Thompson Sampling optimization problem using a Bayesian Linear model. To enforce the constraints, they used a modified algorithm, Constrained Thompson Sampling, to ensure a non-negative change in reliability. The reward function also similarly needed some shaping to align with the constrained objective. With this reward shaping optimization, Sam shared some results that showed how the Constrained Thompson Sampling algorithm surfaced many actions that satisfied the reliability constraints, where vanilla Thompson Sampling had failed.
The last talk of the event was a system architecture introduction by Dropbox’s Tsahi Glik. As a first time participant, their talk was more of an architecture overview of the ML Infra in place at Dropbox.
Tsahi started off by giving some ML usage examples at Dropbox like Smart Sync which predicts which file you will use on a particular device, so it’s preloaded. Some of the challenges he called out were the diversity and size of the disparate data sources that Dropbox has to manage. Data privacy is increasingly important and presents its own set of challenges. From an ML practice perspective, they also have to deal with a wide variety of development processes and ML frameworks, custom work for new use cases and challenges with reproducibility of training.
He shared a high level overview of their ML platform showing the various common stages of developing and deploying a model categorized by the online and offline components. He then dived into some individual components of the platform.
The first component he talked about was a user activity service to collect the input signals for the models. This service, Antenna, provides a way to query user activity events and summarizes the activity with various aggregations. The next component he dived deeper into was a content ingestion pipeline for OCR (optical character recognition). As an example, he explained how the image of a receipt is converted into contextual text. The pipeline takes the image through multiple models for various subtasks. The first classifies whether the image has some detectable text, the second does corner detection, the third does word box detection followed by deep LSTM neural net that does the core sequence based OCR. The final stage performs some lexicographical post processing.
He talked about the practical considerations of ingesting user content — they need to prevent malicious content from impacting the service. To enable this they have adopted a plugin based architecture and each task plugin runs in a sandbox jail environment.
Their offline data preparation ETLs run on Spark and they use Airflow as the orchestration layer. Their training infrastructure relies on a hybrid cloud approach. They have built a layer and command line tool called dxblearn that abstracts the training paths, allowing the researchers to train either locally or leverage AWS. dxblearn also allows them to fire off training jobs for hyperparameter tuning.
Published models are sent to a model store in S3 which are then picked up by their central model prediction service that does online inferencing for all use cases. Using a central inferencing service allows them to partition compute resources appropriately and having a standard API makes it easy to share and also run inferencing in the cloud.
They have also built a common “suggest backend” that is a generic predictive application that can be used by the various edge and production facing services that regularizes the data fetching, prediction and experiment configuration needed for a product prediction use case. This allows them to do live experimentation more easily.
The last part of Tsahi’s talk described a product use case leveraging their ML Platform. He used the example of a promotion campaign ranker, (eg “Try Dropbox business”) for up-selling. This is modeled as a multi-armed bandit problem, an example well in line with the meetup theme.
The biggest value of such meetups lies in the high bandwidth exchange of ideas from like-minded practitioners. In addition to some great questions after the talks, the 150+ attendees stayed well past 2 hours in the reception exchanging stories and lessons learnt solving similar problems at scale.
In the Personalization org at Netflix, we are always interested in exchanging ideas about this rapidly evolving ML space in general and the bandits and reinforcement learning space in particular. We are committed to sharing our learnings with the community and hope to discuss progress here, especially our work on Policy Evaluation and Bandit Metrics in future meetups. If you are interested in working on this exciting space, there are many open opportunities on both engineering and research endeavors.
In a microservice architecture such as Netflix’s, propagating datasets from a single source to multiple downstream destinations can be challenging. These datasets can represent anything from service configuration to the results of a batch job, are often needed in-memory to optimize access and must be updated as they change over time.
One example displaying the need for dataset propagation: at any given time Netflix runs a very large number of A/B tests. These tests span multiple services and teams, and the operators of the tests need to be able to tweak their configuration on the fly. There needs to be the ability to detect nodes that have failed to pick up the latest test configuration, and the ability to revert to older versions of configuration when things go wrong.
Another example of a dataset that needs to be disseminated is the result of a machine-learning model: the results of these models may be used by several teams, but the ML teams behind the model aren’t necessarily interested in maintaining high-availability services in the critical path. Rather than each team interested in consuming the model having to build in fallbacks to degrade gracefully, there is a lot of value in centralizing the work to allow multiple teams to leverage a single team’s effort.
Without infrastructure-level support, every team ends up building their own point solution to varying degrees of success. Datasets themselves are of varying size, from a few bytes to multiple gigabytes. It is important to build in observability and fault detection, and to provide tooling to allow operators to make quick changes without having to develop their own tools.
At Netflix we use an in-house dataset pub/sub system called Gutenberg. Gutenberg allows for propagating versioned datasets — consumers subscribe to data and are updated to the latest versions when they are published. Each version of the dataset is immutable and represents a complete view of the data — there is no dependency on previous versions of data. Gutenberg allows browsing older versions of data for use cases such as debugging, rapid mitigation of data related incidents, and re-training of machine-learning models. This post is a high level overview of the design and architecture of Gutenberg.
The top-level construct in Gutenberg is a “topic”. A publisher publishes to a topic and consumers consume from a topic. Publishing to a topic creates a new monotonically-increasing “version”. Topics have a retention policy that specifies a number of versions or a number of days of versions, depending on the use case. For example, you could configure a topic to retain 10 versions or 10 days of versions.
Each version contains metadata (keys and values) and a data pointer. You can think of a data pointer as special metadata that points to where the actual data you published is stored. Today, Gutenberg supports direct data pointers (where the payload is encoded in the data pointer value itself) and S3 data pointers (where the payload is stored in S3). Direct data pointers are generally used when the data is small (under 1MB) while S3 is used as a backing store when the data is large.
Gutenberg provides the ability to scope publishes to a particular set of consumers — for example by region, application, or cluster. This can be used to canary data changes with a single cluster, roll changes out incrementally, or constrain a dataset so that only a subset of applications can subscribe to it. Publishers decide the scope of a particular data version publish, and they can later add scopes to a previously published version. Note that this means that the concept of a latest version depends on the scope — two applications may see different versions of data as the latest depending on the publish scopes created by the publisher. The Gutenberg service matches the consuming application with the published scopes before deciding what to advertise as the latest version.
The most common use case of Gutenberg is to propagate varied sizes of data from a single publisher to multiple consumers. Often the data is held in memory by consumers and used as a “total cache”, where it is accessed at runtime by client code and atomically swapped out under the hood. Many of these use cases can be loosely grouped as “configuration” — for example Open Connect Appliance cache configuration, supported device type IDs, supported payment method metadata, and A/B test configuration. Gutenberg provides an abstraction between the publishing and consumption of this data — this allows publishers the freedom to iterate on their application without affecting downstream consumers. In some cases, publishing is done via a Gutenberg-managed UI, and teams do not need to manage their own publishing app at all.
Another use case for Gutenberg is as a versioned data store. This is common for machine-learning applications, where teams build and train models based on historical data, see how it performs over time, then tweak some parameters and run through the process again. More generally, batch-computation jobs commonly use Gutenberg to store and propagate the results of a computation as distinct versions of datasets. “Online” use cases subscribe to topics to serve real-time requests using the latest versions of topics’ data, while “offline” systems may instead use historical data from the same topics — for example to train machine-learned models.
An important point to note is that Gutenberg is not designed as an eventing system — it is meant purely for data versioning and propagation. In particular, rapid-fire publishes do not result in subscribed clients stepping through each version; when they ask for an update, they will be provided with the latest version, even if they are currently many versions behind. Traditional pub-sub or eventing systems are suited towards messages that are smaller in size and are consumed in sequence; consumers may build up a view of an entire dataset by consuming an entire (potentially compacted) feed of events. Gutenberg, however, is designed for publishing and consuming an entire immutable view of a dataset.
Design and architecture
Gutenberg consists of a service with gRPC and REST APIs as well as a Java client library that uses the gRPC API.
The Gutenberg client library handles tasks such as subscription management, S3 uploads/downloads, Atlas metrics, and knobs you can tweak using Archaius properties. It communicates with the Gutenberg service via gRPC, using Eureka for service discovery.
Publishers generally use high-level APIs to publish strings, files, or byte arrays. Depending on the data size, the data may be published as a direct data pointer or it may get uploaded to S3 and then published as an S3 data pointer. The client can upload a payload to S3 on the caller’s behalf or it can publish just the metadata for a payload that already exists in S3.
Direct data pointers are automatically replicated globally. Data that is published to S3 is uploaded to multiple regions by the publisher by default, although that can be configured by the caller.
The client library provides subscription management for consumers. This allows users to create subscriptions to particular topics, where the library retrieves data (eg from S3) before handing off to a user-provided listener. Subscriptions operate on a polling model — they ask the service for a new update every 30 seconds, providing the version with which they were last notified. Subscribed clients will never consume an older version of data than the one they are on unless they are pinned (see “Data resiliency” below). Retry logic is baked in and configurable — for instance, users can configure Gutenberg to try older versions of data if it fails to download or process the latest version of data on startup, often to deal with non-backwards-compatible data changes. Gutenberg also provides a pre-built subscription that holds on to the latest data and atomically swaps it out under the hood when a change comes in — this tackles a majority of subscription use cases, where callers only care about the current value at any given time. It allows callers to specify a default value — either for a topic that has never been published to (a good fit when the topic is used for configuration) or if there is an error consuming the topic (to avoid blocking service startup when there is a reasonable default).
Gutenberg also provides high-level client APIs that wrap the low-level gRPC APIs and provide additional functionality and observability. One example of this is to download data for a given topic and version — this is used extensively by components plugged into Netflix Hollow. Another example is a method to get the “latest” version of a topic at a particular time — a common use case when debugging and when training ML models.
Client resiliency and observability
Gutenberg was designed with a bias towards allowing consuming services to be able to start up successfully versus guaranteeing that they start with the freshest data. With this in mind, the client library was built with fallback logic for when it cannot communicate with the Gutenberg service. After HTTP request retries are exhausted, the client downloads a fallback cache of topic publish metadata from S3 and works based off of that. This cache contains all the information needed to decide whether an update needs to be applied, and from where data needs to be fetched (either from the publish metadata itself or from S3). This allows clients to fetch data (which is potentially stale, depending on how current that fallback cache is) without using the service.
Part of the benefit of providing a client library is the ability to expose metrics that can be used to alert on an infrastructure-wide issue or issues with specific applications. Today these metrics are used by the Gutenberg team to monitor our publish-propagation SLI and to alert in the event of widespread issues. Some clients also use these metrics to alert on app-specific errors, for example individual publish failures or a failure to consume a particular topic.
The Gutenberg service is a Governator/Tomcat application that exposes gRPC and REST endpoints. It uses a globally-replicated Cassandra cluster for persistence and to propagate publish metadata to every region. Instances handling consumer requests are scaled separately from those handling publish requests — there are approximately 1000 times more consumer requests than there are publish requests. In addition, this insulates publishing from consumption — a sudden spike in publishing will not affect consumption, and vice versa.
Each instance in the consumer request cluster maintains its own in-memory cache of “latest publishes”, refreshing it from Cassandra every few seconds. This is to handle the large volume of poll requests coming from subscribed clients without passing on the traffic to the Cassandra cluster. In addition, request-pooling low-ttl caches protect against large spikes in requests that could potentially burden Cassandra enough to affect entire region — we’ve had situations where transient errors coinciding with redeployments of large clusters have caused Gutenberg service degradation. Furthermore, we use an adaptive concurrency limiter bucketed by source application to throttle misbehaving applications without affecting others.
For cases where the data was published to S3 buckets in multiple regions, the server makes a decision on what bucket to send back to the client to download from based on where the client is. This also allows the service to provide the client with a bucket in the “closest” region, and to have clients fall back to another region if there is a region outage.
Before returning subscription data to consumers, the Gutenberg service first runs consistency checks on the data. If the checks fail and the polling client already has consumed some data the service returns nothing, which effectively means that there is no update available. If the polling client has not yet consumed any data (this usually means it has just started up), the service queries the history for the topic and returns the latest value that passes consistency checks. This is because we see sporadic replication delays at the Cassandra layer, where by the time a client polls for new data, the metadata associated with the most recently published version has only been partially replicated. This can result in incomplete data being returned to the client, which then manifests itself either as a data fetch failure or an obscure business-logic failure. Running these consistency checks on the server insulates consumers from the eventual-consistency caveats that come with the service’s choice of a data store.
Visibility on topic publishes and nodes that consume a topic’s data is important for auditing and to gather usage info. To collect this data, the service intercepts requests from publishers and consumers (both subscription poll requests and others) and indexes them in Elasticsearch by way of the Keystone data pipeline. This allows us to gain visibility into topic usage and decommission topics that are no longer in use. We expose deep-links into a Kibana dashboard from an internal UI to allow topic owners to get a handle on their consumers in a self-serve manner.
In addition to the clusters serving publisher and consumer requests, the Gutenberg service runs another cluster that runs periodic tasks. Specifically this runs two tasks:
Every few minutes, all the latest publishes and metadata are gathered up and sent to S3. This powers the fallback cache used by the client as detailed above.
A nightly janitor job purges topic versions which exceed their topic’s retention policy. This deletes the underlying data as well (e.g. S3 objects) and helps enforce a well-defined lifecycle for data.
In the world of application development bad deployments happen, and a common mitigation strategy there is to roll back the deployment. A data-driven architecture makes that tricky, since behavior is driven by data that changes over time.
Data propagated by Gutenberg influences — and in many cases drives — system behavior. This means that when things go wrong, we need a way to roll back to a last-known good version of data. To facilitate this, Gutenberg provides the ability to “pin” a topic to a particular version. Pins override the latest version of data and force clients to update to that version — this allows for quick mitigation rather than having an under-pressure operator attempt to figure out how to publish the last known good version. You can even apply a pin to a specific publish scope so that only consumers that match that scope are pinned. Pins also override data that is published while the pin is active, but when the pin is removed clients update to the latest version, which may be the latest version when the pin was applied or a version published while the pin was active.
When deploying new code, it’s often a good idea to canary new builds with a subset of traffic, roll it out incrementally, or otherwise de-risk a deployment by taking it slow. For cases where data drives behavior, a similar principle should be applied.
One feature Gutenberg provides is the ability to incrementally roll out data publishes via Spinnaker pipelines. For a particular topic, users configure what publish scopes they want their publish to go to and what the delay is between each one. Publishing to that topic then kicks off the pipeline, which publishes the same data version to each scope incrementally. Users are able to interact with the pipeline; for example they may choose to pause or cancel pipeline execution if their application starts misbehaving, or they may choose to fast-track a publish to get it out sooner. For example, for some topics we roll out a new dataset version one AWS region at a time.
Gutenberg has been at use at Netflix for the past three years. At present, Gutenberg stores low tens-of-thousands of topics in production, about a quarter of which have published at least once in the last six months. Topics are published at a variety of cadences — from tens of times a minute to once every few months — and on average we see around 1–2 publishes per second, with peaks and troughs about 12 hours apart.
In a given 24 hour period, the number of nodes that are subscribed to at least one topic is in the low six figures. The largest number of topics a single one of these nodes is subscribed to is north of 200, while the median is 7. In addition to subscribed applications, there are a large number of applications that request specific versions of specific topics, for example for ML and Hollow use cases. Currently the number of nodes that make a non-subscribe request for a topic is in the low hundreds of thousands, the largest number of topics requested is 60, and the median is 4.
Here’s a sample of work we have planned for Gutenberg:
Polyglot support: today Gutenberg only supports a Java client, but we’re seeing an increasing number of requests for Node.js and Python support. Some of these teams have cobbled together their own solutions built on top of the Gutenberg REST API or other systems. Rather than have different teams reinvent the wheel, we plan to provide first-class client libraries for Node.js and Python.
Encryption and access control: for sensitive data, Gutenberg publishers should be able to encrypt data and distribute decryption credentials to consumers out-of-band. Adding this feature opens Gutenberg up to another set of use-cases.
Better incremental rollout: the current implementation is in its pretty early days and needs a lot of work to support customization to fit a variety of use cases. For example, users should be able to customize the rollout pipeline to automatically accept or reject a data version based on their own tests.
Alert templates: the metrics exposed by the Gutenberg client are used by the Gutenberg team and a few teams that are power users. Instead, we plan to provide leverage to users by building and parameterizing templates they can use to set up alerts for themselves.
Topic cleanup: currently topics sit around forever unless they are explicitly deleted, even if no one is publishing to them or consuming from them. We plan on building an automated topic cleanup system based on the consumption trends indexed in Elasticsearch.
Data catalog integration: an ongoing issue at Netflix is the problem of cataloging data characteristics and lineage. There is an effort underway to centralize metadata around data sources and sinks, and once Gutenberg integrates with this, we can leverage the catalog to automate tools that message the owners of a dataset.
The effects that legal streaming services have on people’s motivation to pirate can be quite confusing.
On the one hand, legal services have been known to lower the piracy rates in some regions, but too many exclusive platforms could boost piracy again.
In South Africa, Internet providers are mostly noticing the first. Netflix first launched there in 2016, and since then, Netflix traffic has dwarfed BitTorrent traffic, ISPs say. While it’s not entirely clear to what degree torrent traffic decreased, if it did, the companies have all noticed a massive Netflix effect.
In a report published by Mybroadband, several ISPs were questioned about the topic. Without exception, they say that video services, Netflix in particular, have made BitTorrent traffic relatively insignificant.
RSAWEB, for example, noticed that Netflix traffic surged and that peak time data usage doubles every six months. This is in large part the result of a streaming boom. At the moment, the volume of Netflix traffic is 20x that of BitTorrent traffic.
“The ratios have significantly changed compared to a few years ago,” a RSAWEB spokesperson said. “The current ratio would be for every 50Mbps of aggregated torrent traffic we observe 1Gbps of aggregated Netflix streaming traffic.”
Webafrica noticed a similar trend, but perhaps even more pronounced. The company noted that during peak hours half of all traffic is generated by Netflix. BitTorrent traffic follows somewhere in the distance t the extent it’s not even worth tracking anymore.
“The growth of Netflix in recent years has been truly phenomenal, to the point where we no longer track torrent traffic separately,” Webafrica’s Greg Wright said. “Google (including Youtube) and Netflix are dominating the content currently,” he added.
Paul Butschi, co-founder of Internet provider Cool Ideas confirmed this trend. Netflix makes up roughly 30% of the company’s peak traffic and he believes that the increased popularity of online streaming had a pronounced impact on torrent traffic.
These opinions were largely shared by rival ISPs Cybersmart and Supersonic, with the latter noting that video streaming services have “completely overtaken the need for torrent sharing,” and that things will only get better if more competing services enter the market.
The last comment is something that’s up for debate, especially if new services all come with exclusive content. Looking at the relative traffic market share in North America over the past decade, a word of caution may be warranted.
Less than a decade ago nearly 20% of all traffic during peak hours was P2P related, mostly BitTorrent. As Netflix and other video streaming services grew, this relative share quickly dropped, but more recently it started to show signs of growth again.
It could be that people have started to pirate again because they can’t afford to pay for several paid streaming subscriptions.
This notion is supported by a recent survey which showed that piracy rates could potentially double again if the video streaming market continues to fragment. This can affect BitTorrent traffic, bus also pirate streaming sites and services, which the ISPs were asked about. These are the go-to piracy solution for most people nowadays.
Overall it’s safe to say that legal streaming services do indeed limit the demand for piracy, as the South African ISPs observe. This is particularly true if they’re convenient and affordable while providing access to a great content library.
At Netflix we prioritize innovation and velocity in pursuit of the best experience for our 150+ million global customers. This means that our microservices constantly evolve and change, but what doesn’t change is our responsibility to provide a highly available service that delivers 100+ million hours of daily streaming to our subscribers.
In order to achieve this level of availability, we leverage an N+1 architecture where we treat Amazon Web Services (AWS) regions as fault domains, allowing us to withstand single region failures. In the event of an isolated failure we first pre-scale microservices in the healthy regions after which we can shift traffic away from the failing one. This pre-scaling is necessary due to our use of autoscaling, which generally means that services are right-sized to handle their current demand, not the surge they would experience once we shift traffic.
Though this evacuation capability exists today, this level of resiliency wasn’t always the standard for the Netflix API. In 2013 we first developed our multi-regional availability strategy in response to a catalyst that led us to re-architect the way our service operates. Over the last 6 years Netflix has continued to grow and evolve along with our customer base, invalidating core assumptions built into the machinery that powers our ability to pre-scale microservices. Two such assumptions were that:
Regional demand for all microservices (i.e. requests, messages, connections, etc.) can be abstracted by our key performance indicator, stream starts per second (SPS).
Microservices within healthy regions can be scaled uniformly during an evacuation.
These assumptions simplified pre-scaling, allowing us to treat microservices uniformly, ignoring the uniqueness and regionality of demand. This approach worked well in 2013 due to the existence of monolithic services and a fairly uniform customer base, but became less effective as Netflix evolved.
Regional Microservice Demand
Most of our microservices are in some way related to serving a stream, so SPS seemed like a reasonable stand-in to simplify regional microservice demand. This was especially true for large monolithic services. For example, player logging, authorization, licensing, and bookmarks were initially handled by a single monolithic service whose demand correlated highly with SPS. However, in order to improve developer velocity, operability, and reliability, the monolith was decomposed into smaller, purpose-built services with dissimilar function-specific demand.
Our edge gateway (zuul) also sharded by function to achieve similar wins. The graph below captures the demand for each shard, the combined demand, and SPS. Looking at the combined demand and SPS lines, SPS roughly approximates combined demand for a majority of the day. Looking at individual shards however, the amount of error introduced by using SPS as a demand proxy varies widely.
Uniform Evacuation Scaling
Since we used SPS as a demand proxy, it also seemed reasonable to assume that we can uniformly pre-scale all microservices in the healthy regions. In order to illustrate the shortcomings of this approach, let’s look at playback licensing (DRM) & authorization.
DRM is closely aligned with device type, such that Consumer Electronics (CE), Android, & iOS use different DRM platforms. In addition, the ratio of CE to mobile streaming differs regionally; for example, mobile is more popular in South America. So, if we evacuate South American traffic to North America, demand for CE and Android DRM won’t grow uniformly.
On the other hand, playback authorization is a function used by all devices prior to requesting a license. While it does have some device specific behavior, demand during an evacuation is more a function of the overall change in regional demand.
Closing The Gap
In order to address the issues with our previous approach, we needed to better characterize microservice-specific demand and how it changes when we evacuate. The former requires that we capture regional demand for microservices versus relying on SPS. The latter necessitates a better understanding of microservice demand by device type as well as how regional device demand changes during an evacuation.
Microservice-Specific Regional Demand
Because of service decomposition, we understood that using a proxy demand metric like SPS wasn’t tenable and we needed to transition to microservice-specific demand. Unfortunately, due to the diversity of services, a mix of Java (Governator/Springboot with Ribbon/gRPC, etc.) and Node (NodeQuark), there wasn’t a single demand metric we could rely on to cover all use cases. To address this, we built a system that allows us to associate each microservice with metrics that represent their demand.
The microservice metrics are configuration-driven, self-service, and allows for scoping such that services can have different configurations across various shards and regions. Our system then queries Atlas, our time series telemetry platform, to gather the appropriate historical data.
Microservice Demand By Device Type
Since demand is impacted by regional device preferences, we needed to deconstruct microservice demand to expose the device-specific components. The approach we took was to partition a microservice’s regional demand by aggregated device types (CE, Android, PS4, etc.). Unfortunately, the existing metrics didn’t uniformly expose demand by device type, so we leveraged distributed tracing to expose the required details. Using this sampled trace data we can explain how a microservice’s regional device type demand changes over time. The graph below highlights how relative device demand can vary throughout the day for a microservice.
Device Type Demand
We can use historical device type traffic to understand how to scale the device-specific components of a service’s demand. For example, the graph below shows how CE traffic in us-east-1 changes when we evacuate us-west-2. The nominal and evacuation traffic lines are normalized such that 1 represents the max(nominal traffic) and the demand scaling ratio represents the relative change in demand during an evacuation (i.e. evacuation traffic/nominal traffic).
Microservice-Specific Demand Scaling Ratio
We can now combine microservice demand by device and device-specific evacuation scaling ratios to better represent the change in a microservice’s regional demand during an evacuation — i.e. the microservice’s device type weighted demand scaling ratio. To calculate this ratio (for a specific time of day) we take a service’s device type percentages, multiply by device type evacuation scaling ratios, producing each device type’s contribution to the service’s scaling ratio. Summing these components then yields a device type weighted evacuation scaling ratio for the microservice. To provide a concrete example, the table below shows the evacuation scaling ratio calculation for a fictional service.
The graph below highlights the impact of using a microservice-specific evacuation scaling ratio versus the simplified SPS-based approach used previously. In the case of Service A, the old approach would have done well in approximating the ratio, but in the case of Service B and Service C, it would have resulted in over and under predicting demand, respectively.
Understanding the uniqueness of demand across our microservices improved the quality of our predictions, leading to safer and more efficient evacuations at the cost of additional computational complexity. This new approach, however, is itself an approximation with its own set of assumptions. For example, it assumes all categories of traffic for a device type has similar shape, for example Android logging and playback traffic. As Netflix grows our assumptions will again be challenged and we will have to adapt to continue to provide our customers with the availability and reliability that they have come to expect.
If this article has piqued your interest and you have a passion for solving cross-discipline distributed systems problems, our small but growing team is hiring!
The rate at which ‘pirate’ sites are being blocked in various countries raises the question of how many more there are left to block.
The answer, it seems, is plenty more yet.
Back in May, yet another application filed in Australia’s Federal Court presented a unique feature – the inclusion of US-based streaming giant Netflix as one of the applicants.
This was the first time the company had appeared requesting a blocking application in the region, claiming infringement of its works Santa Clarita Diet and Stranger Things.
Netflix didn’t appear on its own. The application was headed by local movie giant Roadshow Films and supported by other prominent movie companies such as Disney Enterprises, Universal City Studios, Warner Bros., Television Broadcasts Limited, TVBO, and Madman Anime Group.
Together they demanded the blocking of over 130 domains related to close to 90 torrent, streaming, and similar sites by more than 50 local ISPs.
The claims were filed under Section 115a of Australia’s Copyright Act, which can grant injunctions to force local ISPs to prevent their subscribers from accessing overseas-based ‘infringing locations. It’s taken three months, but the content companies have now been successful.
This morning, Justice Thawley in the Federal Court ordered the respondents including Telstra, Optus, TPG, Vocus, and Vodafone, to take “reasonable steps to disable access to the Target Online Locations” within 15 business days. Each ISP will be handed AUS$50 per domain by the applicants to cover compliance costs.
In common with previous orders, the ISPs were given the option to utilize DNS, IP address, and/or URL blocking techniques (or any other technical means agreed in writing between them and the applications) to prevent access to the sites.
Of course, sites often decide to take countermeasures when orders such as this are handed down in order to circumvent blocking, so the order allows the studios to provide additional information so that these can be swiftly dealt with by the ISPs moving forward.
An updated/amended domain and URL list (there can be changes following an original application) is yet to appear in court records. However, the list of sites and domains in the original application can be viewed in our earlier report.
The order handed down this morning can be found here (pdf)
India is no stranger to blocking pirate sites. Just last week, a court ordered local Internet service providers to block more than 1,200 sites to prevent the spread of a single movie.
Now, however, it appears that there additional legal moves underway to ensure that sites are not only blocked temporarily but also on a more permanent basis.
Over the past several weeks the High Court in Delhi has been handling many separate applications for permanent injunction filed by US-based Warner Bros. Entertainment Inc.
In all cases, the company states that several of its copyrighted works – movies Aquaman, A Star is Born, Wonder Woman, plus TV show Arrow – were made available via a broad range of torrent, streaming, linking, and proxy-type sites.
The complaints also cite works by studios including Columbia, Paramount, Universal, and Netflix as further examples of content being infringed on the platforms.
In just one of the complaints the list of infringing domains runs to 124 and includes some very well known names including local giant Tamilrockers, TorrentDownload.ch, TorrentDownloads.me and EZTV, iStole.it, Zoink.it, Torrents.me, Torrents.io, Zooqle, MovieRulz, LimeTorrents, Bolly4u, KatMovie, Monova, and 9xMovies.
In many cases, multiple domains are listed for the above sites, including alternates, proxies and other variants that are accessible via various unblocking platforms. All are accused of infringing the rights of Warner Bros. by providing access to its movie and TV shows content without authorization.
“[D]efendant Websites are primarily and substantially engaged in communicating to the public, hosting, streaming and/or making available to the public Plaintiff’s original content without authorization, and/or facilitating the same,” one order reads.
The order covering the above sites notes that Warner investigated and then served legal notices on the platforms ordering them to cease-and-desist. However, it’s reported that none acted to prevent their infringing activities.
To boost its case, Warner also informed the Court that some of the sites have already been blocked in other jurisdictions (including the UK, Portugal, Malaysia, Australia, Belgium, Denmark, Russia, and Italy) for similar behavior.
After consideration, the Court found that there is a prima facie case and Warner should be awarded an interim injunction to prevent the sites from continuing their infringing activities. Furthermore, the sites should have their domains blocked by ISPs in India, to prevent further damage and losses.
The Court also addressed the issue of additional domains or platforms appearing to circumvent any blocking, by granting Warner permission to file additional updates with the Court that will allow for such mechanisms to be disabled by ISPs via an expedited process.
The example order detailed above is very specific, in that it orders ISPs to block the domain names of the sites plus a list of IP addresses. However, the vast majority appear to be using Cloudflare, so it remains to be seen whether the ISPs will use discretion or blindly block, which could cause considerable disruption to other sites using the same IP locations.
In some of the orders, it appears that domain registrars are also required to suspend domain names belonging or connected to various sites, including TamilRockers, Hindilinks4u, Otorrents, Filmlinks4u, Mp4Moviez, Series9.io, uWatchFree, OnlineWatchMovies, MovieRulzFree, and SkyMovies.
Several additional applications from Warner are on record at the Delhi High Court but are yet to be published as interim orders.
The main order detailed above can be found here (pdf), the rest here 2,3,4,5,6,7,8,9,10
Conductor is a workflow orchestration engine developed and open-sourced by Netflix. If you’re new to Conductor, this earlier blogpost and the documentation should help you get started and acclimatized to Conductor.
In the last two years since inception, Conductor has seen wide adoption and is instrumental in running numerous core workflows at Netflix. Many of the Netflix Content and Studio Engineering services rely on Conductor for efficient processing of their business flows. The Netflix Media Database (NMDB) is one such example.
In this blog, we would like to present the latest updates to Conductor, address some of the frequently asked questions and thank the community for their contributions.
How we’re using Conductor at Netflix
Conductor is one of the most heavily used services within Content Engineering at Netflix. Of the multitude of modules that can be plugged into Conductor as shown in the image below, we use the Jersey server module, Cassandra for persisting execution data, Dynomite for persisting metadata, DynoQueues as the queuing recipe built on top of Dynomite, Elasticsearch as the secondary datastore and indexer, and Netflix Spectator + Atlas for Metrics. Our cluster size ranges from 12–18 instances of AWS EC2 m4.4xlarge instances, typically running at ~30% capacity.
We do not maintain an internal fork of Conductor within Netflix. Instead, we use a wrapper that pulls in the latest version of Conductor and adds Netflix infrastructure components and libraries before deployment. This allows us to proactively push changes to the open source version while ensuring that the changes are fully functional and well-tested.
As of writing this blog, Conductor orchestrates 600+ workflow definitions owned by 50+ teams across Netflix. While we’re not (yet) actively measuring the nth percentiles, our production workloads speak for Conductor’s performance. Below is a snapshot of our Kibana dashboard which shows the workflow execution metrics over a typical 7-day period.
Some of the use cases served by Conductor at Netflix can be categorized under:
Content Ingest and Delivery
Content Quality Control
Encodes and Deployments
One of the key features in v2.0 was the introduction of the gRPC framework as an alternative/auxiliary to REST. This was contributed by our counterparts at GitHub, thereby strengthening the value of community contributions to Conductor.
Cassandra Persistence Layer
To enable horizontal scaling of the datastore for large volume of concurrent workflow executions (millions of workflows/day), Cassandra was chosen to provide elastic scaling and meet throughput demands.
External Payload Storage
External payload storage was implemented to prevent the usage of Conductor as a data persistence system and to reduce the pressure on its backend datastore.
Dynamic Workflow Executions
For use cases where the need arises to execute a large/arbitrary number of varying workflow definitions or to run a one-time ad hoc workflow for testing or analytical purposes, registering definitions first with the metadata store in order to then execute them only once, adds a lot of additional overhead. The ability to dynamically create and execute workflows removes this friction. This was another great addition that stemmed from our collaboration with GitHub.
Workflow Status Listener
Conductor can be configured to publish notifications to external systems or queues upon completion/termination of workflows. The workflow status listener provides hooks to connect to any notification system of your choice. The community has contributed an implementation that publishes a message on a dyno queue based on the status of the workflow. An event handler can be configured on these queues to trigger workflows or tasks to perform specific actions upon the terminal state of the workflow.
Bulk Workflow Management
There has always been a need for bulk operations at the workflow level from an operability standpoint. When running at scale, it becomes essential to perform workflow level operations in bulk due to bad downstream dependencies in the worker processes causing task failures or bad task executions. Bulk APIs enable the operators to have macro-level control on the workflows executing within the system.
Decoupling Elasticsearch from Persistence
This inter-dependency was removed by moving the indexing layer into separate persistence modules, exposing a property (workflow.elasticsearch.instanceType) to choose the type of indexing engine. Further, the indexer and persistence layer have been decoupled by moving this orchestration from within the primary persistence layer to a service layer through the ExecutionDAOFacade.
Support for Elasticsearch versions 5 and 6 have been added as part of the major version upgrade to v2.x. This addition also provides the option to use the Elasticsearch RestClient instead of the Transport Client which was enforced in the previous version. This opens the route to using a managed Elasticsearch cluster (a la AWS) as part of the Conductor deployment.
Task Rate Limiting & Concurrent Execution Limits
Task rate limiting helps achieve bounded scheduling of tasks. The task definition parameter rateLimitFrequencyInSeconds sets the duration window, while rateLimitPerFrequency defines the number of tasks that can be scheduled in a duration window. On the other hand, concurrentExecLimit provides unbounded scheduling limits of tasks. I.e the total of current scheduled tasks at any given time will be under concurrentExecLimit. The above parameters can be used in tandem to achieve desired throttling and rate limiting.
Validation was one of the core features missing in Conductor 1.x. To improve usability and operability, we added validations, which in practice has greatly helped find bugs during creation of workflow and task definitions. Validations enforce the user to create and register their task definitions before registering the workflow definitions using these tasks. It also ensures that the workflow definition is well-formed with correct wiring of inputs and outputs in the various tasks within the workflow. Any anomalies found are reported to the user with a detailed error message describing the reason for failure.
Developer Labs, Logging and Metrics
We have been continually improving logging and metrics, and revamped the documentation to reflect the latest state of Conductor. To provide a smooth on boarding experience, we have created developer labs, which guides the user through creating task and workflow definitions, managing a workflow lifecycle, configuring advanced workflows with eventing etc., and a brief introduction to Conductor API, UI and other modules.
New Task Types
System tasks have proven to be very valuable in defining the Workflow structure and control flow. As such, Conductor 2.x has seen several new additions to System tasks, mostly contributed by the community:
Terminate task is useful when workflow logic should terminate with a given output. For example, if a decision task evaluates to false, and we do not want to execute remaining tasks in the workflow, instead of having a DECISION task with a list of tasks in one case and an empty list in the other, this can scope the decide and terminate workflow execution.
Exclusive Join task helps capture task output from a DECISION task’s flow. This is useful to wire task inputs from the outputs of one of the cases within a decision flow. This data will only be available during workflow execution time and the ExclusiveJoin task can be used to collect the output from one of the tasks in any of decision branches.
For in-depth implementation details of the new additions, please refer the documentation.
There are a lot of features and enhancements we would like to add to Conductor. The below wish list could be considered as a long-term road map. It is by no means exhaustive, and we are very much welcome to ideas and contributions from the community. Some of these listed in no particular order are:
Advanced Eventing with Event Aggregation and Distribution
At the moment, event generation and processing is a very simple implementation. An event task can create only one message, and a task can wait for only one event.
We envision an Event Aggregation and Distribution mechanism that would open up Conductor to a multitude of use-cases. A coarse idea is to allow a task to wait for multiple events, and to progress several tasks based on one event.
While the current UI provides a neat way to visualize and track workflow executions, we would like to enhance this with features like:
Creating metadata objects from UI
Support for starting workflows
Visualize execution metrics
Admin dashboard to show outliers
New Task types like Goto, Loop etc.
Conductor has been using a Directed Acyclic Graph (DAG) structure to define a workflow. The Goto and Loop on tasks are valid use cases, which would deviate from the DAG structure. We would like to add support for these tasks without violating the existing workflow execution rules. This would help unlock several other use cases like streaming flow of data to tasks and others that require repeated execution of a set of tasks within a workflow.
Support for reusable commonly used tasks like Email, DatabaseQuery etc.
Similarly, we’ve seen the value of shared reusable tasks that does a specific thing. At Netflix internal deployment of Conductor, we’ve added tasks specific to services that users can leverage over recreating the tasks from scratch. For example, we provide a TitusTask which enables our users to launch a new Titus container as part of their workflow execution.
We would like to extend this idea such that Conductor can offer a repository of commonly used tasks.
Push based task scheduling interface
Current Conductor architecture is based on polling from a worker to get tasks that it will execute. We need to enhance the grpc modules to leverage the bidirectional channel to push tasks to workers as and when they are scheduled, thus reducing network traffic, load on the server and redundant client calls.
Validating Task inputKeys and outputKeys
This is to provide type safety for tasks and define a parameterized interface for task definitions such that tasks are completely re-usable within Conductor once registered. This provides a contract allowing the user to browse through available task definitions to use as part of their workflow where the tasks could have been implemented by another team/user. This feature would also involve enhancing the UI to display this contract.
Implementing MetadataDAO in Cassandra
As mentioned here, Cassandra module provides a partial implementation for persisting only the workflow executions. Metadata persistence implementation is not available yet and is something we are looking to add soon.
Pluggable Notifications on Task completion
Similar to the Workflow status listener, we would like to provide extensible interfaces for notifications on task execution.
Python client in Pypi
We have seen wide adoption of Python client within the community. However, there is no official Python client in Pypi, and lacks some of the newer additions to the Java client. We would like to achieve feature parity and publish a client from Conductor Github repository, and automate the client release to Pypi.
Removing Elasticsearch from critical path
While Elasticsearch is greatly useful in Conductor, we would like to make this optional for users who do not have Elasticsearch set-up. This means removing Elasticsearch from the critical execution path of a workflow and using it as an opt-in layer.
Pluggable authentication and authorization
Conductor doesn’t support authentication and authorization for API or UI, and is something that we feel would add great value and is a frequent request in the community.
Validations and Testing
Dry runs, i.e the ability to evaluate workflow definitions without actually running it through worker processes and all relevant set-up would make it much easier to test and debug execution paths.
If you would like to be a part of the Conductor community and contribute to one of the Wishlist items or something that you think would provide a great value add, please read through this guide for instructions or feel free to start a conversation on our Gitter channel, which is Conductor’s user forum.
We also highly encourage to polish, genericize and share any customizations that you may have built on top of Conductor with the community.
We really appreciate and are extremely proud of the community involvement, who have made several important contributions to Conductor. We would like to take this further and make Conductor widely adopted with a strong community backing.
Netflix Conductor is maintained by the Media Workflow Infrastructure team. If you like the challenges of building distributed systems and are interested in building the Netflix Content and Studio ecosystem at scale, connect with Charles Zhao to get the conversation started.
Thanks to Alexandra Pau, Charles Zhao, Falguni Jhaveri, Konstantinos Christidis and Senthil Sayeebaba.
Pirated copies of movies appear online every day in a variety of formats, such as CAM, DVDRip, WEBRip, and Web-DL.
The latter, which usually come from streaming and download services such as Netflix, Amazon, or iTunes, have proven to be a reliable source for pirates over the years.
In general, that doesn’t apply to 4K releases. These are protected by the highest encryption standards. In the case of Netflix, this is Widevine’s highest level DRM. Cracking this is seen as the holy grail by pirates.
While there is no confirmation that the keys have been cracked, a flurry of new 4K Netflix leaks suggests that there’s at least some type of vulnerability that allows outsiders to decrypt the original steams.
Over the past 24-hours several 4K releases of prominent Netflix titles spread across various pirate sites. It started with the entire third season of the Netflix exclusive “Stranger Things,” which came out yesterday.
The leaked episodes originate from the DEFLATE release group and are all marked as ‘INTERNAL’ releases, such as “Stranger Things S03E01 INTERNAL 2160p WEB H265-DEFLATE.”
In the past, we have seen several 4K videos being ripped from Netflix. In fact, the first rips came out four years ago. However, the WEB tags on today’s releases indicate that these files were directly decrypted from the original files, which means that there’s no loss in quality.
“Untouched releases must be considered as anything that has been losslessly downloaded by official (offered) or unofficial (backdoor) methods,” official Scene rules dictate.
These untouched releases are rare. We’ve only previously seen these types of Netflix leaks for a brief period in 2017. At the time, the releases stopped following a Widevine update, a source informed TorrentFreak.
Exactly how the release group was able to pull off these new leaks is unknown. TorrentFreak reached out to Netflix for a comment on the matter, but at the time of publication, we have yet to hear back.
The DEFLATE release group is no stranger to novel 4K leaks. Earlier this year the same group also released several movies from iTunes, including the entire James Bond collection. That was the first breach of its kind on iTunes.
The first Stranger Things leak was pointed out by Tarnkappe but several other titles have appeared online as well. The release group UHDCANDY, for example, also posted the first episodes from the latest seasons of Marvel’s Jessica Jones and Black Mirror.
The fact that two groups have been able to decrypt the 4K releases indicates that this ‘breach’ is widespread. It wouldn’t be a surprise to see more titles appear during the coming days, until the hole is patched again.
Bringing Rich Experiences to Memory-Constrained TV Devices
By Jason Munning, Archana Kumar, Kris Range
Netflix has over 148M paid members streaming on more than half a billion devices spanning over 1,900 different types. In the TV space alone, there are hundreds of device types that run the Netflix app. We need to support the same rich Netflix experience on not only high-end devices like the PS4 but also memory and processor-constrained consumer electronic devices that run a similar chipset as was used in an iPhone 3Gs.
In this post, we will discuss the development of the Rich Collection row and the iterations we went through to be able to support this experience across the majority of the TV ecosystem.
Rich Collection Row
One of our most ambitious UI projects to date on the TV app is the animated Rich Collection Row. The goal of this experience from a UX design perspective was to bring together a tightly-related set of original titles that, though distinct entities on their own, also share a connected universe. We hypothesized this design would net a far greater visual impact than if the titles were distributed individually throughout the page. We wanted the experience to feel less like scrolling through a row and more like exploring a connected world of stories.
For the collections below, the row is composed of characters representing each title in a collected universe overlaid onto a shared, full-bleed background image which depicts the shared theme for the collection. When the user first scrolls down to the row, the characters are grouped into a lineup of four. The name of the collection animates in along with the logos for each title while a sound clip plays which evokes the mood of the shared world. The characters slide off screen to indicate the first title is selected. As the user scrolls horizontally, characters slide across the screen and the shared backdrop scrolls with a parallax effect. For some of the collections, the character images themselves animate and a full-screen tint is applied using a color that is representative of the show’s creative (see “Character Images” below).
Once the user pauses on a title for more than two seconds, the trailer for that title cross-fades with the background image and begins playing.
As part of developing this type of UI experience on any platform, we knew we would need to think about creating smooth, performant animations with a balance between quality and download size for the images and video previews, all without degrading the performance of the app. Some of the metrics we use to measure performance on the Netflix TV app include animation frames per second (FPS), key input responsiveness (the amount of time before a member’s key press renders a change in the UI), video playback speed, and app start-up time.
UI developers on the Netflix TV app also need to consider some challenges that developers on other platforms often are able to take for granted. One such area is our graphics memory management. While web browsers and mobile phones have gigabytes of memory available for graphics, our devices are constrained to mere MBs. Our UI runs on top of a custom rendering engine which uses what we call a “surface cache” to optimize our use of graphics memory.
Surface cache is a reserved pool in main memory (or separate graphics memory on a minority of systems) that the Netflix app uses for storing textures (decoded images and cached resources). This benefits performance as these resources do not need to be re-decoded on every frame, saving CPU time and giving us a higher frame-rate for animations.
Each device running the Netflix TV application has a limited surface cache pool available so the rendering engine tries to maximize the usage of the cache as much as possible. This is a positive for the end experience because it means more textures are ready for re-use as a customer navigates around the app.
The amount of space a texture requires in surface cache is calculated as:
width * height * 4 bytes/pixel (for rgba)
Most devices currently run a 1280 x 720 Netflix UI. A full-screen image at this resolution will use 1280 * 720 * 4 = 3.5MB of surface cache. The majority of legacy devices run at 28MB of surface cache. At this size, you could fit the equivalent of 8 full-screen images in the cache. Reserving this amount of memory allows us to use transition effects between screens, layering/parallax effects, and to pre-render images for titles that are just outside the viewport to allow scrolling in any direction without images popping in. Devices in the Netflix TVUI ecosystem have a range of surface cache capacity, anywhere from 20MB to 96MB and we are able to enable/disable rich features based on that capacity.
When the limit of this memory pool is approached or exceeded, the Netflix TV app tries to free up space with resources it believes it can purge (i.e. images no longer in the viewport). If the cache is over budget with surfaces that cannot be purged, devices can behave in unpredictable ways ranging from application crashes, displaying garbage on the screen, or drastically slowing down animations.
Surface Cache and the Rich Collection Row
From developing previous rich UI features, we knew that surface cache usage was something to consider with the image-heavy design for the Rich Collection row. We made sure to test memory usage early on during manual testing and did not see any overages so we checked that box and proceeded with development. When we were approaching code-complete and preparing to roll out this experience to all users we ran our new code against our memory-usage automation suite as a sanity check.
The chart below shows an end-to-end automated test that navigates the Netflix app, triggering playbacks, searches, etc to simulate a user session. In this case, the test was measuring surface cache after every step. The red line shows a test run with the Rich Collection row and the yellow line shows a run without. The dotted red line is placed at 28MB which is the amount of memory reserved for surface cache on the test device.
Uh oh! We found some massive peaks (marked in red) in surface cache that exceeded our maximum recommended surface cache usage of 28MB and indicated we had a problem. Exceeding the surface cache limit can have a variety of impacts (depending on the device implementation) to the user from missing images to out of memory crashes. Time to put the brakes on the rollout and debug!
Assessing the Problem
The first step in assessing the problem was to drill down into our automation results to make sure they were valid. We re-ran the automation tests and found the results were reproducible. We could see the peaks were happening on the home screen where the Rich Collection row was being displayed. It was odd that we hadn’t seen the surface cache over budget (SCOB) errors while doing manual testing.
To close the gap we took a look at the configuration settings we were using in our automation and adjusted them to match the settings we use in production for real devices. We then re-ran the automation and still saw the peaks but in the process we discovered that the issue seemed to only present itself on devices running a version of our SDK from 2015. The manual testing hadn’t caught it because we had only been manually testing surface cache on more recent versions of the SDK. Once we did manual testing on our older SDK version we were able to reproduce the issue in our development environment.
During brainstorming with our platform team, we came across an internal bug report from 2017 that described a similar issue to what we were seeing — surfaces that were marked as purgeable in the surface cache were not being fully purged in this older version of our SDK. From the ticket we could see that the inefficiency was fixed in the next release of our SDK but, because not all devices get Netflix SDK updates, the fix could not be back-ported to the 2015 version that had this issue. Considering that a significant share of our actively-used TV devices are running this 2015 version and won’t be updated to a newer SDK, we knew we needed to find a fix that would work for this specific version — a similar situation to the pre-2000 world before browsers auto-updated and developers had to code to specific browser versions.
Finding a Solution
The first step was to take a look at what textures were in the surface cache (especially those marked as un-purgeable) at the time of the overage and see where we might be able to make gains by reducing the size of images. For this we have a debug port that allows us to inspect which images are in the cache. This shows us information about the images in the surface cache including url. The links can then be hovered over to show a small thumbnail of the image.
From snapshots such as this one we could see the Rich Collection row alone filled about 15.3MB of surface cache which is >50% of the 28MB total graphics memory available on devices running our 2015 SDK.
The largest un-purgeable images we found were:
Character images (6 * 1MB)
Background images for the parallax background (2 * 2.9MB)
Unknown — a full screen blank white rectangle (3.5MB)
Some of our rich collections featured the use of animated character assets to give an even richer experience. We created these assets using a Netflix-proprietary animation format called a Scriptable Network Graphic (SNG) which was first supported in 2017 and is similar to an animated PNG. The SNG files have a relatively large download size at ~1.5MB each. In order to ensure these assets are available at the time the rich collection row enters the viewport, we preload the SNGs during app startup and save them to disk. If the user relaunches the app in the future and receives the same collection row, the SNG files can be read from the disk cache, avoiding the need to download them again. Devices running an older version of the SDK fallback to a static character image.
At the time of the overage we found thatsix character images were present in the cache — four on the screen and two preloaded off of the screen. Our first savings came from only preloading one image for a total of five characters in the cache. Right off the bat this saved us almost 7% in surface cache with no observable impact to the experience.
Next we created cropped versions of the static character images that did away with extra transparent pixels (that still count toward surface cache usage!). This required modifications to the image pipeline in order to trim the whitespace but still maintain the relative size of the characters — so the relative heights of the characters in the lineup would still be preserved. The cropped character assets used only half of the surface cache memory of the full-size images and again had no visible impact to the experience.
In order to achieve the illusion of a continuously scrolling parallax background, we were using two full screen background images essentially placed side by side which together accounted for ~38% of the experience’s surface cache usage. We worked with design to create a new full-screen background image that could be used for a fallback experience (without parallax) on devices that couldn’t support loading both of the background images for the parallax effect. Using only one background image saved us 19% in surface cache for the fallback experience.
After trial and error removing React components from our local build and inspecting the surface cache we found that the unknown widget that showed as a full screen blank white rectangle in our debug tool was added by the full-screen tint effect we were using. In order to apply the tint, the graphics layer essentially creates a full screen texture that is colored dynamically and overlaid over the visible viewport. Removing the tint overlay saved us 23% in surface cache.
Removing the tint overlay and using a single background image gave us a fallback experience that used 42% less surface cache than the full experience.
When all was said and done, the surface cache usage of the fallback experience (including fewer preloaded characters, cropped character images, a single background, and no tint overlay) clocked in at about 5MB which gave us a total savings of almost 67% over our initial implementation.
We were able to target this fallback experience to devices running the 2015 and older SDK, while still serving the full rich experience (23% lower surface cache usage than the original implementation) to devices running the new SDKs.
At this point our automation was passing so we began slowly rolling out this experience to all members. As part of any rollout, we have a dashboard of near real-time metrics that we monitor. To our chagrin we saw that another class of devices — those running the 2017 SDK — also were reporting higher SCOB errors than the control.
Thanks to our work on the fallback experience we were able to change the configuration for this class of devices on the fly to serve the fallback experience (without parallax background and tint). We found if we used the fallback experience we could still get away with using the animated characters. So yet another flavor of the experience was born.
Improvements and Takeaways
At Netflix we strive to move fast in innovation and learn from all projects whether they are successes or failures. From this project, we learned that there were gaps in our understanding of how our underlying graphics memory worked and in the tooling we used to monitor that memory. We kicked off an effort to understand this graphics memory space at a low level and compiled a set of best practices for developers beginning work on a project. We also documented a set of tips and tools for debugging and optimizing surface cache should a problem arise.
As part of that effort, we expanded our suite of build-over-build automated tests to increase coverage across our different SDK versions on real and reference devices to detect spikes/regressions in our surface cache usage.
We began logging SCOB errors with more detail in production so we can target the specific areas of the app that we need to optimize. We also are now surfacing surface cache errors as notifications in the dev environment so developers can catch them sooner.
And we improved our surface cache inspector tool to be more user friendly and to integrate with our Chrome DevTools debugger:
As UI engineers on the TVUI platform at Netflix, we have the challenge of delivering ambitious UI experiences to a highly fragmented ecosystem of devices with a wide range of performance characteristics. It’s important for us to reach as many devices as possible in order to give our members the best possible experience.
The solutions we developed while scaling the Rich Collection row have helped inform how we approach ambitious UI projects going forward. With our optimizations and fallback experiences we were able to almost double the number of devices that were able to get the Rich Collection row.
We are now more thoughtful about designing fallback experiences that degrade gracefully as part of the initial design phase instead of just as a reaction to problems we encounter in the development phase. This puts us in a position of being able to scale an experience very quickly with a set of knobs and levers that can be used to tune an experience for a specific class of devices.
Most importantly, we received feedback that our members enjoyed our Rich Collection row experience — both the full and fallback experiences — when we rolled them out globally at the end of 2018.
If this interests you and want to help build the future UIs for discovering and watching shows and movies, join our team!
Hack Days are a big deal at Netflix. They’re a chance to bring together employees from all our different disciplines to explore new ideas and experiment with emerging technologies.
For the most recent hack day, we channeled our creative energy towards our studio efforts. The goal remained the same: team up with new colleagues and have fun while learning, creating, and experimenting. We know even the silliest idea can spur something more.
The most important value of hack days is that they support a culture of innovation. We believe in this work, even if it never ships, and love to share the creativity and thought put into these ideas.
Below, you can find videos made by the hackers of some of our favorite hacks from this event.
Project Rumble Pak
You’re watching your favorite episode of Voltron when, after a suspenseful pause, there’s a huge explosion — and your phone starts to vibrate in your hands.
The Project Rumble Pak hack day project explores how haptics can enhance the content you’re watching. With every explosion, sword clank, and laser blast, you get force feedback to amp up the excitement.
For this project, we synchronized Netflix content with haptic effects using Immersion Corporation technology.
Introducing The Voice of Netflix. We trained a neural net to spot words in Netflix content and reassemble them into new sentences on demand. For our stage demonstration, we hooked this up to a speech recognition engine to respond to our verbal questions in the voice of Netflix’s favorite characters. Try it out yourself at blogofsomeguy.com/v!
TerraVision re-envisions the creative process and revolutionizes the way our filmmakers can search and discover filming locations. Filmmakers can drop a photo of a look they like into an interface and find the closest visual matches from our centralized library of locations photos. We are using a computer vision model trained to recognize places to build reverse image search functionality. The model converts each image into a small dimensional vector, and the matches are obtained by computing the nearest neighbors of the query.
Have you ever found yourself needing to give the Evil Eye™ to colleagues who are hogging your conference room after their meeting has ended?
Our hack is a simple web application that allows employees to select a Netflix meeting room anywhere in the world, and press a button to kick people out of their meeting room if they have overstayed their meeting. First, the app looks up calendar events associated with the room and finds the latest meeting in the room that should have already ended. It then automatically calls in to that meeting and plays walk-off music similar to the Oscar’s to not-so-subtly encourage your colleagues to Get Out! We built this hack using Java (Springboot framework), the Google OAuth and Calendar APIs (for finding rooms) and Twilio API (for calling into the meeting), and deployed it on AWS.
By Pythonistas at Netflix, coordinated by Amjith Ramanujam and edited by Ellen Livengood
As many of us prepare to go to PyCon, we wanted to share a sampling of how Python is used at Netflix. We use Python through the full content lifecycle, from deciding which content to fund all the way to operating the CDN that serves the final video to 148 million members. We use and contribute to many open-source Python packages, some of which are mentioned below. If any of this interests you, check out the jobs site or find us at PyCon. We have donated a few Netflix Originals posters to the PyLadies Auction and look forward to seeing you all there.
Open Connect is Netflix’s content delivery network (CDN). An easy, though imprecise, way of thinking about Netflix infrastructure is that everything that happens before you press Play on your remote control (e.g., are you logged in? what plan do you have? what have you watched so we can recommend new titles to you? what do you want to watch?) takes place in Amazon Web Services (AWS), whereas everything that happens afterwards (i.e., video streaming) takes place in the Open Connect network. Content is placed on the network of servers in the Open Connect CDN as close to the end user as possible, improving the streaming experience for our customers and reducing costs for both Netflix and our Internet Service Provider (ISP) partners.
Various software systems are needed to design, build, and operate this CDN infrastructure, and a significant number of them are written in Python. The network devices that underlie a large portion of the CDN are mostly managed by Python applications. Such applications track the inventory of our network gear: what devices, of which models, with which hardware components, located in which sites. The configuration of these devices is controlled by several other systems including source of truth, application of configurations to devices, and back up. Device interaction for the collection of health and other operational data is yet another Python application. Python has long been a popular programming language in the networking space because it’s an intuitive language that allows engineers to quickly solve networking problems. Subsequently, many useful libraries get developed, making the language even more desirable to learn and use.
Demand Engineering is responsible for Regional Failovers, Traffic Distribution, Capacity Operations, and Fleet Efficiency of the Netflix cloud. We are proud to say that our team’s tools are built primarily in Python. The service that orchestrates failover uses numpy and scipy to perform numerical analysis, boto3 to make changes to our AWS infrastructure, rq to run asynchronous workloads and we wrap it all up in a thin layer of Flask APIs. The ability to drop into a bpython shell and improvise has saved the day more than once.
We are heavy users of Jupyter Notebooks and nteract to analyze operational data and prototype visualization tools that help us detect capacity regressions.
The CORE team uses Python in our alerting and statistical analytical work. We lean on many of the statistical and mathematical libraries (numpy, scipy, ruptures, pandas) to help automate the analysis of 1000s of related signals when our alerting systems indicate problems. We’ve developed a time series correlation system used both inside and outside the team as well as a distributed worker system to parallelize large amounts of analytical work to deliver results quickly.
Python is also a tool we typically use for automation tasks, data exploration and cleaning, and as a convenient source for visualization work.
Monitoring, alerting and auto-remediation
The Insight Engineering team is responsible for building and operating the tools for operational insight, alerting, diagnostics, and auto-remediation. With the increased popularity of Python, the team now supports Python clients for most of their services. One example is the Spectator Python client library, a library for instrumenting code to record dimensional time series metrics. We build Python libraries to interact with other Netflix platform level services. In addition to libraries, the Winston and Bolt products are also built using Python frameworks (Gunicorn + Flask + Flask-RESTPlus).
The information security team uses Python to accomplish a number of high leverage goals for Netflix: security automation, risk classification, auto-remediation, and vulnerability identification to name a few. We’ve had a number of successful Python open sources, including Security Monkey (our team’s most active open source project). We leverage Python to protect our SSH resources using Bless. Our Infrastructure Security team leverages Python to help with IAM permission tuning using Repokid. We use Python to help generate TLS certificates using Lemur.
Some of our more recent projects include Prism: a batch framework to help security engineers measure paved road adoption, risk factors, and identify vulnerabilities in source code. We currently provide Python and Ruby libraries for Prism. The Diffy forensics triage tool is written entirely in Python. We also use Python to detect sensitive data using Lanius.
We use Python extensively within our broader Personalization Machine Learning Infrastructure to train some of the Machine Learning models for key aspects of the Netflix experience: from our recommendation algorithms to artwork personalization to marketing algorithms. For example, some algorithms use TensorFlow, Keras, and PyTorch to learn Deep Neural Networks, XGBoost and LightGBM to learn Gradient Boosted Decision Trees or the broader scientific stack in Python (e.g. numpy, scipy, sklearn, matplotlib, pandas, cvxpy). Because we’re constantly trying out new approaches, we use Jupyter Notebooks to drive many of our experiments. We have also developed a number of higher-level libraries to help integrate these with the rest of our ecosystem (e.g. data access, fact logging and feature extraction, model evaluation, and publishing).
Machine Learning Infrastructure
Besides personalization, Netflix applies machine learning to hundreds of use cases across the company. Many of these applications are powered by Metaflow, a Python framework that makes it easy to execute ML projects from the prototype stage to production.
Metaflow pushes the limits of Python: We leverage well parallelized and optimized Python code to fetch data at 10Gbps, handle hundreds of millions of data points in memory, and orchestrate computation over tens of thousands of CPU cores.
But Python plays a huge role in how we provide those services. Python is a primary language when we need to develop, debug, explore, and prototype different interactions with the Jupyter ecosystem. We use Python to build custom extensions to the Jupyter server that allows us to manage tasks like logging, archiving, publishing, and cloning notebooks on behalf of our users. We provide many flavors of Python to our users via different Jupyter kernels, and manage the deployment of those kernel specifications using Python.
The Big Data Orchestration team is responsible for providing all of the services and tooling to schedule and execute ETL and Adhoc pipelines.
Many of the components of the orchestration service are written in Python. Starting with our scheduler, which uses Jupyter Notebooks with papermill to provide templatized job types (Spark, Presto, …). This allows our users to have a standardized and easy way to express work that needs to be executed. You can see some deeper details on the subject here. We have been using notebooks as real runbooks for situations where human intervention is required — for example: to restart everything that has failed in the last hour.
Internally, we also built an event-driven platform that is fully written in Python. We have created streams of events from a number of systems that get unified into a single tool. This allows us to define conditions to filter events, and actions to react or route them. As a result of this, we have been able to decouple microservices and get visibility into everything that happens on the data platform.
Our team also built the pygenie client which interfaces with Genie, a federated job execution service. Internally, we have additional extensions to this library that apply business conventions and integrate with the Netflix platform. These libraries are the primary way users interface programmatically with work in the Big Data platform.
Finally, it’s been our team’s commitment to contribute to papermill and scrapbook open source projects. Our work there has been both for our own and external use cases. These efforts have been gaining a lot of traction in the open source community and we’re glad to be able to contribute to these shared projects.
The scientific computing team for experimentation is creating a platform for scientists and engineers to analyze AB tests and other experiments. Scientists and engineers can contribute new innovations on three fronts, data, statistics, and visualizations.
The Metrics Repo is a Python framework based on PyPika that allows contributors to write reusable parameterized SQL queries. It serves as an entry point into any new analysis.
The Causal Models library is a Python & R framework for scientists to contribute new models for causal inference. It leverages PyArrow and RPy2 so that statistics can be calculated seamlessly in either language.
The Visualizations library is based on Plotly. Since Plotly is a widely adopted visualization spec, there are a variety of tools that allow contributors to produce an output that is consumable by our platforms.
The Partner Ecosystem group is expanding its use of Python for testing Netflix applications on devices. Python is forming the core of a new CI infrastructure, including controlling our orchestration servers, controlling Spinnaker, test case querying and filtering, and scheduling test runs on devices and containers. Additional post-run analysis is being done in Python using TensorFlow to determine which tests are most likely to show problems on which devices.
Video Encoding and Media Cloud Engineering
Our team takes care of encoding (and re-encoding) the Netflix catalog, as well as leveraging machine learning for insights into that catalog. We use Python for ~50 projects such as vmaf and mezzfs, we build computer vision solutions using a media map-reduce platform called Archer, and we use Python for many internal projects. We have also open sourced a few tools to ease development/distribution of Python projects, like setupmeta and pickley.
Netflix Animation and NVFX
Python is the industry standard for all of the major applications we use to create Animated and VFX content, so it goes without saying that we are using it very heavily. All of our integrations with Maya and Nuke are in Python, and the bulk of our Shotgun tools are also in Python. We’re just getting started on getting our tooling in the cloud, and anticipate deploying many of our own custom Python AMIs/containers.
Content Machine Learning, Science & Analytics
The Content Machine Learning team uses Python extensively for the development of machine learning models that are the core of forecasting audience size, viewership, and other demand metrics for all content.
Back in January, a coalition of companies and organizations with ties to the entertainment industries called on local telecoms regulator CRTC to implement a national website blocking regime.
Under the banner of Fairplay Canada, members including Bell, Cineplex, Directors Guild of Canada, Maple Leaf Sports and Entertainment, Movie Theatre Association of Canada, and Rogers Media, spoke of an industry under threat from marauding pirates. But just how serious is this threat?
The results of a new survey commissioned by Innovation Science and Economic Development Canada (ISED) in collaboration with the Department of Canadian Heritage (PCH) aims to shine light on the problem by revealing the online content consumption habits of citizens in the Great White North.
While there are interesting findings for those on both sides of the site-blocking debate, the situation seems somewhat removed from the Armageddon scenario predicted by the entertainment industries.
Carried out among 3,301 Canadians aged 12 years and over, the Kantar TNS study aims to cover copyright infringement in six key content areas – music, movies, TV shows, video games, computer software, and eBooks. Attitudes and behaviors are also touched upon while measuring the effectiveness of Canada’s copyright measures.
General Digital Content Consumption
In its introduction, the report notes that 28 million Canadians used the Internet in the three-month study period to November 27, 2017. Of those, 22 million (80%) consumed digital content. Around 20 million (73%) streamed or accessed content, 16 million (59%) downloaded content, while 8 million (28%) shared content.
Music, TV shows and movies all battled for first place in the consumption ranks, with 48%, 48%, and 46% respectively.
According to the study, the majority of Canadians do things completely by the book. An impressive 74% of media-consuming respondents said that they’d only accessed material from legal sources in the preceding three months.
The remaining 26% admitted to accessing at least one illegal file in the same period. Of those, just 5% said that all of their consumption was from illegal sources, with movies (36%), software (36%), TV shows (34%) and video games (33%) the most likely content to be consumed illegally.
Interestingly, the study found that few demographic factors – such as gender, region, rural and urban, income, employment status and language – play a role in illegal content consumption.
“We found that only age and income varied significantly between consumers who infringed by downloading or streaming/accessing content online illegally and consumers who did not consume infringing content online,” the report reads.
“More specifically, the profile of consumers who downloaded or streamed/accessed infringing content skewed slightly younger and towards individuals with household incomes of $100K+.”
Licensed services much more popular than pirate haunts
It will come as no surprise that Netflix was the most popular service with consumers, with 64% having used it in the past three months. Sites like YouTube and Facebook were a big hit too, visited by 36% and 28% of content consumers respectively.
Overall, 74% of online content consumers use licensed services for content while 42% use social networks. Under a third (31%) use a combination of peer-to-peer (BitTorrent), cyberlocker platforms, or linking sites. Stream-ripping services are used by 9% of content consumers.
“Consumers who reported downloading or streaming/accessing infringing content only are less likely to use licensed services and more likely to use peer-to-peer/cyberlocker/linking sites than other consumers of online content,” the report notes.
Attitudes towards legal consumption & infringing content
In common with similar surveys over the years, the Kantar research looked at the reasons why people consume content from various sources, both legal and otherwise.
Convenience (48%), speed (36%) and quality (34%) were the most-cited reasons for using legal sources. An interesting 33% of respondents said they use legal sites to avoid using illegal sources.
On the illicit front, 54% of those who obtained unauthorized content in the previous three months said they did so due to it being free, with 40% citing convenience and 34% mentioning speed.
Almost six out of ten (58%) said lower costs would encourage them to switch to official sources, with 47% saying they’d move if legal availability was improved.
Canada’s ‘Notice-and-Notice’ warning system
People in Canada who share content on peer-to-peer systems like BitTorrent without permission run the risk of receiving an infringement notice warning them to stop. These are sent by copyright holders via users’ ISPs and the hope is that the shock of receiving a warning will turn consumers back to the straight and narrow.
The study reveals that 10% of online content consumers over the age of 12 have received one of these notices but what kind of effect have they had?
“Respondents reported that receiving such a notice resulted in the following: increased awareness of copyright infringement (38%), taking steps to ensure password protected home networks (27%), a household discussion about copyright infringement (27%), and discontinuing illegal downloading or streaming (24%),” the report notes.
While these are all positives for the entertainment industries, Kantar reports that almost a quarter (24%) of people who receive a notice simply ignore them.
Once upon a time, people obtaining music via P2P networks was cited as the music industry’s greatest threat but, with the advent of sites like YouTube, so-called stream-ripping is the latest bogeyman.
According to the study, 11% of Internet users say they’ve used a stream-ripping service. They are most likely to be male (62%) and predominantly 18 to 34 (52%) years of age.
“Among Canadians who have used a service to stream-rip music or entertainment, nearly half (48%) have used stream-ripping sites, one-third have used downloader apps (38%), one-in-seven (14%) have used a stream-ripping plug-in, and one-in-ten (10%) have used stream-ripping software,” the report adds.
Set-Top Boxes and VPNs
Few general piracy studies would be complete in 2018 without touching on set-top devices and Virtual Private Networks and this report doesn’t disappoint.
More than one in five (21%) respondents aged 12+ reported using a VPN, with the main purpose of securing communications and Internet browsing (57%).
A relatively modest 36% said they use a VPN to access free content while 32% said the aim was to access geo-blocked content unavailable in Canada. Just over a quarter (27%) said that accessing content from overseas at a reasonable price was the main motivator.
One in ten (10%) of respondents reported using a set-top box, with 78% stating they use them to access paid-for content. Interestingly, only a small number say they use the devices to infringe.
“A minority use set-top boxes to access other content that is not legal or they are unsure if it is legal (16%), or to access live sports that are not legal or they are unsure if it is legal (11%),” the report notes.
“Individuals who consumed a mix of legal and illegal content online are more likely to use VPN services (42%) or TV set-top boxes (21%) than consumers who only downloaded or streamed/accessed legal content.”
Kantar says that the findings of the report will be used to help policymakers evaluate how Canada’s Copyright Act is coping with a changing market and technological developments.
“This research will provide the necessary information required to further develop copyright policy in Canada, as well as to provide a foundation to assess the effectiveness of the measures to address copyright infringement, should future analysis be undertaken,” it concludes.
In April 2017, the first episode of the brand new season of Netflix’s Orange is the New Black was uploaded to The Pirate Bay, months ahead of its official release date.
The leak was the work of a hacking entity calling itself TheDarkOverlord (TDO). One of its members had contacted TorrentFreak months earlier claiming that the content was in its hands but until the public upload, nothing could be confirmed.
TDO told us it had obtained the episodes after hacking the systems of Hollywood-based Larson Studios, an ADR (additional dialogue recorded) studio, back in 2016. TDO had attempted to blackmail the company into paying a bitcoin ransom but when it wasn’t forthcoming, TDO pressed the nuclear button.
Netflix responded by issuing a wave of takedown notices but soon TDO moved onto a new target. In June 2017, TDO followed up on an earlier threat to leak content owned by ABC.
But while TDO was perhaps best known for its video-leaking exploits, the group’s core ‘business’ was hacking what many perceived to be softer targets. TDO ruthlessly slurped confidential data from weakly protected computer systems at medical facilities, private practices, and businesses large and small.
In each case, the group demanded ransoms in exchange for silence and leaked sensitive data to the public if none were paid. With dozens of known targets, TDO found itself at the center of an international investigation, led by the FBI. That now appears to have borne some fruit, with the arrest of an individual in Serbia.
Serbian police say that members of its Ministry of Internal Affairs, Criminal Police Directorate (UCC), in coordination with the Special Prosecution for High-Tech Crime, have taken action against a suspected member of TheDarkOverlord group.
Police say they tracked down a Belgrade resident, who was arrested and taken into custody. Identified only by the initials “S.S”, police say the individual was born in 1980 but have released no further personal details. A search of his apartment and other locations led to the seizure of items of digital equipment.
“According to the order of the Special Prosecutor’s Office for High-Tech Crime, criminal charges will be brought against him because of the suspicion that he committed the criminal offense of unauthorized access to a protected computer, computer networks and electronic processing, and the criminal offense of extortion,” a police statement reads.
In earlier correspondence with TF, the TDO member always gave the impression of working as part of a team but we only had a single contact point which appeared to be the same person. However, Serbian authorities say the larger investigation is aimed at uncovering “a large number of people” who operate under the banner of “TheDarkOverlord”.
Since June 2016, the group is said to have targeted at least 50 victims while demanding bitcoin ransoms to avoid disclosure of their content. Serbian authorities say that on the basis of available data, TDO received payments of more than $275,000.
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.