Isometric graphics give 2D games the illusion of depth. Mark Vanstone explains how to make an isometric game map of your own.
Published by Quicksilva in 1983, Ant Attack was one of the earliest games to use isometric graphics. And you threw grenades at giant ants. It was brilliant.
Most early arcade games were 2D, but in 1982, a new dimension emerged: isometric projection. The first isometric game to hit arcades was Sega’s pseudo-3D shooter, Zaxxon. The eye-catching format soon caught on, and other isometric titles followed: Q*bert came out the same year, and in 1983 the first isometric game for home computers was published: Ant Attack, written by Sandy White.
Ant Attack was first released on the ZX Spectrum, and the aim of the game was for the player to find and rescue a hostage in a city infested with giant ants. The isometric map has since been used by countless titles, including Ultimate Play The Game’s classics Knight Lore and Alien 8, and my own educational history series ArcVenture.
Let’s look at how an isometric display is created, and code a simple example of how this can be done in Pygame Zero — so let’s start with the basics. The isometric view displays objects as if you’re looking down at 45 degrees onto them, so the top of a cube looks like a diamond shape. The scene is made by drawing cubes on a diagonal grid so that the cubes overlap and create solid-looking structures. Additional layers can be used above them to create the illusion of height.
Blocks are drawn from the back forward, one line at a time and then one layer on top of another until the whole map is drawn.
The cubes are actually two-dimensional bitmaps, which we start printing at the top of the display and move along a diagonal line, drawing cubes as we go. The map is defined by a three-dimensional list (or array). The list is the width of the map by the height of the map, and has as many layers as we want to represent in the upward direction. In our example, we’ll represent the floor as the value 0 and a block as value 1. We’ll make a border around the map and create some arches and pyramids, but you could use any method you like — such as a map editor — to create the map data.
To make things a bit easier on the processor, we only need to draw cubes that are visible in the window, so we can do a check of the coordinates before we draw each cube. Once we’ve looped over the x, y, and z axes of the data list, we should have a 3D map displayed. The whole map doesn’t fit in the window, and in a full game, the map is likely to be many times the size of the screen. To see more of the map, we can add some keyboard controls.
If we detect keyboard presses in the update() function, all we need to do to move the map is change the coordinates we start drawing the map from. If we start drawing further to the left, the right-hand side of the map emerges, and if we draw the map higher, the lower part of the map can be seen.
We now have a basic map made of cubes that we can move around the window. If we want to make this into a game, we can expand the way the data represents the display. We could add differently shaped blocks represented by different numbers in the data, and we could include a player block which gets drawn in the draw() function and can be moved around the map. We could also have some enemies moving around — and before we know it, we’ll have a game a bit like Ant Attack.
When writing games with large isometric maps, an editor will come in handy. You can write your own, but there are several out there that you can use. One very good one is called Tiled and can be downloaded free from mapeditor.org. Tiled allows you to define your own tilesets and export the data in various formats, including JSON, which can be easily read into Python.
Get your copy of Wireframe issue 15
You can read more features like this one in Wireframe issue 15, available now at Tesco, WHSmith, and all good independent UK newsagents.
Effective animation gave Donkey Kong barrels of personality. Raspberry Pi’s own Rik Cross explains how to create a similar walk cycle.
Donkey Kong wasn’t the first game to feature an animated character who could walk and jump, but on its release in 1981, it certainly had more personality than the games that came before it. You only have to compare Donkey Kong to another Nintendo arcade game that came out just two years earlier — the half-forgotten top-down shooter Sheriff — to see how quickly both technology and pixel art moved on in that brief period. Although simple by modern standards, Donkey Kong’s hero Jumpman (later known as Mario) packed movement and personality into just a few frames of animation.
In this article, I’ll show you how to use Python and Pygame to create a character with a simple walk cycle animation like Jumpman’s in Donkey Kong. The code can, however, be adapted for any game object that requires animation, and even for multiple game object animations, as I’ll explain later.
Jumpman’s (aka Mario’s) walk cycle comprised just three frames of animation.
Firstly, we’ll need some images to animate. As this article is focused on the animation code and not the theory behind creating walk cycle images, I grabbed some suitable images created by Kenney Vleugels and available at opengameart.org.
Let’s start by animating the player with a simple walk cycle. The two images to be used in the animation are stored in an images list, and an animationindex variable keeps track of the index of the current image in the list to display. So, for a very simple animation with just two different frames, the images list will contain two different images:
images = [‘walkleft1’,‘walkleft2’
To achieve a looping animation, the animationindex is repeatedly incremented, and is reset to 0 once the end of the images list is reached. Displaying the current image can then be achieved by using the animationindex to reference and draw the appropriate image in the animation cycle:
A list of images along with an index is used to loop through an animation cycle.
The problem with the code described so far is that the animationindex is incremented once per frame, and so the walk cycle will happen way too quickly, and won’t look natural. To solve this problem, we need to tell the player to update its animation every few frames, rather than every frame. To achieve this, we need another couple of variables; I’ll use animationdelay to store the number of frames to skip between displayed images, and animationtimer to store the number of frames since the last image change.
Therefore, the code needed to animate the player becomes:
So we have a player that appears to be walking, but now the problem is that the player walks constantly, and always in the same direction! The rest of this article will show you how to solve these two related problems.
There are a few different ways to approach this problem, but the method I’ll use is to make use of game object states, and then have different animations for each state. This method is a little more complicated, but it’s very adaptable.
The first thing to do is to decide on what the player’s ‘states’ might be — stand, walkleft, and walkright will do as a start. Just as we did with our previous single animation, we can now define a list of images for each of the possible player’s states. Again, there are lots of ways of structuring this data, but I’ve opted for a Python dictionary linking states and image lists:
The correct player state can then be set by getting the keyboard input, setting the player to walkleft if the left arrow key is pressed or walkright if the right arrow key is pressed. If neither key is pressed, the player can be set to a stand state; the image list for which contains a single image of the player facing the camera.
Animation cycles can be linked to player ‘states’.
For simplicity, a maximum of two images are used for each animation cycle; adding more images would create a smoother or more realistic animation.
Using the code above, it would also be possible to easily add additional states for, say, jumping or fighting enemies. You could even take things further by defining an Animation() object for each player state. This way, you could specify the speed and other properties (such as whether or not to loop) for each animation separately, giving you greater flexibility.
Netflix brings delightful customer experiences to homes on a variety of devices that continues to grow each day. The device ecosystem is rich with partners ranging from Silicon-on-Chip (SoC) manufacturers, Original Design Manufacturer (ODM) and Original Equipment Manufacturer (OEM) vendors.
Partners across the globe leverage Netflix device certification process on a continual basis to ensure that quality products and experiences are delivered to their customers. The certification process involves the verification of partner’s implementation of features provided by the Netflix SDK.
The Partner Device Ecosystem organization in Netflix is responsible for ensuring successful integration and testing of the Netflix application on all partner devices. Netflix engineers run a series of tests and benchmarks to validate the device across multiple dimensions including compatibility of the device with the Netflix SDK, device performance, audio-video playback quality, license handling, encryption and security. All this leads to a plethora of test cases, most of them automated, that need to be executed to validate the functionality of a device running Netflix.
With a collection of tests that, by nature, are time consuming to run and sometimes require manual intervention, we need to prioritize and schedule test executions in a way that will expedite detection of test failures. There are several problems efficient test scheduling could help us solve:
Quickly detect a regression in the integration of the Netflix SDK on a consumer electronic or MVPD (multichannel video programming distributor) device.
Detect a regression in a test case. Using the Netflix Reference Application and known good devices, ensure the test case continues to function and tests what is expected.
When code many test cases are dependent on has changed, choose the right test cases among thousands of affected tests to quickly validate the change before committing it and running extensive, and expensive, tests.
Choose the most promising subset of tests out of thousands of test cases available when running continuous integration against a device.
Recommend a set of test cases to execute against the device that would increase the probability of failing the device in real-time.
Solving the above problems could help Netflix and our Partners save time and money during the entire lifecycle of device design, build, test, and certification.
These problems could be solved in several different ways. In our quest to be objective, scientific, and inline with the Netflix philosophy of using data to drive solutions for intriguing problems, we proceeded by leveraging machine learning.
In the case of continuously testing a Netflix SDK integration on a new device, we usually lack relevant data for model training in the early phases of integration. In this situation training an agent is a great fit as it allows us to start with very little input data and let the agent explore and exploit the patterns it learns in the process of SDK integration and regression testing. The agent in reinforcement learning is an entity that performs a decision on what action to take considering the current state of the environment, and gets a reward based on the quality of the action.
We built a system called Lerner that consists of a set of microservices and a python library that allows scalable agent training and inference for test case scheduling. We also provide an API client in Python.
Lerner works in tandem with our continuous integration framework that executes on-device tests using the Netflix Test Studio platform. Tests are run on Netflix Reference Applications (running as containers on Titus), as well as on physical devices.
There were several motivations that led to building a custom solution:
We wanted to keep the APIs and integrations as simple as possible.
We needed a way to run agents and tie the runs to the internal infrastructure for analytics, reporting, and visualizations.
We wanted the to tool be available as a standalone library as well as scalable API service.
Lerner provides ability to setup any number of agents making it the first component in our re-usable reinforcement learning framework for device certification.
Lerner, as a web-service, relies on Amazon Web Services (AWS) and Netflix’s Open Source Software (OSS) tools. We use Spinnaker to deploy instances and host the API containers on Titus — which allows fast deployment times and rapid scalability. Lerner uses AWS services to store binary versions of the agents, agent configurations, and training data. To maintain the quality of Lerner APIs, we are using the server-less paradigm for Lerner’s own integration testing by utilizing AWS Lambda.
The agent training library is written in Python and supports versions 2.7, 3.5, 3.6, and 3.7. The library is available in the artifactory repository for easy installation. It can be used in Python notebooks — allowing for rapid experimentation in isolated environments without a need to perform API calls. The agent training library exposes different types of learning agents that utilize neural networks to approximate action.
The neural network (NN)-based agent uses a deep net with fully connected layers. The NN gets the state of a particular test case (the input) and outputs a continuous value, where a higher number means an earlier position in a test execution schedule. The inputs to the neural network include: general historical features such as the last N executions and several domain specific features that provide meta-information about a test case.
The Lerner APIs are split into three areas:
Storing execution results.
Getting recommendations based on the current state of the environment.
Assign reward to the agent based on the execution result and predicted recommendations.
A process of getting recommendations and rewarding the agent using APIs consists of 4 steps:
Out of all available test cases for a particular job — form a request that can be interpreted by Lerner. This involves aggregation of historical results and additional features.
Lerner returns a recommendation identified with a unique episode id.
A CI system can execute the recommendation and submit the execution results to Lerner based on the episode id.
Call an API to assign a reward based on the agent id and episode id.
Below is a diagram of the services and persistence layers that support the functionality of the Lerner API.
The self-service nature of the tool makes it easy for service owners to integrate with Lerner, create agents, ask agents for recommendations and reward them after execution results are available.
The metrics relevant to the training and recommendation process are reported to Atlas and visualized using Netflix’s Lumen. Users of the service can track the statistics specific to the agents they setup and deploy, which allows them to build their own dashboards.
We have identified some interesting patterns while doing online reinforcement learning.
The recommendation/execution reward cycle can happen without any prior training data.
We can bootstrap several CI jobs that would use agents with different reward functions, and gain additional insight based on agents performance. It could help us design and implement more targeted reward functions.
We can keep a small amount of historical data to train agents. The data can be truncated after each execution and offloaded to a long-term storage for further analysis.
Some of the downsides:
It might take time for an agent to stop exploring and start exploiting the accumulated experience.
As agents stored in a binary format in the database, an update of an agent from multiple jobs could cause a race condition in its state. Handling concurrency in the training process is cumbersome and requires trade offs. We achieved the desired state by relying on the locking mechanisms of the underlying persistence layer that stores and serves agent binaries.
Thus, we have the luxury of training as many agents as we want that could prioritize and recommend test cases based on their unique learning experiences.
We are currently piloting the system and have live agents serving predictions for various CI runs. At the moment we run Lerner-based CIs in parallel with CIs that either execute test cases in random order or use simple heuristics as sorting test cases by time and execute everything that previously failed.
The system was built with simplicity and performance in mind, so the set of APIs are minimal. We developed client libraries that allow seamless, but opinionated, integration with Lerner.
We collect several metrics to evaluate the performance of a recommendation, with main metrics being time taken to first failure and time taken to complete a whole scheduled run.
Lerner-based recommendations are proving to be different and more insightful than random runs, as they allow us to fit a particular time budget and detect patterns such as cases that tend to fail together in a cluster, cases that haven’t been run in a long time, and so on.
The below graphs shows more or less an artificial case when a schedule of 100+ test cases would contain several flaky tests. The Y-axis represents how many minutes it took to complete the schedule or reach a first failed test case. In blue, we have random recommendations with no time budget constraints. In green you can see executions based on Lerner recommendations under a time constraint of 60 minutes. The green spikes represent Lerner exploring the environment, where the wiggly lines around 0 are the executions that failed quickly as Lerner was exploiting its policy.
The next phases of the project will focus on:
Reward functions that are aware of a comprehensive domain context, such as assigning appropriate rewards to states where infrastructure is fragile and test case could not be run appropriately.
Administrative user-interface to manage agents.
More generic, simple, and user-friendly framework for reinforcement learning and agent deployment.
Using Lerner on all available CIs jobs against all SDK versions.
Experiment with different neural network architectures.
If you would like to be a part of our team, come join us.
Unparalleled depth in a 2D game: PyGame Zero extraordinaire Daniel Pope shows you how to recreate a zooming starfield effect straight out of the eighties arcade classic Gyruss.
The crowded, noisy realm of eighties amusement arcades presented something of a challenge for developers of the time: how can you make your game stand out from all the other ones surrounding it? Gyruss, released by Konami in 1983, came up with one solution. Although it was yet another alien blaster — one of a slew of similar shooters that arrived in the wake of Space Invaders, released in 1978 — it differed in one important respect: its zooming starfield created the illusion that the player’s craft was hurtling through space, and that aliens were emerging from the abyss to attack it.
This made Gyruss an entry in the ‘tube shooter’ genre — one that was first defined by Atari’s classic Tempest in 1981. But where Tempest used a vector display to create a 3D environment where enemies clambered up a series of tunnels, Gyruss used more common hardware and conventional sprites to render its aliens on the screen. Gyruss was designed by Yoshiki Okamoto (who would later go on to produce the hit Street Fighter II, among other games, at Capcom), and was born from his affection for Galaga, a 2D shoot-’em-up created by Namco.
Under the surface, Gyruss is still a 2D game like Galaga, but the cunning use of sprite animation and that zooming star effect created a sense of dynamism that its rivals lacked. The tubular design also meant that the player could move in a circle around the edge of the play area, rather than moving left and right at the bottom of the screen, as in Galaga and other fixed-screen shooters like it. Gyruss was one of the most popular arcade games of its period, probably in part because of its attention-grabbing design.
The code sample above, written by Daniel Pope, shows you how a zooming star field can work in PyGame Zero — and how, thanks to modern hardware, we can heighten the sense of movement in a way that Konami’s engineers couldn’t have hoped to achieve about 30 years ago. The code generates a cluster of stars on the screen, and creates the illusion of depth and movement by redrawing them in a new position in a randomly chosen direction each frame.
At the same time, the stars gradually increase their brightness over time, as if they’re getting closer. As a modern twist, Pope has also added an extra warp factor: holding down the Space bar increases the stars’ velocity, making that zoom into space even more exhilarating.
Get your copy of Wireframe issue 13
You can read the rest of the feature in Wireframe issue 13, available now at Tesco, WHSmith, and all good independent UK newsagents.
Note from May 10, 2019: We’ve updated a code sample for accuracy.
Today, AWS Secrets Manager introduced a client-side caching library for Python that improves the availability and latency of accessing and distributing credentials to your applications. It can also help you reduce the cost associated with retrieving secrets. In this post, I’ll walk you through the following topics:
An overview of the Secrets Manager client-side caching library for Python
How to use the Python client-side caching library to retrieve a secret
Here are the key benefits of client-side caching libraries:
Improved availability: You can cache secrets to reduce the impact of network availability issues such as increased response times and temporary loss of network connectivity.
Reduced cost: Retrieving secrets from the cache can reduce the number of API requests made to and billed by Secrets Manager.
Automatic refresh of secrets: The library updates the cache by calling Secrets Manager periodically, ensuring your applications use the most current secret value. This ensures any regularly rotated secrets are automatically retrieved.
Implementation in just two steps: Add the Python library dependency to your application, and then provide the identifier of the secret that you want the library to use.
Using the Secrets Manager client-side caching library for Python
First, I’ll walk you through an example in which I retrieve a secret without using the Python cache. Then I’ll show you how to update your code to use the Python client-side caching library.
Retrieving a secret without using a cache
Using the AWS SDK for Python (Boto3), you can retrieve a secret from Secrets Manager using the API call flow, as shown below.
Figure 1: Diagram showing GetSecretValue API call without the Python cache
The code below demonstrates a GetSecretValue API call to AWS Secrets Manager without using the cache feature. Each time the application makes a call, the AWS Secrets Manager GetSecretValue API will also be called. This increases the secret retrieval latency. Additionally, there is a minor cost associated with an API call made to the AWS Secrets Manager API endpoint.
from botocore.exceptions import ClientError
secret_name = "python-cache-test"
region_name = "us-west-2"
# Create a Secrets Manager client
session = boto3.session.Session()
client = session.client(
# In this sample we only handle the specific exceptions for the 'GetSecretValue' API.
# See https://docs.aws.amazon.com/secretsmanager/latest/apireference/API_GetSecretValue.html
# We rethrow the exception by default.
get_secret_value_response = client.get_secret_value(
except ClientError as e:
if e.response['Error']['Code'] == 'DecryptionFailureException':
# Secrets Manager can't decrypt the protected secret text using the provided KMS key.
# Deal with the exception here, and/or rethrow at your discretion.
# Decrypts secret using the associated KMS CMK.
# Depending on whether the secret is a string or binary, one of these fields will be populated.
if 'SecretString' in get_secret_value_response:
secret = get_secret_value_response['SecretString']
decoded_binary_secret = base64.b64decode(get_secret_value_response['SecretBinary'])
# Your code goes here.
Using the Python client-side caching library to retrieve a secret
Using the Python cache feature, you can now use the cache library to reduce calls to the AWS Secrets Manager API, improving the availability and latency of your application. As shown in the diagram below, when you implement the Python cache, the call to retrieve the secret is routed to the local cache before reaching the AWS Secrets Manager API. If the secret exists in the cache, the application retrieves the secret from the client-side cache. If the secret does not exist in the client-side cache, the request is routed to the AWS Secrets Manager endpoint to retrieve the secret.
Figure 2: Diagram showing GetSecretValue API call using Python client-side cache
In the example below, I’ll implement a Python cache to retrieve the secret from a local cache, and hence avoid calling the AWS Secrets Manager API:
from aws_secretsmanager_caching import SecretCache, SecretCacheConfig
from botocore.exceptions import ClientError
secret_name = "python-cache-test"
region_name = "us-west-2"
# Create a Secrets Manager client
session = boto3.session.Session()
client = session.client(
# Create a cache
cache = SecretCache(SecretCacheConfig(),client)
# Get secret string from the cache
get_secret_value_response = cache.get_secret_string(secret_name)
except ClientError as e:
if e.response['Error']['Code'] == 'DecryptionFailureException':
# Deal with the exception here, and/or rethrow at your discretion.
secret = get_secret_value_response
# Your code goes here.
The cache allows advanced configuration using the SecretCacheConfig library. This library allows you to define cache configuration parameters to help meet your application security, performance, and cost requirements. The SDK enforces the configuration thresholds on maximum cache size, default secret version stage to request, and secret refresh interval between requests. It also allows configuration of various exception thresholds. Further detail on this library is provided in the library.
Based on the secret refresh interval defined in your cache configuration, the cache will check the version of the secret at the defined interval, using the DescribeSecret API to determine if a new version is available. If there is a newer version of the secret, the cache will update to the latest version from AWS Secrets Manager, using the GetSecretValue API. This ensures that an updated version of the secret is available in the cache.
Additionally, the Python client-side cache library allows developers to retrieve secrets from the cache directly, using the secret name through decorator functions. An example of using a decorator function is shown below:
from aws_secretsmanager_caching.decorators import InjectKeywordedSecretString
@InjectKeywordedSecretString('python-cache-test', cache, arg1='secret_key1', arg2='secret_key2')
def my_test_function(arg1, arg2):
test = TestClass()
To delete the secret created in this post, run the command below:
If you have comments about this post, submit them in the Comments section below. If you have questions about anything in this post, start a new thread on the Secrets Manager forum or contact AWS Support.
Want more AWS Security news? Follow us on Twitter.
Rik Cross, Senior Learning Manager here at the Raspberry Pi Foundation, shows you how to recreate the deadly explosions in the classic game, Bomberman.
An early incarnation of Bomberman on the NES; the series is still going strong today under Konami’s wing.
Bomberman was first released in the early 1980s as a tech demo for a BASIC compiler, but soon became a popular series that’s still going today. Bomberman sees players use bombs to destroy enemies and uncover doors behind destructible tiles. In this article, I’ll show you how to recreate the bombs that explode in four directions, destroying parts of the level as well as any players in their path!
The game level is a tilemap stored as a two-dimensional array. Each tile in the map is a Tile object, which contains the tile type, and corresponding image. For simplicity, a tile can be set to one of five types; GROUND, WALL, BRICK, BOMB, or EXPLOSION. In this example code, BRICK and GROUND can be exploded with bombs, but WALL cannot, but of course, this behaviour can be changed.
Each Tile object also has a timer, which is decremented each frame of the game. When a tile’s timer reaches 0, an action is carried out, which is dependent on the tile type. BOMB tiles (and surrounding tiles) turn into EXPLOSION tiles after a short delay, and EXPLOSION tiles eventually turn back into GROUND. At the start of the game, the tilemap for the level is generated, in this case consisting of mostly GROUND, with some WALL and a couple of BRICK tiles. The player starts off in the top-left tile, and moves by using the arrow keys. Pressing the SPACE key will place a bomb in the player’s current tile, which is achieved by setting the Tile at the player’s position to BOMB. The tile’s timer is also set to a small number, and once this timer is decremented to 0, the bomb tile and the tiles around it are set to EXPLOSION.
The bomb explodes outwards in four directions, with a range determined by the RANGE, which in our code is 3. As the bomb explodes out to the right, for example, the tile to the right of the bomb is checked. If such a tile exists (i.e. the position isn’t out of the level bounds) and can be exploded, then the tile’s type is set to EXPLOSION and the next tile to the right is checked. If the explosion moves out of the level bounds, or hits a WALL tile, then the explosion will stop radiating in that direction. This process is then repeated for the other directions.
There’s a nice trick for exploding the bomb without repeating the code four times, and it relies on the sine and cosine values for the four direction angles. The angles are 0° (up), 90° (right), 180° (down) and 270° (left). When exploding to the right (at an angle of 90°), sin(90) is 1 and cos(90) is 0, which corresponds to the offset direction on the x- and y-axis respectively. These values can be multiplied by the tile offset, to explode the bomb in all four directions.
Get your copy of Wireframe issue 12
You can read the rest of the feature in Wireframe issue 12, available now at Tesco, WHSmith, and all good independent UK newsagents.
By Pythonistas at Netflix, coordinated by Amjith Ramanujam and edited by Ellen Livengood
As many of us prepare to go to PyCon, we wanted to share a sampling of how Python is used at Netflix. We use Python through the full content lifecycle, from deciding which content to fund all the way to operating the CDN that serves the final video to 148 million members. We use and contribute to many open-source Python packages, some of which are mentioned below. If any of this interests you, check out the jobs site or find us at PyCon. We have donated a few Netflix Originals posters to the PyLadies Auction and look forward to seeing you all there.
Open Connect is Netflix’s content delivery network (CDN). An easy, though imprecise, way of thinking about Netflix infrastructure is that everything that happens before you press Play on your remote control (e.g., are you logged in? what plan do you have? what have you watched so we can recommend new titles to you? what do you want to watch?) takes place in Amazon Web Services (AWS), whereas everything that happens afterwards (i.e., video streaming) takes place in the Open Connect network. Content is placed on the network of servers in the Open Connect CDN as close to the end user as possible, improving the streaming experience for our customers and reducing costs for both Netflix and our Internet Service Provider (ISP) partners.
Various software systems are needed to design, build, and operate this CDN infrastructure, and a significant number of them are written in Python. The network devices that underlie a large portion of the CDN are mostly managed by Python applications. Such applications track the inventory of our network gear: what devices, of which models, with which hardware components, located in which sites. The configuration of these devices is controlled by several other systems including source of truth, application of configurations to devices, and back up. Device interaction for the collection of health and other operational data is yet another Python application. Python has long been a popular programming language in the networking space because it’s an intuitive language that allows engineers to quickly solve networking problems. Subsequently, many useful libraries get developed, making the language even more desirable to learn and use.
Demand Engineering is responsible for Regional Failovers, Traffic Distribution, Capacity Operations, and Fleet Efficiency of the Netflix cloud. We are proud to say that our team’s tools are built primarily in Python. The service that orchestrates failover uses numpy and scipy to perform numerical analysis, boto3 to make changes to our AWS infrastructure, rq to run asynchronous workloads and we wrap it all up in a thin layer of Flask APIs. The ability to drop into a bpython shell and improvise has saved the day more than once.
We are heavy users of Jupyter Notebooks and nteract to analyze operational data and prototype visualization tools that help us detect capacity regressions.
The CORE team uses Python in our alerting and statistical analytical work. We lean on many of the statistical and mathematical libraries (numpy, scipy, ruptures, pandas) to help automate the analysis of 1000s of related signals when our alerting systems indicate problems. We’ve developed a time series correlation system used both inside and outside the team as well as a distributed worker system to parallelize large amounts of analytical work to deliver results quickly.
Python is also a tool we typically use for automation tasks, data exploration and cleaning, and as a convenient source for visualization work.
Monitoring, alerting and auto-remediation
The Insight Engineering team is responsible for building and operating the tools for operational insight, alerting, diagnostics, and auto-remediation. With the increased popularity of Python, the team now supports Python clients for most of their services. One example is the Spectator Python client library, a library for instrumenting code to record dimensional time series metrics. We build Python libraries to interact with other Netflix platform level services. In addition to libraries, the Winston and Bolt products are also built using Python frameworks (Gunicorn + Flask + Flask-RESTPlus).
The information security team uses Python to accomplish a number of high leverage goals for Netflix: security automation, risk classification, auto-remediation, and vulnerability identification to name a few. We’ve had a number of successful Python open sources, including Security Monkey (our team’s most active open source project). We leverage Python to protect our SSH resources using Bless. Our Infrastructure Security team leverages Python to help with IAM permission tuning using Repokid. We use Python to help generate TLS certificates using Lemur.
Some of our more recent projects include Prism: a batch framework to help security engineers measure paved road adoption, risk factors, and identify vulnerabilities in source code. We currently provide Python and Ruby libraries for Prism. The Diffy forensics triage tool is written entirely in Python. We also use Python to detect sensitive data using Lanius.
We use Python extensively within our broader Personalization Machine Learning Infrastructure to train some of the Machine Learning models for key aspects of the Netflix experience: from our recommendation algorithms to artwork personalization to marketing algorithms. For example, some algorithms use TensorFlow, Keras, and PyTorch to learn Deep Neural Networks, XGBoost and LightGBM to learn Gradient Boosted Decision Trees or the broader scientific stack in Python (e.g. numpy, scipy, sklearn, matplotlib, pandas, cvxpy). Because we’re constantly trying out new approaches, we use Jupyter Notebooks to drive many of our experiments. We have also developed a number of higher-level libraries to help integrate these with the rest of our ecosystem (e.g. data access, fact logging and feature extraction, model evaluation, and publishing).
Machine Learning Infrastructure
Besides personalization, Netflix applies machine learning to hundreds of use cases across the company. Many of these applications are powered by Metaflow, a Python framework that makes it easy to execute ML projects from the prototype stage to production.
Metaflow pushes the limits of Python: We leverage well parallelized and optimized Python code to fetch data at 10Gbps, handle hundreds of millions of data points in memory, and orchestrate computation over tens of thousands of CPU cores.
But Python plays a huge role in how we provide those services. Python is a primary language when we need to develop, debug, explore, and prototype different interactions with the Jupyter ecosystem. We use Python to build custom extensions to the Jupyter server that allows us to manage tasks like logging, archiving, publishing, and cloning notebooks on behalf of our users. We provide many flavors of Python to our users via different Jupyter kernels, and manage the deployment of those kernel specifications using Python.
The Big Data Orchestration team is responsible for providing all of the services and tooling to schedule and execute ETL and Adhoc pipelines.
Many of the components of the orchestration service are written in Python. Starting with our scheduler, which uses Jupyter Notebooks with papermill to provide templatized job types (Spark, Presto, …). This allows our users to have a standardized and easy way to express work that needs to be executed. You can see some deeper details on the subject here. We have been using notebooks as real runbooks for situations where human intervention is required — for example: to restart everything that has failed in the last hour.
Internally, we also built an event-driven platform that is fully written in Python. We have created streams of events from a number of systems that get unified into a single tool. This allows us to define conditions to filter events, and actions to react or route them. As a result of this, we have been able to decouple microservices and get visibility into everything that happens on the data platform.
Our team also built the pygenie client which interfaces with Genie, a federated job execution service. Internally, we have additional extensions to this library that apply business conventions and integrate with the Netflix platform. These libraries are the primary way users interface programmatically with work in the Big Data platform.
Finally, it’s been our team’s commitment to contribute to papermill and scrapbook open source projects. Our work there has been both for our own and external use cases. These efforts have been gaining a lot of traction in the open source community and we’re glad to be able to contribute to these shared projects.
The scientific computing team for experimentation is creating a platform for scientists and engineers to analyze AB tests and other experiments. Scientists and engineers can contribute new innovations on three fronts, data, statistics, and visualizations.
The Metrics Repo is a Python framework based on PyPika that allows contributors to write reusable parameterized SQL queries. It serves as an entry point into any new analysis.
The Causal Models library is a Python & R framework for scientists to contribute new models for causal inference. It leverages PyArrow and RPy2 so that statistics can be calculated seamlessly in either language.
The Visualizations library is based on Plotly. Since Plotly is a widely adopted visualization spec, there are a variety of tools that allow contributors to produce an output that is consumable by our platforms.
The Partner Ecosystem group is expanding its use of Python for testing Netflix applications on devices. Python is forming the core of a new CI infrastructure, including controlling our orchestration servers, controlling Spinnaker, test case querying and filtering, and scheduling test runs on devices and containers. Additional post-run analysis is being done in Python using TensorFlow to determine which tests are most likely to show problems on which devices.
Video Encoding and Media Cloud Engineering
Our team takes care of encoding (and re-encoding) the Netflix catalog, as well as leveraging machine learning for insights into that catalog. We use Python for ~50 projects such as vmaf and mezzfs, we build computer vision solutions using a media map-reduce platform called Archer, and we use Python for many internal projects. We have also open sourced a few tools to ease development/distribution of Python projects, like setupmeta and pickley.
Netflix Animation and NVFX
Python is the industry standard for all of the major applications we use to create Animated and VFX content, so it goes without saying that we are using it very heavily. All of our integrations with Maya and Nuke are in Python, and the bulk of our Shotgun tools are also in Python. We’re just getting started on getting our tooling in the cloud, and anticipate deploying many of our own custom Python AMIs/containers.
Content Machine Learning, Science & Analytics
The Content Machine Learning team uses Python extensively for the development of machine learning models that are the core of forecasting audience size, viewership, and other demand metrics for all content.
Channel your inner Targaryen by building this voice-activated, colour-changing, 3D-printed Drogon before watching the next episode of Game of Thrones.
Winter has come
This is a spoiler-free zone! I’ve already seen the new episode of season 8, but I won’t ruin anything, I promise.
Even if you’ve never watched an episode of Game of Thrones (if so, that’s fine, I don’t judge you), you’re probably aware that the final season has started.
And you might also know that the show has dragons in it — big, hulking, scaley dragons called Rhaegal, Viserion, and Drogon. They look a little something like this:
Well, not anymore. They look like this now:
Raspberry Pi voice-responsive dragon!
The creator of this project goes by the moniker Botmation. To begin with, they 3D printed modified a Drogon model they found on Thingiverse. Then, with Dremel in hand, they modified the print, to replace its eyes with RGB LEDs. Before drawing the LEDs through the hollowed-out body of the model, they soldered them to wires connected to a Raspberry Pi Zero W‘s GPIO pins.
Located in the tin beneath Drogon, the Pi Zero W is also equipped with a microphone and runs the Python code for the project. And thanks to Google’s Speech to Text API, Drogon’s eyes change colour whenever a GoT character repeats one of two keywords: white turns the eyes blue, while fire turns them red.
If you’d like more information about building your own interactive Drogon, here’s a handy video. At the end, Botmation asks viewers to help improve their code for a cleaner voice-activation experience.
Going into the final season of Game of Thrones with your very own 3D printed Drogron dragon! The eyes are made of LEDs that changes between Red and Blue depending on what happens in the show. When you’re watching the show, Drogon will watch the show with you and listen for cues to change the eye color.
Drogon for the throne!
I’ve managed to bag two of the three dragons in the Pi Towers Game of Thrones fantasy league, so I reckon my chances of winning are pretty good thanks to all the points I’ll rack up by killing White Walker.
Wait — does killing a White Walker count as a kill, since they’re already dead?
Not one for rising with the sun, and getting more and more skilled at throwing their watch across the room to snooze their alarm, Reddit user ravenspired decided to hook up a physical bell to a Raspberry Pi and servo motor to create the ultimate morning wake-up call.
This has to be the harshest thing to wake up to EVER!
Wake up, Boo
“I have difficulty waking up in the morning” admits ravenspired, who goes by the name Darks Pi on YouTube. “My watch isn’t doing its job.”
Therefore, ravenspired attached a bell to a servo motor, and the servo motor to a Raspberry Pi. Then they wrote Python code in Raspbian’s free IDE software Thonny that rings the bell when it’s time to get up.
“A while loop searches for what time it is and checks it against my alarm time. When the alarm is active, it sends commands to the servo to move.”
While I’d be concerned about how securely attached the heavy brass bell above my head is, this is still a fun project, and an inventive way to address a common problem.
Atari’s Breakout was one of the earliest video game blockbusters. Here’s how to recreate it in Python.
The original Breakout, designed by Nolan Bushnell and Steve Bristow, and famously built by a young Steve Wozniak.
The games industry owes a lot to the humble bat and ball. Designed by Allan Alcorn in 1972, Pong was a simplified version of table tennis, where the player moved a bat and scored points by ricocheting a ball past their opponent. About four years later, Atari’s Nolan Bushnell and Steve Bristow figured out a way of making Pong into a single-player game. The result was 1976’s Breakout, which rotated Pong’s action 90 degrees and replaced the second player with a wall of bricks.
Points were scored by deflecting the ball off the bat and destroying the bricks; as in Pong, the player would lose the game if the ball left the play area. Breakout was a hit for Atari, and remains one of those game ideas that has never quite faded from view; in the 1980s, Taito’s Arkanoid updated the action with collectible power-ups, multiple stages with different layouts of bricks, and enemies that disrupted the trajectory of the player’s ball.
Breakout had an impact on other genres too: game designer Tomohiro Nishikado came up with the idea for Space Invaders by switching Breakout’s bat with a base that shot bullets, while Breakout’s bricks became aliens that moved and fired back at the player.
The code above, written by Daniel Pope, shows you just how easy it is to get a basic version of Breakout up and running in Python, using the Pygame Zero library. Like Atari’s original, this version draws a wall of blocks on the screen, sets a ball bouncing around, and gives the player a paddle, which can be controlled by moving the mouse left and right. The ball physics are simple to grasp too. The ball has a velocity, vel – which is a vector, or a pair of numbers: vx for the x direction and vy for the y direction.
The program loop checks the position of the ball and whether it’s collided with a brick or the edge of the play area. If the ball hits the left side of the play area, the ball’s x velocity vx is set to positive, thus sending it bouncing to the right. If the ball hits the right side, vx is set to a negative number, so the ball moves left. Likewise, when the ball hits the top or bottom of a brick, we set the sign of the y velocity vy, and so on for the collisions with the bat and the top of the play area and the sides of bricks. Collisions set the sign of vx and vy but never change the magnitude. This is called a perfectly elastic collision.
To this basic framework, you could add all kinds of additional features: a 2012 talk by developers Martin Jonasson and Petri Purho, which you can watch on YouTube here, shows how the Breakout concept can be given new life with the addition of a few modern design ideas.
You can read this feature and more besides in Wireframe issue 11, available now in Tesco, WHSmith, and all good independent UK newsagents.
Or you can buy Wireframe directly from us – worldwide delivery is available. And if you’d like to own a handy digital version of the magazine, you can also download a free PDF.
The program I made lets me bind “actions” (strobe white, flash blue, disable all colors, etc.) to any input and any input type (hold, knob, trigger, etc.). And each action type has a set of parameters that I bind to the input. For example, I have a knob that changes a strobe’s intensity, and another knob that changes its speed.
The program updates each action, pulls its resulting color, and adds them together, then sends that to the LEDs. I’m using rtmidi for reading the midi device and pigpio for handling the LED output.
Rik Cross, Senior Learning Manager here at Raspberry Pi, shows you how to recreate the spawning of objects found in the balloon-bursting arcade gem Pang.
Pang: bringing balloon-hating to the masses since 1989.
Programmed by Mitchell and distributed by Capcom, Pang was first released as an arcade game in 1989, but was later ported to a whole host of home computers, including the ZX Spectrum, Amiga, and Commodore 64. The aim in Pang is to destroy balloons as they bounce around the screen, either alone or working together with another player, in increasingly elaborate levels. Destroying a balloon can sometimes also spawn a power-up, freezing all balloons for a short time or giving the player a better weapon with which to destroy balloons.
Initially, the player is faced with the task of destroying a small number of large balloons. However, destroying a large balloon spawns two smaller balloons, which in turn spawns two smaller balloons, and so on. Each level is only complete once all balloons have been broken up and completely destroyed. To add challenge to the game, different-sized balloons have different attributes – smaller balloons move faster and don’t bounce as high, making them more difficult to destroy.
Rik’s spawning balloons, up and running in Pygame Zero. Hit space to divide them into smaller balloons.
There are a few different ways to achieve this game mechanic, but the approach I’ll take in my example is to use various features of object orientation (as usual, my example code has been written in Python, using the Pygame Zero library). It’s also worth mentioning that for brevity, the example code only deals with simple spawning and destroying of objects, and doesn’t handle balloon movement or collision detection.
The base Enemy class is simply a subclass of Pygame Zero’s Actor class, including a static enemies list to keep track of all enemies that exist within a level. The Enemy subclass also includes a destroy() method, which removes an enemy from the enemies list and deletes the object.
There are then three further subclasses of the Enemy class, called LargeEnemy, MediumEnemy, and SmallEnemy. Each of these subclasses are instantiated with a specific image, and also include a destroy() method. This method simply calls the same destroy() method of its parent Enemy class, but additionally creates two more objects nearby — with large enemies spawning two medium enemies, and medium enemies spawning two small enemies.
In the example code, initially two LargeEnemy objects are created, with the first object in the enemies list having its destroy() method called each time the Space key is pressed. If you run this code, you’ll see that the first large enemy is destroyed and two medium-sized enemies are created. This chain reaction of destroying and creating enemies continues until all SmallEnemy objects are destroyed (small enemies don’t create any other enemies when destroyed).
As I mentioned earlier, this isn’t the only way of achieving this behaviour, and there are advantages and disadvantages to this approach. Using subclasses for each size of enemy allows for a lot of customisation, but could get unwieldy if much more than three enemy sizes are required. One alternative is to simply have a single Enemy class, with a size attribute. The enemy’s image, the entities it creates when destroyed, and even the movement speed and bounce height could all depend on the value of the enemy size.
You can read the rest of the feature in Wireframe issue 10, available now in Tesco, WHSmith, and all good independent UK newsagents.
Or you can buy Wireframe directly from us – worldwide delivery is available. And if you’d like to own a handy digital version of the magazine, you can also download a free PDF.
I do not really have any spare time. (Toddler, job, very demanding cat, lots of LEGO to tidy up.) If I did, I like to imagine that I’d come up with something like this to do with it.
Want to see this collection of junk animate? Scroll down for video.
From someone calling themselves Banjowise (let me know what your real name is in the comments, please, so I can credit you properly here!), here is a pile of junk turned into a weirdly compelling drum machine.
Mechanically speaking, this isn’t too complicated: just a set of solenoids triggered by a Raspberry Pi. The real clever is in the beauteous, browser-based step sequencer Banjowise has built to program the solenoids to wallop things in beautiful rhythm. And in the beauteous, skip-sourced tchotchkes that Banjowise has found for them to wallop. Generously, they’ve made full instructions on making your own available on Instructables. Use any bits and bobs you can get your hands on if old piano hammers and crocodile castanets are not part of the detritus kicking around your house.
My Raspberry Pi based drum / percussion machine. Consisting of 8 12v solenoids, a relay, wooden spoons, a Fullers beer bottle, a crocodile maraca and a few other things. An Instructable on how to build your own is here: https://www.instructables.com/id/A-Raspberry-Pi-Powered-Junk-Drum-Machine/, or take a look at: http://www.banjowise.com/post/automabeat/
The sequencer is lovely: a gorgeously simple user interface that you can run on a tablet, your phone, or anything else with a browser (and it’s very easily adaptable to other projects). The web interface lets Python trigger the GPIO pins over web sockets. There’s a precompiled version available for people who’ve followed Banjowise’s comprehensive wiring instructions, but you can also get the source code from GitHub.
I think I’m getting good, but I can handle criticism.
We love it. Now please excuse me. I need a little while to search online for crocodile castanets.
They add strategy to a genre-defining shooter. Andrew Gillett lifts the lid on Space Invaders’ disintegrating shields.
Released in 1978, Space Invaders introduced ideas so fundamental to video games that it’s hard to imagine a time before them. And it did this using custom-made hardware which by today’s standards is unimaginably slow.
Space Invaders ran on an Intel 8080 CPU operating at 2MHz. With such meagre processing power, merely moving sprites around the screen was a struggle. In modern 2D games, at the start of each frame the entire screen is reset, then all objects are displayed.
For Space Invaders’ hardware, this process would have been too slow. Instead, each time a sprite needs to move, the game first erases the sprite from the screen, then redraws it in the new position. The game also updates only one alien per frame — which leads to the effect of the aliens moving faster when there are fewer of them. These techniques cut down the number of pixels which need to be updated each frame, from nearly 60,000 to around a hundred.
One of Space Invaders’ most notable features is its four shields. These provide shelter from enemy fire, but deteriorate after repeated hits. The player can take advantage of the shields’ destructible nature — by repeatedly firing at the same place on a shield’s underside, a narrow gap can be created which can then be used to take out enemies. (Of course, the player can also be shot through the same gap.)
The system of updating only the minimum necessary number of pixels works well as long as there’s no need for objects to overlap. In the case of the shields, though, what happens when objects do overlap is fundamental to how they work. Whenever a shot hits something, it’s replaced by an explosion sprite. A few frames later, the explosion sprite is deleted from the screen. If the explosion sprite overlapped with a shield, that part of the shield is also deleted.
The code to the right displays four shields, and then bombards them with a series of shots which explode on impact. I’m using sprites which have been scaled up by ten, to make it easier to see what’s going on.
We first create two empty lists — one to hold details of any shots on screen, as well as explosions. These will be displayed on the screen every frame. Each entry in the shots list will be a dictionary data structure containing three values: a position, the sprite to be displayed, and whether the shot is in ‘exploding’ mode — in which case it’s displayed in the same position for a few frames before being deleted.
The second list, to_delete, is for sprites which need to be deleted from the screen. For simplicity, I’m using separate copies of the shot and explosion sprites where the white pixels have been changed to black (the other pixels in these sprites are set as transparent).
The function create_random_shot is called every half-second. The combination of dividing the maximum value by ten, choosing a random whole number between zero and the maximum value, and then multiplying the resulting random number by ten, ensures that the chosen X coordinate is a multiple of ten.
Andrew’s Space Invaders shields up and running in Pygame Zero.
In the draw function, we first check to see if it’s the first frame, as we only want to display the shields on that frame. The screen.blit method is used to display sprites, and Pygame Zero’s images object is used to specify which sprite should be displayed. We then display all sprites in the to_delete list, after which we reset it to being an empty list. Finally we display all sprites in the shots list.
In the update function, we go through all sprites in the shots list, in reverse order. Going through the list backwards avoids problems that can occur when deleting items from a list inside a for loop. For each shot, we first check to see if it’s in ‘exploding’ mode. If so, its timer is reduced each frame — when it hits zero we add the shot to the to_delete list, then delete it from shots.
If the item is a normal shot rather than an explosion, we add its current position to to_delete, then update the shot’s position to move the sprite down the screen. We next check to see if the sprite has either gone off the bottom of the screen or collided with something. Pygame’s get_at method gives us the colour of a pixel at a given position. If a collision occurs, we switch the shot into ‘exploding’ mode — the explosion sprite will be displayed for five frames.
You can read the rest of the feature in Wireframe issue 9, available now in Tesco, WHSmith, and all good independent UK newsagents.
Or you can buy Wireframe directly from us – worldwide delivery is available. And if you’d like to own a handy digital version of the magazine, you can also download a free PDF.
MezzFS — Mounting object storage in Netflix’s media processing platform
By Barak Alon (on behalf of Netflix’s Media Cloud Engineering team)
MezzFS (short for “Mezzanine File System”) is a tool we’ve developed at Netflix that mountscloud objects as local files via FUSE. It’s used extensively in our media processing platform, which includes services like Archer and runs features like video encoding and title image generation on tens of thousands of Amazon EC2 instances. There are similar tools out there, but we’ve developed some unique features like “replays” and “adaptive buffering” that we think are worth sharing.
What problem are we solving?
We are constantly innovating on video encoding technology at Netflix, and we have a lot of content to encode. Video encoding is what MezzFS was originally designed for and remains one of its canonical use cases, so we’ll focus on video encoding to describe the problem that MezzFS solves.
Video encoding is the process of converting an uncompressed video into a compressed format defined by a codec, and it’s an essential part of preparing content to be streamed on Netflix. A single movie at Netflix might be encoded dozens of times for different codecs and video resolutions. Encoding is not a one-time process — large portions of the entire Netflix catalog are re-encoded whenever we’ve made significant advancements in encoding technology.
We scale out video encoding by processing segments of an uncompressed video (we segment movies by scene) in parallel. We have one file — the original, raw movie file — and many worker processes, all encoding different segments of the file. That file is stored in our object storage service, which splits and encrypts the file into separate chunks, storing the chunks in Amazon S3. This object storage service also handles content security, auditing, disaster recovery, and more.
The individual video encoders process their segments of the movie with tools like FFmpeg, which doesn’t speak our object storage service’s API and expects to deal with a file on the local filesystem. Furthermore, the movie file is very large (often several 100s of GB), and we want to avoid downloading the entire file for each individual video encoder that might be processing only a small segment of the whole movie.
This is just one of many use cases that MezzFS supports, but all the use cases share a similar theme: stream the right bits of a remote object efficiently and expose those bits as a file on the filesystem.
The solution: MezzFS
MezzFS is a Python application that implements the FUSE interface. It’s built as a Debian package and installed by applications running on our media processing platform, which use MezzFS’s command line interface to mount remote objects as local files.
MezzFS has a number of features, including:
Stream objects —MezzFS exposes multi-terabyte objects without requiring any disk space.
Assemble and decrypt parts — Our object storage service splits objects into many parts and stores them in S3. MezzFS knows how to assemble and decrypt the parts.
Mount multiple objects —Multiple cloud objects can be mounted on the local filesystem simultaneously.
Disk Caching —MezzFS can be configured to cache objects on the local disk.
Mount ranges of objects —Arbitrary ranges of a cloud object can be mounted as separate files on the local file system. This is particularly useful in media computing, where it is common to mount the frames of a movie scene as separate files.
Regional caching — Netflix operates in multiple AWS regions. If an application in region A is using MezzFS to read from an object stored in region B, MezzFS will cache the object in region A. In addition to improving download speed, this is useful for cutting down on cross-region transfer costs when many workers will be processing the same data — we only pay the transfer costs for one worker, and the rest use the cached object.
Replays — More on this below…
Adaptive buffering — More on this below…
We’ve been using MezzFS in production for 5 years, and have validated it at scale — during a typical week at Netflix, MezzFS performs ~100 million mounts for dozens of different use cases and streams about ~25 petabytes of data.
MezzFS has become a crucial tool for us, and we don’t just send it out into the wild with a packed lunch and hope it will be fine.
MezzFS collects metrics on data throughput, download efficiency, resource usage, etc. in Atlas, Netflix’s in-memory dimensional time series database. Its logs are collected in an ELK stack. But one of the more novel tools we’ve developed for debugging and developing is the MezzFS “replay”.
At mount time, MezzFS can be configured to record a “replay” file. This file includes:
Metadata — This includes: the remote objects that were mounted, the environment in which MezzFS is running, etc.
File operations — All “open” and “read” operations. That is, all mounted files that were opened and every single byte range read that MezzFS received.
Actions — MezzFS records everything it buffers and everything it caches
Statistics — Finally, MezzFS will record various statistics about the mount, including: total bytes downloaded, total bytes read, total time spent reading, etc.
A single replay may include million of file operations, so these files are packed in a custom binary format to minimize their footprint.
Based on these replay files, we’ve built tools that:
Visualize a replay
This has proven very useful for quickly gaining insight into data access patterns and why they might be causing performance issues.
Here’s a GIF of what these visualization look like:
The bytes of a remote object are represented by pixels on the screen, where the top left is the start of the remote object and the bottom right is the end. The different colors mean different things — green means the bytes have been scheduled for downloading, yellow means the bytes are being actively downloaded, blue means the bytes have been successfully returned, etc. What we see in the above visualization is a very simple access pattern — a remote object is mounted and then streamed through sequentially.
Here is a more interesting, “sparse” access pattern, and one that inspired “adaptive buffering” described later in this post. We can see lots of little green bars quickly sprinkle the screen — these bars represent the bytes that were downloaded:
Rerun a replay
We mount the same objects and rerun all of the operations recorded in the replay file. We use this to debug errors and performance issues in specific mounts.
Rerun a batch of replays
We collect replays from actual MezzFS mounts in production, and we rerun large batches of replays for regression and performance tests. We’ve integrated these tests into our build pipeline, where a build will fail if there are any errors across the gamut of replays or if the performance of a new MezzFS commit falls below some threshold. We parallelize rerun jobs with Titus, Netflix’s container management platform, which allows us to exercise many hundreds of replay files in minutes. The results are aggregated in Elasticsearch, allowing us to quickly analyze MezzFS’s performance across the entire batch.
These replays have proven essential for developing optimizations like “adaptive buffering”.
One of the challenges of efficiently streaming bits in a FUSE system is that the kernel will break reads into chunks. This means that if an application reads, for example, 1 GB from a mounted file, MezzFS might receive that as 16,384 consecutive reads of 64KB. Making 16,384 separate HTTP calls to S3 for 64KB will suffer significant overhead, so it’s better to “read ahead” larger chunks of data from S3, speeding up subsequent reads by anticipating that the data will be read sequentially. We call the size of the chunks being read ahead the “buffer size”.
While large buffer sizes speed up sequential data access, they can slow down “sparse” data access — that is, the application is not reading through the file consecutively, but is reading small segments dispersed throughout the file (as shown in the visualization above). In this scenario, most of the buffered data isn’t actually going to be used, leading to a lot of unnecessary downloading and very slow reads.
One option is to expect applications to specify a buffer size when mounting with MezzFS. This is not always easy for application developers to do, since applications might be using third party tools and developers might not actually know their access pattern. It gets even messier when an application changes access patterns during a single MezzFS mount.
With “adaptive buffering,” we aimed to make MezzFS “just work” for a variety of access patterns, without requiring application developers to maintain MezzFS configuration.
How it works
MezzFS records a sliding window of the most recent reads. When it receives a read for data that has not already been buffered, it calculates an appropriate buffer size. It does this by first grouping the window of reads into “clusters”, where a cluster is a contiguous set of reads.
Here’s an illustration of how reads relate to clusters:
If the average number of bytes per read divided by the average number of bytes per cluster is close to 1, we classify the access pattern as “sparse”. In the “sparse” case, we try to match the buffer size to the average number of bytes per read. If number is closer to 0, we classify the access pattern as “dense”, and we set the buffer size to the maximum allowed buffer size divided by the number of clusters (We divide by the number of clusters to account for a common case when an application might have multiple threads all reading different parts from the same file, but each thread is reading its part “densely.” If we used the maximum allowed buffer size for each thread, our buffers would consume too much memory).
Here’s an attempt to represent this logic with some pseudo code:
There is a limit on the throughput you can get out of a single HTTP connection to S3. So when the calculated buffer size is large, we divide the buffer into separate requests and parallelize them across multiple threads. So for “sparse” access patterns we improve performance by choosing a small buffer size, and for “dense” access patterns we improve performance by buffering lots of data in parallel.
How much faster is this?
We’ve been using adaptive buffering in production across a number of different use cases. For the purpose of clarity in demonstration, we used the “rerun a batch of replays” technique described above to run a quick and dirty test comparing the old buffering technique against the new.
Two replay files that represent two canonical access patterns were used:
Dense/Sequential — Sequentially read 1GB from a remote object.
Sparse/Random — Read 32MB in chunks of 64KB, dispersed randomly throughout a remote object.
And we compared two buffering strategies:
Fixed Sized Buffering— This is the old technique, where the buffer size is fixed at 8MB (we chose 8MB as a “one-size-fits-all” buffer size after running some experiments across MezzFS use cases at the time).
Adaptive Buffering— The shiny new technique described above.
We ran each combination of replay file and buffering strategy 10 times each inside containers with 2 Gbps network and 16 CPUs, recording the total time to process all the operations in the replay files. The following table represents the minimum of all 10 runs (while mean and standard deviation often seem like good aggregations, we use minimum here because variability is often caused by other processes interrupting MezzFS, or variability in network conditions outside of MezzFS’s control).
Looking at the dense/sequential replay, fixed buffering has a throughput of ~0.5 Gbps, while adaptive buffering has a throughput of ~1.1Gbps.
While a handful of seconds might not seem worth all the trouble, these seconds become hours for many of our use cases that stream significantly more bytes. And shaving off hours is especially beneficial in latency sensitive workflows, like encoding videos that are released on Netflix the day they are shot.
MezzFS has become a core part of our media transformation and innovation platform. We’ve built some pretty fancy tools around it that we’re actively using to quickly and confidently develop new features and optimizations.
The next big feature on our roadmap is support for writes, which has exciting potential for our next generation media processing platform and our growing, global network of movie production studios.
Netflix’s media processing platform is maintained by the Media Cloud Engineering (MCE) team. If you’re excited about large-scale distributed computing problems in media processing, we’re hiring!
GPIO Zero started out as a friendly API on top of the RPi.GPIO library, but later we extended it to allow other pin libraries to be used. The pigpio library is supported, and that includes the ability to remotely control GPIO pins over the network, or on a Pi Zero over USB.
This also gave us the opportunity to create a “mock” pin factory, so that we could emulate the effect of pin changes without using real Raspberry Pi hardware. This is useful for prototyping without hardware, and for testing. Try it yourself!
As well as the pin factories we provide with the library (RPi.GPIO, pigpio, RPIO, and native), it’s also possible to write your own. So far, I’m aware of only one custom pin factory, and that has been written by the AIY team at Google, who created their own pin factory for the pins on the AIY Vision Kit. This means that you can connect devices to these pins, and use GPIO Zero to program them, despite the fact they’re not connected to the Pi’s own pins.
We had identified some issues with the results from the DistanceSensor class, and we dealt with them in two ways. Firstly, GPIO Zero co-author Dave Jones did some work under the hood of the pins API to use timing information provided by underlying drivers, so that timing events from pins will be considerably more accurate (see #655). Secondly, Dave found that RPi.GPIO would often miss edges during callbacks, which threw off the timing, so we now drop missed edges and get better accuracy as a result (see #719).
The best DistanceSensor results come when using pigpio as your pin factory, so we recommend changing to this if you want more accuracy, especially if you’re using (or deploying to) a Pi 1 or Pi Zero.
A really neat feature of GPIO Zero is the ability to connect devices together easily. One way to do this is to use callback functions:
Another way is to set the source of one device to the values of another device:
led.source = button.values
In GPIO Zero v1.5, we’ve made connecting devices even easier. You can now use the following method to pair devices together:
led.source = button
Read more about this declarative style of programming in the source/values page in the docs. There are plenty of great examples of how you can create projects with these simple connections:
An important part of software development is automated testing. You write tests to check your code does what you want it to do, especially checking the edge cases. Then you write the code to implement the features you’ve written tests for. Then after every change you make, you run your old tests to make sure nothing got broken. We have tools for automating this (thanks pytest, tox, coverage, and Travis CI).
But how do you test a GPIO library? Well, most of the GPIO parts of our test suite use the mock pins interface, so we can test our API works as intended, abstracted from how the pins behave. And while Travis CI only runs tests with mock pins, we also do real testing on Raspberry Pi: there are additional tests that ensure the pins do what they’re supposed to. See the docs chapter on development to learn more about this process, and try it for yourself.
You may remember that the last major GPIO Zero release introduced the pinout command line tool. We’ve added some new art for the Pi 3A+ and 3B+:
pinout also now supports the -x (or --xyz) option, which opens the website pinout.xyz in your web browser.
Zero boilerplate for hardware
The goal of all this is to remove obstacles to physical computing, and Rachel Rayns has designed a wonderful board that makes a great companion to GPIO Zero for people who are learning. Available from The Pi Hut, the PLAY board provides croc-clip connectors for four GPIO pins, GND, and 3V3, along with a set of compatible components:
Since the board simply breaks out GPIO pins, there’s no special software required. You can use Scratch or Python (or anything else).
This release welcomed seven new contributors to the project, including Claire Pollard from PiBorg and ModMyPi, who provided implementations for TonalBuzzer, PumpkinPi, and the JamHat. We also passed 1000 commits!
Watch your tone
As part of the work Claire did to add support for the Jam HAT, she created a new class for working with its buzzer, which works by setting the PWM frequency to emit a particular tone. I took what Claire provided and added some maths to it, then Dave created a whole Tones module to provide a musical API. You can play buzzy jingles, or you can build a theremin:
from gpiozero import TonalBuzzer from gpiozero.tools import sin_values buzzer = TonalBuzzer(20) buzzer.source = sin_values()
The Tones API is a really neat way of creating particular buzzer sounds and chaining them together to make tunes, using a variety of musical notations:
>>> from gpiozero.tones import Tone
We all make mistakes
One of the important things about writing a library to help beginners is knowing when to expect mistakes, and providing help when you can. For example, if a user mistypes an attribute or just gets it wrong – for example, if they type button.pressed = foo instead of button.when_pressed = foo – they wouldn’t usually get an error; it would just set a new attribute. In GPIO Zero, though, we prevent new attributes from being created, so you’d get an error if you tried doing this. We provide an FAQ about this, and explain how to get around it if you really need to.
Similarly, it’s common to see people type button.when_pressed = foo() and actually call the function, which isn’t correct, and will usually have the effect of unsetting the callback (as the function returns None). Because this is valid, the user won’t get an error to call their attention to the mistake.
In this release, we’ve added a warning that you’ll see if you set a callback to None when it was previously None. Hopefully that will be useful to people who make this mistake, helping them quickly notice and rectify it.
Update your Raspberry Pi now to get the latest and greatest GPIO Zero goodness in your (operating) system:
Note: it’s currently syncing with the Raspbian repo, so if it’s not available for you yet, it will be soon.
We have plenty more suggestions to be working on. This year we’ll be working on SPI and I2C interfaces, including I2C expander chips. If you’d like to make more suggestions, or contribute yourself, find us over on GitHub.
The first game I ever wrote was named Pooh. It had nothing to do with the bear. In September 1982, I was four years old, and the ZX Spectrum home computer had just been released. It was incredible enough that the Spectrum let you play games on the TV, but like most home computers of the time, it also came with a built-in language called BASIC, and a manual which explained how to program it. In my first game, Pooh (the title was a misspelling), the player controlled a baby, represented by a pound sign, and had to guide it to a potty, represented by the letter O. There were no obstacles, no enemies, and if you tried to walk off the screen, the program would stop with an error message. I didn’t have any idea how to create a graphical game more complex than Pooh. I didn’t even know how to display a sprite on the screen.
The Hobbit, released in 1982, was widely praised for its intuitive parser.
Instead, I focused on writing text adventures, where the game describes scenes to the player (“You are in a comfortable, tunnel-like hall. You can see a door,” from 1982’s The Hobbit) and the player enters commands such as “Go through door” or “Kill goblin with sword.” Although this type of game is comparatively easy to write, I implemented it in the worst way possible. The code was essentially a huge list of IF statements. Each room had its own set of code, which would print out a description of the room and then check to see what the player typed. This ‘hard-coding’ led to the code being much longer than necessary, and more difficult to maintain.
The correct way would have been to separate my code and data. Each room would have had several pieces of data associated with it, such as an ID number, the description of the room (“You are in a small cave”), an array of objects which can be found in the room, and an array of room numbers indicating where the player should end up if they try to move in a particular direction – for example, the first number could indicate which room to go to if the player enters ‘NORTH’. You’d then have the main game code which keeps track of the room the player is currently in, and looks up the data for that room. With that data, it can then take the appropriate action based on the command the player typed.
Getting it right
The code below shows how to implement the beginnings of a text adventure game in Python. Instead of numeric IDs and arrays, the code uses string IDs and dictionary data structures, where each piece of data is associated with an ID or ‘key’. This is a more convenient option which wasn’t available in Spectrum BASIC. We first create a list of directions in which the player can potentially move. We then create the class Location which specifies each location’s properties. We store a name, a description, and a dictionary data structure which stores the other locations that the current location is linked to. For example, if you go north from the woods, you’ll reach the lake. The class includes a method named addLink, which adds entries to the linked locations dictionary after checking that the specified direction and destination exist.
Following the class definition, we then create a dictionary named locations. This has two entries, with the keys being woods and lake, and the values being instances of the Location class. Next, we call the addLink method on each of the locations we’ve just created, so that the player will be able to walk between them. The final step of the setup phase is to create the variable currentLocation, specifying where the player will start the game.
We then reach the main game loop, which will repeat indefinitely. We first display the description of the current location, along with the available directions in which the player can move. Then we wait for the player to input a command. In this version of the code, the only valid commands are directions: for example, type ‘north’ at the starting location to go to the lake. When a direction is entered, we check to make sure it’s a valid direction from the current location, then update currentLocation to the new location. When the main loop restarts, the description of the new location is displayed.
I moved on from the ZX Spectrum eight years after my dad first unpacked it. Despite the poor design of my code, I’d learned the essentials of programming. Ten years later, I was a game developer.
If you’re keen to learn more about making a text adventure in Python, you could check out Phillip Johnson’s guide to the subject, Make Your Own Python Text Adventure. The author has also written a condensed version of the same guide.
Hey folks, Rob from The MagPi here! Before I head off on my Christmas holidays, I want to introduce you to The MagPi 77, where we teach you how to make with code.
Making made fun! See what we did there?
What do we mean by that? Well, using code to make things – whether that’s scripts, programs, or games on your Pi, or whether you’re controlling LEDs with code, or robots, or massive Rube Goldberg machines. In this feature, we show new Pi users how to get started making practical applications with Python, and hopefully you’ll be inspired to go on and do something special.
Can you make… with code?
Accessories make the Pi
Want to power up your Raspberry Pi with a few extras? We’ve put together a guide to the 20 best Raspberry Pi accessories, covering IoT, robots, media, power solutions, and even industrial add-ons. There’s a lot of stuff you can do with your Pi, and even more if you’ve got the right tool to help.
We have the best accessories for you
More, you say?
Still need more reasons to grab a copy? Well, we have a tutorial on how to make a smart door, we continue developing Pac-Man while checking out the Picade Console, and we have plenty of amazing project showcases like the SelfieBot!
Get The MagPi 77
You can get The MagPi 77 from WHSmith, Tesco, Sainsbury’s, and Asda. If you live in the US, head over to your local Barnes & Noble or Micro Center in the next few days for a print copy. You can also get the issue online: check it out on our store, or digitally via our Android or iOS apps. And don’t forget, there’s always the free PDF.
Software developers have their own preferred tools. Some use powerful editors, others Integrated Development Environments (IDEs) that are tailored for specific languages and platforms. In 2014 I created my first AWS Lambda function using the editor in the Lambda console. Now, you can choose from a rich set of tools to build and deploy serverless applications. For example, the editor in the Lambda console has been greatly enhanced last year when AWS Cloud9 was released. For .NET applications, you can use the AWS Toolkit for Visual Studio and AWS Tools for Visual Studio Team Services.
AWS Toolkits for PyCharm, IntelliJ, and Visual Studio Code
Today, we are announcing the general availability of the AWS Toolkit for PyCharm. We are also announcing the developer preview of the AWS Toolkits for IntelliJ and Visual Studio Code, which are under active development in GitHub. These open source toolkits will enable you to easily develop serverless applications, including a full create, step-through debug, and deploy experience in the IDE and language of your choice, be it Python, Java, Node.js, or .NET.
For example, using the AWS Toolkit for PyCharm you can:
Create a new, ready-to-deploy serverless application in your preferred runtime.
Locally test your code with step-through debugging in a Lambda-like execution environment.
Deploy your applications to the AWS region of your choice.
The AWS Toolkit for PyCharm is available via the IDEA Plugin Repository. To install it, in the Settings/Preferences dialog, click Plugins, search for “AWS Toolkit”, use the checkbox to enable it, and click the Install button. You will need to restart your IDE for the changes to take effect.
The AWS Toolkit for IntelliJ and Visual Studio Code are currently in developer preview and under active development. You are welcome to build and install these from the GitHub repositories:
After installing AWS SAM CLI and AWS Toolkit, I create a new project in PyCharm and choose SAM on the left to create a serverless application using the AWS Serverless Application Model. I call my project hello-world in the Location field. Expanding More Settings, I choose which SAM template to use as the starting point for my project. For this walkthrough, I select the “AWS SAM Hello World”.
In PyCharm you can use credentials and profiles from your AWS Command Line Interface (CLI) configuration. You can change AWS region quickly if you have multiple environments. The AWS Explorer shows Lambda functions and AWS CloudFormation stacks in the selected AWS region. Starting from a CloudFormation stack, you can see which Lambda functions are part of it.
The function handler is in the app.py file. After I open the file, I click on the Lambda icon on the left of the function declaration to have the option to run the function locally or start a local step-by-step debugging session.
A local container is used to emulate the Lambda execution environment. This function is implementing a basic web API, and I can check that the result is in the format expected by the API Gateway.
After that, I want to get more information on what my code is doing. I set a breakpoint and start a local debugging session. I use the same input event as before. Again, you can choose the credentials and region for the AWS services used by the function.
I step over the HTTP request in the code to inspect the response in the Variables tab. Here you have access to all local variables, including the event and the context provided in input to the function.
After that, I resume the program to reach the end of the debugging session.
Now I am confident enough to deploy the serverless application right-clicking on the project (or the SAM template file). I can create a new CloudFormation stack, or update an existing one. For now, I create a new stack called hello-world-prod. For example, you can have a stack for production, and one for testing. I select an S3 bucket in the region to store the package used for the deployment. If your template has parameters, here you can set up the values used by this deployment.
After a few minutes, the stack creation is complete and I can run the function in the cloud with a right-click in the AWS Explorer. Here there is also the option to jump to the source code of the function.
As expected, the result of the remote invocation is the same as the local execution. My serverless application is in production!
Using these toolkits, developers can test locally to find problems before deployment, change the code of their application or the resources they need in the SAM template, and update an existing stack, quickly iterating until they reach their goal. For example, they can add an S3 bucket to store images or documents, or a DynamoDB table to store your users, or change the permissions used by their functions.
I am really excited by how much faster and easier it is to build your ideas on AWS. Now you can use your preferred environment to accelerate even further. I look forward to seeing what you will do with these new tools!
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.