Today we have a guest post from Igalia’s Iago Toral, who has spent the past year working on the Mesa graphic driver stack for Raspberry Pi 4.
It’s been nearly a year since we first announced that we were developing a Vulkan driver for the latest generation of Raspberry Pi devices (Raspberry Pi 4, Raspberry Pi 400, and Compute Module 4).
In June we released the source code for our prototype driver, and last month we announced that the driver had been successfully merged to Mesa upstream.
Today we have some very exciting news to share: as of 24 November the V3DV Vulkan Mesa driver for Raspberry Pi 4 has demonstrated Vulkan 1.0 conformance.
Khronos describes the conformance process as a way to ensure that its standards are consistently implemented by multiple vendors, so as to create a reliable platform for application developers. For each standard, Khronos provides a large conformance test suite (CTS) that implementations must pass successfully to be declared conformant; in the case of Vulkan 1.0, the CTS contains over 100,000 tests.
Vulkan 1.0 conformance is a major milestone in bringing Vulkan to Raspberry Pi, but it isn’t the end of the journey. Our team continues to work on all fronts to expand the Vulkan feature set, improve performance, and fix bugs. So stay tuned for future Vulkan updates!
Today we have another guest post from Igalia’s Iago Toral, who has spent the past year working on the Mesa graphic driver stack for Raspberry Pi 4.
Four months ago we announced that work on the Vulkan effort for Raspberry Pi 4 (v3dv) was progressing well, and that we were moving the development to an open repository.
vkQuake3 on Raspberry Pi 4
This week, the Vulkan driver for Raspberry Pi 4 has been merged with Mesa upstream, becoming one of the official Vulkan Mesa drivers. This brings several advantages:
Easier to find: now anyone willing to test the driver just needs to go to the official Mesa repository
Bug tracking: issues/bugs can now be filed on the official Mesa repository bug tracker. If the problem affects other parts of the project, it will be easier for us to involve other Mesa developers.
Releasing: v3dv will be included in all Mesa releases. In due course, you will no longer need to go to an external repository to obtain the driver, as it will be included in the Mesa package for your distribution.
Maintenance: v3dv will be included in the Mesa Continuous Integration system, so every merge request will be tested to ensure that our driver still builds. More effort can go to new features and bug fixes rather than just keeping up with upstream changes.
Progress, and current status
We said back in June that we were passing over 70,000 tests from the Khronos Conformance Test Suite for Vulkan 1.0, and that we had an implementation for a significant subset of the Vulkan 1.0 API. Now we are passing over 100,000 tests, and have implemented the full Vulkan 1.0 API. Only a handful of CTS tests remain to be fixed.
Sascha Willems’ deferred multisampling demo
This doesn’t mean that our work is done, of course. Although the CTS is a really complete test suite, it is not the same as a real use case. As mentioned some of our updates, we have been testing the driver with Vulkan ports of the original Quake trilogy, but deeper and more detailed testing is needed. So the next step will be to test the driver with more use cases, and fixing any bugs or performance issues that we find during the process.
Join us for Digital Making at Home: this week, young people can explore the graphics side of video game design! Through Digital Making at Home, we invite kids all over the world to code along with us and our new videos every week.
So get ready to design video game graphics with us:
Today we have a guest post from Igalia’s Iago Toral, who has spent the past year working on the Mesa graphic driver stack for Raspberry Pi 4.
It is almost five months since we announced the Vulkan effort for Raspberry Pi 4. It was great to see how many people were excited about this, and today we would like to give you a status update on our progress over these last months.
When we announced the effort back in January we were at the point of rendering a coloured triangle, which required only minimal coverage of the Vulkan 1.0 API in the driver. Today, we are passing over 70,000 tests from the Khronos Conformance Test Suite for Vulkan 1.0 and we have an implementation for a significant subset of the Vulkan 1.0 API.
Progress so far, in pictures
While I could detail here all the features that we have implemented, I am sure that list would get long and boring very quickly for most of you. So, instead, we would like to show you our progress through pics taken from a bunch of the popular Vulkan demos by Sascha Willems running on Raspberry Pi 4:
Hopefully that is more entertaining than a feature checklist and will help you visualize better where we are now compared to January’s coloured triangle.
Before you get too excited though, while these demos are nice, they are still a far cry from actual games and applications. We still have a lot of work to do before the driver can handle these more complex workloads. Even some of Sascha’s demos don’t run yet, whether because of driver bugs or unimplemented Vulkan features. We still have a lot of work ahead of us.
I would also like to give you an overview of some of the things we will be working on in the coming months:
Our first priority is to support the basic Vulkan 1.0 feature set. This will involve, at least, supporting compute shaders, input attachments, texel buffers, storage images, pipeline caches, and multisampling. There are some other features that we need to support in Vulkan 1.0, such as robust buffer access etc, but those are probably the largest ones we are currently missing.
Once we are feature-complete we will probably move focus to CTS conformance, which will be all about bugfixing, and making sure we handle spec corner cases. And once we are close to conformance, the driver should hopefully be stable and robust enough that we should probably start testing actual Vulkan applications and games to drive further bugfixing work.
Finally, there will be a lot of performance tuning and optimization work that we will probably tackle in the last stages of development.
So as I said before, we still have a long way to go!
Moving development to an open repository
Before we end this post, I would also like to share another important piece of news: starting today, we are moving development of the driver to an open repository. You can find instructions on how to build and install the driver here. I know this is something that many of you have been asking for, and I am sorry that it took us a few months to get here. But I think that now that we have a more stable driver infrastructure in place, and we don’t feel like we are constantly making large changes every other day, development should be a lot friendlier to external contributors than it may have been a few months ago.
So that’s everything we wanted to share today – I hope you are still excited about Vulkan and looking forward to future updates. In the meantime, if you have questions or are interested in contributing to the driver, join us on irc.freenode.net, #videocore channel.
Following on from our recent announcement that Raspberry Pi 4 is OpenGL ES 3.1 conformant, we have some more news to share on the graphics front. We have started work on a much requested feature: an open-source Vulkan driver!
Standards body Khronos describes Vulkan as “a new generation graphics and compute API that provides high-efficiency, cross-platform access to modern GPUs”. The Vulkan API has been designed to better accommodate modern GPUs and address common performance bottlenecks in OpenGL, providing graphics developers with new means to squeeze the best performance out of the hardware.
The “first triangle” image is something of a VideoCore graphics tradition: while I arrived at Broadcom too late to witness the VideoCore III version, I still remember the first time James and Gary were able to get a flawless, single-tile, RGB triangle out of VideoCore IV in simulation. So, without further ado, here’s the VideoCore VI Vulkan version.
First triangle out of Vulkan
Before you get too excited, remember that this is just the start of the development process for Vulkan on Raspberry Pi. While there have been community efforts in the direction of Vulkan support (originally on VideoCore IV) as far back as 2018, Igalia has only been working on this new driver for a few weeks, and we still have a very long development roadmap ahead of us before we can put an actual driver in the hands of our users. So don’t hold your breath, and instead look forward to more news from us and Igalia as they make further development progress.
Raspberry Pi Trading engineer James Hughes recently pointed out a project to us that he’d found on the Raspberry Pi forum. Using a Raspberry Pi, forum member Rene Richarz has written a Tektronix 4010, 4013, 4014, 4015, and ARDS terminal emulator. The project sounded cool, but Helen and I didn’t 100% get it, so we asked James to write an introduction for us. You can find that below, followed by the project itself. James’s intro is amazing, because, despite this heat messing with my concentration, I understand the project now! That James – what a treasure. And here he is:
Those of a certain age will remember the vector graphics display of arcade games like Battlezone and Asteroids, and the subsequent colour displays of Star Wars and Tempest. Even earlier than these games came the less sophisticated Tektronic storage tube terminals used by the pioneers of computer graphics, combined with the PDP-11s and Vax’s that were the staple of computer graphics labs of the era.
Unlike the raster displays that everyone uses now, these terminals used a steerable electron beam (the ‘write gun’) to draw lines directly on the phosphor of the monitor, which were kept illuminated by a secondary ‘flood gun’. These devices had very high resolution, up to 1024×1024 pixels, but the big problem was that you could not erase just a bit of the display — you had to erase the whole image!
Rene Richarz’s project emulates these fascinating old displays, even down to the speed of drawing: because the display needed to be charged, the electron gun could only travel at a limited speed of 1500–4000 vector inches/second!
Once memory prices started dropping, the cost of raster displays also dropped significantly, meaning these early computer graphics vector displays were consigned to the annals of history. But their memory lives on, not only in the project we see here but in many of the algorithms and techniques developed in those early years that are still used today.
This video shows the blinking PiDP-11 (https://obsolescence.wixsite.com/obsolescence) running the historical 2.11 BSD Unix (https://github.com/rricharz/pidp11-2.11bsd) with the Tektronix 4014 emulator tek4010 (https://github.com/rricharz/Tek4010). Late 1970s Blinkenlight action and storage tube display action.
As Rene explains on the GitHub repo, his project “makes an effort to emulate the storage tube display of the Tektronix 4010, including the bright drawing spot. It can be used to log into a historical Unix system such as 2.11 BSD on the PiDP-11 or a real historical system. It can also be used to display historical plot data.”
Bringing Rich Experiences to Memory-Constrained TV Devices
By Jason Munning, Archana Kumar, Kris Range
Netflix has over 148M paid members streaming on more than half a billion devices spanning over 1,900 different types. In the TV space alone, there are hundreds of device types that run the Netflix app. We need to support the same rich Netflix experience on not only high-end devices like the PS4 but also memory and processor-constrained consumer electronic devices that run a similar chipset as was used in an iPhone 3Gs.
In this post, we will discuss the development of the Rich Collection row and the iterations we went through to be able to support this experience across the majority of the TV ecosystem.
Rich Collection Row
One of our most ambitious UI projects to date on the TV app is the animated Rich Collection Row. The goal of this experience from a UX design perspective was to bring together a tightly-related set of original titles that, though distinct entities on their own, also share a connected universe. We hypothesized this design would net a far greater visual impact than if the titles were distributed individually throughout the page. We wanted the experience to feel less like scrolling through a row and more like exploring a connected world of stories.
For the collections below, the row is composed of characters representing each title in a collected universe overlaid onto a shared, full-bleed background image which depicts the shared theme for the collection. When the user first scrolls down to the row, the characters are grouped into a lineup of four. The name of the collection animates in along with the logos for each title while a sound clip plays which evokes the mood of the shared world. The characters slide off screen to indicate the first title is selected. As the user scrolls horizontally, characters slide across the screen and the shared backdrop scrolls with a parallax effect. For some of the collections, the character images themselves animate and a full-screen tint is applied using a color that is representative of the show’s creative (see “Character Images” below).
Once the user pauses on a title for more than two seconds, the trailer for that title cross-fades with the background image and begins playing.
As part of developing this type of UI experience on any platform, we knew we would need to think about creating smooth, performant animations with a balance between quality and download size for the images and video previews, all without degrading the performance of the app. Some of the metrics we use to measure performance on the Netflix TV app include animation frames per second (FPS), key input responsiveness (the amount of time before a member’s key press renders a change in the UI), video playback speed, and app start-up time.
UI developers on the Netflix TV app also need to consider some challenges that developers on other platforms often are able to take for granted. One such area is our graphics memory management. While web browsers and mobile phones have gigabytes of memory available for graphics, our devices are constrained to mere MBs. Our UI runs on top of a custom rendering engine which uses what we call a “surface cache” to optimize our use of graphics memory.
Surface cache is a reserved pool in main memory (or separate graphics memory on a minority of systems) that the Netflix app uses for storing textures (decoded images and cached resources). This benefits performance as these resources do not need to be re-decoded on every frame, saving CPU time and giving us a higher frame-rate for animations.
Each device running the Netflix TV application has a limited surface cache pool available so the rendering engine tries to maximize the usage of the cache as much as possible. This is a positive for the end experience because it means more textures are ready for re-use as a customer navigates around the app.
The amount of space a texture requires in surface cache is calculated as:
width * height * 4 bytes/pixel (for rgba)
Most devices currently run a 1280 x 720 Netflix UI. A full-screen image at this resolution will use 1280 * 720 * 4 = 3.5MB of surface cache. The majority of legacy devices run at 28MB of surface cache. At this size, you could fit the equivalent of 8 full-screen images in the cache. Reserving this amount of memory allows us to use transition effects between screens, layering/parallax effects, and to pre-render images for titles that are just outside the viewport to allow scrolling in any direction without images popping in. Devices in the Netflix TVUI ecosystem have a range of surface cache capacity, anywhere from 20MB to 96MB and we are able to enable/disable rich features based on that capacity.
When the limit of this memory pool is approached or exceeded, the Netflix TV app tries to free up space with resources it believes it can purge (i.e. images no longer in the viewport). If the cache is over budget with surfaces that cannot be purged, devices can behave in unpredictable ways ranging from application crashes, displaying garbage on the screen, or drastically slowing down animations.
Surface Cache and the Rich Collection Row
From developing previous rich UI features, we knew that surface cache usage was something to consider with the image-heavy design for the Rich Collection row. We made sure to test memory usage early on during manual testing and did not see any overages so we checked that box and proceeded with development. When we were approaching code-complete and preparing to roll out this experience to all users we ran our new code against our memory-usage automation suite as a sanity check.
The chart below shows an end-to-end automated test that navigates the Netflix app, triggering playbacks, searches, etc to simulate a user session. In this case, the test was measuring surface cache after every step. The red line shows a test run with the Rich Collection row and the yellow line shows a run without. The dotted red line is placed at 28MB which is the amount of memory reserved for surface cache on the test device.
Uh oh! We found some massive peaks (marked in red) in surface cache that exceeded our maximum recommended surface cache usage of 28MB and indicated we had a problem. Exceeding the surface cache limit can have a variety of impacts (depending on the device implementation) to the user from missing images to out of memory crashes. Time to put the brakes on the rollout and debug!
Assessing the Problem
The first step in assessing the problem was to drill down into our automation results to make sure they were valid. We re-ran the automation tests and found the results were reproducible. We could see the peaks were happening on the home screen where the Rich Collection row was being displayed. It was odd that we hadn’t seen the surface cache over budget (SCOB) errors while doing manual testing.
To close the gap we took a look at the configuration settings we were using in our automation and adjusted them to match the settings we use in production for real devices. We then re-ran the automation and still saw the peaks but in the process we discovered that the issue seemed to only present itself on devices running a version of our SDK from 2015. The manual testing hadn’t caught it because we had only been manually testing surface cache on more recent versions of the SDK. Once we did manual testing on our older SDK version we were able to reproduce the issue in our development environment.
During brainstorming with our platform team, we came across an internal bug report from 2017 that described a similar issue to what we were seeing — surfaces that were marked as purgeable in the surface cache were not being fully purged in this older version of our SDK. From the ticket we could see that the inefficiency was fixed in the next release of our SDK but, because not all devices get Netflix SDK updates, the fix could not be back-ported to the 2015 version that had this issue. Considering that a significant share of our actively-used TV devices are running this 2015 version and won’t be updated to a newer SDK, we knew we needed to find a fix that would work for this specific version — a similar situation to the pre-2000 world before browsers auto-updated and developers had to code to specific browser versions.
Finding a Solution
The first step was to take a look at what textures were in the surface cache (especially those marked as un-purgeable) at the time of the overage and see where we might be able to make gains by reducing the size of images. For this we have a debug port that allows us to inspect which images are in the cache. This shows us information about the images in the surface cache including url. The links can then be hovered over to show a small thumbnail of the image.
From snapshots such as this one we could see the Rich Collection row alone filled about 15.3MB of surface cache which is >50% of the 28MB total graphics memory available on devices running our 2015 SDK.
The largest un-purgeable images we found were:
Character images (6 * 1MB)
Background images for the parallax background (2 * 2.9MB)
Unknown — a full screen blank white rectangle (3.5MB)
Some of our rich collections featured the use of animated character assets to give an even richer experience. We created these assets using a Netflix-proprietary animation format called a Scriptable Network Graphic (SNG) which was first supported in 2017 and is similar to an animated PNG. The SNG files have a relatively large download size at ~1.5MB each. In order to ensure these assets are available at the time the rich collection row enters the viewport, we preload the SNGs during app startup and save them to disk. If the user relaunches the app in the future and receives the same collection row, the SNG files can be read from the disk cache, avoiding the need to download them again. Devices running an older version of the SDK fallback to a static character image.
At the time of the overage we found thatsix character images were present in the cache — four on the screen and two preloaded off of the screen. Our first savings came from only preloading one image for a total of five characters in the cache. Right off the bat this saved us almost 7% in surface cache with no observable impact to the experience.
Next we created cropped versions of the static character images that did away with extra transparent pixels (that still count toward surface cache usage!). This required modifications to the image pipeline in order to trim the whitespace but still maintain the relative size of the characters — so the relative heights of the characters in the lineup would still be preserved. The cropped character assets used only half of the surface cache memory of the full-size images and again had no visible impact to the experience.
In order to achieve the illusion of a continuously scrolling parallax background, we were using two full screen background images essentially placed side by side which together accounted for ~38% of the experience’s surface cache usage. We worked with design to create a new full-screen background image that could be used for a fallback experience (without parallax) on devices that couldn’t support loading both of the background images for the parallax effect. Using only one background image saved us 19% in surface cache for the fallback experience.
After trial and error removing React components from our local build and inspecting the surface cache we found that the unknown widget that showed as a full screen blank white rectangle in our debug tool was added by the full-screen tint effect we were using. In order to apply the tint, the graphics layer essentially creates a full screen texture that is colored dynamically and overlaid over the visible viewport. Removing the tint overlay saved us 23% in surface cache.
Removing the tint overlay and using a single background image gave us a fallback experience that used 42% less surface cache than the full experience.
When all was said and done, the surface cache usage of the fallback experience (including fewer preloaded characters, cropped character images, a single background, and no tint overlay) clocked in at about 5MB which gave us a total savings of almost 67% over our initial implementation.
We were able to target this fallback experience to devices running the 2015 and older SDK, while still serving the full rich experience (23% lower surface cache usage than the original implementation) to devices running the new SDKs.
At this point our automation was passing so we began slowly rolling out this experience to all members. As part of any rollout, we have a dashboard of near real-time metrics that we monitor. To our chagrin we saw that another class of devices — those running the 2017 SDK — also were reporting higher SCOB errors than the control.
Thanks to our work on the fallback experience we were able to change the configuration for this class of devices on the fly to serve the fallback experience (without parallax background and tint). We found if we used the fallback experience we could still get away with using the animated characters. So yet another flavor of the experience was born.
Improvements and Takeaways
At Netflix we strive to move fast in innovation and learn from all projects whether they are successes or failures. From this project, we learned that there were gaps in our understanding of how our underlying graphics memory worked and in the tooling we used to monitor that memory. We kicked off an effort to understand this graphics memory space at a low level and compiled a set of best practices for developers beginning work on a project. We also documented a set of tips and tools for debugging and optimizing surface cache should a problem arise.
As part of that effort, we expanded our suite of build-over-build automated tests to increase coverage across our different SDK versions on real and reference devices to detect spikes/regressions in our surface cache usage.
We began logging SCOB errors with more detail in production so we can target the specific areas of the app that we need to optimize. We also are now surfacing surface cache errors as notifications in the dev environment so developers can catch them sooner.
And we improved our surface cache inspector tool to be more user friendly and to integrate with our Chrome DevTools debugger:
As UI engineers on the TVUI platform at Netflix, we have the challenge of delivering ambitious UI experiences to a highly fragmented ecosystem of devices with a wide range of performance characteristics. It’s important for us to reach as many devices as possible in order to give our members the best possible experience.
The solutions we developed while scaling the Rich Collection row have helped inform how we approach ambitious UI projects going forward. With our optimizations and fallback experiences we were able to almost double the number of devices that were able to get the Rich Collection row.
We are now more thoughtful about designing fallback experiences that degrade gracefully as part of the initial design phase instead of just as a reaction to problems we encounter in the development phase. This puts us in a position of being able to scale an experience very quickly with a set of knobs and levers that can be used to tune an experience for a specific class of devices.
Most importantly, we received feedback that our members enjoyed our Rich Collection row experience — both the full and fallback experiences — when we rolled them out globally at the end of 2018.
If this interests you and want to help build the future UIs for discovering and watching shows and movies, join our team!
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.