Raspberry Pi is excited to bring the Khronos OpenVX 1.3 API to our line of single-board computers. Here’s Kiriti Nagesh Gowda, AMD‘s MTS Software Development Engineer, to tell you more.
OpenVX for computer vision
OpenVX™ is an open, royalty-free API standard for cross-platform acceleration of computer vision applications developed by The Khronos Group. The Khronos Group is an open industry consortium of more than 150 leading hardware and software companies creating advanced, royalty-free acceleration standards for 3D graphics, augmented and virtual reality, vision, and machine learning. Khronos standards include Vulkan®, OpenCL™, SYCL™, OpenVX™, NNEF™, and many others.
Now with added Raspberry Pi
The Khronos Group and Raspberry Pi have come together to work on an open-source implementation of OpenVX™ 1.3, which passes the conformance on Raspberry Pi. The open-source implementation passes the Vision, Enhanced Vision, & Neural Net conformance profiles specified in OpenVX 1.3 on Raspberry Pi.
Application developers may always freely use Khronos standards when they are available on the target system. To enable companies to test their products for conformance, Khronos has established an Adopters Program for each standard. This helps to ensure that Khronos standards are consistently implemented by multiple vendors to create a reliable platform for developers. Conformant products also enjoy protection from the Khronos IP Framework, ensuring that Khronos members will not assert their IP essential to the specification against the implementation.
OpenVX enables a performance and power-optimized computer vision processing, especially important in embedded and real-time use cases such as face, body, and gesture tracking, smart video surveillance, advanced driver assistance systems (ADAS), object and scene reconstruction, augmented reality, visual inspection, robotics, and more. The developers can take advantage of using this robust API in their application and know that the application is portable across all the conformant hardware.
Below, we will go over how to build and install the open-source OpenVX 1.3 library on Raspberry Pi 4 Model B. We will run the conformance for the Vision, Enhanced Vision, & Neural Net conformance profiles and create a simple computer vision application to get started with OpenVX on Raspberry Pi.
Netflix continues to invest in content for a global audience with a diverse range of unique tastes and interests. Correspondingly, the member experience must also evolve to connect this global audience to the content that most appeals to each of them. Images that represent titles on Netflix (what we at Netflix call “artwork”) have proven to be one of the most effective ways to help our members discover the content they love to watch. We thus need to have a rich and diverse set of artwork that is tailored for different parts of the Netflix experience (what we call product canvases). We also need to source multiple images for each title representing different themes so we can present an image that is relevant to each member’s taste.
Manual curation and review of these high quality images from scratch for a growing catalog of titles can be particularly challenging for our Product Creative Strategy Producers (referred to as producers in the rest of the article). Below, we discuss how we’ve built upon our previous work of harvesting static images directly from video source files and our computer vision algorithms to produce a set of artwork candidates that covers the major product canvases for the entire content catalog. The artwork generated by this pipeline is used to augment the artwork typically sourced from design agencies. We call this suite of assisted artwork “The Essential Suite”.
Supplement, not replace
Producers from our Creative Production team are the ultimate decision makers when it comes to the selection of artwork that gets published for each title. Our usage of computer vision to generate artwork candidates from video sources thus is focussed on alleviating the workload for our Creative Production team. The team would rather spend its time on creative and strategic tasks rather than sifting through thousands of frames of a show looking for the most compelling ones. With the “Essential Suite”, we are providing an additional tool in the producers toolkit. Through testing we have learned that with proper checks and human curation in place, assisted artwork candidates can perform on par with agency designed artwork.
Netflix uses best-in-class design agencies to provide artwork that can be used to promote titles on and off the Netflix service. Netflix producers work closely with design agencies to request, review and approve artwork. All artwork is delivered through a web application made available to the design agencies.
The computer generated artwork can be considered as artwork provided by an “Internal agency”. The idea is to generate artwork candidates using video source files and “bubble it up” to the producers on the same artwork portal where they review all other artwork, ideally without knowing if it is an agency produced or internally curated artwork, thereby selecting what goes on product purely based on creative quality of the image.
Assisted Artwork Generation Workflow
The artwork generation process involves several steps, starting with the arrival of the video source files and culminating in generated artwork being made available to producers. We use an open source workflow engine Netflix Conductor to run the orchestration. The whole process can be divided into two parts
This article on AVA provides a good explanation on our technology to extract interesting images from video source files. The artwork generation workflow takes it a step further. For a given product canvas, it selects a handful of images from the hundreds of video stills most suitable for that particular product canvas. The workflow then crops and color-corrects the selected image, picks out the best spot to place the movie’s title based on negative space, selects and resizes the movie title and places it onto the image.
Here is an illustration of what it means if we had to do it manually
Image Selection / Analyze Image
Selection of the right still image is essential to generating good quality artwork. A lot of work has already been done in AVA to extract out a few hundreds of frames from hundreds of thousands of frames present in a typical video source. Broadly speaking, we use two methods to extract movie stills out of video source.
AVA — Ava is primarily a character based algorithm. It picks up frames with a clear facial shot taking into account actors, facial expression and shot detection.
Cinematics — Cinematics picks up aesthetically pleasing cinematic shots.
The combination of these two approaches produce a few hundred movie stills from a typical video source. For a season, this would be a few hundred shots for each episode. Our work here is to pick up the stills that best work for the desired canvas.
Both of the above algorithms use a few heatmaps which define what kind of images have proven to be working best in different canvases. The heatmaps are designed by internal artists who are experienced in designing promotional artwork/posters
We make use of meta-information such as the size of desired canvas, the “unsafe regions” and the “regions of interest” to identify what image would serve best. “Unsafe regions” are areas in the image where badges such as Netflix logo, new episodes, etc are placed. “Regions of interest” are areas that are always displayed in multi-purpose canvases. These details are stored as metadata for each canvas type and passed to the algorithm by the workflow. Some of our canvases are cropped dynamically for different user interfaces. For such images, the “Regions of interest” will be the area that is always displayed in each crop.
This data-driven approach allows for fast turnaround for additional canvases. While selecting images, the algorithms also returns back suggested coordinates within each image for cropping and title placement. Finally, it associates a “score” with the selected image. This score is the “confidence” that the algorithm has on the selection of candidate image on how well it could perform on service, based on previously collected stats.
The artwork generation workflow collates image selection results from each video source and picks up the top “n” images based on confidence score.
The selected image is then cropped and color-corrected based on coordinates passed by the algorithm. Some canvases also need the movie title to be placed on the image. The process makes use of the heatmap provided by our designers to perform cropping and title placement. As an example, the “Billboard” canvas shown on a movie’s landing page is right aligned, with the title and synopsis shown on the left.
The workers to crop and color correct images are made available as separate titus jobs. The workflow invokes the jobs, storing each output in the artwork asset management system and passes it on for review.
For each artwork candidate generated by the workflow, we want to get as much feedback as possible from the Creative Production team because they have the most context about the title. However, getting producers to provide feedback on hundreds of generated images is not scalable. For this reason, we have split the review process in two rounds.
Technical Quality Control (QC)
This round of review enables filtering out images that look obviously wrong to a human eye. Images with features such as human actors with an open mouth, inappropriate facial expressions or an incorrect body position, etc are filtered out in this round.
For the purpose of reviewing these images, we use a video/image annotation application that provides a simple interface to add tags for a given list of videos or images. For our purposes, for each image, we ask the very basic question “Should this image be used for artwork?”
The team reviewing these assets treat each image individually and only look for technical aspects of the image, regardless of the theme or genre of the title, or the quantity of images presented for a given title.
When an image is rejected, a few follow up questions are asked to ascertain why the image is not suitable to be used as artwork.
All this review data is fed back to the image selection, cropping and color corrections algorithms to train and improve them.
Unlike technical QC, which is title agnostic, editorial QC is done by producers who are deeply familiar with the themes, storylines and characters in the title, to select artwork that will represent the title best on the Netflix service.
The application used to review generated artwork is the same application that producers use to place and review artwork requests fulfilled by design agencies. A screenshot of how generated artwork is presented to producers is shown below
Similar to technical QC, the option here for each artwork is whether to approve or reject the artwork. The producers are encouraged to provide reasons why they are rejecting an artwork.
Approved artwork makes its way to the artwork’s asset management system, where it resides alongside other agency-fulfilled artwork. From here, producers have the ability to publish it to the Netflix service.
We have learned a lot from our work on generating artwork. Artwork that looks good might not be the best depiction of the title’s story, a very clear character image might be a content spoiler. All of these decisions are best made by humans and we intend to keep it that way.
However, assisted artwork generation has a place in supporting our creative team by providing them with another avenue to pick up their assets from, and with careful supervision will help in their challenge of sourcing artwork at scale.
Hack Day at Netflix is an opportunity to build and show off a feature, tool, or quirky app. The goal is simple: experiment with new ideas/technologies, engage with colleagues across different disciplines, and have fun!
We know even the silliest idea can spur something more.
The most important value of our Hack Days is that they support a culture of innovation. We believe in this work, even if it never ships, and enjoy sharing the creativity and thought put into these ideas.
Below, you can find videos made by the hackers of some of our favorite hacks from this event.
Nostalgiflixis a chrome extension that transforms your Netflix web browser into an interactive TV time machine covering three decades (80’s, 90’s, and 00’s.) By dragging the UI slider around, you can view titles originally released within the selected year ( based on their historic box office and episode air dates.) More importantly you can also adjust the video filters in real-time to creatively downgrade the viewing experience, further enhancing the nostalgic effect. We think this feature could encourage our users to watch more of our older content while having fun reliving those moments of cinematic history.
This is a real time visualization of all contacts around the world. Each square on the map represent one of our global contact centers, spanning from Salt Lake City to Brazil, India, and Japan. The heatmap in the background is a historical trend of calls over the last hour, showing which countries are currently most active in contacting customer service. Every line you see is a live customer contact — starting at the customer’s country and ending at the contact center it was routed to. Four different types of contacts are represented in this visualization, white for regular phone calls, light blue for chats, green for calls that are initiated through our mobile apps on android and iOS, and red for contacts which are escalated from one representative to another.
Audio Descriptive tracks provide descriptive narration in addition to dialog, helping visually impaired and blind members enjoy our shows. For the Hack Day project, we explored using recent research¹ to automatically generate descriptions, then used our own internal authoring tools to refine the output. We then used synthetic audio and automated mixing techniques to deliver a final audio description track.
Hack Days are a big deal at Netflix. They’re a chance to bring together employees from all our different disciplines to explore new ideas and experiment with emerging technologies.
For the most recent hack day, we channeled our creative energy towards our studio efforts. The goal remained the same: team up with new colleagues and have fun while learning, creating, and experimenting. We know even the silliest idea can spur something more.
The most important value of hack days is that they support a culture of innovation. We believe in this work, even if it never ships, and love to share the creativity and thought put into these ideas.
Below, you can find videos made by the hackers of some of our favorite hacks from this event.
Project Rumble Pak
You’re watching your favorite episode of Voltron when, after a suspenseful pause, there’s a huge explosion — and your phone starts to vibrate in your hands.
The Project Rumble Pak hack day project explores how haptics can enhance the content you’re watching. With every explosion, sword clank, and laser blast, you get force feedback to amp up the excitement.
For this project, we synchronized Netflix content with haptic effects using Immersion Corporation technology.
Introducing The Voice of Netflix. We trained a neural net to spot words in Netflix content and reassemble them into new sentences on demand. For our stage demonstration, we hooked this up to a speech recognition engine to respond to our verbal questions in the voice of Netflix’s favorite characters. Try it out yourself at blogofsomeguy.com/v!
TerraVision re-envisions the creative process and revolutionizes the way our filmmakers can search and discover filming locations. Filmmakers can drop a photo of a look they like into an interface and find the closest visual matches from our centralized library of locations photos. We are using a computer vision model trained to recognize places to build reverse image search functionality. The model converts each image into a small dimensional vector, and the matches are obtained by computing the nearest neighbors of the query.
Have you ever found yourself needing to give the Evil Eye™ to colleagues who are hogging your conference room after their meeting has ended?
Our hack is a simple web application that allows employees to select a Netflix meeting room anywhere in the world, and press a button to kick people out of their meeting room if they have overstayed their meeting. First, the app looks up calendar events associated with the room and finds the latest meeting in the room that should have already ended. It then automatically calls in to that meeting and plays walk-off music similar to the Oscar’s to not-so-subtly encourage your colleagues to Get Out! We built this hack using Java (Springboot framework), the Google OAuth and Calendar APIs (for finding rooms) and Twilio API (for calling into the meeting), and deployed it on AWS.
A Rock-Paper-Scissors game using computer vision and machine learning on the Raspberry Pi. Project GitHub page: https://github.com/DrGFreeman/rps-cv PROJECT ORIGIN: This project results from a challenge my son gave me when I was teaching him the basics of computer programming making a simple text based Rock-Paper-Scissors game in Python.
Virtual rock paper scissors
Here’s why you should always leave comments on our blog: this project from Julien de la Bruère-Terreault instantly had our attention when he shared it on our recent Android Things post.
Julien and his son were building a text-based version of rock paper scissors in Python when his son asked him: “Could you make a rock paper scissors game that uses the camera to detect hand gestures?” Obviously, Julien really had no choice but to accept the challenge.
“The game uses a Raspberry Pi computer and Raspberry Pi Camera Module installed on a 3D-printed support with LED strips to achieve consistent images,” Julien explains in the tutorial for the build. “The pictures taken by the camera are processed and fed to an image classifier that determines whether the gesture corresponds to ‘Rock’, ‘Paper’, or ‘Scissors’ gestures.”
How does it work?
Physically, the build uses a Pi 3 Model B and a Camera Module V2 alongside 3D-printed parts. The parts are all green, since a consistent colour allows easy subtraction of background from the captured images. You can download the files for the setup from Thingiverse.
To illustrate how the software works, Julien has created a rather delightful pipeline demonstrating where computer vision and machine learning come in.
The way the software works means the game doesn’t need to be limited to the standard three hand signs. If you wanted to, you could add other signs such as ‘lizard’ and ‘Spock’! Or ‘fire’ and ‘water balloon’. Or any other alterations made to the game in your pop culture favourites.
Check out Julien’s full tutorial to build your own AI-powered rock paper scissors game here on Julien’s GitHub. Massive kudos to Julien for spending a year learning the skills required to make it happen. And a massive thank you to Julien’s son for inspiring him! This is why it’s great to do coding and digital making with kids — they have the best project ideas!
Sharing is caring
If you’ve built your own project using Raspberry Pi, please share it with us in the comments below, or via social media. As you can tell from today’s blog post, we love to see them and share them with the whole community!
Take a selfie, wait for the image to appear, and behold a cartoon version of yourself. Or, at least, behold a cartoon version of whatever the camera thought it saw. Welcome to Draw This by maker Dan Macnish.
Dan has made code, instructions, and wiring diagrams available to help you bring this beguiling weirdery into your own life.
Neural networks, object recognition, and cartoons
One of the fun things about this re-imagined polaroid is that you never get to see the original image. You point, and shoot – and out pops a cartoon; the camera’s best interpretation of what it saw. The result is always a surprise. A food selfie of a healthy salad might turn into an enormous hot dog, or a photo with friends might be photobombed by a goat.
OK. Let’s take this one step at a time.
Pi + camera + button + LED
Draw This uses a Raspberry Pi 3 and a Camera Module, with a button and a useful status LED connected to the GPIO pins via a breadboard. You press the button, and the camera captures a still image while the LED comes on and stays lit for a couple of seconds while the Pi processes the image. So far, so standard Pi camera build.
Interpreting and re-interpreting the camera image
Dan uses Python to process the captured photograph, employing a pre-trained machine learning model from Google to recognise multiple objects in the image. Now he brings the strangeness. The Pi matches the things it sees in the photo with doodles from Google’s huge open-source Quick, Draw! dataset, and generates a new image that represents the objects in the original image as doodles. Then a thermal printer connected to the Pi’s GPIO pins prints the results.
Kangaroos from the Quick, Draw! dataset (I got distracted)
Potential for peculiar
Reading about this build leaves me yearning to see its oddest interpretation of a scene, so if you make this and you find it really does turn you or your friend into a goat, please do share that with us.
And as you can see from my kangaroo digression above, there is a ton of potential for bizarro makes that use the Quick, Draw! dataset, object recognition models, or both; it’s not just the marsupials that are inexplicably compelling (I dare you to go and look and see how long it takes you to get back to whatever you were in the middle of). If you’re planning to make this, or something inspired by this, check out Dan’s cartoonify GitHub repo. And tell us all about it in the comments.
If you are anything like me, Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning are completely fascinating and exciting topics. As AI, ML, and Deep Learning become more widely used, for me it means that the science fiction written by Dr. Issac Asimov, the robotics and medical advancements in Star Wars, and the technologies that enabled Captain Kirk and his Star Trek crew “to boldly go where no man has gone before” can become achievable realities.
Most people interested in the aforementioned topics are familiar with the AI and ML solutions enabled by Deep Learning, such as Convolutional Neural Networks for Image and Video Classification, Speech Recognition, Natural Language interfaces, and Recommendation Engines. However, it is not always an easy task setting up the infrastructure, environment, and tools to enable data scientists, machine learning practitioners, research scientists, and deep learning hobbyists/advocates to dive into these technologies. Most developers desire to go quickly from getting started with deep learning to training models and developing solutions using deep learning technologies.
For these reasons, I would like to share some resources that will help to quickly build deep learning solutions whether you are an experienced data scientist or a curious developer wanting to get started.
Deep Learning Resources
The Apache MXNet is Amazon’s deep learning framework of choice. With the power of Apache MXNet framework and NVIDIA GPU computing, you can launch your scalable deep learning projects and solutions easily on the AWS Cloud. As you get started on your MxNet deep learning quest, there are a variety of self-service tutorials and datasets available to you:
Launch an AWS Deep Learning AMI: This guide walks you through the steps to launch the AWS Deep Learning AMI with Ubuntu
MXNet – Create a computer vision application: This hands-on tutorial uses a pre-built notebook to walk you through using neural networks to build a computer vision application to identify handwritten digits
AWS Machine Learning Datasets: AWS hosts datasets for Machine Learning on the AWS Marketplace that you can access for free. These large datasets are available for anyone to analyze the data without requiring the data to be downloaded or stored.
Predict and Extract – Learn to use pre-trained models for predictions: This hands-on tutorial will walk you through how to use pre-trained model for predicting and feature extraction using the full Imagenet dataset.
AWS Deep Learning AMIs
AWS offers Amazon Machine Images (AMIs) for use on Amazon EC2 for quick deployment of an infrastructure needed to start your deep learning journey. The AWS Deep Learning AMIsare pre-configured with popular deep learning frameworks built using Amazon EC2 instances on Amazon Linux, and Ubuntu that can be launched for AI targeted solutions and models. The deep learning frameworks supported and pre-configured on the deep learning AMI are:
Microsoft Cognitive Toolkit (CNTK)
Additionally, the AWS Deep Learning AMIs install preconfigured libraries for Jupyter notebooks with Python 2.7/3.4, AWS SDK for Python, and other data science related python packages and dependencies. The AMIs also come with NVIDIA CUDA and NVIDIA CUDA Deep Neural Network (cuDNN) libraries preinstalled with all the supported deep learning frameworks and the Intel Math Kernel Library is installed for Apache MXNet framework. You can launch any of the Deep Learning AMIs by visiting the AWS Marketplace using the Try the Deep Learning AMIs link.
It is a great time to dive into Deep Learning. You can accelerate your work in deep learning by using the AWS Deep Learning AMIs running on the AWS cloud to get your deep learning environment running quickly or get started learning more about Deep Learning on AWS with MXNet using the AWS self-service resources. Of course, you can learn even more information about Deep Learning, Machine Learning, and Artificial Intelligence on AWS by reviewing the AWS Deep Learning page, the Amazon AI product page, and the AWS AI Blog.
A Candy Dispenser running Android Things, that exchange photos for candies. It uses computer vision to classify the image. https://github.com/alvarowolfx/ai-candy-dispenser https://www.hackster.io/alvarowolfx/android-things-a-i-candy-dispenser-a47e74
Released late last year, Android Things is Google’s Android-based operating system for low-cost Internet of Things (IoT) devices such as the Raspberry Pi.
Invented in ancient India, candy is a scrumptious treat often made of sugar and/or chocolate.
The Android Things Candy Dispenser
Via its 20×4 display, Alvaro’s candy dispenser asks for an image, for example of a cat or a dog. Produce the requested image in front of the onboard Camera Module and the dispenser releases a delicious reward for you.
Inside the dispenser
Alvaro’s schematic provides all the information you need to build your own Android Things candy dispenser (click for a larger version)
The dispenser uses a Raspberry Pi to control both the image detection and the candy release. Press the button, and the Raspberry Pi displays one of several image subjects on the screen. Via the camera, the Pi records the image you present and sends for processing via Google’s Cloud Vision API. Cloud Vision supplies image annotations and metadata, and if these match the image request, boom, free candy!
To discover more about Google Cloud Vision, check out this video from the Cloud Vision team.
Alvaro provides full instructions for the build, including all necessary code and peripherals, on both Instructables and GitHub.
Building with Android Things
Given that Android Things has not been available for very long, we have yet to see many complete builds using it with the Raspberry Pi. If you’d like to try out this OS, Alvaro’s project is a great entry point. You should also check out the Pimoroni Rainbow HAT Android Things Starter Kit that provides everything you need to begin making. And if you have a Pi and are raring to go, follow the official ‘Android Things on Raspberry Pi’ setup guide here.
If you’ve already built a project using the Android Things platform, we’d love to see it! Make sure to share your project link in the comments below.
Today we are adding image moderation to Rekognition. If your web site or application allows users to upload profile photos or other imagery, you will love this new Rekognition feature.
Rekognition can now identify images that contain suggestive or explicit content that may not be appropriate for your site. The moderation labels provide detailed sub-categories, allowing you to fine-tune the filters that you use to determine what kinds of images you deem acceptable or objectionable. You can use this feature to improve photo sharing sites, forums, dating apps, content platforms for children, e-commerce platforms and marketplaces, and more.
To access this feature, call the DetectModerationLabels function from your code. The response will include a set of moderation labels drawn from a built-in taxonomy:
Computers and chess have been a potent combination ever since the appearance of the first chess-playing computers in the 1970s. You might even be able to play a game of chess on the device you are using to read this blog post! For digital makers, though, adding a Raspberry Pi into the mix can be the first step to building something a little more exciting. Allow us to introduce you to Joey Meyer‘s chess-playing robot, the Raspberry Turk.
Image credit: Joey Meyer
Being both an experienced software engineer with an interest in machine learning, and a skilled chess player, it’s not surprising that Joey was interested in tinkering with chess programs. What is really stunning, though, is the scale and complexity of the build he came up with. Fascinated by a famous historical hoax, Joey used his skills in programming and robotics to build an open-source Raspberry Pi-powered recreation of the celebrated Mechanical Turk automaton.
The Raspberry Turk is a robot that can play chess-it’s entirely open source, based on Raspberry Pi, and inspired by the 18th century chess playing machine, the Mechanical Turk. Website: http://www.raspberryturk.com Source Code: https://github.com/joeymeyer/raspberryturk
A historical hoax
Joey explains that he first encountered the Mechanical Turk through a book by Tom Standage. A famous example of mechanical trickery, the original Turk was advertised as a chess-playing automaton, capable of defeating human opponents and solving complex puzzles.
Its inner workings a secret, the Turk toured Europe for the best part of a century, confounding everyone who encountered it. Unfortunately, it turned out not to be a fabulous example of early robotic engineering after all. Instead, it was just an elaborate illusion. The awesome chess moves were not being worked out by the clockwork brain of the automaton, but rather by a human chess master who was cunningly concealed inside the casing.
Building a modern Turk
A modern version of the Mechanical Turk was constructed in the 1980s. However, the build cost $120,000. At that price, it would have been impossible for most makers to create their own version. Impossible, that is, until now: Joey uses a Raspberry Pi 3 to drive the Raspberry Turk, while a Raspberry Pi Camera Module handles computer vision.
The Raspberry Turk in the middle of a game Image credit: Joey Meyer
Joey’s Raspberry Turk is built into a neat wooden table. All of the electronics are housed in a box on one side. The chessboard is painted directly onto the table’s surface. In order for the robot to play, a Camera Module located in a 3D-printed housing above the table takes an image of the chessboard. The image is then analysed to determine which pieces are in which positions at that point. By tracking changes in the positions of the pieces, the Raspberry Turk can determine which moves have been made, and which piece should move next. To train the system, Joey had to build a large dataset to validate a computer vision model. This involved painstakingly moving pieces by hand and collecting multiple images of each possible position.
Look, no hands!
A key feature of the Mechanical Turk was that the automaton appeared to move the chess pieces entirely by itself. Of course, its movements were actually being controlled by a person hidden inside the machine. The Raspberry Turk, by contrast, does move the chess pieces itself. To achieve this, Joey used a robotic arm attached to the table. The arm is made primarily out of Actobotics components. Joey explains:
The motion is controlled by the rotation of two servos which are attached to gears at the base of each link of the arm. At the end of the arm is another servo which moves a beam up and down. At the bottom of the beam is an electromagnet that can be dynamically activated to lift the chess pieces.
Joey individually fitted the chess pieces with tiny sections of metal dowel so that the magnet on the arm could pick them up.
Programming the Raspberry Turk
The Raspberry Turk is controlled by a daemon process that runs a perception/action sequence, and the status updates automatically as the pieces are moved. The code is written almost entirely in Python. It is all available on Joey’s GitHub repo for the project, together with his notebooks on the project.
Image credit: Joey Meyer
The AI backend that gives the robot its chess-playing ability is currently Stockfish, a strong open-source chess engine. Joey says he would like to build his own engine when he has time. For the moment, though, he’s confident that this AI will prove a worthy opponent.
The project website goes into much more detail than we are able to give here. We’d definitely recommend checking it out. If you have been experimenting with any robotics or computer vision projects like this, please do let us know in the comments!
Today, Yahoo Mail introduced a feature that allows you to automatically sync your mobile photos to Yahoo Mail so that they’re readily available when you’re composing an email from your computer. A key technology behind this feature is a new photo and video platform called “Tripod,” which was born out of the innovations and capabilities of Flickr.
For 13 years, Flickr has served as one of the world’s largest photo-sharing communities and as a platform for millions of people who have collectively uploaded more than 13 billion photos globally. Tripod provides a great opportunity to bring some of the most-loved and useful Flickr features to the Yahoo network of products, including Yahoo Mail, Yahoo Messenger, and Yahoo Answers Now.
Tripod and its Three Services
As the name suggests, Tripod offers three services:
The Pixel Service: for uploading, storing, resizing, editing, and serving photos and videos.
The Enrichment Service: for enriching media metadata using image recognition algorithms. For example, the algorithms might identify and tag scenes, actions, and objects.
The Aggregation Service: for in-application and cross-application metadata aggregation, filtering, and search.
The combination of these three services makes Tripod an end-to-end platform for smart image services. There is also an administrative console for configuring the integration of an application with Tripod, and an identity service for authentication and authorization.
Figure 1: Tripod Architecture
The Pixel Service
Flickr has achieved a highly-scalable photo upload and resizing pipeline. Particularly in the case of large-scale ingestion of thousands of photos and videos, Flickr’s mobile and API teams tuned techniques, like resumable upload and deduplication, to create a high-quality photo-sync experience. On serving, Flickr tackled the challenge of optimizing storage without impacting photo quality, and added dynamic resizing to support more diverse client photo layouts.
Over many years at Flickr, we’ve demonstrated sustained uploads of more than 500 photos per second. The full pipeline includes the PHP Upload API endpoint, backend Java services (Image Daemon, Storage Master), hot-hot uploads across the US West and East Coasts, and five worldwide photo caches, plus a massive CDN.
In Tripod’s Pixel Service, we leverage all of this core technology infrastructure as-is, except for the API endpoint, which is now written in Java and implements a new bucket-based data model.
The Enrichment Service
In 2013, Flickr made an exciting leap. Yahoo acquired two Computer Vision technology companies, IQ Engines and LookFlow, and rolled these incredible teams into Flickr. Using their image recognition algorithms, we enhanced Flickr Search and introduced Magic View to the Flickr Camera Roll.
In Tripod, the Enrichment Service applies the image recognition technology to each photograph, resulting in rich metadata that can be used to enhance filtering, indexing, and searching. The Enrichment Service can identify places, themes, landmarks, objects, colors, text, media similarity, NSFW content, and best thumbnail. It also performs OCR text recognition and applies an aesthetic score to indicate the overall quality of the photograph.
The Aggregation Service
The Aggregation Service lets an application, such as Yahoo Mail, find media based on any criteria. For example, it can return all the photos belonging to a particular person within a particular application, all public photos, or all photos belonging to a particular person taken in a specific location during a specific time period (e.g. San Francisco between March 1, 2015 and May 31, 2015.)
Vespa, Yahoo’s internal search engine, indexes all metadata for each media item. If the Enrichment Service has been run on the media, the metadata is indexed in Vespa and is available to the Aggregation API. The result set from a call to the Aggregation Service depends on authentication and the read permissions defined by an API key.
APIs and SDKs
Each service is expressed as a set of APIs. We upgraded our API technology stack, switching from PHP to Spring MVC on a Java Jetty servlet container, and made use of the latest Spring features such as Spring Data, Spring Boot, and Spring Security with OAuth 2.0. Tripod’s API is defined and documented using Swagger. Each service is developed and deployed semi-autonomously from a separate Git repository with a separate build lifecycle to an independent micro-service container.
Figure 2: Tripod API
Swagger Editor makes it easy to auto-generate SDKs in many languages, depending on the needs of Yahoo product developers. The mobile SDKs for iOS and Android are most commonly used, as is the JS SDK for Yahoo’s web developers. The SDKs make integration with Tripod by a web or mobile application easy. For example, in the case of the Yahoo Mail photo upload feature, the Yahoo Mail mobile app includes the embedded Tripod SDK to manage the photo upload process.
Buckets and API Keys
The Tripod data model differs in some important ways from the Flickr data model. Tripod applications, buckets, and API keys introduce the notion of multi-tenancy, with a strong access control boundary. An application is simply the name of the application that is using Tripod (e.g. Yahoo Mail). Buckets are logical containers for the application’s media, and media in an application is further affected by bucket settings such as compression rate, capacity, media time-to-live, and the selection of enrichments to compute.
Figure 3: Creating a new Bucket
Beyond Tripod’s generic attributes, a bucket may also have custom organizing attributes that are defined by an application’s developers. API keys control read/write permissions on buckets and are used to generate OAuth tokens for anonymous or user-authenticated access to a bucket.
Figure 4: Creating a new API Key
App developers at Yahoo use the Tripod Console to:
Create the buckets and API keys that they will use with their application
Define the bucket settings and the access control rules for each API key
Another departure from the Flickr API is that Tripod can handle media that is not user-generated content (UGC). This is critical for storing curated content, as is required by many Yahoo applications.
Architecture and Implementation
Going from a monolithic architecture to a microservices architecture has had its challenges. In particular, we’ve had to find the right internal communication process between the services. At the core of this is our Pulsar Event Bus, over which we send Avro messages backed by a strong schema registry. This lets each Tripod team move fast, without introducing incompatible changes that would break another Tripod service.
But, what about the Flickr API? Why not just use that?
Flickr APIs are being used by hundreds of thousands of third-party developers around the world. Flickr’s API was designed for interacting with Flickr Accounts, Photos, and Groups, generally on lower scale than the Flickr site itself; it was not designed for independent, highly configurable, multi-tenant core photo management at large scale.
How can I join the team?
We’re hiring and we’d love to talk to you about our open opportunities! Just email [email protected] to start the conversation.
Fishbowl existence is tough. There you are, bobbing up and down in the same dull old environment, day in, day out; your view unchanging, your breakfast boringly identical every morning; that clam thing in the bottom of the tank opening and closing monotonously – goldfish can live for up to 20 years. That’s a hell of a long time to watch a clam thing for.
Two fish are in a tank. One says “How do you drive this thing?”
Indeed, fishbowl existence is so tough that several countries have banned the boring round bowls altogether. (There’s a reason that your childhood goldfish didn’t live for 20 years. You put it in an environment that bored it to death.) So this build comes with a caveat – we are worried that this particular fish is being driven from understimulus to overstimulus and back again, and that she might be prevented from making it to the full 20 years as a result. Please be kind to your fish.
What’s going on here? Over in Pittsburgh, at Carnegie Mellon University, Alex Kent and friends have widened the goldfish’s horizons, by giving it wheels. Meet the free-range fish.
Build18 @CMU . . . . . . . . . . . . * Jukin Media Verified * Find this video and others like it by visiting https://www.jukinmedia.com/licensing/view/949380 For licensing / permission to use, please email licensing(at)jukinmedia(dot)com.
Alex K, negligent fishparent, says that the speed and direction of the build is determined by the position of the fish relative to the centre of the tank. The battery lasts for five hours, and by all accounts the fish is still alive. Things are a bit jerky in this prototype build. Alex explains:
The jerking is actually caused by the Computer Vision algorithm losing track of the fish because of the reflection off of the lid, condensation on the lid, water ripples, etc.
Alex and co: before you look at more expensive solutions, try fixing a polarising filter to the camera you’re using.
Abstract: Pattern lock is widely used as a mechanism for authentication and authorization on Android devices. In this paper, we demonstrate a novel video-based attack to reconstruct Android lock patterns from video footage filmed u sing a mobile phone camera. Unlike prior attacks on pattern lock, our approach does not require the video to capture any content displayed on the screen. Instead, we employ a computer vision algorithm to track the fingertip movements to infer the pattern. Using the geometry information extracted from the tracked fingertip motions, our approach is able to accurately identify a small number of (often one) candidate patterns to be tested by an adversary. We thoroughly evaluated our approach using 120 unique patterns collected from 215 independent users, by applying it to reconstruct patterns from video footage filmed using smartphone cameras. Experimental results show that our approach can break over 95% of the patterns in five attempts before the device is automatically locked by the Android system. We discovered that, in contrast to many people’s belief, complex patterns do not offer stronger protection under our attacking scenarios. This is demonstrated by the fact that we are able to break all but one complex patterns (with a 97.5% success rate) as opposed to 60% of the simple patterns in the first attempt. Since our threat model is common in day-to-day lives, our work calls for the community to revisit the risks of using Android pattern lock to protect sensitive information.
You might simply see an animal. Maybe you see a pet, a dog, or a Golden Retriever. The association between the image and these labels is not hard-wired in to your brain. Instead, you learned the labels after seeing hundreds or thousands of examples. Operating on a number of different levels, you learned to distinguish an animal from a plant, a dog from a cat, and a Golden Retriever from other dog breeds.
Deep Learning for Image Detection Giving computers the same level of comprehension has proven to be a very difficult task. Over the course of decades, computer scientists have taken many different approaches to the problem. Today, a broad consensus has emerged that the best way to tackle this problem is via deep learning. Deep learning uses a combination of feature abstraction and neural networks to produce results that can be (as Arthur C. Clarke once said) indistinguishable from magic. However, it comes at a considerable cost. First, you need to put a lot of work into the training phase. In essence, you present the learning network with a broad spectrum of labeled examples (“this is a dog”, “this is a pet”, and so forth) so that it can correlate features in the image with the labels. This phase is computationally expensive due to the size and the multi-layered nature of the neural networks. After the training phase is complete, evaluating new images against the trained network is far easier. The results are traditionally expressed in confidence levels (0 to 100%) rather than as cold, hard facts. This allows you to decide just how much precision is appropriate for your applications.
Introducing Amazon Rekognition Today I would like to tell you about Amazon Rekognition. Powered by deep learning and built by our Computer Vision team over the course of many years, this fully-managed service already analyzes billions of images daily. It has been trained on thousands of objects and scenes, and is now available for you to use in your own applications. You can use the Rekognition Demos to put the service through its paces before dive in and start writing code that uses the Rekognition API.
Rekognition was designed from the get-go to run at scale. It comprehends scenes, objects, and faces. Given an image, it will return a list of labels. Given an image with one or more faces, it will return bounding boxes for each face, along with attributes. Let’s see what it has to say about the picture of my dog (her name is Luna, by the way):
As you can see, Rekognition labeled Luna as an animal, a dog, a pet, and as a golden retriever with a high degree of confidence. It is important to note that these labels are independent, in the sense that the deep learning model does not explicitly understand the relationship between, for example, dogs and animals. It just so happens that both of these labels were simultaneously present on the dog-centric training material presented to Rekognition.
Let’s see how it does with a picture of my wife and I:
Amazon Rekognition found our faces, set up bounding boxes, and let me know that my wife was happy (the picture was taken on her birthday, so I certainly hope she was).
You can also use Rekognition to compare faces and to see if a given image contains any one of a number of faces that you have asked it to recognize.
All of this power is accessible from a set of API functions (the console is great for quick demos). For example, you can call DetectLabels to programmatically reproduce my first example, or DetectFaces to reproduce my second one. You can make multiple calls to IndexFaces to prepare Rekognition to recognize some faces. Each time you do this, Rekognition extracts some features (known as face vectors) from the image, stores the vectors, and discards the image. You can create one or more Rekognition collections and store related groups of face vectors in each one.
Applications for Rekognition So, what can you use this for? I’ve got plenty of ideas to get you started!
If you have a large collection of photos, you can tag and index them using Amazon Rekognition. Because Rekognition is a service, you can process millions of photos per day without having to worry about setting up, running, or scaling any infrastructure. You can implement visual search, tag-based browsing, and all sorts of interactive discovery models.
You can use Rekognition in several different authentication and security contexts. You can compare a face on a webcam to a badge photo before allowing an employee to enter a secure zone. You can perform visual surveillance, inspecting photos for objects or people of interest or concern.
You can build “smart” marketing billboards that collect demographic data about viewers.
Now Available Rekognition is now available in the US East (Northern Virginia), US West (Oregon), and EU (Ireland) Regions and you can start using it today. As part of the AWS Free Tier tier, you can analyze up to 5,000 images per month and store up to 1,000 face vectors each month for an entire year. After that (and at higher volume), you will pay tiered pricing based on the number of images that you analyze and the number of face vectors that you store.
This is a guest post from my colleagues Naveen Swamy and Joseph Spisak.
Machine learning is a field of computer science that enables computers to learn without being explicitly programmed. It focuses on algorithms that can learn from and make predictions on data.
Most recently, one branch of machine learning, called deep learning, has been deployed successfully in production with higher accuracy than traditional techniques, enabling capabilities such as speech recognition, image recognition, and video analytics. This higher accuracy comes, however, at the cost of significantly higher compute requirements for training these deep models.
One of the major reasons for this rebirth and rapid progress is the availability and democratization of cloud-scale computing. Training state-of-the-art deep neural networks can be time-consuming, with larger networks like ResidualNet taking several days to weeks to train, even on the latest GPU hardware. Because of this, a scale-out approach is required.
Accelerating training time has multiple benefits, including:
Enabling faster iterative research, allowing scientists to push the state of the art faster in domains such as computer vision or speech recognition.
Reducing the time-to-market for intelligent applications, allowing AI applications that consume trained, deep learning models to access newer models faster.
Absorbing new data faster, helping to keep deep learning models current.
AWS CloudFormation, which creates and configures Amazon Web Services resources with a template, simplifies the process of setting up a distributed deep learning cluster. The CloudFormation Deep Learning template uses the Amazon Deep Learning AMI (supporting MXNet, TensorFlow, Caffe, Theano, Torch, and CNTK frameworks) to launch a cluster of Amazon EC2 instances and other AWS resources needed to perform distributed deep learning. CloudFormation creates all resources in the customer account.
EC2 Cluster Architecture
Resources created by the Deep Learning template
The Deep Learning template creates a stack that contains the following resources:
A VPC in the customer account.
The requested number of worker instances in an Auto Scaling group within the VPC. These worker instances are launched in a private subnet.
A master instance in a separate Auto Scaling group that acts as a proxy to enable connectivity to the cluster via SSH. CloudFormation places this instance within the VPC and connects it to both the public and private subnets. This instance has both public IP addresses and DNS.
A security group that allows external SSH access to the master instance.
Two security groups that open ports on the private subnet for communication between the master and workers.
An IAM role that allows users to access and query Auto Scaling groups and the private IP addresses of the EC2 instances.
A NAT gateway used by the instances within the VPC to talk to the outside world.
The startup script enables SSH forwarding on all hosts. Enabling SSH is essential because frameworks such as MXNet makes use of SSH for communication between master and worker instances during distributed training. The startup script queries the private IP addresses of all the hosts in the stack, appends the IP address and worker alias to /etc/hosts, and writes the list of worker aliases to /opt/deeplearning/workers.
The startup script sets up the following environment variables:
$DEEPLEARNING_WORKERS_PATH: The file path that contains the list of workers
$DEEPLEARNING_WORKERS_COUNT: The total number of workers
$DEEPLEARNING_WORKER_GPU_COUNT: The number of GPUs on the instance
For SSHLocation, choose a valid CIDR IP address range to allow SSH access to the master instance and stack.
For Worker Count, type a value. The stack provisions the worker count + 1, with the additional instance acting as the master. The master also participates in the training/evaluation. Choose Next.
(Optional) Under Tags, type values for Key and Value. This allows you to assign metadata to your resources. (Optional) Under Permissions, you can choose the IAM role that CloudFormation uses to create the stack. Choose Next.
Under Capabilities, select the checkbox to agree to allow CloudFormation to create an IAM role. An IAM role is required for correctly setting up a stack.
To create the CloudFormation stack, choose Create
To see the status of your stack, choose Events. If stack creation fails, for example, because of an access issue or an unsupported number of workers, troubleshoot the issue. For information about troubleshooting the creation of stacks, see Troubleshooting AWS CloudFormation. The event log records the reason for failure.
1. How do I change the IP addresses that are allowed to SSH to the master instance?
The CloudFormation stack output contains the security group that controls the inbound IP addresses for SSH access to the master instance. Use this security group to change your inbound IP addresses.
2. When an instance is replaced, are the IP addresses of the instances updated?
No. You must update IP addresses manually.
3. Does the master instance participate in training/validation?
Yes. Because most deep learning tasks involve GPUs, the master instance acts both as a proxy and as a distributed training/validation instance.
4. Why are the instances in an Auto Scaling group?
Auto Scaling group maintains the number of desired instances by launching a new instance if an existing instance fails. There are two Auto Scaling groups: one for the master and one for the workers in the private subnet. Because only the master instance has a public endpoint to access the hosts in the stack, if the master instance becomes unavailable, you can terminate it and the associated Auto Scaling group automatically launches a new master instance with a new public endpoint.
5. When a new worker instance is added or an existing instance replaced, does CloudFormation update the IP addresses on the master instance?
No, this template does not have the capability to automatically update the IP address of the replacement instance.
Roy Ben-Alta is Sr. Business Development Manager at AWS – Big Data & Machine Learning
We can’t believe that there are just a couple of weeks left before re:Invent 2016. If you are attending this year, you will want to check out our Big Data sessions! Unlike in previous years, these sessions are covered in multiple tracks, such as Big Data & Analytics, Architecture, Databases, and IoT. We will also have—for the first time—two mini-conferences: Big Data and Machine Learning. These resource mini-conferences include full-day technical deep dives on a broad variety of topics, including big data, IoT, machine learning, and more.
This year, we have over 40 sessions!
We have great sessions from Netflix, Chick-fil-A, Under Armour, FINRA, King.com, Beeswax, GE, Toyota Racing Development, Quantcast, Groupon, Amazon.com, Scholastic,Thomson Reuters, DataXu, Sony, EA, and many more. All sessions are recorded and made available on YouTube. Also, all slide decks from the sessions are made available on SlideShare.net after the conference.
Today, I highlight the sessions to be presented as part of the Big Data & Machine Learning mini-conferences, Big Data analytics, and relevant sessions from other tracks. The following sessions are in this year’s session catalog. Choose any link to learn more or to add a session to your schedule.
We are looking forward to meeting you at re:invent.
BDM205 – Big Data Mini-Con State of the Union – Tuesday Join us for this general session where AWS big data experts present an in-depth look at the current state of big data. Learn about the latest big data trends and industry use cases. Hear how other organizations are using the AWS big data platform to innovate and remain competitive. Take a look at some of the most recent AWS big data announcements, as we kick off the Big Data Mini-Con.
MAC206 – Amazon Machine Learning State of the Union Mini-Con – Wednesday With the growing number of business cases for artificial intelligence (AI), machine learning and deep learning continue to drive the development of state-of-the-art technology. We see this manifested in computer vision, predictive modeling, natural language understanding, and recommendation engines. During this full day of sessions and workshops, learn how we use some of these technologies within Amazon, and how you can develop your applications to leverage the benefits of these AI services.
Deep dive customer use case sessions
ARC306 – Event Handling at Scale: Designing an Auditable Ingestion and Persistence Architecture for 10K+ events/second How does McGraw-Hill Education use the AWS platform to scale and reliably receive 10,000 learning events per second? How do we provide near-real-time reporting and event-driven analytics for hundreds of thousands of concurrent learners in a reliable, secure, and auditable manner that is cost effective? MHE designed and implemented a robust solution that integrates AWS API Gateway, AWS Lambda, Amazon Kinesis, Amazon S3, Amazon Elasticsearch Service, Amazon DynamoDB, HDFS, Amazon EMR, Amazon EC2, and other technologies to deliver this cloud-native platform across the US and soon the world. This session describes the challenges we faced, architecture considerations, how we gained confidence for a successful production roll-out, and the behind-the-scenes lessons we learned.
ARC308 – Metering Big Data at AWS: From 0 to 100 Million Records in 1 Second Learn how AWS processes millions of records per second to support accurate metering across AWS and our customers. This session shows how we migrated from traditional frameworks to AWS managed services to support a broad processing pipeline. You gain insights on how we used AWS services to build a reliable, scalable, and fast processing system using Amazon Kinesis, Amazon S3, and Amazon EMR. Along the way, we dive deep into use cases that deal with scaling and accuracy constraints. Attend this session to see AWS’s end-to-end solution that supports metering at AWS.
BDA203 – Billions of Rows Transformed in Record Time Using Matillion ETL for Amazon Redshift Billions of Rows Transformed in Record Time Using Matillion ETL for Amazon Redshift GE Power & Water develops advanced technologies to help solve some of the world’s most complex challenges related to water availability and quality. They had amassed billions of rows of data on on-premises databases but decided to migrate some of their core big data projects to the AWS Cloud. When they decided to transform and store it all in Amazon Redshift, they knew they needed an ETL/ELT tool that could handle this enormous amount of data and safely deliver it to its destination.
In this session, Ryan Oates, Enterprise Architect at GE Water, shares his use case, requirements, outcomes and lessons learned. He also shares the details of his solution stack, including Amazon Redshift and Matillion ETL for Amazon Redshift in AWS Marketplace. You learn best practices on Amazon Redshift ETL supporting enterprise analytics and big data requirements, simply and at scale. You learn how to simplify data loading, transformation and orchestration on to Amazon Redshift and how to build out a real data pipeline.
BDA204 – Leverage the Power of the Crowd To Work with Amazon Mechanical Turk With Amazon Mechanical Turk (MTurk), you can leverage the power of the crowd for a host of tasks ranging from image moderation and video transcription to data collection and user testing. You simply build a process that submits tasks to the Mechanical Turk marketplace and get results quickly, accurately, and at scale. In this session, Russ, from Rainforest QA, shares best practices and lessons learned from his experience using MTurk. The session covers the key concepts of MTurk, getting started as a Requester, and using MTurk via the API. You learn how to set and manage Worker incentives, achieve great Worker quality, and how to integrate and scale your crowdsourced application. By the end of this session, you have a comprehensive understanding of MTurk and know how to get started harnessing the power of the crowd.
BDA205 – Delighting Customers Through Device Data with Salesforce IoT Cloud and AWS IoT The Internet of Things (IoT) produces vast quantities of data that promise a deep, always connected view into customer experiences through their devices. In this connected age, the question is no longer how do you gather customer data, but what do you do with all that data. How do you ingest at massive scale and develop meaningful experiences for your customers? In this session, you’ll learn how Salesforce IoT Cloud works in concert with the AWS IoT engine to ingest and transform all of the data generated by every one of your customers, partners, devices, and sensors into meaningful action. You’ll also see how customers are using Salesforce and AWS together to process massive quantities of data, build business rules with simple, intuitive tools, and engage proactively with customers in real time. Session sponsored by Salesforce.
BDM203 – FINRA: Building a Secure Data Science Platform on AWS Data science is a key discipline in a data-driven organization. Through analytics, data scientists can uncover previously unknown relationships in data to help an organization make better decisions. However, data science is often performed from local machines with limited resources and multiple datasets on a variety of databases. Moving to the cloud can help organizations provide scalable compute and storage resources to data scientists, while freeing them from the burden of setting up and managing infrastructure. In this session, FINRA, the Financial Industry Regulatory Authority, shares best practices and lessons learned when building a self-service, curated data science platform on AWS. A project that allowed us to remove the technology middleman and empower users to choose the best compute environment for their workloads. Understand the architecture and underlying data infrastructure services to provide a secure, self-service portal to data scientists, learn how we built consensus for tooling from of our data science community, hear about the benefits of increased collaboration among the scientists due to the standardized tools, and learn how you can retain the freedom to experiment with the latest technologies while retaining information security boundaries within a virtual private cloud (VPC).
BDM204 – Visualizing Big Data Insights with Amazon QuickSight Amazon QuickSight is a fast BI service that makes it easy for you to build visualizations, perform ad-hoc analysis, and quickly get business insights from your data. QuickSight is built to harness the power and scalability of the cloud, so you can easily run analysis on large datasets, and support hundreds of thousands of users. In this session, we’ll demonstrate how you can easily get started with Amazon QuickSight, uploading files, connecting to Amazon S3 and Amazon Redshift and creating analyses from visualizations that are optimized based on the underlying data. After we’ve built our analysis and dashboard, we’ll show you easy it is to share it with colleagues and stakeholders in just a few seconds.
BDM303 – JustGiving: Serverless Data Pipelines, Event-Driven ETL, and Stream Processing Organizations need to gain insight and knowledge from a growing number of Internet of Things (IoT), application programming interfaces (API), clickstreams, unstructured and log data sources. However, organizations are also often limited by legacy data warehouses and ETL processes that were designed for transactional data. Building scalable big data pipelines with automated extract-transform-load (ETL) and machine learning processes can address these limitations. JustGiving is the world’s social platform for giving. In this session, we describe how we created several scalable and loosely coupled event-driven ETL and ML pipelines as part of our in-house data science platform called RAVEN. You learn how to leverage AWS Lambda, Amazon S3, Amazon EMR, Amazon Kinesis, and other services to build serverless, event-driven, data and stream processing pipelines in your organization. We review common design patterns, lessons learned, and best practices, with a focus on serverless big data architectures with AWS Lambda.
BDM306 – Netflix: Using Amazon S3 as the fabric of our big data ecosystem Amazon S3 is the central data hub for Netflix’s big data ecosystem. We currently have over 1.5 billion objects and 60+ PB of data stored in S3. As we ingest, transform, transport, and visualize data, we find this data naturally weaving in and out of S3. Amazon S3 provides us the flexibility to use an interoperable set of big data processing tools like Spark, Presto, Hive, and Pig. It serves as the hub for transporting data to additional data stores / engines like Teradata, Amazon Redshift, and Druid, as well as exporting data to reporting tools like Microstrategy and Tableau. Over time, we have built an ecosystem of services and tools to manage our data on S3. We have a federated metadata catalog service that keeps track of all our data. We have a set of data lifecycle management tools that expire data based on business rules and compliance. We also have a portal that allows users to see the cost and size of their data footprint. In this talk, we’ll dive into these major uses of S3, as well as many smaller cases, where S3 smoothly addresses an important data infrastructure need. We also provide solutions and methodologies on how you can build your S3 big data hub.
BDM402 – Best Practices for Data Warehousing with Amazon Redshift In this session, we take an in-depth look at data warehousing with Amazon Redshift for big data analytics. We cover best practices to take advantage of Amazon Redshift’s columnar technology and parallel processing capabilities to deliver high throughput and query performance and you learn from king.com how to design optimal schemas, load data efficiently, and use workload management.
DAT202 – Migrating Your Data Warehouse to Amazon Redshift Amazon Redshift is a fast, simple, cost-effective data warehousing solution, and in this session, we look at the tools and techniques you can use to migrate your existing data warehouse to Amazon Redshift. We then present a case study on Scholastic’s migration to Amazon Redshift. Scholastic, a large 100-year-old publishing company, was running their business with older, on-premise, data warehousing and analytics solutions, which could not keep up with business needs and were expensive. Scholastic also needed to include new capabilities like streaming data and real-time analytics. Scholastic migrated to Amazon Redshift, and achieved agility and faster time to insight while dramatically reducing costs. In this session, Scholastic discusses how they achieved this, including options considered, technical architecture implemented, results, and lessons learned.
DAT204 – How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes with MongoDB & AWS Mass spectrometry is the gold standard for determining chemical compositions, with spectrometers often measuring the mass of a compound down to a single electron. This level of granularity produces an enormous amount of hierarchical data that doesn’t fit well into rows and columns. In this talk, learn how Thermo Fisher is using MongoDB Atlas on AWS to allow their users to get near real-time insights from mass spectrometry experiments—a process that used to take days. We also share how the underlying database service used by Thermo Fisher was built on AWS.
DAT205 – Fanatics Migrates Data to Hadoop on the AWS Cloud Using Attunity CloudBeam in AWS Marketplace Keeping a data warehouse current and relevant can be challenging because of the time and effort required to insert new data. The world’s most licensed sports merchandiser, Fanatics, used Attunity CloudBeam in AWS Marketplace to transform their data from Microsoft SQL, Oracle, and other sources to Amazon S3, where they consume the data in Hadoop and Amazon Redshift. Fanatics can now analyze the huge volumes of data from their transactional, e-commerce, and back office systems, and make this data available immediately. In this session, Fanatics shares their use case, requirements, outcomes and lessons learned. You’ll learn best practices on implementing a data lake, using Apache Kafka and how to consistently replicate data to Amazon Redshift and Amazon S3.
DAT308 – Fireside chat with Groupon, Intuit, and LifeLock on solving Big Data database challenges with Redis Redis Labs’ CMO is hosting a fireside chat with leaders from multiple industries including Groupon (e-commerce), Intuit (Finance), and LifeLock (Identity Protection). This conversation-style session covers the Big Data related challenges faced by these leading companies as they scale their applications, ensure high availability, serve the best user experience at lowest latencies, and optimize between cloud and on-premises operations. The introductory level session can appeal to both developer and DevOps functions. Attendees hear about diverse use cases such as recommendations engine, hybrid transactions and analytics operations, and time-series data analysis. The audience learns how the Redis in-memory database platform addresses the above use cases with its multi-model capability and in a cost effective manner to meet the needs of the next generation applications. Session sponsored by Redis Labs.
DAT309 – How Fulfillment by Amazon (FBA) and Scopely Improved Results and Reduced Costs with a Serverless Architecture In this session, we share an overview of leveraging serverless architectures to support high-performance data intensive applications. Fulfillment by Amazon (FBA) built the Seller Inventory Authority Platform (IAP) using Amazon DynamoDB Streams, AWS Lambda functions, Amazon Elasticsearch Service, and Amazon Redshift to improve results and reduce costs. Scopely shares how they used a flexible logging system built on Amazon Kinesis, Lambda, and Amazon ES to provide high-fidelity reporting on hotkeys in Memcached and DynamoDB, and drastically reduce the incidence of hotkeys. Both of these customers are using managed services and serverless architecture to build scalable systems that can meet the projected business growth without a corresponding increase in operational costs
DAT310 – Building Real-Time Campaign Analytics Using AWS Services Quantcast provides its advertising clients the ability to run targeted ad campaigns reaching millions of online users. The real-time bidding for campaigns runs on thousands of machines across the world. When Quantcast wanted to collect and analyze campaign metrics in real time, they turned to AWS to rapidly build a scalable, resilient, and extensible framework. Quantcast used Amazon Kinesis streams to stage data, Amazon EC2 instances to shuffle and aggregate the data, and Amazon DynamoDB and Amazon ElastiCache for building scalable time-series databases. With Elastic Load Balancing and Auto Scaling groups, they can set up distributed microservices with minimal operation overhead. This session discusses their use case, how they architected the application with AWS technologies integrated with their existing home-grown stack, and the lessons they learned.
DAT311 – How Toyota Racing Development Makes Racing Decisions in Real Time with AWS In this session, you learn how Toyota Racing Development (TRD) developed a robust and highly performant real-time data analysis tool for professional racing. In this talk, learn how we structured a reliable, maintainable, decoupled architecture built around Amazon DynamoDB as both a streaming mechanism and a long-term persistent data store. In racing, milliseconds matter and even moments of downtime can cost a race. You’ll see how we used DynamoDB together with Amazon Kinesis Streams and Amazon Kinesis Firehose to build a real-time streaming data analysis tool for competitive racing.
DAT312 – How DataXu scaled its Attribution System to handle billions of events per day with Amazon DynamoDB “Attribution” is the marketing term of art for allocating full or partial credit to advertisements that eventually lead to purchase, sign up, download, or other desired consumer interaction. DataXu shares how we use DynamoDB at the core of our attribution system to store terabytes of advertising history data. The system is cost effective and dynamically scales from 0 to 300K requests per second on demand with predictable performance and low operational overhead.
DAT313 – 6 Million New Registrations in 30 Days: How the Chick-fil-A One App Scaled with AWS Chris leads the team providing back-end services for the massively popular Chick-fil-A One mobile app that launched in June 2016. Chick-fil-A follows AWS best practices for web services and leverages numerous AWS services, including Elastic Beanstalk, DynamoDB, Lambda, and Amazon S3. This was the largest technology-dependent promotion in Chick-fil-A history. To ensure their architecture would perform at unknown and massive scale, Chris worked with AWS Support through an AWS Infrastructure Event Management (IEM) engagement and leaned on automated operations to enable load testing before launch.
DAT316 – How Telltale Games migrated its story analytics from Apache CouchDB to Amazon DynamoDB Every choice made in Telltale Games titles influences how your character develops and how the world responds to you. With millions of users making thousands of choices in a single episode, Telltale Games tracks this data and leverages it to build more relevant stories in real time as the season is developed. In this session, you’ll learn about Telltale Games’ migration from Apache CouchDB to Amazon DynamoDB, the challenges of adjusting capacity to handling spikes in database activity, and how it streamlined its analytics storage to provide new perspectives on player interaction to improve its games.
DAT318 – Migrating from RDBMS to NoSQL: How Sony Moved from MySQL to Amazon DynamoDB In this session, you learn the key differences between a relational database management service (RDBMS) and non-relational (NoSQL) databases like Amazon DynamoDB. You learn about suitable and unsuitable use cases for NoSQL databases. You’ll learn strategies for migrating from an RDBMS to DynamoDB through a 5-phase, iterative approach. See how Sony migrated an on-premises MySQL database to the cloud with Amazon DynamoDB, and see the results of this migration.
GAM301 – How EA Leveraged Amazon Redshift and AWS Partner 47Lining to Gather Meaningful Player Insights In November 2015, Capital Games launched a mobile game accompanying a major feature film release. The back end of the game is hosted in AWS and uses big data services like Amazon Kinesis, Amazon EC2, Amazon S3, Amazon Redshift, and AWS Data Pipeline. Capital Games describe some of their challenges on their initial setup and usage of Amazon Redshift and Amazon EMR. They then go over their engagement with AWS Partner 47lining and talk about specific best practices regarding solution architecture, data transformation pipelines, and system maintenance using AWS big data services. Attendees of this session should expect a candid view of the process to implementing a big data solution. From problem statement identification to visualizing data, with an in-depth look at the technical challenges and hurdles along the way.
LFS303 – How to Build a Big Data Analytics Data Lake For discovery phase research, life sciences companies have to support infrastructure that processes millions to billions of transactions. The advent of a data lake to accomplish such a task is showing itself to be a stable and productive data platform pattern to meet the goal. We discuss how to build a data lake on AWS, using services and techniques such as AWS CloudFormation, Amazon EC2, Amazon S3, IAM, and AWS Lambda. We also review a reference architecture from Amgen that uses a data lake to aid in their Life Science Research.
SVR301 – Real-time Data Processing Using AWS Lambda, Amazon Kinesis In this session, you learn from Thomson Reuters how they leverage AWS for its Product Insight service. The service provides insights to collect usage analytics for Thomson Reuters products. They walk through its architecture and demonstrate how they leverage Amazon Kinesis Streams, Amazon Kinesis Firehose, AWS Lambda, Amazon S3, Amazon Route 53, and AWS KMS for near real-time access to data being collected around the globe. They also outline how applying AWS methodologies benefited its business, such as time-to-market and cross-region ingestion, auto-scaling capabilities, low-latency, security features, and extensibility.
SVR305 – ↑↑↓↓←→←→ BA Lambda Start Ever wished you had a list of cheat codes to unleash the full power of AWS Lambda for your production workload? Come learn how to build a robust, scalable, and highly available serverless application using AWS Lambda. In this session, we discuss hacks and tricks for maximizing your AWS Lambda performance, such as leveraging customer reuse, using the 500 MB scratch space and local cache, creating custom metrics for managing operations, aligning upstream and downstream services to scale along with Lambda, and many other workarounds and optimizations across your entire function lifecycle. You also learn how Hearst converted its real-time clickstream analytics data pipeline from a server-based model to a serverless one. The infrastructure of the data pipeline relied on Amazon EC2 instances and cron jobs to shepherd data through the process. In 2016, Hearst converted its data pipeline architecture to a serverless process that is based on event triggers and the power of AWS Lambda. By moving from a time-based process to a trigger-based process, Hearst improved its pipeline latency times by 50%.
SVR308 – Content and Data Platforms at Vevo: Rebuilding and Scaling from Zero in One Year Vevo has undergone a complete strategic and technical reboot, driven not only by product but also by engineering. Since November 2015, Vevo has been replacing monolithic, legacy content services with a modern, modular, microservices architecture, all while developing new features and functionality. In parallel, Vevo has built its data platform from scratch to power the internal analytics as well as a unique music video consumption experience through a new personalized feed of recommendations — all in less than one year. This has been a monumental effort that was made possible in this short time span largely because of AWS technologies. The content team has been heavily using serverless architectures and AWS Lambda in the form of microservices, taking a similar approach to functional programming, which has helped us speed up the development process and time to market. The data team has been building the data platform by heavily leveraging Amazon Kinesis for data exchange across services, Amazon Aurora for consumer-facing services, Apache Spark on Amazon EMR for ETL + Machine Learning, as well as Amazon Redshift as the core analytics data store..
Machine learning sessions
MAC201 – Getting to Ground Truth with Amazon Mechanical Turk Jump-start your machine learning project by using the crowd to build your training set. Before you can train your machine learning algorithm, you need to take your raw inputs and label, annotate, or tag them to build your ground truth. Learn how to use the Amazon Mechanical Turk marketplace to perform these tasks. We share Amazon’s best practices, developed while training our own machine learning algorithms and walk you through quickly getting affordable and high-quality training data.
MAC202 – Deep Learning in Alexa Neural networks have a long and rich history in automatic speech recognition. In this talk, we present a brief primer on the origin of deep learning in spoken language, and then explore today’s world of Alexa. Alexa is the AWS service that understands spoken language and powers Amazon Echo. Alexa relies heavily on machine learning and deep neural networks for speech recognition, text-to-speech, language understanding, and more. We also discuss the Alexa Skills Kit, which lets any developer teach Alexa new skills.
MAC205 – Deep Learning at Cloud Scale: Improving Video Discoverability by Scaling Up Caffe on AWS Deep learning continues to push state of the art in domains such as video analytics, computer vision, and speech recognition. Deep networks are powered by amazing levels of representational power, feature learning, and abstraction. This approach comes at the cost of a significant increase in required compute power, which makes the AWS cloud an excellent environment for training. Innovators in this space are applying deep learning to a variety of applications. One such innovator, Vilynx, a startup based in Palo Alto, realized that the current pre-roll advertising-based models for mobile video weren’t returning publishers’ desired levels of engagement. In this session, we explain the algorithmic challenges of scaling across multiple nodes, and what Intel is doing on AWS to overcome them. We describe the benefits of using AWS CloudFormation to set up a distributed training environment for deep networks. We also showcase Vilynx’s contributions to video discoverability and explain how Vilynx uses AWS tools to understand video content.
MAC301 – Transforming Industrial Processes with Deep Learning Deep learning has revolutionized computer vision by significantly increasing the accuracy of recognition systems. This session discusses how the Amazon Fulfillment Technologies Computer Vision Research team has harnessed deep learning to identify inventory defects in Amazon’s warehouses. Beginning with a brief overview of how orders on Amazon.com are fulfilled, the session describes a combination of hardware and software that uses computer vision and deep learning that visually examine bins of Amazon inventory to locate possible mismatches between the physical inventory and inventory records. With the growth of deep learning, the emphasis of new system design shifts from clever algorithms to innovative ways to harness available data.
MAC302 – Leveraging Amazon Machine Learning, Amazon Redshift, and an Amazon Simple Storage Service Data Lake for Strategic Advantage in Real Estate The Howard Hughes Corporation partnered with 47Lining to develop a managed enterprise data lake based on Amazon S3. The purpose of the managed EDL is to fuse relevant on-premises and third-party data to enable Howard Hughes to answer its most valuable business questions. Their first analysis was a lead-scoring model that uses Amazon Machine Learning (Amazon ML) to predict propensity to purchase high-end real estate. The model is based on a combined set of public and private data sources, including all publicly recorded real estate transactions in the US for the past 35 years. By changing their business process for identifying and qualifying leads to use the results of data-driven analytics from their managed data lake in AWS, Howard Hughes increased the number of identified qualified leads in their pipeline by over 400% and reduced the acquisition cost per lead by more than 10 times. In this session, you see a practical example of how to use Amazon ML to improve business results, how to architect a data lake with Amazon S3 that fuses on-premises, third-party, and public datasets, and how to train and run an Amazon ML model to attain predictions
MAC303 – Developing Classification and Recommendation Engines with Amazon EMR and Apache Spark Customers are adopting Apache Spark‒a set of open-source distributed machine learning algorithms‒on Amazon EMR for large-scale machine learning workloads, especially for applications that power customer segmentation and content recommendation. By leveraging Spark ML, customers can quickly build and execute massively parallel machine learning jobs. Additionally, Spark applications can train models in streaming or batch contexts and can access data from Amazon S3, Amazon Kinesis, Apache Kafka, Amazon Elasticsearch Service, Amazon Redshift, and other services. This session explains how to quickly and easily create scalable Spark clusters with Amazon EMR, build and share models using Apache Zeppelin notebooks, and create a sample application using Spark Streaming, which updates models with real-time data.
MAC306 – Using MXNet for Recommendation Modeling at Scale For many companies, recommendation systems solve important machine learning problems. But as recommendation systems grow to millions of users and millions of items, they pose significant challenges when deployed at scale. The user-item matrix can have trillions of entries (or more), most of which are zero. To make common ML techniques practical, sparse data requires special techniques. Learn how to use MXNet to build neural network models for recommendation systems that can scale efficiently to large sparse datasets.
MAC307 – Predicting Customer Churn with Amazon Machine Learning In this session, we take a specific business problem—predicting Telco customer churn—and explore the practical aspects of building and evaluating an Amazon Machine Learning model. We explore considerations ranging from assigning a dollar value to applying the model using the relative cost of false positive and false negative errors. We discuss all aspects of putting Amazon ML to practical use, including how to build multiple models to choose from, put models into production, and update them. We also discuss using Amazon Redshift and Amazon S3 with Amazon ML.
Services sessions: Architecture and best practices
BDM201 – Big Data Architectural Patterns and Best Practices on AWS The world is producing an ever-increasing volume, velocity, and variety of big data. Consumers and businesses are demanding up-to-the-second (or even millisecond) analytics on their fast-moving data, in addition to classic batch processing. AWS delivers many technologies for solving big data problems. But what services should you use, why, when, and how? In this session, we simplify big data processing as a data bus comprising various stages: ingest, store, process, and visualize. Next, we discuss how to choose the right technology in each stage based on criteria such as data structure, query latency, cost, request rate, item size, data volume, durability, and so on. Finally, we provide reference architecture, design patterns, and best practices for assembling these technologies to solve your big data problems at the right cost
BDM301 – Best Practices for Apache Spark on Amazon EMR Organizations need to perform increasingly complex analysis on data — streaming analytics, ad-hoc querying, and predictive analytics — in order to get better customer insights and actionable business intelligence. Apache Spark has recently emerged as the framework of choice to address many of these challenges. In this session, we show you how to use Apache Spark on AWS to implement and scale common big data use cases such as real-time data processing, interactive data science, predictive analytics, and more. We talk about common architectures, best practices to quickly create Spark clusters using Amazon EMR, and ways to integrate Spark with other big data services in AWS.
BDM302 – Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service and Kibana Elasticsearch is a fully featured search engine used for real-time analytics, and Amazon Elasticsearch Service makes it easy to deploy Elasticsearch clusters on AWS. With Amazon ES, you can ingest and process billions of events per day, and explore the data using Kibana to discover patterns. In this session, we use Apache web logs as example and show you how to build an end-to-end analytics solution. First, we cover how to configure an Amazon ES cluster and ingest data into it using Amazon Kinesis Firehose. We look at best practices for choosing instance types, storage options, shard counts, and index rotations based on the throughput of incoming data. Then we demonstrate how to set up a Kibana dashboard and build custom dashboard widgets. Finally, we dive deep into the Elasticsearch query DSL and review approaches for generating custom, ad-hoc reports.
BDM304 – Analyzing Streaming Data in Real-time with Amazon Kinesis Analytics As more and more organizations strive to gain real-time insights into their business, streaming data has become ubiquitous. Typical streaming data analytics solutions require specific skills and complex infrastructure. However, with Amazon Kinesis Analytics, you can analyze streaming data in real time with standard SQL—there is no need to learn new programming languages or processing frameworks. In this session, we dive deep into the capabilities of Amazon Kinesis Analytics using real-world examples. We’ll present an end-to-end streaming data solution using Amazon Kinesis Streams for data ingestion, Amazon Kinesis Analytics for real-time processing, and Amazon Kinesis Firehose for persistence. We review in detail how to write SQL queries using streaming data and discuss best practices to optimize and monitor your Amazon Kinesis Analytics applications. Lastly, we discuss how to estimate the cost of the entire system.
BDM401 – Deep Dive: Amazon EMR Best Practices & Design Patterns Amazon EMR is one of the largest Hadoop operators in the world. In this session, we introduce you to Amazon EMR design patterns such as using Amazon S3 instead of HDFS, taking advantage of both long and short-lived clusters, and other Amazon EMR architectural best practices. We talk about how to scale your cluster up or down dynamically and introduce you to ways you can fine-tune your cluster. We also share best practices to keep your Amazon EMR cluster cost-efficient. Finally, we dive into some of our recent launches to keep you current on our latest features.
DAT304 – Deep Dive on Amazon DynamoDB Explore Amazon DynamoDB capabilities and benefits in detail and learn how to get the most out of your DynamoDB database. We go over best practices for schema design with DynamoDB across multiple use cases, including gaming, AdTech, IoT, and others. We explore designing efficient indexes, scanning, and querying, and go into detail on a number of recently released features, including JSON document support, DynamoDB Streams, and more. We also provide lessons learned from operating DynamoDB at scale, including provisioning DynamoDB for IoT.
BDM202 – Workshop: Building Your First Big Data Application with AWS Want to get ramped up on how to use Amazon’s big data web services and launch your first big data application on AWS? Join us in this workshop as we build a big data application in real-time using Amazon EMR, Amazon Redshift, Amazon Kinesis, Amazon DynamoDB, and Amazon S3. We review architecture design patterns for big data solutions on AWS and give you access to a take-home lab so that you can rebuild and customize the application yourself.
IOT306 – IoT Visualizations and Analytics In this workshop, we focus on visualizations of IoT data using ELK, Amazon Elasticsearch Service, Logstash, and Kibana or Amazon Kinesis. We dive into how these visualizations can give you new capabilities and understanding when interacting with your device data from the context they provide on the world around them.
MAC401 – Scalable Deep Learning Using MXNet Deep learning continues to push the state of the art in domains such as computer vision, natural language understanding, and recommendation engines. One of the key reasons for this progress is the availability of highly flexible and developer friendly deep learning frameworks. During this workshop, members of the Amazon Machine Learning team provide a short background on Deep Learning focusing on relevant application domains and an introduction to using the powerful and scalable Deep Learning framework, MXNet. At the end of this tutorial, you’ll gain hands-on experience targeting a variety of applications including computer vision and recommendation engines as well as exposure to how to use preconfigured Deep Learning AMIs and CloudFormation Templates to help speed your development.
STG312 – Workshop: Working with AWS Snowball – Accelerating Data Ingest into the Cloud This workshop provides customers with the opportunity to work hands-on with the AWS Snowball service, with attendees broken out into small teams to perform various on-premises to cloud data transfer scenarios using actual Snowball devices. These scenarios include migrating backup & archive data to S3-IA and Amazon Glacier, HDFS cluster migration to S3 for use with Amazon EMR and Amazon Redshift, and leveraging the Snowball API & SDK to build AWS Snowball service integration into a custom application. The session opens with an overview of the service, objectives, and guidance on where to find resources. Attendees should bring their own laptops and should have a basic familiarity with AWS storage services (S3 and Amazon Glacier). Prerequisites: Participants should have an AWS account established and available for use during the workshop. Please bring your own laptop.
Virtual reality (VR) 360° videos are the next frontier of how we engage with and consume content. Unlike a traditional scenario in which a person views a screen in front of them, VR places the user inside an immersive experience. A viewer is “in” the story, and not on the sidelines as an observer.
Ivan Sutherland, widely regarded as the father of computer graphics, laid out the vision for virtual reality in his famous speech, “Ultimate Display” in 1965 . In that he said, “You shouldn’t think of a computer screen as a way to display information, but rather as a window into a virtual world that could eventually look real, sound real, move real, interact real, and feel real.”
Over the years, significant advancements have been made to bring reality closer to that vision. With the advent of headgear capable of rendering 3D spatial audio and video, realistic sound and visuals can be virtually reproduced, delivering immersive experiences to consumers.
When it comes to entertainment and sports, streaming in VR has become the new 4K HEVC/UHD of 2016. This has been accelerated by the release of new camera capture hardware like GoPro and streaming capabilities such as 360° video streaming from Facebook and YouTube. Yahoo streams lots of engaging sports, finance, news, and entertainment video content to tens of millions of users. The opportunity to produce and stream such content in 360° VR opens a unique opportunity to Yahoo to offer new types of engagement, and bring the users a sense of depth and visceral presence.
While this is not an experience that is live in product, it is an area we are actively exploring. In this blog post, we take a look at what’s involved in building an end-to-end VR streaming workflow for both Live and Video on Demand (VOD). Our experiments and research goes from camera rig setup, to video stitching, to encoding, to the eventual rendering of videos on video players on desktop and VR headsets. We also discuss challenges yet to be solved and the opportunities they present in streaming VR.
1. The Workflow
Yahoo’s video platform has a workflow that is used internally to enable streaming to an audience of tens of millions with the click of a few buttons. During experimentation, we enhanced this same proven platform and set of APIs to build a complete 360°/VR experience. The diagram below shows the end-to-end workflow for streaming 360°/VR that we built on Yahoo’s video platform.
Figure 1: VR Streaming Workflow at Yahoo
1.1. Capturing 360° video
In order to capture a virtual reality video, you need access to a 360°-capable video camera. Such a camera uses either fish-eye lenses or has an array of wide-angle lenses to collectively cover a 360 (θ) by 180 (ϕ) sphere as shown below.
Though it sounds simple, there is a real challenge in capturing a scene in 3D 360° as most of the 360° video cameras offer only 2D 360° video capture.
In initial experiments, we tried capturing 3D video using two cameras side-by-side, for left and right eyes and arranging them in a spherical shape. However this required too many cameras – instead we use view interpolation in the stitching step to create virtual cameras.
Another important consideration with 360° video is the number of axes the camera is capturing video with. In traditional 360° video that is captured using only a single-axis (what we refer as horizontal video), a user can turn their head from left to right. But this setup of cameras does not support a user tilting their head at 90°.
To achieve true 3D in our setup, we went with 6-12 GoPro cameras having 120° field of view (FOV) arranged in a ring, and an additional camera each on top and bottom, with each one outputting 2.7K at 30 FPS.
1.2. Stitching 360° video
Because a 360° view is a spherical video, the surface of this sphere needs to be projected onto a planar surface in 2D so that video encoders can process it. There are two popular layouts:
Equirectangular layout: This is the most widely-used format in computer graphics to represent spherical surfaces in a rectangular form with an aspect ratio of 2:1. This format has redundant information at the poles which means some pixels are over-represented, introducing distortions at the poles compared to the equator (as can be seen in the equirectangular mapping of the sphere below).
Figure 2: Equirectangular Layout 
CubeMap layout: CubeMap layout is a format that has also been used in computer graphics. It contains six individual 2D textures that map to six sides of a cube. The figure below is a typical cubemap representation. In a cubemap layout, the sphere is projected onto six faces and the images are folded out into a 2D image, so pieces of a video frame map to different parts of a cube, which leads to extremely efficient compact packing. Cubemap layouts require about 25% fewer pixels compared to equirectangular layouts.
Figure 3: CubeMap Layout 
In our setup, we experimented with a couple of stitching softwares. One was from Vahana VR , and the other was a modified version of the open-source Surround360 technology that works with a GoPro rig . Both softwares output equirectangular panoramas for the left and the right eye. Here are the steps involved in stitching together a 360° image:
Raw frame image processing: Converts uncompressed raw video data to RGB, which involves several steps starting from black-level adjustment, to applying Demosaic algorithms in order to figure out RGB color parts for each pixel based on the surrounding pixels. This also involves gamma correction, color correction, and anti vignetting (undoing the reduction in brightness on the image periphery). Finally, this stage applies sharpening and noise-reduction algorithms to enhance the image and suppress the noise.
Calibration: During the calibration step, stitching software takes steps to avoid vertical parallax while stitching overlapping portions in adjacent cameras in the rig. The purpose is to align everything in the scene, so that both eyes see every point at the same vertical coordinate. This step essentially matches the key points in images among adjacent camera pairs. It uses computer vision algorithms for feature detection like Binary Robust Invariant Scalable Keypoints (BRISK)  and AKAZE .
Optical Flow: During stitching, to cover the gaps between adjacent real cameras and provide interpolated view, optical flow is used to create virtual cameras. The optical flow algorithm finds the pattern of apparent motion of image objects between two consecutive frames caused by the movement of the object or camera. It uses OpenCV algorithms to find the optical flow .
Below are the frames produced by the GoPro camera rig:
Figure 4: Individual frames from 12-camera rig
Figure 5: Stitched frame output with PtGui
Figure 6: Stitched frame with barrel distortion using Surround360
Figure 7: Stitched frame after removing barrel distortion using Surround360
To get the full depth in stereo, the rig is set-up so that i = r * sin(FOV/2 – 360/n). where:
i = IPD/2 where IPD is the inter-pupillary distance between eyes.\
r = Radius of the rig.
FOV = Field of view of GoPro cameras, 120 degrees.
n = Number of cameras which is 12 in our setup.
Given IPD is normally 6.4 cms, i should be greater than 3.2 cm. This implies that with a 12-camera setup, the radius of the the rig comes to 14 cm(s). Usually, if there are more cameras it is easier to avoid black stripes.
For a truly immersive experience, users expect 4K (3840 x 2160) quality resolution at 60 frames per second (FPS) or higher. Given typical HMDs have a FOV of 120 degrees, a full 360° video needs a resolution of at least 12K (11520 x 6480). 4K streaming needs a bandwidth of 25 Mbps . So for 12K resolution, this effectively translates to > 75 Mbps and even more for higher framerates. However, average wifi in US has bandwidth of 15 Mbps .
One way to address the bandwidth issue is by reducing the resolution of areas that are out of the field of view. Spatial sub-sampling is used during transcoding to produce multiple viewport-specific streams. Each viewport-specific stream has high resolution in a given viewport and low resolution in the rest of the sphere.
On the player side, we can modify traditional adaptive streaming logic to take into account field of view. Depending on the video, if the user moves his head around a lot, it could result in multiple buffer fetches and could result in rebuffering. Ideally, this will work best in videos where the excessive motion happens in one field of view at a time and does not span across multiple fields of view at the same time. This work is still in an experimental stage.
The default output format from stitching software of both Surround360 and Vahana VR is equirectangular format. In order to reduce the size further, we pass it through a cubemap filter transform integrated into ffmpeg to get an additional pixel reduction of ~25%  .
At the end of above steps, the stitching pipeline produces high-resolution stereo 3D panoramas which are then ingested into the existing Yahoo Video transcoding pipeline to produce multiple bit-rates HLS streams.
1.3. Adding a stitching step to the encoding pipeline
Live – In order to prepare for multi-bitrate streaming over the Internet, a live 360° video-stitched stream in RTMP is ingested into Yahoo’s video platform. A live Elemental encoder was used to re-encode and package the live input into multiple bit-rates for adaptive streaming on any device (iOS, Android, Browser, Windows, Mac, etc.)
Video on Demand – The existing Yahoo video transcoding pipeline was used to package multiple bit-rates HLS streams from raw equirectangular mp4 source videos.
1.4. Rendering 360° video into the player
The spherical video stream is delivered to the Yahoo player in multiple bit rates. As a user changes their viewing angle, different portion of the frame are shown, presenting a 360° immersive experience. There are two types of VR players currently supported at Yahoo:
VR Display Capabilities: It has attributes to indicate position support, orientation support, and has external display.
VR Layer: Contains the HTML5 canvas element which is presented by VR Display when its submit frame is called. It also contains attributes defining the left bound and right bound textures within source canvas for presenting to an eye.
VREye Parameters: Has information required to correctly render a scene for given eye. For each eye, it has offset the distance from middle of the user’s eyes to the center point of one eye which is half of the interpupillary distance (IPD). In addition, it maintains the current FOV of the eye, and the recommended renderWidth and render Height of each eye viewport.
Get VR Displays: Returns a list of VR Display(s) HMDs accessible to the browser.
For web devices which support only monoscopic rendering like desktop browsers without HMD, it creates a single Perspective Camera object specifying the FOV and aspect ratio. As the device’s requestAnimationFrame is called it renders the new frames. As part of rendering the frame, it first calculates the projection matrix for FOV and sets the X (user’s right), Y (Up), Z (behind the user) coordinates of the camera position.
For devices that support stereoscopic rendering like mobile phones from Samsung Gear, the webvr player creates two PerspectiveCamera objects, one for the left eye and one for the right eye. Each Perspective camera queries the VR device capabilities to get the eye parameters like FOV, renderWidth and render Height every time a frame needs to be rendered at the native refresh rate of HMD. The key difference between stereoscopic and monoscopic is the perceived sense of depth that the user experiences, as the video frames separated by an offset are rendered by separate canvas elements to each individual eye.
Cardboard VR – Google provides a VR sdk for both iOS and Android . This simplifies common VR tasks like-lens distortion correction, spatial audio, head tracking, and stereoscopic side-by-side rendering. For iOS, we integrated Cardboard VR functionality into our Yahoo Video SDK, so that users can watch stereoscopic 3D videos on iOS using Google Cardboard.
With all the pieces in place, and experimentation done, we were able to successfully do a 360° live streaming of an internal company-wide event.
Figure 8: 360° Live streaming of Yahoo internal event
In addition to demonstrating our live streaming capabilities, we are also experimenting with showing 360° VOD videos produced with a GoPro-based camera rig. Here is a screenshot of one of the 360° videos being played in the Yahoo player.
Figure 9: Yahoo Studios produced 360° VOD content in the Yahoo Player
3. Challenges and Opportunities
3.1. Enormous amounts of data
As we alluded to in the video processing section of this post, delivering 4K resolution videos for each eye for each FOV at a high frame-rate remains a challenge. While FOV-adaptive streaming does reduce the size by providing high resolution streams separately for each FOV, providing an impeccable 60 FPS or more viewing experience still requires a lot more data than the current internet pipes can handle. Some of the other possible options which we are closely paying attention to are:
Compression efficiency with HEVC and VP9 – New codecs like HEVC and VP9 have the potential to provide significant compression gains. HEVC open source codecs like x265 have shown a 40% compression performance gain compared to the currently ubiquitous H.264/AVC codec. LIkewise, a VP9 codec from Google has shown similar 40% compression performance gains. The key challenge is the hardware decoding support and the browser support. But with Apple and Microsoft very much behind HEVC and Firefox and Chrome already supporting VP9, we believe most browsers would support HEVC or VP9 within a year.
Using 10 bit color depth vs 8 bit color depth – Traditional monitors support 8 bpc (bits per channel) for displaying images. Given each pixel has 3 channels (RGB), 8 bpc maps to 256x256x256 color/luminosity combinations to represent 16 million colors. With 10 bit color depth, you have the potential to represent even more colors. But the biggest stated advantage of using 10 bit color depth is with respect to compression during encoding even if the source only uses 8 bits per channel. Both x264 and x265 codecs support 10 bit color depth, with ffmpeg already supporting encoding at 10 bit color depth.
3.2. Six degrees of freedom
With current camera rig workflows, users viewing the streams through HMD are able to achieve three degrees of Freedom (DoF) i.e., the ability to move up/down, clockwise/anti-clockwise, and swivel. But you still can’t get a different perspective when you move inside it i.e., move forward/backward. Until now, this true six DoF immersive VR experience has only been possible in CG VR games. In video streaming, LightField technology-based video cameras produced by Lytro are the first ones to capture light field volume data from all directions . But Lightfield-based videos require an order of magnitude more data than traditional fixed FOV, fixed IPD, fixed lense camera rigs like GoPro. As bandwidth problems get resolved via better compressions and better networks, achieving true immersion should be possible.
VR streaming is an emerging medium and with the addition of 360° VR playback capability, Yahoo’s video platform provides us a great starting point to explore the opportunities in video with regard to virtual reality. As we continue to work to delight our users by showing immersive video content, we remain focused on optimizing the rendering of high-quality 4K content in our players. We’re looking at building FOV-based adaptive streaming capabilities and better compression during delivery. These capabilities, and the enhancement of our webvr player to play on more HMDs like HTC Vive and Oculus Rift, will set us on track to offer streaming capabilities across the entire spectrum. At the same time, we are keeping a close watch on advancements in supporting spatial audio experiences, as well as advancements in the ability to stream volumetric lightfield videos to achieve true six degrees of freedom, with the aim of realizing the true potential of VR.
Glossary – VR concepts:
VR – Virtual reality, commonly referred to as VR, is an immersive computer-simulated reality experience that places viewers inside an experience. It “transports” viewers from their physical reality into a closed virtual reality. VR usually requires a headset device that takes care of sights and sounds, while the most-involved experiences can include external motion tracking, and sensory inputs like touch and smell. For example, when you put on VR headgear you suddenly start feeling immersed in the sounds and sights of another universe, like the deck of the Star Trek Enterprise. Though you remain physically at your place, VR technology is designed to manipulate your senses in a manner that makes you truly feel as if you are on that ship, moving through the virtual environment and interacting with the crew.
360 degree video – A 360° video is created with a camera system that simultaneously records all 360 degrees of a scene. It is a flat equirectangular video projection that is morphed into a sphere for playback on a VR headset. A standard world map is an example of equirectangular projection, which maps the surface of the world (sphere) onto orthogonal coordinates.
Spatial Audio – Spatial audio gives the creator the ability to place sound around the user. Unlike traditional mono/stereo/surround audio, it responds to head rotation in sync with video. While listening to spatial audio content, the user receives a real-time binaural rendering of an audio stream .
FOV – A human can naturally see 170 degrees of viewable area (field of view). Most consumer grade head mounted displays HMD(s) like Oculus Rift and HTC Vive now display 90 degrees to 120 degrees.
Monoscopic video – A monoscopic video means that both eyes see a single flat image, or video file. A common camera setup involves six cameras filming six different fields of view. Stitching software is used to form a single equirectangular video. Max output resolution on 2D scopic videos on Gear VR is 3480×1920 at 30 frames per second.
Presence – Presence is a kind of immersion where the low-level systems of the brain are tricked to such an extent that they react just as they would to non-virtual stimuli.
Latency – It’s the time between when you move your head, and when you see physical updates on the screen. An acceptable latency is anywhere from 11 ms (for games) to 20 ms (for watching 360 vr videos).
Head Tracking – There are two forms:
Positional tracking – movements and related translations of your body, eg: sway side to side.
Traditional head tracking – left, right, up, down, roll like clock rotation.
Automatically identifying that an image is not suitable/safe for work (NSFW), including offensive and adult images, is an important problem which researchers have been trying to tackle for decades. Since images and user-generated content dominate the Internet today, filtering NSFW images becomes an essential component of Web and mobile applications. With the evolution of computer vision, improved training data, and deep learning algorithms, computers are now able to automatically classify NSFW image content with greater precision.
Defining NSFW material is subjective and the task of identifying these images is non-trivial. Moreover, what may be objectionable in one context can be suitable in another. For this reason, the model we describe below focuses only on one type of NSFW content: pornographic images. The identification of NSFW sketches, cartoons, text, images of graphic violence, or other types of unsuitable content is not addressed with this model.
To the best of our knowledge, there is no open source model or algorithm for identifying NSFW images. In the spirit of collaboration and with the hope of advancing this endeavor, we are releasing our deep learning model that will allow developers to experiment with a classifier for NSFW detection, and provide feedback to us on ways to improve the classifier.
Our general purpose Caffe deep neural network model (Github code) takes an image as input and outputs a probability (i.e a score between 0-1) which can be used to detect and filter NSFW images. Developers can use this score to filter images below a certain suitable threshold based on a ROC curve for specific use-cases, or use this signal to rank images in search results.
Convolutional Neural Network (CNN) architectures and tradeoffs
In recent years, CNNs have become very successful in image classification problems   . Since 2012, new CNN architectures have continuously improved the accuracy of the standard ImageNet classification challenge. Some of the major breakthroughs include AlexNet (2012) , GoogLeNet , VGG (2013)  and Residual Networks (2015) . These networks have different tradeoffs in terms of runtime, memory requirements, and accuracy. The main indicators for runtime and memory requirements are:
Flops or connections – The number of connections in a neural network determine the number of compute operations during a forward pass, which is proportional to the runtime of the network while classifying an image.
Parameters -–The number of parameters in a neural network determine the amount of memory needed to load the network.
Ideally we want a network with minimum flops and minimum parameters, which would achieve maximum accuracy.
Training a deep neural network for NSFW classification
We train the models using a dataset of positive (i.e. NSFW) images and negative (i.e. SFW – suitable/safe for work) images. We are not releasing the training images or other details due to the nature of the data, but instead we open source the output model which can be used for classification by a developer.
We use the Caffe deep learning library and CaffeOnSpark; the latter is a powerful open source framework for distributed learning that brings Caffe deep learning to Hadoop and Spark clusters for training models (Big shout out to Yahoo’s CaffeOnSpark team!).
While training, the images were resized to 256×256 pixels, horizontally flipped for data augmentation, and randomly cropped to 224×224 pixels, and were then fed to the network. For training residual networks, we used scale augmentation as described in the ResNet paper , to avoid overfitting. We evaluated various architectures to experiment with tradeoffs of runtime vs accuracy.
MS_CTC  – This architecture was proposed in Microsoft’s constrained time cost paper. It improves on top of AlexNet in terms of speed and accuracy maintaining a combination of convolutional and fully-connected layers.
Squeezenet  – This architecture introduces the fire module which contain layers to squeeze and then expand the input data blob. This helps to save the number of parameters keeping the Imagenet accuracy as good as AlexNet, while the memory requirement is only 6MB.
VGG  – This architecture has 13 conv layers and 3 FC layers.
GoogLeNet  – GoogLeNet introduces inception modules and has 20 convolutional layer stages. It also uses hanging loss functions in intermediate layers to tackle the problem of diminishing gradients for deep networks.
ResNet-50  – ResNets use shortcut connections to solve the problem of diminishing gradients. We used the 50-layer residual network released by the authors.
ResNet-50-thin – The model was generated using our pynetbuilder tool and replicates the Residual Network paper’s 50-layer network (with half number of filters in each layer). You can find more details on how the model was generated and trained here.
Tradeoffs of different architectures: accuracy vs number of flops vs number of params in network.
The deep models were first pre-trained on the ImageNet 1000 class dataset. For each network, we replace the last layer (FC1000) with a 2-node fully-connected layer. Then we fine-tune the weights on the NSFW dataset. Note that we keep the learning rate multiplier for the last FC layer 5 times the multiplier of other layers, which are being fine-tuned. We also tune the hyper parameters (step size, base learning rate) to optimize the performance.
We observe that the performance of the models on NSFW classification tasks is related to the performance of the pre-trained model on ImageNet classification tasks, so if we have a better pretrained model, it helps in fine-tuned classification tasks. The graph below shows the relative performance on our held-out NSFW evaluation set. Please note that the false positive rate (FPR) at a fixed false negative rate (FNR) shown in the graph is specific to our evaluation dataset, and is shown here for illustrative purposes. To use the models for NSFW filtering, we suggest that you plot the ROC curve using your dataset and pick a suitable threshold.
Comparison of performance of models on Imagenet and their counterparts fine-tuned on NSFW dataset.
We are releasing the thin ResNet 50 model, since it provides good tradeoff in terms of accuracy, and the model is lightweight in terms of runtime (takes < 0.5 sec on CPU) and memory (~23 MB). Please refer our git repository for instructions and usage of our model. We encourage developers to try the model for their NSFW filtering use cases. For any questions or feedback about performance of model, we encourage creating a issue and we will respond ASAP.
Results can be improved by fine-tuning the model for your dataset or use case. If you achieve improved performance or you have trained a NSFW model with different architecture, we encourage contributing to the model or sharing the link on our description page.
Disclaimer: The definition of NSFW is subjective and contextual. This model is a general purpose reference model, which can be used for the preliminary filtering of pornographic images. We do not provide guarantees of accuracy of output, rather we make this available for developers to explore and enhance as an open source project.
 He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. “Deep residual learning for image recognition” arXiv preprint arXiv:1512.03385 (2015).
 Simonyan, Karen, and Andrew Zisserman. “Very deep convolutional networks for large-scale image recognition.”; arXiv preprint arXiv:1409.1556(2014).
 Iandola, Forrest N., Matthew W. Moskewicz, Khalid Ashraf, Song Han, William J. Dally, and Kurt Keutzer. “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and 1MB model size.”; arXiv preprint arXiv:1602.07360 (2016).
 He, Kaiming, and Jian Sun. “Convolutional neural networks at constrained time cost.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5353-5360. 2015.
 Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet,Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. “Going deeper with convolutions” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-9. 2015.
 Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. “Imagenet classification with deep convolutional neural networks” In Advances in neural information processing systems, pp. 1097-1105. 2012.
Matt visited Google I/O yesterday, and sent back some pretty incredible pictures. This event looks more like a music festival than a tech conference.
He was sending pictures and excited snippets of text back to Pi Towers all through the event, and then, when he got home, shared this video. I’ve been so excited about it that I’ve had it playing on repeat, and we all thought you’d like to see it too.
This is a demo of a Raspberry Pi robot working with Google’s Cloud Vision API – and it’s got such potential for your projects.
Cloud Vision API provides powerful Image Analytics capabilities as easy to use APIs. It enables application developers to build the next generation of applications that can see and understand the content within the images. The service enables customers to detect a broad set of entities within an image from everyday objects to faces and product logos.
The robot is taking pictures and sending them to the cloud, where they’re analysed and sent back in real time. There’s facial detection – along with detection of what emotion is showing on those faces. And cloud vision offers you image recognition, so you should be able get your robot to distinguish limes from green apples. You can then get the robot to act on that data – so you could set it to gather apples and not limes, for example.
We’re pretty excited about the opportunities this API offers makers of all kinds of Raspberry Pi devices. You can learn more here – please let us know if you start integrating it into your own projects!
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.