Hey folks, Rob here with good news about the latest edition of The MagPi! Issue 71, out right now, is all about running Android on Raspberry Pi with the help of emteria.OS and Android Things.
Android and Raspberry Pi, two great tastes that go great together!
Android and Raspberry Pi
A big part of our main feature looks at emteria.OS, a version of Android that runs directly on the Raspberry Pi. By running it on a touchscreen setup, you can use your Pi just like an Android tablet — one that’s easily customisable and hackable for all your embedded computing needs. Inside the issue, we’ve got a special emteria.OS discount code for readers.
We also look at Android Things, the official Android release for Raspberry Pi that focuses on IoT applications, and we show you some of the amazing projects that have been built with it.
On top of that, we’ve included guides on how to get started with TensorFlow AI and on building an oscilloscope.
We really loved this card scanning project! Read all about it in issue 71.
All this, along with our usual varied selection of project showcases, excellent tutorials, and definitive reviews!
Get The MagPi 71
You can get The MagPi 71 today from WHSmith, Tesco, Sainsbury’s, and Asda. If you live in the US, head over to your local Barnes & Noble or Micro Center in the next few days for a print copy. You can also get the new issue online from our store, or digitally via our Android or iOS apps. And don’t forget, there’s always the free PDF as well.
New subscription offer!
Want to support the Raspberry Pi Foundation and the magazine? We’ve launched a new way to subscribe to the print version of The MagPi: you can now take out a monthly £4 subscription to the magazine, effectively creating a rolling pre-order system that saves you money on each issue.
You can also take out a twelve-month print subscription and get a Pi Zero W plus case and adapter cables absolutely free! This offer does not currently have an end date.
Today, at the AWS Summit in Tokyo we announced a number of updates and new features for Amazon SageMaker. Starting today, SageMaker is available in Asia Pacific (Tokyo)! SageMaker also now supports CloudFormation. A new machine learning framework, Chainer, is now available in the SageMaker Python SDK, in addition to MXNet and Tensorflow. Finally, support for running Chainer models on several devices was added to AWS Greengrass Machine Learning.
Amazon SageMaker Chainer Estimator
Chainer is a popular, flexible, and intuitive deep learning framework. Chainer networks work on a “Define-by-Run” scheme, where the network topology is defined dynamically via forward computation. This is in contrast to many other frameworks which work on a “Define-and-Run” scheme where the topology of the network is defined separately from the data. A lot of developers enjoy the Chainer scheme since it allows them to write their networks with native python constructs and tools.
Luckily, using Chainer with SageMaker is just as easy as using a TensorFlow or MXNet estimator. In fact, it might even be a bit easier since it’s likely you can take your existing scripts and use them to train on SageMaker with very few modifications. With TensorFlow or MXNet users have to implement a train function with a particular signature. With Chainer your scripts can be a little bit more portable as you can simply read from a few environment variables like SM_MODEL_DIR, SM_NUM_GPUS, and others. We can wrap our existing script in a if __name__ == '__main__': guard and invoke it locally or on sagemaker.
import argparse
import os
if __name__ =='__main__':
parser = argparse.ArgumentParser()
# hyperparameters sent by the client are passed as command-line arguments to the script.
parser.add_argument('--epochs', type=int, default=10)
parser.add_argument('--batch-size', type=int, default=64)
parser.add_argument('--learning-rate', type=float, default=0.05)
# Data, model, and output directories
parser.add_argument('--output-data-dir', type=str, default=os.environ['SM_OUTPUT_DATA_DIR'])
parser.add_argument('--model-dir', type=str, default=os.environ['SM_MODEL_DIR'])
parser.add_argument('--train', type=str, default=os.environ['SM_CHANNEL_TRAIN'])
parser.add_argument('--test', type=str, default=os.environ['SM_CHANNEL_TEST'])
args, _ = parser.parse_known_args()
# ... load from args.train and args.test, train a model, write model to args.model_dir.
Then, we can run that script locally or use the SageMaker Python SDK to launch it on some GPU instances in SageMaker. The hyperparameters will get passed in to the script as CLI commands and the environment variables above will be autopopulated. When we call fit the input channels we pass will be populated in the SM_CHANNEL_* environment variables.
from sagemaker.chainer.estimator import Chainer
# Create my estimator
chainer_estimator = Chainer(
entry_point='example.py',
train_instance_count=1,
train_instance_type='ml.p3.2xlarge',
hyperparameters={'epochs': 10, 'batch-size': 64}
)
# Train my estimator
chainer_estimator.fit({'train': train_input, 'test': test_input})
# Deploy my estimator to a SageMaker Endpoint and get a Predictor
predictor = chainer_estimator.deploy(
instance_type="ml.m4.xlarge",
initial_instance_count=1
)
Now, instead of bringing your own docker container for training and hosting with Chainer, you can just maintain your script. You can see the full sagemaker-chainer-containers on github. One of my favorite features of the new container is built-in chainermn for easy multi-node distribution of your chainer training jobs.
There’s a lot more documentation and information available in both the README and the example notebooks.
AWS GreenGrass ML with Chainer
AWS GreenGrass ML now includes a pre-built Chainer package for all devices powered by Intel Atom, NVIDIA Jetson, TX2, and Raspberry Pi. So, now GreenGrass ML provides pre-built packages for TensorFlow, Apache MXNet, and Chainer! You can train your models on SageMaker then easily deploy it to any GreenGrass-enabled device using GreenGrass ML.
JAWS UG
I want to give a quick shout out to all of our wonderful and inspirational friends in the JAWS UG who attended the AWS Summit in Tokyo today. I’ve very much enjoyed seeing your pictures of the summit. Thanks for making Japan an amazing place for AWS developers! I can’t wait to visit again and meet with all of you.
This video demos a real-like Pokedex, complete with visual recognition, that I created using a Raspberry Pi, Python, and Deep Learning. You can find the entire blog post, including code, using this link: https://www.pyimagesearch.com/2018/04/30/a-fun-hands-on-deep-learning-project-for-beginners-students-and-hobbyists/ Music credit to YouTube user “No Copyright” for providing royalty free music: https://www.youtube.com/watch?v=PXpjqURczn8
The history of Pokémon in 30 seconds
The Pokémon franchise was created by video game designer Satoshi Tajiri in 1995. In the fictional world of Pokémon, Pokémon Trainers explore the vast landscape, catching and training small creatures called Pokémon. To date, there are 802 different types of Pokémon. They range from the ever recognisable Pikachu, a bright yellow electric Pokémon, to the highly sought-after Shiny Charizard, a metallic, playing-card-shaped Pokémon that your mate Alex claims she has in mint condition, but refuses to show you.
In the world of Pokémon, children as young as ten-year-old protagonist and all-round annoyance Ash Ketchum are allowed to leave home and wander the wilderness. There, they hunt vicious, deadly creatures in the hope of becoming a Pokémon Master.
Adrian’s deep learning Pokédex
Adrian is a bit of a deep learning pro, as demonstrated by his Santa/Not Santa detector, which we wrote about last year. For that project, he also provided a great explanation of what deep learning actually is. In a nutshell:
…a subfield of machine learning, which is, in turn, a subfield of artificial intelligence (AI).While AI embodies a large, diverse set of techniques and algorithms related to automatic reasoning (inference, planning, heuristics, etc), the machine learning subfields are specifically interested in pattern recognition and learning from data.
As with his earlier Raspberry Pi project, Adrian uses the Keras deep learning model and the TensorFlow backend, plus a few other packages such as Adrian’s own imutils functions and OpenCV.
Adrian trained a Convolutional Neural Network using Keras on a dataset of 1191 Pokémon images, obtaining 96.84% accuracy. As Adrian explains, this model is able to identify Pokémon via still image and video. It’s perfect for creating a Pokédex – an interactive Pokémon catalogue that should, according to the franchise, be able to identify and read out information on any known Pokémon when captured by camera. More information on model training can be found on Adrian’s blog.
For the physical build, a Raspberry Pi 3 with camera module is paired with the Raspberry Pi 7″ touch display to create a portable Pokédex. And while Adrian comments that the same result can be achieved using your home computer and a webcam, that’s not how Adrian rolls as a Raspberry Pi fan.
Plus, the smaller size of the Pi is perfect for one of you to incorporate this deep learning model into a 3D-printed Pokédex for ultimate Pokémon glory, pretty please, thank you.
Adrian has gone into impressive detail about how the project works and how you can create your own on his blog, pyimagesearch. So if you’re interested in learning more about deep learning, and making your own Pokédex, be sure to visit.
Discover new sounds and explore the role of machine learning in music production and sound research with the NSynth Super, an ongoing project from Google’s Magenta research team that you can build at home.
Part of the ongoing Magenta research project within Google, NSynth Super explores the ways in which machine learning tools help artists and musicians be creative.
“Technology has always played a role in creating new types of sounds that inspire musicians — from the sounds of distortion to the electronic sounds of synths,” explains the team behind the NSynth Super. “Today, advances in machine learning and neural networks have opened up new possibilities for sound generation.”
Using TensorFlow, the Magenta team builds tools and interfaces that let artists and musicians use machine learning in their work. The NSynth Super AI algorithm uses deep neural networking to investigate the character of sounds. It then builds new sounds based on these characteristics instead of simply mixing sounds together.
Using an autoencoder, it extracts 16 defining temporal features from each input. These features are then interpolated linearly to create new embeddings (mathematical representations of each sound). These new embeddings are then decoded into new sounds, which have the acoustic qualities of both inputs.
The team publishes all hardware designs and software that are part of their ongoing research under open-source licences, allowing you to build your own synth.
Build your own NSynth Super
Using these open-source tools, Andrew Black has produced his own NSynth Super, demoed in the video above. Andrew’s list of build materials includes a Raspberry Pi 3, potentiometers, rotary encoders, and the Adafruit 1.3″ OLED display. Magenta also provides Gerber files for you to fabricate your own PCB.
Once fabricated, the PCB includes a table of contents for adding components.
The build isn’t easy — it requires soldering skills or access to someone who can assemble PCBs. Take a look at Andrew’s blog post and the official NSynth GitHub repo to see whether you’re up to the challenge.
Music and Raspberry Pi
The Raspberry Pi has been widely used for music production and music builds. Be it retrofitting a boombox, distributing music atop Table Mountain, or coding tracks with Sonic Pi, the Pi offers endless opportunities for musicians and music lovers to expand their repertoire of builds and instruments.
If you’d like to try more music-based projects using the Raspberry Pi, you can check out our free resources. And if you’ve used a Raspberry Pi in your own musical project, please share it with us in the comments or via our social network accounts.
What happens when you combine the Internet of Things, Machine Learning, and Edge Computing? Before I tell you, let’s review each one and discuss what AWS has to offer.
Internet of Things (IoT) – Devices that connect the physical world and the digital one. The devices, often equipped with one or more types of sensors, can be found in factories, vehicles, mines, fields, homes, and so forth. Important AWS services include AWS IoT Core, AWS IoT Analytics, AWS IoT Device Management, and Amazon FreeRTOS, along with others that you can find on the AWS IoT page.
Machine Learning (ML) – Systems that can be trained using an at-scale dataset and statistical algorithms, and used to make inferences from fresh data. At Amazon we use machine learning to drive the recommendations that you see when you shop, to optimize the paths in our fulfillment centers, fly drones, and much more. We support leading open source machine learning frameworks such as TensorFlow and MXNet, and make ML accessible and easy to use through Amazon SageMaker. We also provide Amazon Rekognition for images and for video, Amazon Lex for chatbots, and a wide array of language services for text analysis, translation, speech recognition, and text to speech.
Edge Computing – The power to have compute resources and decision-making capabilities in disparate locations, often with intermittent or no connectivity to the cloud. AWS Greengrass builds on AWS IoT, giving you the ability to run Lambda functions and keep device state in sync even when not connected to the Internet.
ML Inference at the Edge Today I would like to toss all three of these important new technologies into a blender! You can now perform Machine Learning inference at the edge using AWS Greengrass. This allows you to use the power of the AWS cloud (including fast, powerful instances equipped with GPUs) to build, train, and test your ML models before deploying them to small, low-powered, intermittently-connected IoT devices running in those factories, vehicles, mines, fields, and homes that I mentioned.
Here are a few of the many ways that you can put Greengrass ML Inference to use:
Precision Farming – With an ever-growing world population and unpredictable weather that can affect crop yields, the opportunity to use technology to increase yields is immense. Intelligent devices that are literally in the field can process images of soil, plants, pests, and crops, taking local corrective action and sending status reports to the cloud.
Physical Security – Smart devices (including the AWS DeepLens) can process images and scenes locally, looking for objects, watching for changes, and even detecting faces. When something of interest or concern arises, the device can pass the image or the video to the cloud and use Amazon Rekognition to take a closer look.
Industrial Maintenance – Smart, local monitoring can increase operational efficiency and reduce unplanned downtime. The monitors can run inference operations on power consumption, noise levels, and vibration to flag anomalies, predict failures, detect faulty equipment.
Greengrass ML Inference Overview There are several different aspects to this new AWS feature. Let’s take a look at each one:
Machine Learning Models – Precompiled TensorFlow and MXNet libraries, optimized for production use on the NVIDIA Jetson TX2 and Intel Atom devices, and development use on 32-bit Raspberry Pi devices. The optimized libraries can take advantage of GPU and FPGA hardware accelerators at the edge in order to provide fast, local inferences.
Model Building and Training – The ability to use Amazon SageMaker and other cloud-based ML tools to build, train, and test your models before deploying them to your IoT devices. To learn more about SageMaker, read Amazon SageMaker – Accelerated Machine Learning.
Model Deployment – SageMaker models can (if you give them the proper IAM permissions) be referenced directly from your Greengrass groups. You can also make use of models stored in S3 buckets. You can add a new machine learning resource to a group with a couple of clicks:
Amazon SageMaker continues to iterate quickly and release new features on behalf of customers. Starting today, SageMaker adds support for many new instance types, local testing with the SDK, and Apache MXNet 1.1.0 and Tensorflow 1.6.0. Let’s take a quick look at each of these updates.
New Instance Types
Amazon SageMaker customers now have additional options for right-sizing their workloads for notebooks, training, and hosting. Notebook instances now support almost all T2, M4, P2, and P3 instance types with the exception of t2.micro, t2.small, and m4.large instances. Model training now supports nearly all M4, M5, C4, C5, P2, and P3 instances with the exception of m4.large, c4.large, and c5.large instances. Finally, model hosting now supports nearly all T2, M4, M5, C4, C5, P2, and P3 instances with the exception of m4.large instances. Many customers can take advantage of the newest P3, C5, and M5 instances to get the best price/performance for their workloads. Customers also take advantage of the burstable compute model on T2 instances for endpoints or notebooks that are used less frequently.
Open Sourced Containers, Local Mode, and TensorFlow 1.6.0 and MXNet 1.1.0
Today Amazon SageMaker has open sourced the MXNet and Tensorflow deep learning containers that power the MXNet and Tensorflow estimators in the SageMaker SDK. The ability to write Python scripts that conform to simple interface is still one of my favorite SageMaker features and now those containers can be additionally customized to include any additional libraries. You can download these containers locally to iterate and experiment which can accelerate your debugging cycle. When you’re ready go from local testing to production training and hosting you just change one line of code.
These containers launch with support for Tensorflow 1.6.0 and MXNet 1.1.0 as well. Tensorflow has a number of new 1.6.0 features including support for CUDA 9.0, cuDNN 7, and AVX instructions which allows for significant speedups in many training applications. MXNet 1.1.0 adds a number of new features including a Text API mxnet.text with support for text processing, indexing, glossaries, and more. Two of the really cool pre-trained embeddings included are GloVe and fastText. <
Available Now All of the features mentioned above are available today. As always please let us know on Twitter or in the comments below if you have any questions or if you’re building something interesting. Now, if you’ll excuse me I’m going to go experiment with some of those new MXNet APIs!
The video is a demo of my “Not Santa” detector that I deployed to the Raspberry Pi. I trained the detector using deep learning, Keras, and Python. You can find the full source code and tutorial here: https://www.pyimagesearch.com/2017/12/18/keras-deep-learning-raspberry-pi/
Ho-ho-how does it work?
Note: Adrian Rosebrock is not Santa. But he does a good enough impression of the jolly old fellow that his disguise can fool a Raspberry Pi into thinking otherwise.
We jest, but has anyone seen Adrian and Santa in the same room together? Image c/o Adrian Rosebrock
But how is the Raspberry Pi able to detect the Santa-ness or Not-Santa-ness of people who walk into the frame?
Two words: deep learning
If you’re not sure what deep learning is, you’re not alone. It’s a hefty topic, and one that Adrian has written a book about, so I grilled him for a bluffers’ guide. In his words, deep learning is:
…a subfield of machine learning, which is, in turn a subfield of artificial intelligence (AI). While AI embodies a large, diverse set of techniques and algorithms related to automatic reasoning (inference, planning, heuristics, etc), the machine learning subfields are specifically interested in pattern recognition and learning from data.
Artificial Neural Networks (ANNs) are a class of machine learning algorithms that can learn from data. We have been using ANNs successfully for over 60 years, but something special happened in the past 5 years — (1) we’ve been able to accumulate massive datasets, orders of magnitude larger than previous datasets, and (2) we have access to specialized hardware to train networks faster (i.e., GPUs).
Given these large datasets and specialized hardware, deeper neural networks can be trained, leading to the term “deep learning”.
So now we have a bird’s-eye view of deep learning, how does the detector detect?
Components for Santa detection Image c/o Adrian Rosebrock
The camera captures footage of Santa in the wild, while the Christmas tree add-on provides a twinkly notification, accompanied by a resonant ho, ho, ho from the speakers.
A deeper deep dive into deep learning
A full breakdown of the project and the workings of the Not Santa detector can be found on Adrian’s blog, PyImageSearch, which includes links to other deep learning and image classification tutorials using TensorFlow and Keras. It’s an excellent place to start if you’d like to understand more about deep learning.
Build your own Santa detector
Santa might catch on to Adrian’s clever detector and start avoiding the camera, and for that eventuality, we have our own Santa detector. It uses motion detection to notify you of his presence (and your presents!).
Check out our Santa Detector resource here and use a passive infrared sensor, Raspberry Pi, and Scratch to catch the big man in action.
My colleague Sandy Carter delivered the Enterprise Innovation State of the Union last week at AWS re:Invent. She wrote the guest post below to recap the announcements that she made from the stage.
“I want my company to innovate, but I am not convinced we can execute successfully.” Far too many times I have heard this fear expressed by senior executives that I have met at different points in my career. In fact, a recent study published by Price Waterhouse Coopers found that while 93% of executives depend on innovation to drive growth, more than half are challenged to take innovative ideas to market quickly in a scalable way.
Many customers are struggling with how to drive enterprise innovation, so I was thrilled to share the stage at AWS re:Invent this past week with several senior executives who have successfully broken this mold to drive amazing enterprise innovation. In particular, I want to thank Parag Karnik from Johnson & Johnson, Bill Rothe from Hess Corporation, Dave Williams from Just Eat, and Olga Lagunova from Pitney Bowes for sharing their stories of innovation, creativity, and solid execution.
Among the many new announcements from AWS this past week, I am particularly excited about the following newly-launched AWS products and programs that I announced at re:Invent to drive new innovations by our enterprise customers:
AI: New Deep Learning Amazon Machine Image (AMI) on EC2 Windows As I shared at re:Invent, customers such as Infor are already successfully leveraging artificial intelligence tools on AWS to deliver tailored, industry-specific applications to their customers. We want to facilitate more of our Windows developers to get started quickly and easily with AI, leveraging machine learning based tools with popular deep learning frameworks, such as Apache MXNet, TensorFlow, and Caffe2. In order to enable this, I announced at re:Invent that AWS now offers a new Deep Learning AMI for Microsoft Windows. The AMI is tailored to facilitate large scale training of deep-learning models, and enables quick and easy setup of Windows Server-based compute resources for machine learning applications.
IoT: Visualize and Analyze SQL and IoT Data Forecasts show as many as 31 billion IoT devices by 2020. AWS wants every Windows customer to take advantage of the data available from their devices. Pitney Bowes, for example, now has more than 130,000 IoT devices streaming data to AWS. Using machine learning, Pitney Bowes enriches and analyzes data to enhance their customer experience, improve efficiencies, and create new data products. AWS IoT Analytics can now be leveraged to run analytics on IoT data and get insights that help you make better and more accurate decisions for IoT applications and machine learning use cases. AWS IoT Analytics can automatically enrich IoT device data with contextual metadata such as your SQL Server transactional data.
New Capabilities for .NET Developers on AWS In addition to all of the enhancements we’ve introduced to deliver a first class experience to Windows developers on AWS, we announced that we are including .NET Core 2.0 support in AWS Lambda and AWS CodeBuild, which will be available for broader use early next year. .NET Core 2.0 packs a number of new features such as Razor pages, better compatibility with .NET framework, more than double the number of APIs compared to the previous versions, and much more. With this announcement, you will be able to take advantage of all latest .NET Core features on Lambda and CodeBuild for building modern serverless and DevOps centric solutions.
License optimization for BYOL AWS provides you a wide variety of instance types and families that best meet your workload needs. If you are using software licensed by the number of vCPUs, you want the ability to further tweak vCPU count to optimize license spend. I announced the upcoming ability to optimize CPUs for EC2, giving you greater control over your EC2 instances on two fronts:
You can specify a custom number of vCPUs when launching new instances to save on vCPU based licensing costs. For example, SQL Server licensing spend.
You can disable Hyper-Threading Technology for workloads that perform well with single-threaded CPUs, like some high-performance computing (HPC) applications.
Using these capabilities, customers who bring their own license (BYOL) will be able to optimize their license usage and save on the license costs.
Server Migration Service for Hyper-V Virtual Machines As Bill Rothe from Hess Corporation shared at re:Invent, Hess has successfully migrated a wide range of workloads to the cloud, including SQL Server, SharePoint, SAP HANA, and many others. AWS Server Migration Service (SMS) now supports Hyper-V virtual machine (VM) migration, in order to further support enterprise migrations like these. AWS Server Migration Service will enable you to more easily co-ordinate large-scale server migrations from on-premise Hyper-V environments to AWS. AWS Server Migration Service allows you to automate, schedule, and track incremental replications of live server volumes. The replicated volumes are encrypted in transit and saved as a new Amazon Machine Image (AMI), which can be launched as an EC2 instance on AWS.
Microsoft Premier Support for AWS End-Customers I was pleased to announce that Microsoft and AWS have developed new areas of support integration to help ensure a great customer experience. Microsoft Premier Support is on board to help AWS assist end customers. AWS Support engineers can escalate directly to Microsoft Support on behalf of AWS customers running Microsoft workloads.
Best Practice Tools: HIPAA Compliance and Digital Innovation Workshop In November, we updated our HIPAA-focused white paper, outlining how you can use AWS to create HIPAA-compliant applications. In the first quarter of next year, we will publish a HIPAA Implementation Guide that expands on our HIPAA Quick Start to enable you to follow strict security, compliance, and risk management controls for common healthcare use cases. I was also pleased to award a Digital Innovation Workshop to one of our customers in my re:Invent session, and look forward to seeing more customers take advantage of this workshop.
AWS: The Continuous Innovation Cloud A common thread we see across customers is that continuous innovation from AWS enables their ongoing reinvention. Continuous innovation means that you are always getting a newer, better offering every single day. Sometimes it is in the form of brand new services and capabilities, and sometimes it is happening invisibly, under the covers where your environment just keeps getting better. I invite you to learn more about how you can accelerate your innovation journey with recently launched AWS services and AWS best practices. If you are migrating Windows workloads, speak with your AWS sales representative or an AWS Microsoft Workloads Competency Partner to learn how you can leverage our re:Think for Windows program for credits to start your migration.
Contributed by Tiffany Jernigan, Developer Advocate for Amazon ECS
Get ready for takeoff!
We made sure that this year’s re:Invent is chock-full of containers: there are over 40 sessions! New to containers? No problem, we have several introductory sessions for you to dip your toes. Been using containers for years and know the ins and outs? Don’t miss our technical deep-dives and interactive chalk talks led by container experts.
If you can’t make it to Las Vegas, you can catch the keynotes and session recaps from our livestream and on Twitch.
Session types
Not everyone learns the same way, so we have multiple types of breakout content:
Birds of a Feather An interactive discussion with industry leaders about containers on AWS.
Breakout sessions 60-minute presentations about building on AWS. Sessions are delivered by both AWS experts and customers and span all content levels.
Workshops 2.5-hour, hands-on sessions that teach how to build on AWS. AWS credits are provided. Bring a laptop, and have an active AWS account.
Chalk Talks 1-hour, highly interactive sessions with a smaller audience. They begin with a short lecture delivered by an AWS expert, followed by a discussion with the audience.
Session levels
Whether you’re new to containers or you’ve been using them for years, you’ll find useful information at every level.
Introductory Sessions are focused on providing an overview of AWS services and features, with the assumption that attendees are new to the topic.
Advanced Sessions dive deeper into the selected topic. Presenters assume that the audience has some familiarity with the topic, but may or may not have direct experience implementing a similar solution.
Expert Sessions are for attendees who are deeply familiar with the topic, have implemented a solution on their own already, and are comfortable with how the technology works across multiple services, architectures, and implementations.
Session locations
All container sessions are located in the Aria Resort.
MONDAY 11/27
Breakout sessions
Level 200 (Introductory)
CON202 – Getting Started with Docker and Amazon ECS By packaging software into standardized units, Docker gives code everything it needs to run, ensuring consistency from your laptop all the way into production. But once you have your code ready to ship, how do you run and scale it in the cloud? In this session, you become comfortable running containerized services in production using Amazon ECS. We cover container deployment, cluster management, service auto-scaling, service discovery, secrets management, logging, monitoring, security, and other core concepts. We also cover integrated AWS services and supplementary services that you can take advantage of to run and scale container-based services in the cloud.
Chalk talks
Level 200 (Introductory)
CON211 – Reducing your Compute Footprint with Containers and Amazon ECS Tomas Riha, platform architect for Volvo, shows how Volvo transitioned its WirelessCar platform from using Amazon EC2 virtual machines to containers running on Amazon ECS, significantly reducing cost. Tomas dives deep into the architecture that Volvo used to achieve the migration in under four months, including Amazon ECS, Amazon ECR, Elastic Load Balancing, and AWS CloudFormation.
CON212 – Anomaly Detection Using Amazon ECS, AWS Lambda, and Amazon EMR Learn about the architecture that Cisco CloudLock uses to enable automated security and compliance checks throughout the entire development lifecycle, from the first line of code through runtime. It includes integration with IAM roles, Amazon VPC, and AWS KMS.
Level 400 (Expert)
CON410 – Advanced CICD with Amazon ECS Control Plane Mohit Gupta, product and engineering lead for Clever, demonstrates how to extend the Amazon ECS control plane to optimize management of container deployments and how the control plane can be broadly applied to take advantage of new AWS services. This includes ark—an AWS CLI-based deployment to Amazon ECS, Dapple—a slack-based automation system for deployments and notifications, and Kayvee—log and event routing libraries based on Amazon Kinesis.
Workshops
Level 200 (Introductory)
CON209 – Interstella 8888: Learn How to Use Docker on AWS Interstella 8888 is an intergalactic trading company that deals in rare resources, but their antiquated monolithic logistics systems are causing the business to lose money. Join this workshop to get hands-on experience with Docker as you containerize Interstella 8888’s aging monolithic application and deploy it using Amazon ECS.
CON213 – Hands-on Deployment of Kubernetes on AWS In this workshop, attendees get hands-on experience using Kubernetes and Kops (Kubernetes Operations), as described in our recent blog post. Attendees learn how to provision a cluster, assign role-based permissions and security, and launch a container. If you’re interested in learning best practices for running Kubernetes on AWS, don’t miss this workshop.
TUESDAY 11/28
Breakout Sessions
Level 200 (Introductory)
CON206 – Docker on AWS In this session, Docker Technical Staff Member Patrick Chanezon discusses how Finnish Rail, the national train system for Finland, is using Docker on Amazon Web Services to modernize their customer facing applications, from ticket sales to reservations. Patrick also shares the state of Docker development and adoption on AWS, including explaining the opportunities and implications of efforts such as Project Moby, Docker EE, and how developers can use and contribute to Docker projects.
CON208 – Building Microservices on AWS Increasingly, organizations are turning to microservices to help them empower autonomous teams, letting them innovate and ship software faster than ever before. But implementing a microservices architecture comes with a number of new challenges that need to be dealt with. Chief among these finding an appropriate platform to help manage a growing number of independently deployable services. In this session, Sam Newman, author of Building Microservices and a renowned expert in microservices strategy, discusses strategies for building scalable and robust microservices architectures. He also tells you how to choose the right platform for building microservices, and about common challenges and mistakes organizations make when they move to microservices architectures.
Level 300 (Advanced)
CON302 – Building a CICD Pipeline for Containers on AWS Containers can make it easier to scale applications in the cloud, but how do you set up your CICD workflow to automatically test and deploy code to containerized apps? In this session, we explore how developers can build effective CICD workflows to manage their containerized code deployments on AWS.
Ajit Zadgaonkar, Director of Engineering and Operations at Edmunds walks through best practices for CICD architectures used by his team to deploy containers. We also deep dive into topics such as how to create an accessible CICD platform and architect for safe blue/green deployments.
CON307 – Building Effective Container Images Sick of getting paged at 2am and wondering “where did all my disk space go?” New Docker users often start with a stock image in order to get up and running quickly, but this can cause problems as your application matures and scales. Creating efficient container images is important to maximize resources, and deliver critical security benefits.
In this session, AWS Sr. Technical Evangelist Abby Fuller covers how to create effective images to run containers in production. This includes an in-depth discussion of how Docker image layers work, things you should think about when creating your images, working with Amazon ECR, and mise-en-place for install dependencies. Prakash Janakiraman, Co-Founder and Chief Architect at Nextdoor discuss high-level and language-specific best practices for with building images and how Nextdoor uses these practices to successfully scale their containerized services with a small team.
CON309 – Containerized Machine Learning on AWS Image recognition is a field of deep learning that uses neural networks to recognize the subject and traits for a given image. In Japan, Cookpad uses Amazon ECS to run an image recognition platform on clusters of GPU-enabled EC2 instances. In this session, hear from Cookpad about the challenges they faced building and scaling this advanced, user-friendly service to ensure high-availability and low-latency for tens of millions of users.
CON320 – Monitoring, Logging, and Debugging for Containerized Services As containers become more embedded in the platform tools, debug tools, traces, and logs become increasingly important. Nare Hayrapetyan, Senior Software Engineer and Calvin French-Owen, Senior Technical Officer for Segment discuss the principals of monitoring and debugging containers and the tools Segment has implemented and built for logging, alerting, metric collection, and debugging of containerized services running on Amazon ECS.
Chalk Talks
Level 300 (Advanced)
CON314 – Automating Zero-Downtime Production Cluster Upgrades for Amazon ECS Containers make it easy to deploy new code into production to update the functionality of a service, but what happens when you need to update the Amazon EC2 compute instances that your containers are running on? In this talk, we’ll deep dive into how to upgrade the Amazon EC2 infrastructure underlying a live production Amazon ECS cluster without affecting service availability. Matt Callanan, Engineering Manager at Expedia walk through Expedia’s “PRISM” project that safely relocates hundreds of tasks onto new Amazon EC2 instances with zero-downtime to applications.
CON322 – Maximizing Amazon ECS for Large-Scale Workloads Head of Mobfox DevOps, David Spitzer, shows how Mobfox used Docker and Amazon ECS to scale the Mobfox services and development teams to achieve low-latency networking and automatic scaling. This session covers Mobfox’s ecosystem architecture. It compares 2015 and today, the challenges Mobfox faced in growing their platform, and how they overcame them.
CON323 – Microservices Architectures for the Enterprise Salva Jung, Principle Engineer for Samsung Mobile shares how Samsung Connect is architected as microservices running on Amazon ECS to securely, stably, and efficiently handle requests from millions of mobile and IoT devices around the world.
CON324 – Windows Containers on Amazon ECS Docker containers are commonly regarded as powerful and portable runtime environments for Linux code, but Docker also offers API and toolchain support for running Windows Servers in containers. In this talk, we discuss the various options for running windows-based applications in containers on AWS.
CON326 – Remote Sensing and Image Processing on AWS Learn how Encirca services by DuPont Pioneer uses Amazon ECS powered by GPU-instances and Amazon EC2 Spot Instances to run proprietary image-processing algorithms against satellite imagery. Mark Lanning and Ethan Harstad, engineers at DuPont Pioneer show how this architecture has allowed them to process satellite imagery multiple times a day for each agricultural field in the United States in order to identify crop health changes.
Workshops
Level 300 (Advanced)
CON317 – Advanced Container Management at Catsndogs.lol Catsndogs.lol is a (fictional) company that needs help deploying and scaling its container-based application. During this workshop, attendees join the new DevOps team at CatsnDogs.lol, and help the company to manage their applications using Amazon ECS, and help release new features to make our customers happier than ever.Attendees get hands-on with service and container-instance auto-scaling, spot-fleet integration, container placement strategies, service discovery, secrets management with AWS Systems Manager Parameter Store, time-based and event-based scheduling, and automated deployment pipelines. If you are a developer interested in learning more about how Amazon ECS can accelerate your application development and deployment workflows, or if you are a systems administrator or DevOps person interested in understanding how Amazon ECS can simplify the operational model associated with running containers at scale, then this workshop is for you. You should have basic familiarity with Amazon ECS, Amazon EC2, and IAM.
Additional requirements:
The AWS CLI or AWS Tools for PowerShell installed
An AWS account with administrative permissions (including the ability to create IAM roles and policies) created at least 24 hours in advance.
WEDNESDAY 11/29
Birds of a Feather (BoF)
CON01 – Birds of a Feather: Containers and Open Source at AWS Cloud native architectures take advantage of on-demand delivery, global deployment, elasticity, and higher-level services to enable developer productivity and business agility. Open source is a core part of making cloud native possible for everyone. In this session, we welcome thought leaders from the CNCF, Docker, and AWS to discuss the cloud’s direction for growth and enablement of the open source community. We also discuss how AWS is integrating open source code into its container services and its contributions to open source projects.
Breakout Sessions
Level 300 (Advanced)
CON308 – Mastering Kubernetes on AWS Much progress has been made on how to bootstrap a cluster since Kubernetes’ first commit and is now only a matter of minutes to go from zero to a running cluster on Amazon Web Services. However, evolving a simple Kubernetes architecture to be ready for production in a large enterprise can quickly become overwhelming with options for configuration and customization.
In this session, Arun Gupta, Open Source Strategist for AWS and Raffaele Di Fazio, software engineer at leading European fashion platform Zalando, show the common practices for running Kubernetes on AWS and share insights from experience in operating tens of Kubernetes clusters in production on AWS. We cover options and recommendations on how to install and manage clusters, configure high availability, perform rolling upgrades and handle disaster recovery, as well as continuous integration and deployment of applications, logging, and security.
CON310 – Moving to Containers: Building with Docker and Amazon ECS If you’ve ever considered moving part of your application stack to containers, don’t miss this session. We cover best practices for containerizing your code, implementing automated service scaling and monitoring, and setting up automated CI/CD pipelines with fail-safe deployments. Manjeeva Silva and Thilina Gunasinghe show how McDonalds implemented their home delivery platform in four months using Docker containers and Amazon ECS to serve tens of thousands of customers.
Level 400 (Expert)
CON402 – Advanced Patterns in Microservices Implementation with Amazon ECS Scaling a microservice-based infrastructure can be challenging in terms of both technical implementation and developer workflow. In this talk, AWS Solutions Architect Pierre Steckmeyer is joined by Will McCutchen, Architect at BuzzFeed, to discuss Amazon ECS as a platform for building a robust infrastructure for microservices. We look at the key attributes of microservice architectures and how Amazon ECS supports these requirements in production, from configuration to sophisticated workload scheduling to networking capabilities to resource optimization. We also examine what it takes to build an end-to-end platform on top of the wider AWS ecosystem, and what it’s like to migrate a large engineering organization from a monolithic approach to microservices.
CON404 – Deep Dive into Container Scheduling with Amazon ECS As your application’s infrastructure grows and scales, well-managed container scheduling is critical to ensuring high availability and resource optimization. In this session, we deep dive into the challenges and opportunities around container scheduling, as well as the different tools available within Amazon ECS and AWS to carry out efficient container scheduling. We discuss patterns for container scheduling available with Amazon ECS, the Blox scheduling framework, and how you can customize and integrate third-party scheduler frameworks to manage container scheduling on Amazon ECS.
Chalk Talks
Level 300 (Advanced)
CON312 – Building a Selenium Fleet on the Cheap with Amazon ECS with Spot Fleet Roberto Rivera and Matthew Wedgwood, engineers at RetailMeNot, give a practical overview of setting up a fleet of Selenium nodes running on Amazon ECS with Spot Fleet. Discuss the challenges of running Selenium with high availability at minimum cost using Amazon ECS container introspection to connect the Selenium Hub with its nodes.
CON315 – Virtually There: Building a Render Farm with Amazon ECS Learn how 8i Corp scales its multi-tenanted, volumetric render farm up to thousands of instances using AWS, Docker, and an API-driven infrastructure. This render farm enables them to turn the video footage from an array of synchronized cameras into a photo-realistic hologram capable of playback on a range of devices, from mobile phones to high-end head mounted displays. Join Owen Evans, VP of Engineering for 8i, as they dive deep into how 8i’s rendering infrastructure is built and maintained by just a handful of people and powered by Amazon ECS.
CON325 – Developing Microservices – from Your Laptop to the Cloud Wesley Chow, Staff Engineer at Adroll, shows how his team extends Amazon ECS by enabling local development capabilities. Hologram, Adroll’s local development program, brings the capabilities of the Amazon EC2 instance metadata service to non-EC2 hosts, so that developers can run the same software on local machines with the same credentials source as in production.
CON327 – Patterns and Considerations for Service Discovery Roven Drabo, head of cloud operations at Kaplan Test Prep, illustrates Kaplan’s complete container automation solution using Amazon ECS along with how his team uses NGINX and HashiCorp Consul to provide an automated approach to service discovery and container provisioning.
CON328 – Building a Development Platform on Amazon ECS Quinton Anderson, Head of Engineering for Commonwealth Bank of Australia, walks through how they migrated their internal development and deployment platform from Mesos/Marathon to Amazon ECS. The platform uses a custom DSL to abstract a layered application architecture, in a way that makes it easy to plug or replace new implementations into each layer in the stack.
Workshops
Level 300 (Advanced)
CON318 – Interstella 8888: Monolith to Microservices with Amazon ECS Interstella 8888 is an intergalactic trading company that deals in rare resources, but their antiquated monolithic logistics systems are causing the business to lose money. Join this workshop to get hands-on experience deploying Docker containers as you break Interstella 8888’s aging monolithic application into containerized microservices. Using Amazon ECS and an Application Load Balancer, you create API-based microservices and deploy them leveraging integrations with other AWS services.
CON332 – Build a Java Spring Application on Amazon ECS This workshop teaches you how to lift and shift existing Spring and Spring Cloud applications onto the AWS platform. Learn how to build a Spring application container, understand bootstrap secrets, push container images to Amazon ECR, and deploy the application to Amazon ECS. Then, learn how to configure the deployment for production.
THURSDAY 11/30
Breakout Sessions
Level 200 (Introductory)
CON201 – Containers on AWS – State of the Union Just over four years after the first public release of Docker, and three years to the day after the launch of Amazon ECS, the use of containers has surged to run a significant percentage of production workloads at startups and enterprise organizations. Join Deepak Singh, General Manager of Amazon Container Services, as he covers the state of containerized application development and deployment trends, new container capabilities on AWS that are available now, options for running containerized applications on AWS, and how AWS customers successfully run container workloads in production.
Level 300 (Advanced)
CON304 – Batch Processing with Containers on AWS Batch processing is useful to analyze large amounts of data. But configuring and scaling a cluster of virtual machines to process complex batch jobs can be difficult. In this talk, we show how to use containers on AWS for batch processing jobs that can scale quickly and cost-effectively. We also discuss AWS Batch, our fully managed batch-processing service. You also hear from GoPro and Here about how they use AWS to run batch processing jobs at scale including best practices for ensuring efficient scheduling, fine-grained monitoring, compute resource automatic scaling, and security for your batch jobs.
Level 400 (Expert)
CON406 – Architecting Container Infrastructure for Security and Compliance While organizations gain agility and scalability when they migrate to containers and microservices, they also benefit from compliance and security, advantages that are often overlooked. In this session, Kelvin Zhu, lead software engineer at Okta, joins Mitch Beaumont, enterprise solutions architect at AWS, to discuss security best practices for containerized infrastructure. Learn how Okta built their development workflow with an emphasis on security through testing and automation. Dive deep into how containers enable automated security and compliance checks throughout the development lifecycle. Also understand best practices for implementing AWS security and secrets management services for any containerized service architecture.
Chalk Talks
Level 300 (Advanced)
CON329 – Full Software Lifecycle Management for Containers Running on Amazon ECS Learn how The Washington Post uses Amazon ECS to run Arc Publishing, a digital journalism platform that powers The Washington Post and a growing number of major media websites. Amazon ECS enabled The Washington Post to containerize their existing microservices architecture, avoiding a complete rewrite that would have delayed the platform’s launch by several years. In this session, Jason Bartz, Technical Architect at The Washington Post, discusses the platform’s architecture. He addresses the challenges of optimizing Arc Publishing’s workload, and managing the application lifecycle to support 2,000 containers running on more than 50 Amazon ECS clusters.
CON330 – Running Containerized HIPAA Workloads on AWS Nihar Pasala, Engineer at Aetion, discusses the Aetion Evidence Platform, a system for generating the real-world evidence used by healthcare decision makers to implement value-based care. This session discusses the architecture Aetion uses to run HIPAA workloads using containers on Amazon ECS, best practices, and learnings.
Level 400 (Expert)
CON408 – Building a Machine Learning Platform Using Containers on AWS DeepLearni.ng develops and implements machine learning models for complex enterprise applications. In this session, Thomas Rogers, Engineer for DeepLearni.ng discusses how they worked with Scotiabank to leverage Amazon ECS, Amazon ECR, Docker, GPU-accelerated Amazon EC2 instances, and TensorFlow to develop a retail risk model that helps manage payment collections for millions of Canadian credit card customers.
Workshops
Level 300 (Advanced)
CON319 – Interstella 8888: CICD for Containers on AWS Interstella 8888 is an intergalactic trading company that deals in rare resources, but their antiquated monolithic logistics systems are causing the business to lose money. Join this workshop to learn how to set up a CI/CD pipeline for containerized microservices. You get hands-on experience deploying Docker container images using Amazon ECS, AWS CloudFormation, AWS CodeBuild, and AWS CodePipeline, automating everything from code check-in to production.
FRIDAY 12/1
Breakout Sessions
Level 400 (Expert)
CON405 – Moving to Amazon ECS – the Not-So-Obvious Benefits If you ask 10 teams why they migrated to containers, you will likely get answers like ‘developer productivity’, ‘cost reduction’, and ‘faster scaling’. But teams often find there are several other ‘hidden’ benefits to using containers for their services. In this talk, Franziska Schmidt, Platform Engineer at Mapbox and Yaniv Donenfeld from AWS will discuss the obvious, and not so obvious benefits of moving to containerized architecture. These include using Docker and Amazon ECS to achieve shared libraries for dev teams, separating private infrastructure from shareable code, and making it easier for non-ops engineers to run services.
Chalk Talks
Level 300 (Advanced)
CON331 – Deploying a Regulated Payments Application on Amazon ECS Travelex discusses how they built an FCA-compliant international payments service using a microservices architecture on AWS. This chalk talk covers the challenges of designing and operating an Amazon ECS-based PaaS in a regulated environment using a DevOps model.
Workshops
Level 400 (Expert)
CON407 – Interstella 8888: Advanced Microservice Operations Interstella 8888 is an intergalactic trading company that deals in rare resources, but their antiquated monolithic logistics systems are causing the business to lose money. In this workshop, you help Interstella 8888 build a modern microservices-based logistics system to save the company from financial ruin. We give you the hands-on experience you need to run microservices in the real world. This includes implementing advanced container scheduling and scaling to deal with variable service requests, implementing a service mesh, issue tracing with AWS X-Ray, container and instance-level logging with Amazon CloudWatch, and load testing.
Know before you go
Want to brush up on your container knowledge before re:Invent? Here are some helpful resources to get started:
Driven by customer demand and made possible by on-going advances in the state-of-the-art, we’ve come a long way since the original m1.small instance that we launched in 2006, with instances that are emphasize compute power, burstable performance, memory size, local storage, and accelerated computing.
The New P3 Today we are making the next generation of GPU-powered EC2 instances available in four AWS regions. Powered by up to eight NVIDIA Tesla V100 GPUs, the P3 instances are designed to handle compute-intensive machine learning, deep learning, computational fluid dynamics, computational finance, seismic analysis, molecular modeling, and genomics workloads.
P3 instances use customized Intel Xeon E5-2686v4 processors running at up to 2.7 GHz. They are available in three sizes (all VPC-only and EBS-only):
Model
NVIDIA Tesla V100 GPUs
GPU Memory
NVIDIA NVLink
vCPUs
Main Memory
Network Bandwidth
EBS Bandwidth
p3.2xlarge
1
16 GiB
n/a
8
61 GiB
Up to 10 Gbps
1.5 Gbps
p3.8xlarge
4
64 GiB
200 GBps
32
244 GiB
10 Gbps
7 Gbps
p3.16xlarge
8
128 GiB
300 GBps
64
488 GiB
25 Gbps
14 Gbps
Each of the NVIDIA GPUs is packed with 5,120 CUDA cores and another 640 Tensor cores and can deliver up to 125 TFLOPS of mixed-precision floating point, 15.7 TFLOPS of single-precision floating point, and 7.8 TFLOPS of double-precision floating point. On the two larger sizes, the GPUs are connected together via NVIDIA NVLink 2.0 running at a total data rate of up to 300 GBps. This allows the GPUs to exchange intermediate results and other data at high speed, without having to move it through the CPU or the PCI-Express fabric.
What’s a Tensor Core? I had not heard the term Tensor core before starting to write this post. According to this very helpful post on the NVIDIA Blog, Tensor cores are designed to speed up the training and inference of large, deep neural networks. Each core is able to quickly and efficiently multiply a pair of 4×4 half-precision (also known as FP16) matrices together, add the resulting 4×4 matrix to another half or single-precision (FP32) matrix, and store the resulting 4×4 matrix in either half or single-precision form. Here’s a diagram from NVIDIA’s blog post:
This operation is in the innermost loop of the training process for a deep neural network, and is an excellent example of how today’s NVIDIA GPU hardware is purpose-built to address a very specific market need. By the way, the mixed-precision qualifier on the Tensor core performance means that it is flexible enough to work with with a combination of 16-bit and 32-bit floating point values.
Performance in Perspective I always like to put raw performance numbers into a real-world perspective so that they are easier to relate to and (hopefully) more meaningful. This turned out to be surprisingly difficult, given that the eight NVIDIA Tesla V100 GPUs on a single p3.16xlarge can do 125 trillion single-precision floating point multiplications per second.
Let’s go back to the dawn of the microprocessor era, and consider the Intel 8080A chip that powered the MITS Altair that I bought in the summer of 1977. With a 2 MHz clock, it was able to do about 832 multiplications per second (I used this data and corrected it for the faster clock speed). The p3.16xlarge is roughly 150 billion times faster. However, just 1.2 billion seconds have gone by since that summer. In other words, I can do 100x more calculations today in one second than my Altair could have done in the last 40 years!
What about the innovative 8087 math coprocessor that was an optional accessory for the IBM PC that was announced in the summer of 1981? With a 5 MHz clock and purpose-built hardware, it was able to do about 52,632 multiplications per second. 1.14 billion seconds have elapsed since then, p3.16xlarge is 2.37 billion times faster, so the poor little PC would be barely halfway through a calculation that would run for 1 second today.
Ok, how about a Cray-1? First delivered in 1976, this supercomputer was able to perform vector operations at 160 MFLOPS, making the p3.x16xlarge 781,000 times faster. It could have iterated on some interesting problem 1500 times over the years since it was introduced.
Comparisons between the P3 and today’s scale-out supercomputers are harder to make, given that you can think of the P3 as a step-and-repeat component of a supercomputer that you can launch on as as-needed basis.
Run One Today In order to take full advantage of the NVIDIA Tesla V100 GPUs and the Tensor cores, you will need to use CUDA 9 and cuDNN7. These drivers and libraries have already been added to the newest versions of the Windows AMIs and will be included in an updated Amazon Linux AMI that is scheduled for release on November 7th. New packages are already available in our repos if you want to to install them on your existing Amazon Linux AMI.
The newest AWS Deep Learning AMIs come preinstalled with the latest releases of Apache MxNet, Caffe2, and Tensorflow (each with support for the NVIDIA Tesla V100 GPUs), and will be updated to support P3 instances with other machine learning frameworks such as Microsoft Cognitive Toolkit and PyTorch as soon as these frameworks release support for the NVIDIA Tesla V100 GPUs. You can also use the NVIDIA Volta Deep Learning AMI for NGC.
P3 instances are available in the US East (Northern Virginia), US West (Oregon), EU (Ireland), and Asia Pacific (Tokyo) Regions in On-Demand, Spot, Reserved Instance, and Dedicated Host form.
Today, AWS and Microsoft announced Gluon, a new open source deep learning interface which allows developers to more easily and quickly build machine learning models, without compromising performance.
Gluon provides a clear, concise API for defining machine learning models using a collection of pre-built, optimized neural network components. Developers who are new to machine learning will find this interface more familiar to traditional code, since machine learning models can be defined and manipulated just like any other data structure. More seasoned data scientists and researchers will value the ability to build prototypes quickly and utilize dynamic neural network graphs for entirely new model architectures, all without sacrificing training speed.
Gluon is available in Apache MXNet today, a forthcoming Microsoft Cognitive Toolkit release, and in more frameworks over time.
Neural Networks vs Developers Machine learning with neural networks (including ‘deep learning’) has three main components: data for training; a neural network model, and an algorithm which trains the neural network. You can think of the neural network in a similar way to a directed graph; it has a series of inputs (which represent the data), which connect to a series of outputs (the prediction), through a series of connected layers and weights. During training, the algorithm adjusts the weights in the network based on the error in the network output. This is the process by which the network learns; it is a memory and compute intensive process which can take days.
Deep learning frameworks such as Caffe2, Cognitive Toolkit, TensorFlow, and Apache MXNet are, in part, an answer to the question ‘how can we speed this process up? Just like query optimizers in databases, the more a training engine knows about the network and the algorithm, the more optimizations it can make to the training process (for example, it can infer what needs to be re-computed on the graph based on what else has changed, and skip the unaffected weights to speed things up). These frameworks also provide parallelization to distribute the computation process, and reduce the overall training time.
However, in order to achieve these optimizations, most frameworks require the developer to do some extra work: specifically, by providing a formal definition of the network graph, up-front, and then ‘freezing’ the graph, and just adjusting the weights.
The network definition, which can be large and complex with millions of connections, usually has to be constructed by hand. Not only are deep learning networks unwieldy, but they can be difficult to debug and it’s hard to re-use the code between projects.
The result of this complexity can be difficult for beginners and is a time-consuming task for more experienced researchers. At AWS, we’ve been experimenting with some ideas in MXNet around new, flexible, more approachable ways to define and train neural networks. Microsoft is also a contributor to the open source MXNet project, and were interested in some of these same ideas. Based on this, we got talking, and found we had a similar vision: to use these techniques to reduce the complexity of machine learning, making it accessible to more developers.
Enter Gluon: dynamic graphs, rapid iteration, scalable training Gluon introduces four key innovations.
Friendly API: Gluon networks can be defined using a simple, clear, concise code – this is easier for developers to learn, and much easier to understand than some of the more arcane and formal ways of defining networks and their associated weighted scoring functions.
Dynamic networks: the network definition in Gluon is dynamic: it can bend and flex just like any other data structure. This is in contrast to the more common, formal, symbolic definition of a network which the deep learning framework has to effectively carve into stone in order to be able to effectively optimizing computation during training. Dynamic networks are easier to manage, and with Gluon, developers can easily ‘hybridize’ between these fast symbolic representations and the more friendly, dynamic ‘imperative’ definitions of the network and algorithms.
The algorithm can define the network: the model and the training algorithm are brought much closer together. Instead of separate definitions, the algorithm can adjust the network dynamically during definition and training. Not only does this mean that developers can use standard programming loops, and conditionals to create these networks, but researchers can now define even more sophisticated algorithms and models which were not possible before. They are all easier to create, change, and debug.
High performance operators for training: which makes it possible to have a friendly, concise API and dynamic graphs, without sacrificing training speed. This is a huge step forward in machine learning. Some frameworks bring a friendly API or dynamic graphs to deep learning, but these previous methods all incur a cost in terms of training speed. As with other areas of software, abstraction can slow down computation since it needs to be negotiated and interpreted at run time. Gluon can efficiently blend together a concise API with the formal definition under the hood, without the developer having to know about the specific details or to accommodate the compiler optimizations manually.
The team here at AWS, and our collaborators at Microsoft, couldn’t be more excited to bring these improvements to developers through Gluon. We’re already seeing quite a bit of excitement from developers and researchers alike.
Getting started with Gluon Gluon is available today in Apache MXNet, with support coming for the Microsoft Cognitive Toolkit in a future release. We’re also publishing the front-end interface and the low-level API specifications so it can be included in other frameworks in the fullness of time.
You can get started with Gluon today. Fire up the AWS Deep Learning AMI with a single click and jump into one of 50 fully worked, notebook examples. If you’re a contributor to a machine learning framework, check out the interface specs on GitHub.
If you are anything like me, Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning are completely fascinating and exciting topics. As AI, ML, and Deep Learning become more widely used, for me it means that the science fiction written by Dr. Issac Asimov, the robotics and medical advancements in Star Wars, and the technologies that enabled Captain Kirk and his Star Trek crew “to boldly go where no man has gone before” can become achievable realities.
Most people interested in the aforementioned topics are familiar with the AI and ML solutions enabled by Deep Learning, such as Convolutional Neural Networks for Image and Video Classification, Speech Recognition, Natural Language interfaces, and Recommendation Engines. However, it is not always an easy task setting up the infrastructure, environment, and tools to enable data scientists, machine learning practitioners, research scientists, and deep learning hobbyists/advocates to dive into these technologies. Most developers desire to go quickly from getting started with deep learning to training models and developing solutions using deep learning technologies.
For these reasons, I would like to share some resources that will help to quickly build deep learning solutions whether you are an experienced data scientist or a curious developer wanting to get started.
Deep Learning Resources
The Apache MXNet is Amazon’s deep learning framework of choice. With the power of Apache MXNet framework and NVIDIA GPU computing, you can launch your scalable deep learning projects and solutions easily on the AWS Cloud. As you get started on your MxNet deep learning quest, there are a variety of self-service tutorials and datasets available to you:
Launch an AWS Deep Learning AMI: This guide walks you through the steps to launch the AWS Deep Learning AMI with Ubuntu
MXNet – Create a computer vision application: This hands-on tutorial uses a pre-built notebook to walk you through using neural networks to build a computer vision application to identify handwritten digits
AWS Machine Learning Datasets: AWS hosts datasets for Machine Learning on the AWS Marketplace that you can access for free. These large datasets are available for anyone to analyze the data without requiring the data to be downloaded or stored.
Predict and Extract – Learn to use pre-trained models for predictions: This hands-on tutorial will walk you through how to use pre-trained model for predicting and feature extraction using the full Imagenet dataset.
AWS Deep Learning AMIs
AWS offers Amazon Machine Images (AMIs) for use on Amazon EC2 for quick deployment of an infrastructure needed to start your deep learning journey. The AWS Deep Learning AMIsare pre-configured with popular deep learning frameworks built using Amazon EC2 instances on Amazon Linux, and Ubuntu that can be launched for AI targeted solutions and models. The deep learning frameworks supported and pre-configured on the deep learning AMI are:
Apache MXNet
TensorFlow
Microsoft Cognitive Toolkit (CNTK)
Caffe
Caffe2
Theano
Torch
Keras
Additionally, the AWS Deep Learning AMIs install preconfigured libraries for Jupyter notebooks with Python 2.7/3.4, AWS SDK for Python, and other data science related python packages and dependencies. The AMIs also come with NVIDIA CUDA and NVIDIA CUDA Deep Neural Network (cuDNN) libraries preinstalled with all the supported deep learning frameworks and the Intel Math Kernel Library is installed for Apache MXNet framework. You can launch any of the Deep Learning AMIs by visiting the AWS Marketplace using the Try the Deep Learning AMIs link.
Summary
It is a great time to dive into Deep Learning. You can accelerate your work in deep learning by using the AWS Deep Learning AMIs running on the AWS cloud to get your deep learning environment running quickly or get started learning more about Deep Learning on AWS with MXNet using the AWS self-service resources. Of course, you can learn even more information about Deep Learning, Machine Learning, and Artificial Intelligence on AWS by reviewing the AWS Deep Learning page, the Amazon AI product page, and the AWS AI Blog.
We’re excited to co-host the 10th Annual Hadoop Summit, the leading conference for the Apache Hadoop community, taking place on June 13 – 15 at the San Jose Convention Center. In the last few years, the Hadoop Summit has expanded to cover all things data beyond just Apache Hadoop – such as data science, cloud and operations, IoT and applications – and has been aptly renamed the DataWorks Summit. The three-day program is bursting at the seams! Here are just a few of the reasons why you cannot miss this must-attend event:
Familiarize yourself with the cutting edge in Apache project developments from the committers
Learn from your peers and industry experts about innovative and real-world use cases, development and administration tips and tricks, success stories and best practices to leverage all your data – on-premise and in the cloud – to drive predictive analytics, distributed deep-learning and artificial intelligence initiatives
Attend one of our more than 170 technical deep dive breakout sessions from nearly 200 speakers across eight tracks
Check out our keynotes, meetups, trainings, technical crash courses, birds-of-a-feather sessions, Women in Big Data and more
Similar to previous years, we look forward to continuing Yahoo’s decade-long tradition of thought leadership at this year’s summit. Join us for an in-depth look at Yahoo’s Hadoop culture and for the latest in technologies such as Apache Tez, HBase, Hive, Data Highway Rainbow, Mail Data Warehouse and Distributed Deep Learning at the breakout sessions below. Or, stop by Yahoo kiosk #700 at the community showcase.
Also, as a co-host of the event, Yahoo is pleased to offer a 20% discount for the summit with the code MSPO20. Register here forHadoop Summit, San Jose, California!
Andy Feng – VP Architecture, Big Data and Machine Learning
Lee Yang – Sr. Principal Engineer
In this talk, we will introduce a new framework, TensorFlowOnSpark, for scalable TensorFlow learning, that was open sourced in Q1 2017. This new framework enables easy experimentation for algorithm designs, and supports scalable training & inferencing on Spark clusters. It supports all TensorFlow functionalities including synchronous & asynchronous learning, model & data parallelism, and TensorBoard. It provides architectural flexibility for data ingestion to TensorFlow and network protocols for server-to-server communication. With a few lines of code changes, an existing TensorFlow algorithm can be transformed into a scalable application.
2:10 – 2:50 P.M. Handling Kernel Upgrades at Scale – The Dirty Cow Story
Samy Gawande – Sr. Operations Engineer
Savitha Ravikrishnan – Site Reliability Engineer
Apache Hadoop at Yahoo is a massive platform with 36 different clusters spread across YARN, Apache HBase, and Apache Storm deployments, totaling 60,000 servers made up of 100s of different hardware configurations accumulated over generations, presenting unique operational challenges and a variety of unforeseen corner cases. In this talk, we will share methods, tips and tricks to deal with large scale kernel upgrade on heterogeneous platforms within tight timeframes with 100% uptime and no service or data loss through the Dirty COW use case (privilege escalation vulnerability found in the Linux Kernel in late 2016).
5:00 – 5:40 P.M. Data Highway Rainbow – Petabyte Scale Event Collection, Transport, and Delivery at Yahoo
Nilam Sharma – Sr. Software Engineer
Huibing Yin – Sr. Software Engineer
This talk presents the architecture and features of Data Highway Rainbow, Yahoo’s hosted multi-tenant infrastructure which offers event collection, transport and aggregated delivery as a service. Data Highway supports collection from multiple data centers & aggregated delivery in primary Yahoo data centers which provide a big data computing cluster. From a delivery perspective, Data Highway supports endpoints/sinks such as HDFS, Storm and Kafka; with Storm & Kafka endpoints tailored towards latency sensitive consumers.
DAY 2. WEDNESDAY June 14, 2017
9:05 – 9:15 A.M. Yahoo General Session – Shaping Data Platform for Lasting Value
Sumeet Singh – Sr. Director, Products
With a long history of open innovation with Hadoop, Yahoo continues to invest in and expand the platform capabilities by pushing the boundaries of what the platform can accomplish for the entire organization. In the last 11 years (yes, it is that old!), the Hadoop platform has shown no signs of giving up or giving in. In this talk, we explore what makes the shared multi-tenant Hadoop platform so special at Yahoo.
12:20 – 1:00 P.M. CaffeOnSpark Update – Recent Enhancements and Use Cases
Mridul Jain – Sr. Principal Engineer
Jun Shi – Principal Engineer
By combining salient features from deep learning framework Caffe and big-data frameworks Apache Spark and Apache Hadoop, CaffeOnSpark enables distributed deep learning on a cluster of GPU and CPU servers. We released CaffeOnSpark as an open source project in early 2016, and shared its architecture design and basic usage at Hadoop Summit 2016. In this talk, we will update audiences about the recent development of CaffeOnSpark. We will highlight new features and capabilities: unified data layer which multi-label datasets, distributed LSTM training, interleave testing with training, monitoring/profiling framework, and docker deployment.
12:20 – 1:00 P.M. Tez Shuffle Handler – Shuffling at Scale with Apache Hadoop
Jon Eagles – Principal Engineer
Kuhu Shukla – Software Engineer
In this talk we introduce a new Shuffle Handler for Tez, a YARN Auxiliary Service, that addresses the shortcomings and performance bottlenecks of the legacy MapReduce Shuffle Handler, the default shuffle service in Apache Tez. The Apache Tez Shuffle Handler adds composite fetch which has support for multi-partition fetch to mitigate performance slow down and provides deletion APIs to reduce disk usage for long running Tez sessions. As an emerging technology we will outline future roadmap for the Apache Tez Shuffle Handler and provide performance evaluation results from real world jobs at scale.
2:10 – 2:50 P.M. Achieving HBase Multi-Tenancy with RegionServer Groups and Favored Nodes
Thiruvel Thirumoolan – Principal Engineer
Francis Liu – Sr. Principal Engineer
At Yahoo! HBase has been running as a hosted multi-tenant service since 2013. In a single HBase cluster we have around 30 tenants running various types of workloads (ie batch, near real-time, ad-hoc, etc). We will walk through multi-tenancy features explaining our motivation, how they work as well as our experiences running these multi-tenant clusters. These features will be available in Apache HBase 2.0.
2:10 – 2:50 P.M. Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
Nick Huang – Director, Data Engineering, Yahoo Mail
Saurabh Dixit – Sr. Principal Engineer, Yahoo Mail
Since 2014, the Yahoo Mail Data Engineering team took on the task of revamping the Mail data warehouse and analytics infrastructure in order to drive the continued growth and evolution of Yahoo Mail. Along the way we have built a 50 PB Hadoop warehouse, and surrounding analytics and machine learning programs that have transformed the way data plays in Yahoo Mail. In this session we will share our experience from this 3 year journey, from the system architecture, analytics systems built, to the learnings from development and drive for adoption.
DAY3. THURSDAY June 15, 2017
2:10 – 2:50 P.M. OracleStore – A Highly Performant RawStore Implementation for Hive Metastore
Chris Drome – Sr. Principal Engineer
Jin Sun – Principal Engineer
Today, Yahoo uses Hive in many different spaces, from ETL pipelines to adhoc user queries. Increasingly, we are investigating the practicality of applying Hive to real-time queries, such as those generated by interactive BI reporting systems. In order for Hive to succeed in this space, it must be performant in all aspects of query execution, from query compilation to job execution. One such component is the interaction with the underlying database at the core of the Metastore. As an alternative to ObjectStore, we created OracleStore as a proof-of-concept. Freed of the restrictions imposed by DataNucleus, we were able to design a more performant database schema that better met our needs. Then, we implemented OracleStore with specific goals built-in from the start, such as ensuring the deduplication of data. In this talk we will discuss the details behind OracleStore and the gains that were realized with this alternative implementation. These include a reduction of 97%+ in the storage footprint of multiple tables, as well as query performance that is 13x faster than ObjectStore with DirectSQL and 46x faster than ObjectStore without DirectSQL.
3:00 P.M. – 3:40 P.M. Bullet – A Real Time Data Query Engine
Akshai Sarma – Sr. Software Engineer
Michael Natkovich – Director, Engineering
Bullet is an open sourced, lightweight, pluggable querying system for streaming data without a persistence layer implemented on top of Storm. It allows you to filter, project, and aggregate on data in transit. It includes a UI and WS. Instead of running queries on a finite set of data that arrived and was persisted or running a static query defined at the startup of the stream, our queries can be executed against an arbitrary set of data arriving after the query is submitted. In other words, it is a look-forward system. Bullet is a multi-tenant system that scales independently of the data consumed and the number of simultaneous queries. Bullet is pluggable into any streaming data source. It can be configured to read from systems such as Storm, Kafka, Spark, Flume, etc. Bullet leverages Sketches to perform its aggregate operations such as distinct, count distinct, sum, count, min, max, and average.
3:00 P.M. – 3:40 P.M. Yahoo – Moving Beyond Running 100% of Apache Pig Jobs on Apache Tez
Rohini Palaniswamy – Sr. Principal Engineer
Last year at Yahoo, we spent great effort in scaling, stabilizing and making Pig on Tez production ready and by the end of the year retired running Pig jobs on Mapreduce. This talk will detail the performance and resource utilization improvements Yahoo achieved after migrating all Pig jobs to run on Tez. After successful migration and the improved performance we shifted our focus to addressing some of the bottlenecks we identified and new optimization ideas that we came up with to make it go even faster. We will go over the new features and work done in Tez to make that happen like custom YARN ShuffleHandler, reworking DAG scheduling order, serialization changes, etc. We will also cover exciting new features that were added to Pig for performance such as bloom join and byte code generation.
4:10 P.M. – 4:50 P.M. Leveraging Docker for Hadoop Build Automation and Big Data Stack Provisioning
Evans Ye, Software Engineer
Apache Bigtop as an open source Hadoop distribution, focuses on developing packaging, testing and deployment solutions that help infrastructure engineers to build up their own customized big data platform as easy as possible. However, packages deployed in production require a solid CI testing framework to ensure its quality. Numbers of Hadoop component must be ensured to work perfectly together as well. In this presentation, we’ll talk about how Bigtop deliver its containerized CI framework which can be directly replicated by Bigtop users. The core revolution here are the newly developed Docker Provisioner that leveraged Docker for Hadoop deployment and Docker Sandbox for developer to quickly start a big data stack. The content of this talk includes the containerized CI framework, technical detail of Docker Provisioner and Docker Sandbox, a hierarchy of docker images we designed, and several components we developed such as Bigtop Toolchain to achieve build automation.
My colleague Mia Champion is a scientist (check out her publications), an AWS Certified Solutions Architect, and an AWS Certified Developer. The time that she spent doing research on large-data datasets gave her an appreciation for the value of cloud computing in the bioinformatics space, which she summarizes and explains in the guest post below!
Technological advances in scientific research continue to enable the collection of exponentially growing datasets that are also increasing in the complexity of their content. The global pace of innovation is now also fueled by the recent cloud-computing revolution, which provides researchers with a seemingly boundless scalable and agile infrastructure. Now, researchers can remove the hindrances of having to own and maintain their own sequencers, microscopes, compute clusters, and more. Using the cloud, scientists can easily store, manage, process and share datasets for millions of patient samples with gigabytes and more of data for each individual. As American physicist, John Bardeen once said: “Science is a collaborative effort. The combined results of several people working together is much more effective than could be that of an individual scientist working alone”.
Prioritizing Reproducible Innovation, Democratization, and Data Protection Today, we have many individual researchers and organizations leveraging secure cloud enabled data sharing on an unprecedented scale and producing innovative, customized analytical solutions using the AWS cloud. But, can secure data sharing and analytics be done on such a collaborative scale as to revolutionize the way science is done across a domain of interest or even across discipline/s of science? Can building a cloud-enabled consortium of resources remove the analytical variability that leads to diminished reproducibility, which has long plagued the interpretability and impact of research discoveries? The answers to these questions are ‘yes’ and initiatives such as the Neuro Cloud Consortium, The Global Alliance for Genomics and Health (GA4GH), and The Sage Bionetworks Synapse platform, which powers many research consortiums including the DREAM challenges, are starting to put into practice model cloud-initiatives that will not only provide impactful discoveries in the areas of neuroscience, infectious disease, and cancer, but are also revolutionizing the way in which scientific research is done.
Bringing Crowd Developed Models, Algorithms, and Functions to the Data Collaborative projects have traditionally allowed investigators to download datasets such as those used for comparative sequence analysis or for training a deep learning algorithm on medical imaging data. Investigators were then able to develop and execute their analysis using institutional clusters, local workstations, or even laptops:
This method of collaboration is problematic for many reasons. The first concern is data security, since dataset download essentially permits “chain-data-sharing” with any number of recipients. Second, analytics done using compute environments that are not templated at some level introduces the risk of variable analytics that itself is not reproducible by a different investigator, or even the same investigator using a different compute environment. Third, the required data dump, processing, and then re-upload or distribution to the collaborative group is highly inefficient and dependent upon each individual’s networking and compute capabilities. Overall, traditional methods of scientific collaboration have introduced methods in which security is compromised and time to discovery is hampered.
Using the AWS cloud, collaborative researchers can share datasets easily and securely by taking advantage of Identity and Access Management (IAM) policy restrictions for user bucket access as well as S3 bucket policies or Access Control Lists (ACLs). To streamline analysis and ensure data security, many researchers are eliminating the necessity to download datasets entirely by leveraging resources that facilitate moving the analytics to the data source and/or taking advantage of remote API requests to access a shared database or data lake. One way our customers are accomplishing this is to leverage container based Docker technology to provide collaborators with a way to submit algorithms or models for execution on the system hosting the shared datasets:
Docker container images have all of the application’s dependencies bundled together, and therefore provide a high degree of versatility and portability, which is a significant advantage over using other executable-based approaches. In the case of collaborative machine learning projects, each docker container will contain applications, language runtime, packages and libraries, as well as any of the more popular deep learning frameworks commonly used by researchers including: MXNet, Caffe, TensorFlow, and Theano.
A common feature in these frameworks is the ability to leverage a host machine’s Graphical Processing Units (GPUs) for significant acceleration of the matrix and vector operations involved in the machine learning computations. As such, researchers with these objectives can leverage EC2’s new P2 instance types in order to power execution of submitted machine learning models. In addition, GPUs can be mounted directly to containers using the NVIDIA Docker tool and appear at the system level as additional devices. By leveraging Amazon EC2 Container Service and the EC2 Container Registry, collaborators are able to execute analytical solutions submitted to the project repository by their colleagues in a reproducible fashion as well as continue to build on their existing environment. Researchers can also architect a continuous deployment pipeline to run their docker-enabled workflows.
In conclusion, emerging cloud-enabled consortium initiatives serve as models for the broader research community for how cloud-enabled community science can expedite discoveries in Precision Medicine while also providing a platform where data security and discovery reproducibility is inherent to the project execution.
The TensorFlow 1.0 release is available, bringing an API stability guarantee to this machine-learning library from Google. “TensorFlow 1.0 introduces a high-level API for TensorFlow, with tf.layers, tf.metrics, and tf.losses modules. We’ve also announced the inclusion of a new tf.keras module that provides full compatibility with Keras, another popular high-level neural networks library.”
By Lee Yang, Jun Shi, Bobbie Chern, and Andy Feng (@afeng76), Yahoo Big ML team
Introduction
Today, we are pleased to offer TensorFlowOnSpark to the community, our latest open source framework for distributed deep learning on big-data clusters.
Deep learning (DL) has evolved significantly in recent years. At Yahoo, we’ve found that in order to gain insight from massive amounts of data, we need to deploy distributed deep learning. Existing DL frameworks often require us to set up separate clusters for deep learning, forcing us to create multiple programs for a machine learning pipeline (see Figure 1 below). Having separate clusters requires us to transfer large datasets between them, introducing unwanted system complexity and end-to-end learning latency.
Last year we addressed scaleout issues by developing and publishing CaffeOnSpark, our open source framework that allows distributed deep learning and big-data processing on identical Spark and Hadoop clusters. We use CaffeOnSpark at Yahoo to improve our NSFW image detection, to automatically identify eSports game highlights from live-streamed videos, and more. With the community’s valuable feedback and contributions, CaffeOnSpark has been upgraded with LSTM support, a new data layer, training and test interleaving, a Python API, and deployment on docker containers. This has been great for our Caffe users, but what about those who use the deep learning framework TensorFlow? We’re taking a page from our own playbook and doing for TensorFlow for what we did for Caffe.
After TensorFlow’s initial publication, Google released an enhanced TensorFlow with distributed deep learning capabilities in April 2016. In October 2016, TensorFlow introduced HDFS support. Outside of the Google cloud, however, users still needed a dedicated cluster for TensorFlow applications. TensorFlow programs could not be deployed on existing big-data clusters, thus increasing the cost and latency for those who wanted to take advantage of this technology at scale.
To address this limitation, several community projects wired TensorFlow onto Spark clusters. SparkNet added the ability to launch TensorFlow networks in Spark executors. DataBricks proposed TensorFrame to manipulate Apache Spark’s DataFrames with TensorFlow programs. While these approaches are a step in the right direction, after examining their code, we learned we would be unable to get the TensorFlow processes to communicate with each other directly, we would not be able to implement asynchronous distributed learning, and we would have to expend significant effort to migrate existing TensorFlow programs.
TensorFlowOnSpark
Our new framework, TensorFlowOnSpark (TFoS), enables distributed TensorFlow execution on Spark and Hadoop clusters. As illustrated in Figure 2 above, TensorFlowOnSpark is designed to work along with SparkSQL, MLlib, and other Spark libraries in a single pipeline or program (e.g. Python notebook).
TensorFlowOnSpark supports all types of TensorFlow programs, enabling both asynchronous and synchronous training and inferencing. It supports model parallelism and data parallelism, as well as TensorFlow tools such as TensorBoard on Spark clusters.
Any TensorFlow program can be easily modified to work with TensorFlowOnSpark. Typically, changing fewer than 10 lines of Python code are needed. Many developers at Yahoo who use TensorFlow have easily migrated TensorFlow programs for execution with TensorFlowOnSpark.
TensorFlowOnSpark supports direct tensor communication among TensorFlow processes (workers and parameter servers). Process-to-process direct communication enables TensorFlowOnSpark programs to scale easily by adding machines. As illustrated in Figure 3, TensorFlowOnSpark doesn’t involve Spark drivers in tensor communication, and thus achieves similar scalability as stand-alone TensorFlow clusters.
TensorFlowOnSpark provides two different modes to ingest data for training and inference:
TensorFlow QueueRunners: TensorFlowOnSpark leverages TensorFlow’s file readers and QueueRunners to read data directly from HDFS files. Spark is not involved in accessing data.
Spark Feeding: Spark RDD data is fed to each Spark executor, which subsequently feeds the data into the TensorFlow graph via feed_dict.
Figure 4 illustrates how the synchronous distributed training of Inception image classification network scales in TFoS using QueueRunners with a simple setting: 1 GPU, 1 reader, and batch size 32 for each worker. Four TFoS jobs were launched to train 100,000 steps. When these jobs completed after 2+ days, the top-5 accuracy of these jobs were 0.730, 0.814, 0.854, and 0.879. Reaching top-5 accuracy of 0.730 takes 46 hours for a 1-worker job, 22.5 hours for a 2-worker job, 13 hours for a 4-worker job, and 7.5 hours for an 8-worker job. TFoS thus achieves near linear scalability for Inception model training. This is very encouraging, although TFoS scalability will vary for different models and hyperparameters.
RDMA for Distributed TensorFlow
In Yahoo’s Hadoop clusters, GPU nodes are connected by both Ethernet and Infiniband. Infiniband provides faster connectivity and supports direct access to other servers’ memories over RDMA. Current TensorFlow releases, however, only support distributed learning using gRPC over Ethernet. To speed up distributed learning, we have enhanced the TensorFlow C++ layer to enable RDMA over Infiniband.
In conjunction with our TFoS release, we are introducing a new protocol for TensorFlow servers in addition to the default “grpc” protocol. Any distributed TensorFlow program can leverage our enhancement via specifying protocol=“grpc_rdma” in tf.train.ServerDef() or tf.train.Server().
With this new protocol, a RDMA rendezvous manager is created to ensure tensors are written directly into the memory of remote servers. We minimize the tensor buffer creation: Tensor buffers are allocated once at the beginning, and then reused across all training steps of a TensorFlow job. From our early experimentation with large models like the VGG-19 network, our RDMA implementation has demonstrated a significant speedup on training time compared with the existing gRPC implementation.
Since RDMA support is a highly requested capability (see TensorFlow issue #2916), we decided to make our current implementation available as an alpha release to the TensorFlow community. In the coming weeks, we will polish our RDMA implementation further, and share detailed benchmark results.
Simple CLI and API
TFoS programs are launched by the standard Apache Spark command, spark-submit. As illustrated below, users can specify the number of Spark executors, the number of GPUs per executor, and the number of parameter servers in the CLI. A user can also state whether they want to use TensorBoard (–tensorboard) and/or RDMA (–rdma).
TFCluster.reserve() … construct a TensorFlow cluster from Spark executors
TFCluster.start() … launch Tensorflow program on the executors
TFCluster.train() or TFCluster.inference() … feed RDD data to TensorFlow processes
TFCluster.shutdown() … shutdown Tensorflow execution on executors
Open Source
Yahoo is happy to release TensorFlowOnSpark at github.com/yahoo/TensorFlowOnSpark and a RDMA enhancement of TensorFlow at github.com/yahoo/tensorflow/tree/yahoo. Multiple example programs (including mnist, cifar10, inception, and VGG) are provided to illustrate the simple conversion process of TensorFlow programs to TensorFlowOnSpark, and leverage RDMA. An Amazon Machine Image is also available for applying TensorFlowOnSpark on AWS EC2.
Going forward, we will advance TensorFlowOnSpark as we continue to do with CaffeOnSpark. We welcome the community’s continued feedback and contributions to CaffeOnSpark, and are interested in thoughts on ways TensorFlowOnSpark can be enhanced.
Happy 2017, everybody! We’re back in the office (for values of “we” equal to me and a cup of coffee – the rest of your friendly Comms team is still on vacation). I hope your New Year’s resolutions are still unbroken. Mine involves that coffee, which doesn’t have any sugar in it and is making January feel much bleaker than necessary. I’ll be fascinated to see how long I can keep it up.
On to the Pi stuff.
I spotted this magnificently creepy art installation from David Bowen just before Christmas, and have been looking forward to showing it to you, because I like to know I’m not the only person having specific nightmares. In this project, a Raspberry Pi AI is mothering a colony of flies: whenever if spots and correctly identifies a fly, it releases a dose of nutrients and water.
flyAI creates a situation where the fate of a colony of living houseflies is determined by the accuracy of artificial intelligence software. The installation uses the TensorFlow machine learning image recognition library to classify images of live houseflies. As the flies fly and land in front of a camera, their image is captured.
David says: “The system is setup to run indefinitely with an indeterminate outcome.”
Which means there’s potential for an awful lot of tiny corpses.
It all sounds simple enough, but there’s something about the build – the choice of AI voice, the achingly slow process of enunciating everything it believes it might have seen before it feeds its wards…the fact that the horrible space-helmet-bubble thing is full of flies – that makes for the most unsettling project we’ve seen in a long time.
If you are inspired by this arthropod chamber of horrors, you can read about more of David’s projects on his blog. You’ll be delighted to learn than this is not the only one employing house-fly labourers. More power to all six of your elbows, David.
Jupyter Notebook (formerly IPython) is one of the most popular user interfaces for running Python, R, Julia, Scala, and other languages to process and visualize data, perform statistical analysis, and train and run machine learning models. Jupyter notebooks are self-contained documents that can include live code, charts, narrative text, and more. The notebooks can be easily converted to HTML, PDF, and other formats for sharing.
Amazon EMR is a popular hosted big data processing service that allows users to easily run Hadoop, Spark, Presto, and other Hadoop ecosystem applications, such as Hive and Pig.
Python, Scala, and R provide support for Spark and Hadoop, and running them in Jupyter on Amazon EMR makes it easy to take advantage of:
the big-data processing capabilities of Hadoop applications.
the large selection of Python and R packages for analytics and visualization.
JupyterHub is a multiple-user environment for Jupyter. You can use the following bootstrap action (BA) to install Jupyter and JupyterHub on Amazon EMR:
Apache Toree (which provides the Spark, PySpark, SparkR, and SparkSQL kernels)
Julia
Ruby
JavaScript
CoffeeScript
Torch
The BA will install Jupyter, JupyterHub, and sample notebooks on the master node.
Commonly used Python and R data science and machine learning packages can be optionally installed on all nodes.
The following arguments can be passed to the BA:
--r
Install the IRKernel for R.
--toree
Install the Apache Toree kernel that supports Scala, PySpark, SQL, SparkR for Apache Spark.
--julia
Install the IJulia kernel for Julia.
--torch
Install the iTorch kernel for Torch (machine learning and visualization).
--ruby
Install the iRuby kernel for Ruby.
--ds-packages
Install the Python data science-related packages (scikit-learn pandas statsmodels).
--ml-packages
Install the Python machine learning-related packages (theano keras tensorflow).
--python-packages
Install specific Python packages (for example, ggplot and nilearn).
--port
Set the port for Jupyter notebook. The default is 8888.
--password
Set the password for the Jupyter notebook.
--localhost-only
Restrict Jupyter to listen on localhost only. The default is to listen on all IP addresses.
--jupyterhub
Install JupyterHub.
--jupyterhub-port
Set the port for JuputerHub. The default is 8000.
--notebook-dir
Specify the notebook folder. This could be a local directory or an S3 bucket.
--cached-install
Use some cached dependency artifacts on S3 to speed up installation.
--ssl
Enable SSL. For production, make sure to use your own certificate and key files.
--copy-samples
Copy sample notebooks to the notebook folder.
--spark-opts
User-supplied Spark options to override the default values.
--s3fs
Use instead of s3nb (the default) for storing notebooks on Amazon S3. This argument can cause slowness if the S3 bucket has lots of files. The Upload file and Create folder menu options do not work with s3nb.
By default (with no --password and --port arguments), Jupyter will run on port 8888 with no password protection; JupyterHub will run on port 8000. The --port and --jupyterhub-port arguments can be used to override the default ports to avoid conflicts with other applications.
The --r option installs the IRKernel for R. It also installs SparkR and sparklyr for R, so make sure Spark is one of the selected EMR applications to be installed. You’ll need the Spark application if you use the --toree argument.
If you used --jupyterhub, use Linux users to sign in to JupyterHub. (Be sure to create passwords for the Linux users first.) hadoop, the default admin user for JupyterHub, can be used to set up other users. The –password option sets the password for Jupyter and for the hadoop user for JupyterHub.
Jupyter on EMR allows users to save their work on Amazon S3 rather than on local storage on the EMR cluster (master node).
To store notebooks on S3, use:
--notebook-dir <s3://your-bucket/folder/>
To store notebooks in a directory different from the user’s home directory, use:
--notebook-dir <local directory>
The following example CLI command is used to launch a five-node (c3.4xlarge) EMR 5.2.0 cluster with the bootstrap action. The BA will install all the available kernels. It will also install the ggplot and nilearn Python packages and set:
Replace <your-ec2-key> with your AWS access key and <your-s3-bucket> with the S3 bucket where you store notebooks. You can also change the instance types to suit your needs and budget.
If you are using the EMR console to launch a cluster, you can specify the bootstrap action as follows:
When the cluster is available, set up the SSH tunnel and web proxy. The Jupyter notebook should be available at localhost:8880 (as specified in the example CLI command).
After you have signed in, you will see the home page, which displays the notebook files:
If JupyterHub is installed, the Sign in page should be available at port 8001 (as specified in the CLI example):
After you are signed in, you’ll see the JupyterHub and Jupyter home pages are the same. The JupyterHub URL, however, is /user/<username>/tree instead of /tree.
The JupyterHub Admin page is used for managing users:
You can install Jupyter extensions from the Nbextensions tab:
If you specified the --copy-samples option in the BA, you should see sample notebooks on the home page. To try the samples, first open and run the CopySampleDataToHDFS.ipynb notebook to copy some sample data files to HDFS. In the CLI example, --python-packages,'ggplot nilearn' is used to install the ggplot and nilearn packages. You can verify those packages were installed by running the Py-ggplot and PyNilearn notebooks.
The CreateUser.ipynb notebook contains examples for setting up JupyterHub users.
The PySpark.ipynb and ScalaSpark.ipynb notebooks contain the Python and Scala versions of some machine learning examples from the Spark distribution (Logistic Regression, Neural Networks, Random Forest, and Support Vector Machines):
PyHivePrestoHDFS.ipynb shows how to access Hive, Presto, and HDFS in Python. (Be sure to run the CreateHivePrestoS3Tables.ipynb first to create tables.) The %%time and %%timeit cell magics can be used to benchmark Hive and Presto queries (and other executable code):
Here are some other sample notebooks for you to try.
SparkSQLParquetJSON.ipynb:
plot_separating_hyperplane.ipynb:
R-SVMLinearNonLinear.ipynb:
plot_iris.ipynb:
Julia-IrisPlot.ipynb:
PyIrisPlot.ipynb:
R-RandomForestVisualization.ipynb:
GrangerCausality.ipynb:
SQLite.ipynb:
GraphvizDot.ipynb:
Conclusion
Data scientists who run Jupyter and JupyterHub on Amazon EMR can use Python, R, Julia, and Scala to process, analyze, and visualize big data stored in Amazon S3. Jupyter notebooks can be saved to S3 automatically, so users can shut down and launch new EMR clusters, as needed. EMR makes it easy to spin up clusters with different sizes and CPU/memory configurations to suit different workloads and budgets. This can greatly reduce the cost of data-science investigations.
If you have questions about using Jupyter and JupyterHub on EMR or would like share your use cases, please leave a comment below.
This is a guest post from my colleagues Naveen Swamy and Joseph Spisak.
———————————
Machine learning is a field of computer science that enables computers to learn without being explicitly programmed. It focuses on algorithms that can learn from and make predictions on data.
Most recently, one branch of machine learning, called deep learning, has been deployed successfully in production with higher accuracy than traditional techniques, enabling capabilities such as speech recognition, image recognition, and video analytics. This higher accuracy comes, however, at the cost of significantly higher compute requirements for training these deep models.
One of the major reasons for this rebirth and rapid progress is the availability and democratization of cloud-scale computing. Training state-of-the-art deep neural networks can be time-consuming, with larger networks like ResidualNet taking several days to weeks to train, even on the latest GPU hardware. Because of this, a scale-out approach is required.
Accelerating training time has multiple benefits, including:
Enabling faster iterative research, allowing scientists to push the state of the art faster in domains such as computer vision or speech recognition.
Reducing the time-to-market for intelligent applications, allowing AI applications that consume trained, deep learning models to access newer models faster.
Absorbing new data faster, helping to keep deep learning models current.
AWS CloudFormation, which creates and configures Amazon Web Services resources with a template, simplifies the process of setting up a distributed deep learning cluster. The CloudFormation Deep Learning template uses the Amazon Deep Learning AMI (supporting MXNet, TensorFlow, Caffe, Theano, Torch, and CNTK frameworks) to launch a cluster of Amazon EC2 instances and other AWS resources needed to perform distributed deep learning. CloudFormation creates all resources in the customer account.
EC2 Cluster Architecture
Resources created by the Deep Learning template
The Deep Learning template creates a stack that contains the following resources:
A VPC in the customer account.
The requested number of worker instances in an Auto Scaling group within the VPC. These worker instances are launched in a private subnet.
A master instance in a separate Auto Scaling group that acts as a proxy to enable connectivity to the cluster via SSH. CloudFormation places this instance within the VPC and connects it to both the public and private subnets. This instance has both public IP addresses and DNS.
A security group that allows external SSH access to the master instance.
Two security groups that open ports on the private subnet for communication between the master and workers.
An IAM role that allows users to access and query Auto Scaling groups and the private IP addresses of the EC2 instances.
A NAT gateway used by the instances within the VPC to talk to the outside world.
The startup script enables SSH forwarding on all hosts. Enabling SSH is essential because frameworks such as MXNet makes use of SSH for communication between master and worker instances during distributed training. The startup script queries the private IP addresses of all the hosts in the stack, appends the IP address and worker alias to /etc/hosts, and writes the list of worker aliases to /opt/deeplearning/workers.
The startup script sets up the following environment variables:
$DEEPLEARNING_WORKERS_PATH: The file path that contains the list of workers
$DEEPLEARNING_WORKERS_COUNT: The total number of workers
$DEEPLEARNING_WORKER_GPU_COUNT: The number of GPUs on the instance
For SSHLocation, choose a valid CIDR IP address range to allow SSH access to the master instance and stack.
For Worker Count, type a value. The stack provisions the worker count + 1, with the additional instance acting as the master. The master also participates in the training/evaluation. Choose Next.
(Optional) Under Tags, type values for Key and Value. This allows you to assign metadata to your resources. (Optional) Under Permissions, you can choose the IAM role that CloudFormation uses to create the stack. Choose Next.
Under Capabilities, select the checkbox to agree to allow CloudFormation to create an IAM role. An IAM role is required for correctly setting up a stack.
To create the CloudFormation stack, choose Create
To see the status of your stack, choose Events. If stack creation fails, for example, because of an access issue or an unsupported number of workers, troubleshoot the issue. For information about troubleshooting the creation of stacks, see Troubleshooting AWS CloudFormation. The event log records the reason for failure.
The CloudFormation stack output contains the Auto Scaling group in which the master instance is launched. Note the Auto Scaling group ID for MasterAutoScalingGroup.
In the navigation pane, under Auto Scaling, choose Auto Scaling Groups.
On the Auto Scaling page, search for the group ID and select it.
On the Instances tab, find the instance ID of the master instance.
Choose the instance to find the public DNS/IP address used for login.
Enable SSH agent forwarding.
This enables communication with all instances in the private subnet. Using the DNS/IP address from Step 1, modify the SSH configuration to include these lines:
Host IP/DNS-from-above
ForwardAgent yes
Run MXNet distributed training.
The following example shows how to run MNIST with data parallelism. Note the use of the DEEPLEARNING_* environment variables:
#terminate all running Python processes across workers
while read -u 10 host; do ssh $host "pkill -f python" ; done 10<$DEEPLEARNING_WORKERS_PATH
#navigate to the mnist image-classification example directory
cd ~/src/mxnet/example/image-classification
#run the MNIST distributed training example
../../tools/launch.py -n $DEEPLEARNING_WORKERS_COUNT -H $DEEPLEARNING_WORKERS_PATH python train_mnist.py --gpus $(seq -s , 0 1 $(($DEEPLEARNING_WORKER_GPU_COUNT - 1))) --network lenet --kv-store dist_sync
These steps are only a subset. For more information about running distributed training, see Run MXNet on Multiple Devices.
FAQ
1. How do I change the IP addresses that are allowed to SSH to the master instance?
The CloudFormation stack output contains the security group that controls the inbound IP addresses for SSH access to the master instance. Use this security group to change your inbound IP addresses.
2. When an instance is replaced, are the IP addresses of the instances updated?
No. You must update IP addresses manually.
3. Does the master instance participate in training/validation?
Yes. Because most deep learning tasks involve GPUs, the master instance acts both as a proxy and as a distributed training/validation instance.
4. Why are the instances in an Auto Scaling group?
Auto Scaling group maintains the number of desired instances by launching a new instance if an existing instance fails. There are two Auto Scaling groups: one for the master and one for the workers in the private subnet. Because only the master instance has a public endpoint to access the hosts in the stack, if the master instance becomes unavailable, you can terminate it and the associated Auto Scaling group automatically launches a new master instance with a new public endpoint.
5. When a new worker instance is added or an existing instance replaced, does CloudFormation update the IP addresses on the master instance?
No, this template does not have the capability to automatically update the IP address of the replacement instance.
One of the dreaded parking enforcement Interceptors! Source: Wikipedia
John explained that the parking spots near his home only allow two hour parking. But he had figured out that you only get caught exceeding that if the parking enforcement officers see your car in the same spot for more than two hours. If he could somehow know when a meter maid’s Interceptor drives by, he would have a two hour heads-up before he had to move his vehicle.
Here’s how the Raspberry Pi comes into play:
“I used a Raspberry Pi with the Camera Module and OpenCV as a motion detector,” Naulty explains, rattling off the long list of tech that went into creating Meter Maid Monitor. “The camera monitors traffic and takes photos. The pictures are uploaded to AWS, where an EC2 instance running the TensorFlow supervised learning platform does the image recognition. I’ve trained it to recognise meter maid cars. Finally, if there’s more than a 75 percent chance of the car being a meter maid, it sends me a text message using Twilio, so I can move my car before I get a ticket”
If this all feels a bit nefarious and subversive to you, hopefully you can at the very least appreciate John’s clever use of technology. Either way, if you want to see his code for the Raspberry Pi and for the AWS instance, head on over to his GitHub repo for this project. If you have any other smart ideas for using Raspberry Pi to make city parking more bearable, let’s hear ’em!
By continuing to use the site, you agree to the use of cookies. more information
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.