Tag Archives: Apache MXNet on AWS

AWS Contributes to Milestone 1.0 Release and Adds Model Serving Capability for Apache MXNet

Post Syndicated from Ana Visneski original https://aws.amazon.com/blogs/aws/aws-contributes-to-milestone-1-0-release-and-adds-model-serving-capability-for-apache-mxnet/

Post by Dr. Matt Wood

Today AWS announced contributions to the milestone 1.0 release of the Apache MXNet deep learning engine including the introduction of a new model-serving capability for MXNet. The new capabilities in MXNet provide the following benefits to users:

1) MXNet is easier to use: The model server for MXNet is a new capability introduced by AWS, and it packages, runs, and serves deep learning models in seconds with just a few lines of code, making them accessible over the internet via an API endpoint and thus easy to integrate into applications. The 1.0 release also includes an advanced indexing capability that enables users to perform matrix operations in a more intuitive manner.

  • Model Serving enables set up of an API endpoint for prediction: It saves developers time and effort by condensing the task of setting up an API endpoint for running and integrating prediction functionality into an application to just a few lines of code. It bridges the barrier between Python-based deep learning frameworks and production systems through a Docker container-based deployment model.
  • Advanced indexing for array operations in MXNet: It is now more intuitive for developers to leverage the powerful array operations in MXNet. They can use the advanced indexing capability by leveraging existing knowledge of NumPy/SciPy arrays. For example, it supports MXNet NDArray and Numpy ndarray as index, e.g. (a[mx.nd.array([1,2], dtype = ‘int32’]).

2) MXNet is faster: The 1.0 release includes implementation of cutting-edge features that optimize the performance of training and inference. Gradient compression enables users to train models up to five times faster by reducing communication bandwidth between compute nodes without loss in convergence rate or accuracy. For speech recognition acoustic modeling like the Alexa voice, this feature can reduce network bandwidth by up to three orders of magnitude during training. With the support of NVIDIA Collective Communication Library (NCCL), users can train a model 20% faster on multi-GPU systems.

  • Optimize network bandwidth with gradient compression: In distributed training, each machine must communicate frequently with others to update the weight-vectors and thereby collectively build a single model, leading to high network traffic. Gradient compression algorithm enables users to train models up to five times faster by compressing the model changes communicated by each instance.
  • Optimize the training performance by taking advantage of NCCL: NCCL implements multi-GPU and multi-node collective communication primitives that are performance optimized for NVIDIA GPUs. NCCL provides communication routines that are optimized to achieve high bandwidth over interconnection between multi-GPUs. MXNet supports NCCL to train models about 20% faster on multi-GPU systems.

3) MXNet provides easy interoperability: MXNet now includes a tool for converting neural network code written with the Caffe framework to MXNet code, making it easier for users to take advantage of MXNet’s scalability and performance.

  • Migrate Caffe models to MXNet: It is now possible to easily migrate Caffe code to MXNet, using the new source code translation tool for converting Caffe code to MXNet code.

MXNet has helped developers and researchers make progress with everything from language translation to autonomous vehicles and behavioral biometric security. We are excited to see the broad base of users that are building production artificial intelligence applications powered by neural network models developed and trained with MXNet. For example, the autonomous driving company TuSimple recently piloted a self-driving truck on a 200-mile journey from Yuma, Arizona to San Diego, California using MXNet. This release also includes a full-featured and performance optimized version of the Gluon programming interface. The ease-of-use associated with it combined with the extensive set of tutorials has led significant adoption among developers new to deep learning. The flexibility of the interface has driven interest within the research community, especially in the natural language processing domain.

Getting started with MXNet
Getting started with MXNet is simple. To learn more about the Gluon interface and deep learning, you can reference this comprehensive set of tutorials, which covers everything from an introduction to deep learning to how to implement cutting-edge neural network models. If you’re a contributor to a machine learning framework, check out the interface specs on GitHub.

To get started with the Model Server for Apache MXNet, install the library with the following command:

$ pip install mxnet-model-server

The Model Server library has a Model Zoo with 10 pre-trained deep learning models, including the SqueezeNet 1.1 object classification model. You can start serving the SqueezeNet model with just the following command:

$ mxnet-model-server \
  --models squeezenet=https://s3.amazonaws.com/model-server/models/squeezenet_v1.1/squeezenet_v1.1.model \
  --service dms/model_service/mxnet_vision_service.py

Learn more about the Model Server and view the source code, reference examples, and tutorials here: https://github.com/awslabs/mxnet-model-server/

-Dr. Matt Wood

Introducing Gluon: a new library for machine learning from AWS and Microsoft

Post Syndicated from Ana Visneski original https://aws.amazon.com/blogs/aws/introducing-gluon-a-new-library-for-machine-learning-from-aws-and-microsoft/

Post by Dr. Matt Wood

Today, AWS and Microsoft announced Gluon, a new open source deep learning interface which allows developers to more easily and quickly build machine learning models, without compromising performance.

Gluon Logo

Gluon provides a clear, concise API for defining machine learning models using a collection of pre-built, optimized neural network components. Developers who are new to machine learning will find this interface more familiar to traditional code, since machine learning models can be defined and manipulated just like any other data structure. More seasoned data scientists and researchers will value the ability to build prototypes quickly and utilize dynamic neural network graphs for entirely new model architectures, all without sacrificing training speed.

Gluon is available in Apache MXNet today, a forthcoming Microsoft Cognitive Toolkit release, and in more frameworks over time.

Neural Networks vs Developers
Machine learning with neural networks (including ‘deep learning’) has three main components: data for training; a neural network model, and an algorithm which trains the neural network. You can think of the neural network in a similar way to a directed graph; it has a series of inputs (which represent the data), which connect to a series of outputs (the prediction), through a series of connected layers and weights. During training, the algorithm adjusts the weights in the network based on the error in the network output. This is the process by which the network learns; it is a memory and compute intensive process which can take days.

Deep learning frameworks such as Caffe2, Cognitive Toolkit, TensorFlow, and Apache MXNet are, in part, an answer to the question ‘how can we speed this process up? Just like query optimizers in databases, the more a training engine knows about the network and the algorithm, the more optimizations it can make to the training process (for example, it can infer what needs to be re-computed on the graph based on what else has changed, and skip the unaffected weights to speed things up). These frameworks also provide parallelization to distribute the computation process, and reduce the overall training time.

However, in order to achieve these optimizations, most frameworks require the developer to do some extra work: specifically, by providing a formal definition of the network graph, up-front, and then ‘freezing’ the graph, and just adjusting the weights.

The network definition, which can be large and complex with millions of connections, usually has to be constructed by hand. Not only are deep learning networks unwieldy, but they can be difficult to debug and it’s hard to re-use the code between projects.

The result of this complexity can be difficult for beginners and is a time-consuming task for more experienced researchers. At AWS, we’ve been experimenting with some ideas in MXNet around new, flexible, more approachable ways to define and train neural networks. Microsoft is also a contributor to the open source MXNet project, and were interested in some of these same ideas. Based on this, we got talking, and found we had a similar vision: to use these techniques to reduce the complexity of machine learning, making it accessible to more developers.

Enter Gluon: dynamic graphs, rapid iteration, scalable training
Gluon introduces four key innovations.

  1. Friendly API: Gluon networks can be defined using a simple, clear, concise code – this is easier for developers to learn, and much easier to understand than some of the more arcane and formal ways of defining networks and their associated weighted scoring functions.
  2. Dynamic networks: the network definition in Gluon is dynamic: it can bend and flex just like any other data structure. This is in contrast to the more common, formal, symbolic definition of a network which the deep learning framework has to effectively carve into stone in order to be able to effectively optimizing computation during training. Dynamic networks are easier to manage, and with Gluon, developers can easily ‘hybridize’ between these fast symbolic representations and the more friendly, dynamic ‘imperative’ definitions of the network and algorithms.
  3. The algorithm can define the network: the model and the training algorithm are brought much closer together. Instead of separate definitions, the algorithm can adjust the network dynamically during definition and training. Not only does this mean that developers can use standard programming loops, and conditionals to create these networks, but researchers can now define even more sophisticated algorithms and models which were not possible before. They are all easier to create, change, and debug.
  4. High performance operators for training: which makes it possible to have a friendly, concise API and dynamic graphs, without sacrificing training speed. This is a huge step forward in machine learning. Some frameworks bring a friendly API or dynamic graphs to deep learning, but these previous methods all incur a cost in terms of training speed. As with other areas of software, abstraction can slow down computation since it needs to be negotiated and interpreted at run time. Gluon can efficiently blend together a concise API with the formal definition under the hood, without the developer having to know about the specific details or to accommodate the compiler optimizations manually.

The team here at AWS, and our collaborators at Microsoft, couldn’t be more excited to bring these improvements to developers through Gluon. We’re already seeing quite a bit of excitement from developers and researchers alike.

Getting started with Gluon
Gluon is available today in Apache MXNet, with support coming for the Microsoft Cognitive Toolkit in a future release. We’re also publishing the front-end interface and the low-level API specifications so it can be included in other frameworks in the fullness of time.

You can get started with Gluon today. Fire up the AWS Deep Learning AMI with a single click and jump into one of 50 fully worked, notebook examples. If you’re a contributor to a machine learning framework, check out the interface specs on GitHub.

-Dr. Matt Wood