All posts by davglass

Open Sourcing a Deep Learning Solution for Detecting NSFW Images

Post Syndicated from davglass original https://yahooeng.tumblr.com/post/151148689421

By Jay Mahadeokar and Gerry Pesavento

Automatically identifying that an image is not suitable/safe for work (NSFW), including offensive and adult images, is an important problem which researchers have been trying to tackle for decades. Since images and user-generated content dominate the Internet today, filtering NSFW images becomes an essential component of Web and mobile applications. With the evolution of computer vision, improved training data, and deep learning algorithms, computers are now able to automatically classify NSFW image content with greater precision.

Defining NSFW material is subjective and the task of identifying these images is non-trivial. Moreover, what may be objectionable in one context can be suitable in another. For this reason, the model we describe below focuses only on one type of NSFW content: pornographic images. The identification of NSFW sketches, cartoons, text, images of graphic violence, or other types of unsuitable content is not addressed with this model.

To the best of our knowledge, there is no open source model or algorithm for identifying NSFW images. In the spirit of collaboration and with the hope of advancing this endeavor, we are releasing our deep learning model that will allow developers to experiment with a classifier for NSFW detection, and provide feedback to us on ways to improve the classifier.

Our general purpose Caffe deep neural network model (Github code) takes an image as input and outputs a probability (i.e a score between 0-1) which can be used to detect and filter NSFW images. Developers can use this score to filter images below a certain suitable threshold based on a ROC curve for specific use-cases, or use this signal to rank images in search results.

Convolutional Neural Network (CNN) architectures and tradeoffs

In recent years, CNNs have become very successful in image classification problems [1] [5] [6]. Since 2012, new CNN architectures have continuously improved the accuracy of the standard ImageNet classification challenge. Some of the major breakthroughs include AlexNet (2012) [6], GoogLeNet [5], VGG (2013) [2] and Residual Networks (2015) [1]. These networks have different tradeoffs in terms of runtime, memory requirements, and accuracy. The main indicators for runtime and memory requirements are:

  1. Flops or connections – The number of connections in a neural network determine the number of compute operations during a forward pass, which is proportional to the runtime of the network while classifying an image.
  2. Parameters -–The number of parameters in a neural network determine the amount of memory needed to load the network.

Ideally we want a network with minimum flops and minimum parameters, which would achieve maximum accuracy.

Training a deep neural network for NSFW classification

We train the models using a dataset of positive (i.e. NSFW) images and negative (i.e. SFW – suitable/safe for work) images. We are not releasing the training images or other details due to the nature of the data, but instead we open source the output model which can be used for classification by a developer.

We use the Caffe deep learning library and CaffeOnSpark; the latter is a powerful open source framework for distributed learning that brings Caffe deep learning to Hadoop and Spark clusters for training models (Big shout out to Yahoo’s CaffeOnSpark team!).

While training, the images were resized to 256×256 pixels, horizontally flipped for data augmentation, and randomly cropped to 224×224 pixels, and were then fed to the network. For training residual networks, we used scale augmentation as described in the ResNet paper [1], to avoid overfitting. We evaluated various architectures to experiment with tradeoffs of runtime vs accuracy.

  1. MS_CTC [4] – This architecture was proposed in Microsoft’s constrained time cost paper. It improves on top of AlexNet in terms of speed and accuracy maintaining a combination of convolutional and fully-connected layers.
  2. Squeezenet [3] – This architecture introduces the fire module which contain layers to squeeze and then expand the input data blob. This helps to save the number of parameters keeping the Imagenet accuracy as good as AlexNet, while the memory requirement is only 6MB.
  3. VGG [2] – This architecture has 13 conv layers and 3 FC layers.
  4. GoogLeNet [5] – GoogLeNet introduces inception modules and has 20 convolutional layer stages. It also uses hanging loss functions in intermediate layers to tackle the problem of diminishing gradients for deep networks.
  5. ResNet-50 [1] – ResNets use shortcut connections to solve the problem of diminishing gradients. We used the 50-layer residual network released by the authors.
  6. ResNet-50-thin – The model was generated using our pynetbuilder tool and replicates the Residual Network paper’s 50-layer network (with half number of filters in each layer). You can find more details on how the model was generated and trained here.

Tradeoffs of different architectures: accuracy vs number of flops vs number of params in network.

The deep models were first pre-trained on the ImageNet 1000 class dataset. For each network, we replace the last layer (FC1000) with a 2-node fully-connected layer. Then we fine-tune the weights on the NSFW dataset. Note that we keep the learning rate multiplier for the last FC layer 5 times the multiplier of other layers, which are being fine-tuned. We also tune the hyper parameters (step size, base learning rate) to optimize the performance.

We observe that the performance of the models on NSFW classification tasks is related to the performance of the pre-trained model on ImageNet classification tasks, so if we have a better pretrained model, it helps in fine-tuned classification tasks. The graph below shows the relative performance on our held-out NSFW evaluation set. Please note that the false positive rate (FPR) at a fixed false negative rate (FNR) shown in the graph is specific to our evaluation dataset, and is shown here for illustrative purposes. To use the models for NSFW filtering, we suggest that you plot the ROC curve using your dataset and pick a suitable threshold.

Comparison of performance of models on Imagenet and their counterparts fine-tuned on NSFW dataset.

We are releasing the thin ResNet 50 model, since it provides good tradeoff in terms of accuracy, and the model is lightweight in terms of runtime (takes < 0.5 sec on CPU) and memory (~23 MB). Please refer our git repository for instructions and usage of our model. We encourage developers to try the model for their NSFW filtering use cases. For any questions or feedback about performance of model, we encourage creating a issue and we will respond ASAP.

Results can be improved by fine-tuning the model for your dataset or use case. If you achieve improved performance or you have trained a NSFW model with different architecture, we encourage contributing to the model or sharing the link on our description page.

Disclaimer: The definition of NSFW is subjective and contextual. This model is a general purpose reference model, which can be used for the preliminary filtering of pornographic images. We do not provide guarantees of accuracy of output, rather we make this available for developers to explore and enhance as an open source project.

We would like to thank Sachin Farfade, Amar Ramesh Kamat, Armin Kappeler, and Shraddha Advani for their contributions in this work.

References:

[1] He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. “Deep residual learning for image recognition” arXiv preprint arXiv:1512.03385 (2015).

[2] Simonyan, Karen, and Andrew Zisserman. “Very deep convolutional networks for large-scale image recognition.”; arXiv preprint arXiv:1409.1556(2014).

[3] Iandola, Forrest N., Matthew W. Moskewicz, Khalid Ashraf, Song Han, William J. Dally, and Kurt Keutzer. “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and 1MB model size.”; arXiv preprint arXiv:1602.07360 (2016).

[4] He, Kaiming, and Jian Sun. “Convolutional neural networks at constrained time cost.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5353-5360. 2015.

[5] Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet,Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. “Going deeper with convolutions” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-9. 2015.

[6] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. “Imagenet classification with deep convolutional neural networks” In Advances in neural information processing systems, pp. 1097-1105. 2012.

Personalized Group Recommendations Are Here | code.flickr.com

Post Syndicated from davglass original https://yahooeng.tumblr.com/post/151144204266

Personalized Group Recommendations Are Here | code.flickr.com:

There are two primary paradigms for the discovery of digital content. First is the search paradigm, in which the user is actively looking for specific content using search terms and filters (e.g., Google web search, Flickr image search, Yelprestaurant search, etc.). Second is a passive approach, in which the user browses content presented to them (e.g., NYTimes news, Flickr Explore, and Twitter trending topics). Personalization benefits both approaches by providing relevant content that is tailored to users’ tastes (e.g., Google News, Netflix homepage, LinkedIn job search, etc.). We believe personalization can improve the user experience at Flickr by guiding both new as well as more experienced members as they explore photography. Today, we’re excited to bring you personalized group recommendations.

Read more over at code.flickr.com

Moving Beyond Flash: The Yahoo HTML5 Video Player – Streaming Media Magazine

Post Syndicated from davglass original https://yahooeng.tumblr.com/post/150727511601

Moving Beyond Flash: The Yahoo HTML5 Video Player – Streaming Media Magazine:

Adobe Flash, once the de-facto standard for media playback on the web, has lost favor in the industry due to increasing concerns over security and performance. At the same time, requiring a plugin for video playback in browsers is losing favor among users as well. As a result, the industry is moving toward HTML5 for video playback.

Yahoo Account Key – Signing in Has Never Been Easier

Post Syndicated from davglass original https://yahooeng.tumblr.com/post/131218006711

yahoo:

By Dylan Casey, VP of Product Management
Earlier this year, we launched on-demand passwords so you can sign into your Yahoo account using an SMS code, instead of memorizing a complicated password. It was the first step toward a password-free future.
Today, we’re excited to take user convenience a step further by introducing Yahoo Account Key, which uses push notifications to provide a quick and simple way for you to access a Yahoo account using your mobile device.
Passwords are usually simple to hack and easy to forget. Account Key streamlines the sign-in process with a secure, elegant and easy-to-use interface that makes access as easy as tapping a button. It’s also more secure than a traditional password because once you activate Account Key – even if someone gets access to your account info – they can’t sign in.
image
Account Key is now available globally for the new Yahoo Mail app and will be rolling out to other Yahoo apps this year. We’re thrilled about this next step towards a password-free future!

Yahoo Daily Fantasy: Everyone’s Invited—and We Mean “Everyone”

Post Syndicated from davglass original https://yahooeng.tumblr.com/post/129855575131

imbrianj:

Photo of a Yahoo accessibility specialist assisting a colleague with keyboard navigation. The Fantasy Sport logo is superimposed.
When we’re building products at Yahoo we get really excited about our work. No surprise. We envision that millions of people are going to love our products and be absolutely delighted when using them.

With our new Yahoo Sports Daily Fantasy game, we wanted to include everyone.

We support all major modern browsers on desktop and mobile as well as native apps. However, that, in and of itself, won’t ensure that the billion individuals around the world who use assistive technology will be able to manage and play our fantasy games. One billion. That’s a lot of everyone.

Daily Fantasy baked in accessibility. Baked in. Important point. In order to ensure that everyone is able to compete in our games at the same level, accessibility can’t be an add-on.

Check out our pages. Title and ARIA attributes. Structured headers. Brilliant labels. TabIndex and other attributes that are convenience features for many of us and a necessity for a great experience for others—especially our assistive technology users. There are a lot of them and if we work to make our pages and apps accessible, well, we figure, there can be a lot more of them using Daily Fantasy.

Think about it: whether you’re a sighted user and just need to hover over an icon to get the full description of what it indicates—or a totally blind user who would otherwise miss that valuable content—it makes sense to work on making our game as enjoyable and as easy to use as possible for everyone.

So, the technical bits. What specific things did we do to ensure good accessibility on Daily Fantasy?

A properly accessible site starts on a foundation of good, semantic markup. We work to ensure that content is presented in the markup in the order that makes the most sense, then worry about how to style it to look as we desire. The markup we choose is also important: while <div> and <span> are handy wrappers, we try to make sure the context is appropriate. Should this player info be a <dl>? Should this alert be a <p>?

One of the biggest impacts to screen readers is the appropriate use of header tags and well-written labels. With these a user can quickly navigate to the appropriate part of the page based on the headers presented—allowing them to skip some of the navigation stuff that sighted users take for granted—and know exactly what they can do when, for example, they need to subtract or add a player to their roster. When content changes, we make use of ARIA attributes. With a single-page web app (that does not do a page refresh as you navigate) we make use of ARIA’s role=“alert” to give a cue to users what change has occurred. Similarly, we’ve tried to ensure some components, such as our tab selectors and sliders, are compatible and present information that is as helpful as possible. With our scrolling table headers, we had to use ARIA to “ignore” them, as it’d be redundant for screen readers as the natural <th> elements were intentionally left in place but visibly hidden.

Although we have done some testing with OSX and VoiceOver, our primary testing platform is NVDA on Windows using Chrome. NVDA’s support has been good – and, it’s free and open source. Even if you’re on OSX, you can install a free Windows VM for testing thanks to a program Microsoft has set up (thank you!). These free tools make it so anyone is able to ensure a great experience for all users:

https://dev.modern.ie/tools/vms/mac/
https://www.virtualbox.org/wiki/Downloads
http://www.nvaccess.org/download/
http://www.google.com/chrome/

Accessibility should not be considered a competitive advantage. It’s something everyone should strive for and something we should all be supporting. If you’re interested in participating in the conversation, give us a tweet, reblog, join in the forum conversations or drop us a line! We share your love of Daily Fantasy games and want to make sure everyone’s invited.

If you have a suggestion on what could improve our product, please let us know! For Daily Fantasy we personally lurk in some of the more popular forums and have gotten some really great feedback from their users. It’s not uncommon to read a comment and have a fix in to address it within hours.

Did I mention that we are excited about our work and delighting users—everyone?

– Gary, Darren and Brian