Tag Archives: Sketches

More power to your Pi

Post Syndicated from James Adams original https://www.raspberrypi.org/blog/pi-power-supply-chip/

It’s been just over three weeks since we launched the new Raspberry Pi 3 Model B+. Although the product is branded Raspberry Pi 3B+ and not Raspberry Pi 4, a serious amount of engineering was involved in creating it. The wireless networking, USB/Ethernet hub, on-board power supplies, and BCM2837 chip were all upgraded: together these represent almost all the circuitry on the board! Today, I’d like to tell you about the work that has gone into creating a custom power supply chip for our newest computer.

Raspberry Pi 3 Model B+, with custome power supply chip

The new Raspberry Pi 3B+, sporting a new, custom power supply chip (bottom left-hand corner)

Successful launch

The Raspberry Pi 3B+ has been well received, and we’ve enjoyed hearing feedback from the community as well as reading the various reviews and articles highlighting the solid improvements in wireless networking, Ethernet, CPU, and thermal performance of the new board. Gareth Halfacree’s post here has some particularly nice graphs showing the increased performance as well as how the Pi 3B+ keeps cool under load due to the new CPU package that incorporates a metal heat spreader. The Raspberry Pi production lines at the Sony UK Technology Centre are running at full speed, and it seems most people who want to get hold of the new board are able to find one in stock.

Powering your Pi

One of the most critical but often under-appreciated elements of any electronic product, particularly one such as Raspberry Pi with lots of complex on-board silicon (processor, networking, high-speed memory), is the power supply. In fact, the Raspberry Pi 3B+ has no fewer than six different voltage rails: two at 3.3V — one special ‘quiet’ one for audio, and one for everything else; 1.8V; 1.2V for the LPDDR2 memory; and 1.2V nominal for the CPU core. Note that the CPU voltage is actually raised and lowered on the fly as the speed of the CPU is increased and decreased depending on how hard the it is working. The sixth rail is 5V, which is the master supply that all the others are created from, and the output voltage for the four downstream USB ports; this is what the mains power adaptor is supplying through the micro USB power connector.

Power supply primer

There are two common classes of power supply circuits: linear regulators and switching regulators. Linear regulators work by creating a lower, regulated voltage from a higher one. In simple terms, they monitor the output voltage against an internally generated reference and continually change their own resistance to keep the output voltage constant. Switching regulators work in a different way: they ‘pump’ energy by first storing the energy coming from the source supply in a reactive component (usually an inductor, sometimes a capacitor) and then releasing it to the regulated output supply. The switches in switching regulators effect this energy transfer by first connecting the inductor (or capacitor) to store the source energy, and then switching the circuit so the energy is released to its destination.

Linear regulators produce smoother, less noisy output voltages, but they can only convert to a lower voltage, and have to dissipate energy to do so. The higher the output current and the voltage difference across them is, the more energy is lost as heat. On the other hand, switching supplies can, depending on their design, convert any voltage to any other voltage and can be much more efficient (efficiencies of 90% and above are not uncommon). However, they are more complex and generate noisier output voltages.

Designers use both types of regulators depending on the needs of the downstream circuit: for low-voltage drops, low current, or low noise, linear regulators are usually the right choice, while switching regulators are used for higher power or when efficiency of conversion is required. One of the simplest switching-mode power supply circuits is the buck converter, used to create a lower voltage from a higher one, and this is what we use on the Pi.

A history lesson

The BCM2835 processor chip (found on the original Raspberry Pi Model B and B+, as well as on the Zero products) has on-chip power supplies: one switch-mode regulator for the core voltage, as well as a linear one for the LPDDR2 memory supply. This meant that in addition to 5V, we only had to provide 3.3V and 1.8V on the board, which was relatively simple to do using cheap, off-the-shelf parts.

Pi Zero sporting a BCM2835 processor which only needs 2 external switchers (the components clustered behind the camera port)

When we moved to the BCM2836 for Raspberry Pi Model 2 (and subsequently to the BCM2837A1 and B0 for Raspberry Pi 3B and 3B+), the core supply and the on-chip LPDDR2 memory supply were not up to the job of supplying the extra processor cores and larger memory, so we removed them. (We also used the recovered chip area to help fit in the new quad-core ARM processors.) The upshot of this was that we had to supply these power rails externally for the Raspberry Pi 2 and models thereafter. Moreover, we also had to provide circuitry to sequence them correctly in order to control exactly when they power up compared to the other supplies on the board.

Power supply design is tricky (but critical)

Raspberry Pi boards take in 5V from the micro USB socket and have to generate the other required supplies from this. When 5V is first connected, each of these other supplies must ‘start up’, meaning go from ‘off’, or 0V, to their correct voltage in some short period of time. The order of the supplies starting up is often important: commonly, there are structures inside a chip that form diodes between supply rails, and bringing supplies up in the wrong order can sometimes ‘turn on’ these diodes, causing them to conduct, with undesirable consequences. Silicon chips come with a data sheet specifying what supplies (voltages and currents) are needed and whether they need to be low-noise, in what order they must power up (and in some cases down), and sometimes even the rate at which the voltages must power up and down.

A Pi3. Power supply components are clustered bottom left next to the micro USB, middle (above LPDDR2 chip which is on the bottom of the PCB) and above the A/V jack.

In designing the power chain for the Pi 2 and 3, the sequencing was fairly straightforward: power rails power up in order of voltage (5V, 3.3V, 1.8V, 1.2V). However, the supplies were all generated with individual, discrete devices. Therefore, I spent quite a lot of time designing circuitry to control the sequencing — even with some design tricks to reduce component count, quite a few sequencing components are required. More complex systems generally use a Power Management Integrated Circuit (PMIC) with multiple supplies on a single chip, and many different PMIC variants are made by various manufacturers. Since Raspberry Pi 2 days, I was looking for a suitable PMIC to simplify the Pi design, but invariably (and somewhat counter-intuitively) these were always too expensive compared to my discrete solution, usually because they came with more features than needed.

One device to rule them all

It was way back in May 2015 when I first chatted to Peter Coyle of Exar (Exar were bought by MaxLinear in 2017) about power supply products for Raspberry Pi. We didn’t find a product match then, but in June 2016 Peter, along with Tuomas Hollman and Trevor Latham, visited to pitch the possibility of building a custom power management solution for us.

I was initially sceptical that it could be made cheap enough. However, our discussion indicated that if we could tailor the solution to just what we needed, it could be cost-effective. Over the coming weeks and months, we honed a specification we agreed on from the initial sketches we’d made, and Exar thought they could build it for us at the target price.

The chip we designed would contain all the key supplies required for the Pi on one small device in a cheap QFN package, and it would also perform the required sequencing and voltage monitoring. Moreover, the chip would be flexible to allow adjustment of supply voltages from their default values via I2C; the largest supply would be capable of being adjusted quickly to perform the dynamic core voltage changes needed in order to reduce voltage to the processor when it is idling (to save power), and to boost voltage to the processor when running at maximum speed (1.4 GHz). The supplies on the chip would all be generously specified and could deliver significantly more power than those used on the Raspberry Pi 3. All in all, the chip would contain four switching-mode converters and one low-current linear regulator, this last one being low-noise for the audio circuitry.

The MXL7704 chip

The project was a great success: MaxLinear delivered working samples of first silicon at the end of May 2017 (almost exactly a year after we had kicked off the project), and followed through with production quantities in December 2017 in time for the Raspberry Pi 3B+ production ramp.

The team behind the power supply chip on the Raspberry Pi 3 Model B+ (group of six men, two of whom are holding Raspberry Pi boards)

Front row: Roger with the very first Pi 3B+ prototypes and James with a MXL7704 development board hacked to power a Pi 3. Back row left to right: Will Torgerson, Trevor Latham, Peter Coyle, Tuomas Hollman.

The MXL7704 device has been key to reducing Pi board complexity and therefore overall bill of materials cost. Furthermore, by being able to deliver more power when needed, it has also been essential to increasing the speed of the (newly packaged) BCM2837B0 processor on the 3B+ to 1.4GHz. The result is improvements to both the continuous output current to the CPU (from 3A to 4A) and to the transient performance (i.e. the chip has helped to reduce the ‘transient response’, which is the change in supply voltage due to a sudden current spike that occurs when the processor suddenly demands a large current in a few nanoseconds, as modern CPUs tend to do).

With the MXL7704, the power supply circuitry on the 3B+ is now a lot simpler than the Pi 3B design. This new supply also provides the LPDDR2 memory voltage directly from a switching regulator rather than using linear regulators like the Pi 3, thereby improving energy efficiency. This helps to somewhat offset the extra power that the faster Ethernet, wireless networking, and processor consume. A pleasing side effect of using the new chip is the symmetric board layout of the regulators — it’s easy to see the four switching-mode supplies, given away by four similar-looking blobs (three grey and one brownish), which are the inductors.

Close-up of the power supply chip on the Raspberry Pi 3 Model B+

The Pi 3B+ PMIC MXL7704 — pleasingly symmetric

Kudos

It takes a lot of effort to design a new chip from scratch and get it all the way through to production — we are very grateful to the team at MaxLinear for their hard work, dedication, and enthusiasm. We’re also proud to have created something that will not only power Raspberry Pis, but will also be useful for other product designs: it turns out when you have a low-cost and flexible device, it can be used for many things — something we’re fairly familiar with here at Raspberry Pi! For the curious, the product page (including the data sheet) for the MXL7704 chip is here. Particular thanks go to Peter Coyle, Tuomas Hollman, and Trevor Latham, and also to Jon Cronk, who has been our contact in the US and has had to get up early to attend all our conference calls!

The MXL7704 design team celebrating on Pi Day — it takes a lot of people to design a chip!

I hope you liked reading about some of the effort that has gone into creating the new Pi. It’s nice to finally have a chance to tell people about some of the (increasingly complex) technical work that makes building a $35 computer possible — we’re very pleased with the Raspberry Pi 3B+, and we hope you enjoy using it as much as we’ve enjoyed creating it!

The post More power to your Pi appeared first on Raspberry Pi.

Weekly roundup: Fortnite

Post Syndicated from Eevee original https://eev.ee/dev/2018/04/02/weekly-roundup-fortnite/

I skipped a week again because, surprise, I’ve been mostly working on the same game…

  • art: Actually been doing a bit of it! I painted a thing on a whim, and some misc sketches, a few of which I even felt like posting.

  • alice: Finally kind of hit my stride here and wrote, um, a pretty good chunk of stuff. Also played with extending the syntax a bit, and came up with a choice menu that hangs around while the dialogue continues. Kinda cool, though I’m not totally sure what we’ll use it for yet.

    Even with my figuring out how to accelerate, it’s looking like we’ll have to rush if we want to hit our promised date of June 9. So we might delay that a little… maybe even Kickstart some stretch goals? I dunno, I’m leaving that all up to glip and just writing stuff.

  • writing: While I’m at it, I actually picked up and worked on a Twine from ages ago. Cool.

  • idchoppers: Holy moly, it actually works. The basics actually work, at least. I can’t believe how much effort this hecking took.

    I also tried to start putting together an actual map API, with mixed results. And tried to figure out the maximum distance you can jump in Doom, which is surprisingly tricky? Doom physics are super goofy.

  • blog: I actually published a post, which is even tangentially about that idchoppers stuff! Wow! Maybe I’ll do it again, even!

Huh, that almost makes it sound like I’ve been busy.

Weekly roundup: Forwards

Post Syndicated from Eevee original https://eev.ee/dev/2018/03/14/weekly-roundup-forwards/

  • art: Did some doodles. Not as frequently as I’d like, and mostly not published, but I did some, and that’s nice.

  • alice: Continuin’ on, though mostly planning and tech stuff this week, not so much writing.

  • irl: I did my taxes oh boy!

  • blog: I made decent progress on last month’s posts, but, am still not done yet. Sorry. I only have so much energy I can pour into writing at a time, apparently, and working on a visual novel is eating up tons of it.

  • anise: We picked up progress on this game again, came up with a bunch more things to populate the world, and both did some sketches of them! Also I did some basic tile collision merging, which I’d been meaning to do for a while, and which had promising results.

  • idchoppers: I got arbitrary poly splitting mostly working, finally…! I can’t believe how much effort this is taking, but it doesn’t help that I’m only dedicating a couple hours at a time to it completely sporadically. Maybe I’ll have something to show for it soon.

The visual novel is eating most of my time lately, and I’m struggling to get back that writing momentum, and in the meantime it feels like it’s consuming all my time and not letting me do anything else! I’m getting there, though.

Physical computing blocks at Maker Faire New York

Post Syndicated from Matt Richardson original https://www.raspberrypi.org/blog/physical-computing-blocks/

At events like Maker Faire New York, we love offering visitors the chance to try out easy, inviting, and hands-on activities, so we teamed up with maker Ben Light to create interactive physical computing blocks.

Raspberry Blocks FINAL

In response to the need for hands-on, easy and inviting activities at events such as Maker Faire New York, we teamed up with maker Ben Light to create our interactive physical computing blocks.

Getting hands-on experience at events

At the Raspberry Pi Foundation, we often have the opportunity to engage with families and young people at events such as Maker Faires and STEAM festivals. When we set up a booth, it’s really important to us that we provide an educational, fun experience for everyone who visits us. But there are a few reasons why this can be a challenge.

Girls use the physical computing blocks at Maker Faire New York

For one, you have a broad audience of people with differing levels of experience with computers. Moreover, some people want to take the time to learn a lot, others just want to try something quick and move on. And on top of that, the environment is often loud, crowded, and chaotic…in a good way!

Creating our physical computing blocks

We were up against these challenges when we set out to create a new physical computing experience for our World Maker Faire New York booth. Our goal was to give people the opportunity to try a little bit of circuit making and a little bit of coding — and they should be able to get hands-on with the activity right away.




Inspired by Exploratorium’s Tinkering Studio, we sketched out physical computing blocks which let visitors use the Raspberry Pi’s GPIO pins without needing to work with tiny components or needing to understand how a breadboard works. We turned the sketches over to our friend Ben Light in New York City, and he brought the project to life.

Father and infant child clip crocodile leads to the Raspberry Pi physical computing blocks at Maker Faire New York

As you can see, the activity turned out really well, so we hope to bring it to more events in the future. Thank you, Ben Light, for collaborating with us on it!

The post Physical computing blocks at Maker Faire New York appeared first on Raspberry Pi.

Growing up alongside tech

Post Syndicated from Eevee original https://eev.ee/blog/2017/08/09/growing-up-alongside-tech/

IndustrialRobot asks… or, uh, asked last month:

industrialrobot: How has your views on tech changed as you’ve got older?

This is so open-ended that it’s actually stumped me for a solid month. I’ve had a surprisingly hard time figuring out where to even start.


It’s not that my views of tech have changed too much — it’s that they’ve changed very gradually. Teasing out and explaining any one particular change is tricky when it happened invisibly over the course of 10+ years.

I think a better framework for this is to consider how my relationship to tech has changed. It’s gone through three pretty distinct phases, each of which has strongly colored how I feel and talk about technology.

Act I

In which I start from nothing.

Nothing is an interesting starting point. You only really get to start there once.

Learning something on my own as a kid was something of a magical experience, in a way that I don’t think I could replicate as an adult. I liked computers; I liked toying with computers; so I did that.

I don’t know how universal this is, but when I was a kid, I couldn’t even conceive of how incredible things were made. Buildings? Cars? Paintings? Operating systems? Where does any of that come from? Obviously someone made them, but it’s not the sort of philosophical point I lingered on when I was 10, so in the back of my head they basically just appeared fully-formed from the æther.

That meant that when I started trying out programming, I had no aspirations. I couldn’t imagine how far I would go, because all the examples of how far I would go were completely disconnected from any idea of human achievement. I started out with BASIC on a toy computer; how could I possibly envision a connection between that and something like a mainstream video game? Every new thing felt like a new form of magic, so I couldn’t conceive that I was even in the same ballpark as whatever process produced real software. (Even seeing the source code for GORILLAS.BAS, it didn’t quite click. I didn’t think to try reading any of it until years after I’d first encountered the game.)

This isn’t to say I didn’t have goals. I invented goals constantly, as I’ve always done; as soon as I learned about a new thing, I’d imagine some ways to use it, then try to build them. I produced a lot of little weird goofy toys, some of which entertained my tiny friend group for a couple days, some of which never saw the light of day. But none of it felt like steps along the way to some mountain peak of mastery, because I didn’t realize the mountain peak was even a place that could be gone to. It was pure, unadulterated (!) playing.

I contrast this to my art career, which started only a couple years ago. I was already in my late 20s, so I’d already spend decades seeing a very broad spectrum of art: everything from quick sketches up to painted masterpieces. And I’d seen the people who create that art, sometimes seen them create it in real-time. I’m even in a relationship with one of them! And of course I’d already had the experience of advancing through tech stuff and discovering first-hand that even the most amazing software is still just code someone wrote.

So from the very beginning, from the moment I touched pencil to paper, I knew the possibilities. I knew that the goddamn Sistine Chapel was something I could learn to do, if I were willing to put enough time in — and I knew that I’m not, so I’d have to settle somewhere a ways before that. I knew that I’d have to put an awful lot of work in before I’d be producing anything very impressive.

I did it anyway (though perhaps waited longer than necessary to start), but those aren’t things I can un-know, and so I can never truly explore art from a place of pure ignorance. On the other hand, I’ve probably learned to draw much more quickly and efficiently than if I’d done it as a kid, precisely because I know those things. Now I can decide I want to do something far beyond my current abilities, then go figure out how to do it. When I was just playing, that kind of ambition was impossible.


So, I played.

How did this affect my views on tech? Well, I didn’t… have any. Learning by playing tends to teach you things in an outward sprawl without many abrupt jumps to new areas, so you don’t tend to run up against conflicting information. The whole point of opinions is that they’re your own resolution to a conflict; without conflict, I can’t meaningfully say I had any opinions. I just accepted whatever I encountered at face value, because I didn’t even know enough to suspect there could be alternatives yet.

Act II

That started to seriously change around, I suppose, the end of high school and beginning of college. I was becoming aware of this whole “open source” concept. I took classes that used languages I wouldn’t otherwise have given a second thought. (One of them was Python!) I started to contribute to other people’s projects. Eventually I even got a job, where I had to work with other people. It probably also helped that I’d had to maintain my own old code a few times.

Now I was faced with conflicting subjective ideas, and I had to form opinions about them! And so I did. With gusto. Over time, I developed an idea of what was Right based on experience I’d accrued. And then I set out to always do things Right.

That’s served me decently well with some individual problems, but it also led me to inflict a lot of unnecessary pain on myself. Several endeavors languished for no other reason than my dissatisfaction with the architecture, long before the basic functionality was done. I started a number of “pure” projects around this time, generic tools like imaging libraries that I had no direct need for. I built them for the sake of them, I guess because I felt like I was improving some niche… but of course I never finished any. It was always in areas I didn’t know that well in the first place, which is a fine way to learn if you have a specific concrete goal in mind — but it turns out that building a generic library for editing images means you have to know everything about images. Perhaps that ambition went a little haywire.

I’ve said before that this sort of (self-inflicted!) work was unfulfilling, in part because the best outcome would be that a few distant programmers’ lives are slightly easier. I do still think that, but I think there’s a deeper point here too.

In forgetting how to play, I’d stopped putting any of myself in most of the work I was doing. Yes, building an imaging library is kind of a slog that someone has to do, but… I assume the people who work on software like PIL and ImageMagick are actually interested in it. The few domains I tried to enter and revolutionize weren’t passions of mine; I just happened to walk through the neighborhood one day and decided I could obviously do it better.

Not coincidentally, this was the same era of my life that led me to write stuff like that PHP post, which you may notice I am conspicuously not even linking to. I don’t think I would write anything like it nowadays. I could see myself approaching the same subject, but purely from the point of view of language design, with more contrasts and tradeoffs and less going for volume. I certainly wouldn’t lead off with inflammatory puffery like “PHP is a community of amateurs”.

Act III

I think I’ve mellowed out a good bit in the last few years.

It turns out that being Right is much less important than being Not Wrong — i.e., rather than trying to make something perfect that can be adapted to any future case, just avoid as many pitfalls as possible. Code that does something useful has much more practical value than unfinished code with some pristine architecture.

Nowhere is this more apparent than in game development, where all code is doomed to be crap and the best you can hope for is to stem the tide. But there’s also a fixed goal that’s completely unrelated to how the code looks: does the game work, and is it fun to play? Yes? Ship the damn thing and forget about it.

Games are also nice because it’s very easy to pour my own feelings into them and evoke feelings in the people who play them. They’re mine, something with my fingerprints on them — even the games I’ve built with glip have plenty of my own hallmarks, little touches I added on a whim or attention to specific details that I care about.

Maybe a better example is the Doom map parser I started writing. It sounds like a “pure” problem again, except that I actually know an awful lot about the subject already! I also cleverly (accidentally) released some useful results of the work I’ve done thusfar — like statistics about Doom II maps and a few screenshots of flipped stock maps — even though I don’t think the parser itself is far enough along to release yet. The tool has served a purpose, one with my fingerprints on it, even without being released publicly. That keeps it fresh in my mind as something interesting I’d like to keep working on, eventually. (When I run into an architecture question, I step back for a while, or I do other work in the hopes that the solution will reveal itself.)

I also made two simple Pokémon ROM hacks this year, despite knowing nothing about Game Boy internals or assembly when I started. I just decided I wanted to do an open-ended thing beyond my reach, and I went to do it, not worrying about cleanliness and willing to accept a bumpy ride to get there. I played, but in a more experienced way, invoking the stuff I know (and the people I’ve met!) to help me get a running start in completely unfamiliar territory.


This feels like a really fine distinction that I’m not sure I’m doing justice. I don’t know if I could’ve appreciated it three or four years ago. But I missed making toys, and I’m glad I’m doing it again.

In short, I forgot how to have fun with programming for a little while, and I’ve finally started to figure it out again. And that’s far more important than whether you use PHP or not.

The CNC Wood Burner turning heads (and wood, obviously)

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/cnc-wood-burner/

Why stick to conventional laser cutters or CNC machines for creating images on wood, when you can build a device to do the job that is a beautiful piece of art in itself? Mechanical and Computer Science student and Imgur user Tucker Shannon has created a wonderful-looking CNC Wood Burner using a Raspberry Pi and stepper motors. His project has a great vinyl-turntable-like design.

Raspberry Pi CNC Wood Burner

Tucker’s somewhat hypnotic build burns images into wood using a Raspberry Pi and stepper motors
GIF c/o Tucker Shannon

A CNC Wood Burner?

Sure! Why not? Tucker had already put the knowledge he acquired while studying at Oregon State University to good use by catching a bike thief in action with the help of a Raspberry Pi. Thus it’s obvious he has the skills he needed to incorporate our little computer into a project. Moreover, his Skittles portrait of Bill Nye is evidence of his artistic flare, so it’s not surprising that he wanted to make something a little different, and pretty, using code.

Tucker Shannon

“Bill Nye, the Skittles Guy”
Image c/o Tucker Shannon

With an idea in mind and sketches drawn, Tucker first considered using an old record player as the base of his build. Having a rotating deck and arm already in place would have made building his project easier. However, he reports on Imgur:

I thought about that! I couldn’t find any at local thrift shops though. Apparently, they’ve become pretty popular…

We can’t disagree with him. Since his search was unsuccessful, Tucker ended up creating the CNC Wood Burner from scratch.

Raspberry Pi CNC Wood Burner

Concept designs
Image c/o Tucker Shannon

Taking into consideration the lumps and bumps of the wood he would be using as a ‘canvas’, Tucker decided to incorporate a pivot to allow the arm to move smoothly over the rough surface.

The code for the make is currently in ‘spaghetti form’, though Tucker is set to release it, as well as full instructions for the build, in the near future.

The build

Tucker laser-cut the pieces for the wood burner’s box and gear out of birch and pine wood. As the motors require 12v power, the standard Raspberry Pi supply wasn’t going to be enough. Therefore, Tucker scavenged for old computer parts , and ended up rescuing a PSU (power supply unit). He then fitted the PSU and the Raspberry Pi within the box.

Raspberry Pi CNC Wood Burner

The cannibalised PSU, stepper motor controller, and Raspberry Pi fit nicely into Tucker’s handmade pine box.
Image c/o Tucker Shannon

Next, he got to work building runners for the stepper motor controlling the position of the ‘pen thing’ that would scorch the image into the wood.

Raspberry Pi CNC Wood Burner

Initial tests on paper help to align the pen
Image c/o Tucker Shannon

After a few test runs using paper, the CNC Wood Burner was good to go!

The results

Tucker has used his CNC Wood Burner to create some wonderful pieces of art. The few examples he’s shared on Imgur have impressed us with their precision. We’re looking forward to seeing what else he is going to make with it!

Raspberry Pi CNC Wood Burner

The build burns wonderfully clean-lined images into wood
Image c/o Tucker Shannon

Your turn

Image replication using Raspberry Pis and stepper motors isn’t a new thing – though doing it using a wood-burning device may be! We’ve seen some great builds in which makers set up motors and a marker pen to create massive works of art. Are you one of those makers? Or have you been planning a build similar to Tucker’s project, possibly with a new twist?

Share your project with us below, whether it is complete or still merely sketches in a notebook. We’d love to see what you’re getting up to!

The post The CNC Wood Burner turning heads (and wood, obviously) appeared first on Raspberry Pi.

Join Us at the 10th Annual Hadoop Summit / DataWorks Summit, San Jose (Jun 13-15)

Post Syndicated from mikesefanov original https://yahooeng.tumblr.com/post/160966148886

yahoohadoop:

image

We’re excited to co-host the 10th Annual Hadoop Summit, the leading conference for the Apache Hadoop community, taking place on June 13 – 15 at the San Jose Convention Center. In the last few years, the Hadoop Summit has expanded to cover all things data beyond just Apache Hadoop – such as data science, cloud and operations, IoT and applications – and has been aptly renamed the DataWorks Summit. The three-day program is bursting at the seams! Here are just a few of the reasons why you cannot miss this must-attend event:

  • Familiarize yourself with the cutting edge in Apache project developments from the committers
  • Learn from your peers and industry experts about innovative and real-world use cases, development and administration tips and tricks, success stories and best practices to leverage all your data – on-premise and in the cloud – to drive predictive analytics, distributed deep-learning and artificial intelligence initiatives
  • Attend one of our more than 170 technical deep dive breakout sessions from nearly 200 speakers across eight tracks
  • Check out our keynotes, meetups, trainings, technical crash courses, birds-of-a-feather sessions, Women in Big Data and more
  • Attend the community showcase where you can network with sponsors and industry experts, including a host of startups and large companies like Microsoft, IBM, Oracle, HP, Dell EMC and Teradata

Similar to previous years, we look forward to continuing Yahoo’s decade-long tradition of thought leadership at this year’s summit. Join us for an in-depth look at Yahoo’s Hadoop culture and for the latest in technologies such as Apache Tez, HBase, Hive, Data Highway Rainbow, Mail Data Warehouse and Distributed Deep Learning at the breakout sessions below. Or, stop by Yahoo kiosk #700 at the community showcase.

Also, as a co-host of the event, Yahoo is pleased to offer a 20% discount for the summit with the code MSPO20. Register here for Hadoop Summit, San Jose, California!


DAY 1. TUESDAY June 13, 2017


12:20 – 1:00 P.M. TensorFlowOnSpark – Scalable TensorFlow Learning On Spark Clusters

Andy Feng – VP Architecture, Big Data and Machine Learning

Lee Yang – Sr. Principal Engineer

In this talk, we will introduce a new framework, TensorFlowOnSpark, for scalable TensorFlow learning, that was open sourced in Q1 2017. This new framework enables easy experimentation for algorithm designs, and supports scalable training & inferencing on Spark clusters. It supports all TensorFlow functionalities including synchronous & asynchronous learning, model & data parallelism, and TensorBoard. It provides architectural flexibility for data ingestion to TensorFlow and network protocols for server-to-server communication. With a few lines of code changes, an existing TensorFlow algorithm can be transformed into a scalable application.

2:10 – 2:50 P.M. Handling Kernel Upgrades at Scale – The Dirty Cow Story

Samy Gawande – Sr. Operations Engineer

Savitha Ravikrishnan – Site Reliability Engineer

Apache Hadoop at Yahoo is a massive platform with 36 different clusters spread across YARN, Apache HBase, and Apache Storm deployments, totaling 60,000 servers made up of 100s of different hardware configurations accumulated over generations, presenting unique operational challenges and a variety of unforeseen corner cases. In this talk, we will share methods, tips and tricks to deal with large scale kernel upgrade on heterogeneous platforms within tight timeframes with 100% uptime and no service or data loss through the Dirty COW use case (privilege escalation vulnerability found in the Linux Kernel in late 2016).

5:00 – 5:40 P.M. Data Highway Rainbow –  Petabyte Scale Event Collection, Transport, and Delivery at Yahoo

Nilam Sharma – Sr. Software Engineer

Huibing Yin – Sr. Software Engineer

This talk presents the architecture and features of Data Highway Rainbow, Yahoo’s hosted multi-tenant infrastructure which offers event collection, transport and aggregated delivery as a service. Data Highway supports collection from multiple data centers & aggregated delivery in primary Yahoo data centers which provide a big data computing cluster. From a delivery perspective, Data Highway supports endpoints/sinks such as HDFS, Storm and Kafka; with Storm & Kafka endpoints tailored towards latency sensitive consumers.


DAY 2. WEDNESDAY June 14, 2017


9:05 – 9:15 A.M. Yahoo General Session – Shaping Data Platform for Lasting Value

Sumeet Singh  – Sr. Director, Products

With a long history of open innovation with Hadoop, Yahoo continues to invest in and expand the platform capabilities by pushing the boundaries of what the platform can accomplish for the entire organization. In the last 11 years (yes, it is that old!), the Hadoop platform has shown no signs of giving up or giving in. In this talk, we explore what makes the shared multi-tenant Hadoop platform so special at Yahoo.

12:20 – 1:00 P.M. CaffeOnSpark Update – Recent Enhancements and Use Cases

Mridul Jain – Sr. Principal Engineer

Jun Shi – Principal Engineer

By combining salient features from deep learning framework Caffe and big-data frameworks Apache Spark and Apache Hadoop, CaffeOnSpark enables distributed deep learning on a cluster of GPU and CPU servers. We released CaffeOnSpark as an open source project in early 2016, and shared its architecture design and basic usage at Hadoop Summit 2016. In this talk, we will update audiences about the recent development of CaffeOnSpark. We will highlight new features and capabilities: unified data layer which multi-label datasets, distributed LSTM training, interleave testing with training, monitoring/profiling framework, and docker deployment.

12:20 – 1:00 P.M. Tez Shuffle Handler – Shuffling at Scale with Apache Hadoop

Jon Eagles – Principal Engineer  

Kuhu Shukla – Software Engineer

In this talk we introduce a new Shuffle Handler for Tez, a YARN Auxiliary Service, that addresses the shortcomings and performance bottlenecks of the legacy MapReduce Shuffle Handler, the default shuffle service in Apache Tez. The Apache Tez Shuffle Handler adds composite fetch which has support for multi-partition fetch to mitigate performance slow down and provides deletion APIs to reduce disk usage for long running Tez sessions. As an emerging technology we will outline future roadmap for the Apache Tez Shuffle Handler and provide performance evaluation results from real world jobs at scale.

2:10 – 2:50 P.M. Achieving HBase Multi-Tenancy with RegionServer Groups and Favored Nodes

Thiruvel Thirumoolan – Principal Engineer

Francis Liu – Sr. Principal Engineer

At Yahoo! HBase has been running as a hosted multi-tenant service since 2013. In a single HBase cluster we have around 30 tenants running various types of workloads (ie batch, near real-time, ad-hoc, etc). We will walk through multi-tenancy features explaining our motivation, how they work as well as our experiences running these multi-tenant clusters. These features will be available in Apache HBase 2.0.

2:10 – 2:50 P.M. Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse

Nick Huang – Director, Data Engineering, Yahoo Mail  

Saurabh Dixit – Sr. Principal Engineer, Yahoo Mail

Since 2014, the Yahoo Mail Data Engineering team took on the task of revamping the Mail data warehouse and analytics infrastructure in order to drive the continued growth and evolution of Yahoo Mail. Along the way we have built a 50 PB Hadoop warehouse, and surrounding analytics and machine learning programs that have transformed the way data plays in Yahoo Mail. In this session we will share our experience from this 3 year journey, from the system architecture, analytics systems built, to the learnings from development and drive for adoption.

DAY3. THURSDAY June 15, 2017


2:10 – 2:50 P.M. OracleStore – A Highly Performant RawStore Implementation for Hive Metastore

Chris Drome – Sr. Principal Engineer  

Jin Sun – Principal Engineer

Today, Yahoo uses Hive in many different spaces, from ETL pipelines to adhoc user queries. Increasingly, we are investigating the practicality of applying Hive to real-time queries, such as those generated by interactive BI reporting systems. In order for Hive to succeed in this space, it must be performant in all aspects of query execution, from query compilation to job execution. One such component is the interaction with the underlying database at the core of the Metastore. As an alternative to ObjectStore, we created OracleStore as a proof-of-concept. Freed of the restrictions imposed by DataNucleus, we were able to design a more performant database schema that better met our needs. Then, we implemented OracleStore with specific goals built-in from the start, such as ensuring the deduplication of data. In this talk we will discuss the details behind OracleStore and the gains that were realized with this alternative implementation. These include a reduction of 97%+ in the storage footprint of multiple tables, as well as query performance that is 13x faster than ObjectStore with DirectSQL and 46x faster than ObjectStore without DirectSQL.

3:00 P.M. – 3:40 P.M. Bullet – A Real Time Data Query Engine

Akshai Sarma – Sr. Software Engineer

Michael Natkovich – Director, Engineering

Bullet is an open sourced, lightweight, pluggable querying system for streaming data without a persistence layer implemented on top of Storm. It allows you to filter, project, and aggregate on data in transit. It includes a UI and WS. Instead of running queries on a finite set of data that arrived and was persisted or running a static query defined at the startup of the stream, our queries can be executed against an arbitrary set of data arriving after the query is submitted. In other words, it is a look-forward system. Bullet is a multi-tenant system that scales independently of the data consumed and the number of simultaneous queries. Bullet is pluggable into any streaming data source. It can be configured to read from systems such as Storm, Kafka, Spark, Flume, etc. Bullet leverages Sketches to perform its aggregate operations such as distinct, count distinct, sum, count, min, max, and average.

3:00 P.M. – 3:40 P.M. Yahoo – Moving Beyond Running 100% of Apache Pig Jobs on Apache Tez

Rohini Palaniswamy – Sr. Principal Engineer

Last year at Yahoo, we spent great effort in scaling, stabilizing and making Pig on Tez production ready and by the end of the year retired running Pig jobs on Mapreduce. This talk will detail the performance and resource utilization improvements Yahoo achieved after migrating all Pig jobs to run on Tez. After successful migration and the improved performance we shifted our focus to addressing some of the bottlenecks we identified and new optimization ideas that we came up with to make it go even faster. We will go over the new features and work done in Tez to make that happen like custom YARN ShuffleHandler, reworking DAG scheduling order, serialization changes, etc. We will also cover exciting new features that were added to Pig for performance such as bloom join and byte code generation.

4:10 P.M. – 4:50 P.M. Leveraging Docker for Hadoop Build Automation and Big Data Stack Provisioning

Evans Ye,  Software Engineer

Apache Bigtop as an open source Hadoop distribution, focuses on developing packaging, testing and deployment solutions that help infrastructure engineers to build up their own customized big data platform as easy as possible. However, packages deployed in production require a solid CI testing framework to ensure its quality. Numbers of Hadoop component must be ensured to work perfectly together as well. In this presentation, we’ll talk about how Bigtop deliver its containerized CI framework which can be directly replicated by Bigtop users. The core revolution here are the newly developed Docker Provisioner that leveraged Docker for Hadoop deployment and Docker Sandbox for developer to quickly start a big data stack. The content of this talk includes the containerized CI framework, technical detail of Docker Provisioner and Docker Sandbox, a hierarchy of docker images we designed, and several components we developed such as Bigtop Toolchain to achieve build automation.

Register here for Hadoop Summit, San Jose, California with a 20% discount code MSPO20

Questions? Feel free to reach out to us at [email protected] Hope to see you there!

Inktober

Post Syndicated from Eevee original https://eev.ee/blog/2016/10/23/inktober/

Inktober is an ancient and hallowed art tradition, dating all the way back to sometime, when it was started by someone. The idea is simple: draw something in ink every day. Real ink. You know. On paper.

I tried this last year. I quit after four days. Probably because I tried to do it without pencil sketches, and I’m really not very good at drawing things correctly the first time. I’d hoped that forcing myself to do it would spark some improvement, but all it really produced was half a week of frustration and bad artwork.

This year, I was convinced to try again without unnecessarily handicapping myself, so I did that. Three weeks and more than forty ink drawings later, here are some thoughts.

Some background

I’ve been drawing seriously since the beginning of 2015. I spent the first few months working primarily in pencil, until I was gifted a hand-me-down tablet in March; almost everything has been digital since then.

I’ve been fairly lax about learning to use color effectively — I have enough trouble just producing a sketch I like, so I’ve mostly been trying to improve there. Doesn’t feel worth the effort to color a sketch I’m not really happy with, and by the time I’m really happy with it, I’m itching to draw something else. Whoops. Until I get quicker or find some mental workaround, monochrome ink is a good direction to try.

I have an ongoing “daily” pokémon series, so I’ve been continuing that in ink. (Everyone else seems to be using some list of single-word prompts, but I didn’t even know about that until after I’d started, so, whoops.)

I’ve got a few things I want to get better at:

  • Detailing, whatever that means. Part of the problem is that I’m not sure what it means. My art is fairly simple and cartoony, and I know it’s possible to be more detailed without doing realistic shading, but I don’t have a grasp of how to think about that.

  • Better edges, which mostly means line weight. I mentally categorize this as a form of scale, which also includes tips like “don’t let parallel lines get too close together” and “don’t draw one or two very small details”.

  • Better backgrounds and environments. Or, let’s be honest, any backgrounds and environments — I draw an awful lot of single characters floating in an empty white void. My fixed-size canvas presents an obvious and simple challenge: fill the page!

  • More interesting poses, and relatedly, getting a better hang of anatomy. I started drawing the pokémon series partly for this reason: a great many pokémon have really unusual shapes I’ve tried drawing before. Dealing with weird anatomy and trying to map it to my existing understanding should hopefully flex some visualization muscles.

  • Lighting, probably? I’m aware that things not facing a light source are in shadow, but my understanding doesn’t extend very far beyond that. How does light affect a large outdoor area? How can you represent the complexity of light and shadow with only a single pen? Art, especially cartoony art, has an entire vocabulary of subtle indicators of shadow and volume that I don’t know much about.

Let’s see what exactly I’ve learned.

Analog materials are very different

I’ve drawn plenty of pencil sketches on paper, and I’ve done a few watercolors, but I’ve never done this volume of “serious” art on paper before.

All my inks so far are in a 3.5” × 5” sketchbook. I’ll run out of pages in a few days, at which point I’ll finish up the month in a bigger sketchbook. It’s been a mixed blessing: I have less page to fill, but details are smaller and more fiddly, so mistakes are more obvious. I also don’t have much room for error with the composition.

I started out drawing with a small black Faber–Castell “PITT artist pen”. Around day five, I borrowed C3 and C7 (light and dark cool greys) Copic sketch markers from Mel; later I got a C5 as well. A few days ago I bought a Lamy Safari fountain pen with Noodler’s Heart of Darkness ink.

Both the FC pen and the fountain pen are ultimately still pens, but they have some interesting differences in edge cases. Used very lightly at an extreme angle, the FC pen produces very scratchy-looking lines… sometimes. Sometimes it does nothing instead, and you must precariously tilt the pen until you find the magical angle, hoping you don’t suddenly get a solid line where you didn’t want it. The Lamy has been much more consistent: it’s a little more willing to draw thinner lines than it’s intended for, and it hasn’t created any unpleasant surprises. The Lamy feels much smoother overall, like it flows, which is appropriate since that’s how fountain pens work.

Markers are interesting. The last “serious” art I did on paper was watercolor, which is pretty fun — I can water a color down however much I want, and if I’m lucky and fast, I can push color around on the paper a bit before it dries. Markers, ah, not so much. Copics are supposed to be blendable, but I’ve yet to figure out how to make that happen. It might be that my sketchbook’s paper is too thin, but the ink seems to dry within seconds, too fast for me to switch markers and do much of anything. For the same reason, I have to color an area by… “flood-filling”? I can’t let the edge of the colored area dry, or when I go back to extend that edge, I’ll be putting down a second layer of ink and create an obvious dark band. I’ve learned to keep the edge wet as much as possible.

On the plus side, going over dry ink in the same color will darken it, and I’ve squeezed several different shades of gray out of just the light marker. The brush tip can be angled in several different ways to make different shapes; I’ve managed a grassy background and a fur texture just by holding the marker differently. Marker ink does bleed very slightly, but it tends to stop at pen ink, a feature I’ve wanted in digital art for at least a century. I can also kinda make strokes that fade out by moving the marker quickly and lifting it off the paper as I go; surely there are more clever things to be done here, but I’ve yet to figure them out.

The drawing of bergmite above was done as the light marker started to run dry, which is not a problem I was expecting. The marker still worked, but not very well. The strokes on the cave wall in the background aren’t a deliberate effect; those are the strokes the marker was making, and I tried to use them as best I could. I didn’t have the medium marker yet, and the dark marker is very dark — almost black. I’d already started laying down marker, so I couldn’t very well finish the picture with just the pen, and I had to improvise.

Ink is permanent

Well. Obviously.

I have to be pretty careful about what I draw, which creates a bit of a conflict. If I make smooth, confident strokes, I’m likely to fuck them up, and I can’t undo and try again. If I make a lot of short strokes, I get those tell-tale amateurish scratchy lines. If I trace my sketch very carefully and my hand isn’t perfectly steady, the resulting line will be visibly shaky.

I probably exacerbated the shaky lines with my choice of relatively small paper; there’s no buffer between those tiny wobbles and the smallest level of detail in the drawing itself. I can’t always even see where my tiny sketch is going, because my big fat fingers are in the way.

I’ve also had the problem that my sketch is such a mess that I can’t tell where a line is supposed to be going… until I’ve drawn it and it’s obviously wrong. Again, small paper exacerbates this by compressing sketches.

Since I can’t fix mistakes, I’ve had to be a little creative about papering over them.

  • I did one ink with very stark contrast: shadows were completely filled with ink, highlights were bare paper. No shading, hatching, or other middle ground. I’d been meaning to try the approach anyway, but I finally did it after making three or four glaring mistakes. In the final work, they’re all hidden in shadow, so you can’t really tell anything ever went wrong.

  • I’ve managed to disguise several mistakes of the “curved this line too early” variety just by adding some more parallel strokes and pretending I intended to hatch it all along.

  • One of the things I’ve been trying to figure out is varying line weight, and one way to vary it is to make edges thicker when in shadows. A clever hack has emerged here.

    You see, it’s much easier for me to draw an upwards arc than a downwards arc. (I think this is fairly universal?) I can of course just rotate the paper, but if I’m drawing a cylinder, it’s pretty obvious when the top was drawn with a slight bias in one direction and the bottom was drawn with a slight bias in the other direction.

    My lifehack is to draw the top and bottom with the paper oriented the same way, then gradually thicken the bottom, “carving” it into the right shape as I go. I can make a lot of small adjustments and still end up with a single smooth line that looks more or less deliberate.

  • As a last resort… leave it and hope no one notices. That’s what I did for the floatzel above, who has a big fat extra stroke across their lower stomach. It’s in one of the least interesting parts of the picture, though, so it doesn’t really stand out, even though it’s on one of the lightest surfaces.

Ink takes a while

Ink drawings feel like they’ve consumed my entire month. Sketching and then lining means drawing everything twice. Using physical ink means I have to nail the sketch — but I’m used to digital, where I can sketch sloppily and then fixing up lines as I go. I also can’t rearrange the sketch, move it around on the paper if I started in the wrong place, or even erase precisely, so I’ve had to be much more careful and thoughtful even with pencil. That’s a good thing — I don’t put nearly enough conscious thought into what I’m drawing — but it definitely takes longer. In a few thorny cases I’ve even resorted to doing a very loose digital sketch, then drawing the pencil sketch based off of that.

All told, each one takes maybe two hours, and I’ve been doing two at a time… but wait, that’s still only four hours, right? How are they taking most of a day?

I suspect a bunch of factors are costing me more time than expected. If I can’t think of a scene idea, I’ll dawdle on Twitter for a while. Two “serious” attempts in a medium I’m not used to can be a little draining and require a refractory period. Fragments of time between or around two larger tasks are, of course, lost forever. And I guess there’s that whole thing where I spent half the month waking up in the middle of the night for no reason and then being exhausted by late evening.

Occasionally I’ve experimented with some approach that turns out to be incredibly tedious and time-consuming, like the early Gardevoir above. You would not believe how long that damn grass took. Or maybe you would, if you’d ever tried similar. Even the much lazier tree-covered mountain in the background seemed to take a while. And this is on a fairly small canvas!

I’m feeling a bit exhausted with ink work at this point, which is not the best place to be after buying a bunch of ink supplies. I definitely want to do more of it in the future, but maybe not daily. I also miss being able to undo. Sweet, sweet undo.

Precision is difficult, and I am bad at planning

These turn out to be largely the same problem.

I’m not a particularly patient person, so I like to jump from the sketch into the inking as soon as possible. Sometimes this means I overlook some details. Here’s that whole “not consciously thinking enough” thing again. Consider, in the above image,

  • The two buildings at the top right are next to each other, yet the angles of their roofs suggest they’re facing in slightly different directions, which doesn’t make a lot of sense for artificial structures.

  • The path leading from the dock doesn’t quite make sense, and the general scale of the start of the dock versus the shrubs and trees is nonsense. The trees themselves are pretty cool, but it looks like I plopped them down individually without really having a full coherent plan going in. Which is exactly what happened.

    Imagining spaces in enough detail to draw them is tough, and not something I’ve really had to do much before. It’s ultimately the same problem I have with game level design, though, so hopefully a breakthrough in one will help me with the other.

  • Phantump’s left eye has a clear white edge showing the depth of the hole in the trunk, but the right eye’s edge was mostly lost to some errant strokes and subsequent attempts to fix them. Also, even the left margin is nowhere near as thick as the trunk’s bottom edge.

  • The crosshatched top of phantump’s head blends into the noisy grassy background. The fix for this is to leave a thin white edge around the top of the head. I think I intended to do this, then completely forgot about it as I was drawing the grass. I suppose I’m not used to reasoning about negative space; I can’t mark or indicate it in any way, nor erase the ink if I later realize I laid down too much.

  • The pupils don’t quite match, but I’d already carved them down a good bit. Negative space problem again. Highlights on dark areas have been a recurring problem all month, especially with markers.

I have no idea how people make beautifully precise inkwork. At the same time, I’ve long had the suspicion that I worry too much about precision and should be a lot looser. I’m missing something here, and I don’t know what it is.

What even is pokémon anatomy

This is a wigglytuff. Wigglytuffs are tall blobs with ears.

I had such a hard time sketching this. (Probably why I rushed the background.)

It turns out that if you draw a wigglytuff even slightly off, the result is a tall blob with ears rather than a wigglytuff. That makes no sense, especially given that wigglytuffs are balloons. Surely, the shape shouldn’t be such a strong part of the wigglytuff identity, and yet it is.

Maybe half of the pokémon I’ve drawn have had some anatomical surprise, even ones I thought I was familiar with. Aerodactyl and huntail have a really pronounced lower jaw. Palpitoad has no arms at all. Pelipper is 70% mouth. Zangoose seems like a straightforward mammal at first glance, but the legs and body and head are all kind of a single blob. Numerous pokémon have no distinct neck, or no distinct shoulders, or a very round abdomen with legs kind of arbitrarily attached somewhere.

Progress, maybe

I don’t know what precisely I’ve gotten out of this experience. I can’t measure artistic progress from one day to the next. I do feel like I’ve gleaned some things, but they seem to be very abstract things. I’m out of the total beginner weeds and solidly into the intermediate hell of just picking up hundreds of little things no one really talks about. All I can do is cross my fingers and push forwards.

The crowd favorite so far is this mega rayquaza, which is kinda funny to me because I don’t feel like I did anything special here. I just copied a bunch of fiddly details. It looks cool, but it felt more like rote work than a struggle to do a new thing.

My own favorite is this much simpler qwilfish. It’s the culmination of several attempts to draw water that I liked, and it came out the best by far. The highlight is also definitely the best I’ve drawn this month. Interesting how that works out.

The rest are on on Tumblr, or in this single Twitter thread.

Open Sourcing a Deep Learning Solution for Detecting NSFW Images

Post Syndicated from davglass original https://yahooeng.tumblr.com/post/151148689421

By Jay Mahadeokar and Gerry Pesavento

Automatically identifying that an image is not suitable/safe for work (NSFW), including offensive and adult images, is an important problem which researchers have been trying to tackle for decades. Since images and user-generated content dominate the Internet today, filtering NSFW images becomes an essential component of Web and mobile applications. With the evolution of computer vision, improved training data, and deep learning algorithms, computers are now able to automatically classify NSFW image content with greater precision.

Defining NSFW material is subjective and the task of identifying these images is non-trivial. Moreover, what may be objectionable in one context can be suitable in another. For this reason, the model we describe below focuses only on one type of NSFW content: pornographic images. The identification of NSFW sketches, cartoons, text, images of graphic violence, or other types of unsuitable content is not addressed with this model.

To the best of our knowledge, there is no open source model or algorithm for identifying NSFW images. In the spirit of collaboration and with the hope of advancing this endeavor, we are releasing our deep learning model that will allow developers to experiment with a classifier for NSFW detection, and provide feedback to us on ways to improve the classifier.

Our general purpose Caffe deep neural network model (Github code) takes an image as input and outputs a probability (i.e a score between 0-1) which can be used to detect and filter NSFW images. Developers can use this score to filter images below a certain suitable threshold based on a ROC curve for specific use-cases, or use this signal to rank images in search results.

Convolutional Neural Network (CNN) architectures and tradeoffs

In recent years, CNNs have become very successful in image classification problems [1] [5] [6]. Since 2012, new CNN architectures have continuously improved the accuracy of the standard ImageNet classification challenge. Some of the major breakthroughs include AlexNet (2012) [6], GoogLeNet [5], VGG (2013) [2] and Residual Networks (2015) [1]. These networks have different tradeoffs in terms of runtime, memory requirements, and accuracy. The main indicators for runtime and memory requirements are:

  1. Flops or connections – The number of connections in a neural network determine the number of compute operations during a forward pass, which is proportional to the runtime of the network while classifying an image.
  2. Parameters -–The number of parameters in a neural network determine the amount of memory needed to load the network.

Ideally we want a network with minimum flops and minimum parameters, which would achieve maximum accuracy.

Training a deep neural network for NSFW classification

We train the models using a dataset of positive (i.e. NSFW) images and negative (i.e. SFW – suitable/safe for work) images. We are not releasing the training images or other details due to the nature of the data, but instead we open source the output model which can be used for classification by a developer.

We use the Caffe deep learning library and CaffeOnSpark; the latter is a powerful open source framework for distributed learning that brings Caffe deep learning to Hadoop and Spark clusters for training models (Big shout out to Yahoo’s CaffeOnSpark team!).

While training, the images were resized to 256×256 pixels, horizontally flipped for data augmentation, and randomly cropped to 224×224 pixels, and were then fed to the network. For training residual networks, we used scale augmentation as described in the ResNet paper [1], to avoid overfitting. We evaluated various architectures to experiment with tradeoffs of runtime vs accuracy.

  1. MS_CTC [4] – This architecture was proposed in Microsoft’s constrained time cost paper. It improves on top of AlexNet in terms of speed and accuracy maintaining a combination of convolutional and fully-connected layers.
  2. Squeezenet [3] – This architecture introduces the fire module which contain layers to squeeze and then expand the input data blob. This helps to save the number of parameters keeping the Imagenet accuracy as good as AlexNet, while the memory requirement is only 6MB.
  3. VGG [2] – This architecture has 13 conv layers and 3 FC layers.
  4. GoogLeNet [5] – GoogLeNet introduces inception modules and has 20 convolutional layer stages. It also uses hanging loss functions in intermediate layers to tackle the problem of diminishing gradients for deep networks.
  5. ResNet-50 [1] – ResNets use shortcut connections to solve the problem of diminishing gradients. We used the 50-layer residual network released by the authors.
  6. ResNet-50-thin – The model was generated using our pynetbuilder tool and replicates the Residual Network paper’s 50-layer network (with half number of filters in each layer). You can find more details on how the model was generated and trained here.

Tradeoffs of different architectures: accuracy vs number of flops vs number of params in network.

The deep models were first pre-trained on the ImageNet 1000 class dataset. For each network, we replace the last layer (FC1000) with a 2-node fully-connected layer. Then we fine-tune the weights on the NSFW dataset. Note that we keep the learning rate multiplier for the last FC layer 5 times the multiplier of other layers, which are being fine-tuned. We also tune the hyper parameters (step size, base learning rate) to optimize the performance.

We observe that the performance of the models on NSFW classification tasks is related to the performance of the pre-trained model on ImageNet classification tasks, so if we have a better pretrained model, it helps in fine-tuned classification tasks. The graph below shows the relative performance on our held-out NSFW evaluation set. Please note that the false positive rate (FPR) at a fixed false negative rate (FNR) shown in the graph is specific to our evaluation dataset, and is shown here for illustrative purposes. To use the models for NSFW filtering, we suggest that you plot the ROC curve using your dataset and pick a suitable threshold.

Comparison of performance of models on Imagenet and their counterparts fine-tuned on NSFW dataset.

We are releasing the thin ResNet 50 model, since it provides good tradeoff in terms of accuracy, and the model is lightweight in terms of runtime (takes < 0.5 sec on CPU) and memory (~23 MB). Please refer our git repository for instructions and usage of our model. We encourage developers to try the model for their NSFW filtering use cases. For any questions or feedback about performance of model, we encourage creating a issue and we will respond ASAP.

Results can be improved by fine-tuning the model for your dataset or use case. If you achieve improved performance or you have trained a NSFW model with different architecture, we encourage contributing to the model or sharing the link on our description page.

Disclaimer: The definition of NSFW is subjective and contextual. This model is a general purpose reference model, which can be used for the preliminary filtering of pornographic images. We do not provide guarantees of accuracy of output, rather we make this available for developers to explore and enhance as an open source project.

We would like to thank Sachin Farfade, Amar Ramesh Kamat, Armin Kappeler, and Shraddha Advani for their contributions in this work.

References:

[1] He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. “Deep residual learning for image recognition” arXiv preprint arXiv:1512.03385 (2015).

[2] Simonyan, Karen, and Andrew Zisserman. “Very deep convolutional networks for large-scale image recognition.”; arXiv preprint arXiv:1409.1556(2014).

[3] Iandola, Forrest N., Matthew W. Moskewicz, Khalid Ashraf, Song Han, William J. Dally, and Kurt Keutzer. “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and 1MB model size.”; arXiv preprint arXiv:1602.07360 (2016).

[4] He, Kaiming, and Jian Sun. “Convolutional neural networks at constrained time cost.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5353-5360. 2015.

[5] Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet,Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. “Going deeper with convolutions” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-9. 2015.

[6] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. “Imagenet classification with deep convolutional neural networks” In Advances in neural information processing systems, pp. 1097-1105. 2012.

Weekly roundup: what even is sleep

Post Syndicated from Eevee original https://eev.ee/dev/2016/08/21/weekly-roundup-what-even-is-sleep/

August is loosely about video games, but really it’s about three big things in particular.

I accidentally went nocturnal again, which always leaves me exhausted for a couple days as atonement for skipping an entire day, so this week was a little less productive than the last two.

  • art: Drew, just, a ridiculous amount of stuff. Pages of sketches. Several birthday gifts. Even some serious attempts to learn from other artists’ work. Somehow I did zero “daily” Pokémon, though.

  • blog: Wrote and published a quick lament about attribution on the web. Also finally put the article “summary” (the bit before the read-more link) in Twitter cards, so, that’s nice.

  • book: Decent progress? Took a bunch more notes; converted a good chunk of my existing notes into real prose; rewrote the preface; cleaned up some aesthetic issues that had been bugging me.

  • hax: I took a day to make a ridiculous ROM hack for a friend’s birthday. Learned some neat things about the Game Boy along the way, though.

I count drawing as genuine useful work, since it’s practicing a thing I’m trying to learn, so I still did a decent amount of stuff this week. Just, ah, not too much of it was the Three Things. My sleep seems to be mostly un-fucked now, so this next week should go a little better. I think I can still have decent progress to show on the three things, though maybe not quite everything I wanted.

Combining Druid and DataSketches for Real-time, Robust Behavioral Analytics

Post Syndicated from mikesefanov original https://yahooeng.tumblr.com/post/147711922956

By Himanshu Gupta

Millions of users around the world interact with Yahoo through their web browsers and mobile devices, generating billions of events every day (e.g. clicking on ads, clicking on various pages of interest, and logging in). As Yahoo’s data grows larger and more complex, we are investing in new ways to better manage and make sense of it. Behavioral analytics is one important branch of analytics in which we are making significant advancements, and is helping us accomplish these tasks.

Beyond simply measuring how many times a user has performed a certain action, we also try to understand patterns in their actions. We do this in order to help us decide which of our features are impactful and might grow our user base, and to understand responses to ads that might help us improve users’ future experiences.

One example of behavioral analytics is measuring user retention rates for Yahoo properties such as Mail, News, and Finance, and breaking down these rates by different user demographics. Another example is to determine which ads perform well for various types of users (as measured by various signals), and to serve ads appropriately based on that implicit or explicit feedback.

The challenges we face in answering these questions mainly concern storing and interactively querying our user-generated events at massive scale. We heavily make use of distributed systems, and Druid is at the forefront of powering most of our real-time analytics at scale.

One of the features that makes Druid very useful is the ability to summarize data at storage time. This leads to greatly-reduced storage requirements, and hence, faster queries. For example, consider the dataset below:

This data represents ad clicks for different website domains. We can see that there are many repeated attributes, which we call “dimensions,” in our data across different timestamps. Now, most of the time we don’t care that a certain ad was clicked at a precise millisecond in time. What is a lot more interesting to us, is how many times an ad was clicked over the period of an hour. Thus, we can truncate the raw event timestamps and group all events with the same set of dimensions. When we group the dimensions, we also aggregate the raw event values for the “clicked” column.

This method is known as summarization, and in practice, we see summarization significantly reduce the amount of raw data we have to store. We’ve chosen to lose some information about the time an event occurred, but there is no loss of fidelity for the “clicked” metric that we really care about.

Let’s consider the same dataset again, but now with information about which user performed the click. When we go to summarize our data, the highly cardinal and unique “user-id” column prevents our data from compacting very well.

The number of unique user-ids could be very high due to the number of users visiting Yahoo everyday. So, in our “user-id” column, we end up effectively storing our raw data. Given that we are mostly interested in how many unique users performed certain actions, and we don’t really care about precisely which users did those actions, it would be nice if we could somehow lose some information about the individual users so that our data could still be summarized.

One approach to solving this problem is to create a “sketch” of the user-id dimension. Instead of storing every single unique user-id, we instead maintain a hash-based data structure – also known as a sketch – which has smaller storage requirements and gives estimates of user-id dimension cardinality with predictable accuracy.

Leveraging sketches, our summarized data for the user dimension looks something like this:

Sketch algorithms are highly desirable because they are very scalable, use predictable storage, work with real-time streams of data, and provide predictable estimates. There are many different algorithms to construct different type of sketches, and a lot of fancy mathematics goes into detail about how sketch algorithms work and why we can get very good estimations of results.

At Yahoo, we recently developed an open source library called DataSketches. DataSketches provides implementations of various approximate sketch-based algorithms that enable faster, cheaper analytics on large datasets. By combining DataSketches with an extremely low-latency data store, such as Druid, you bring sketches into practical use in a big data store. Embedding sketch algorithms in a data store and persisting the actual sketches is relatively novel in the industry, and is the future structure of big data analytics systems.

Druid’s flexible plugin architecture allows us to integrate it with DataSketches; as such, we’ve developed and open sourced an extension to Druid that allows DataSketches to be used as a Druid aggregation function. Druid applies the aggregation function on selected columns and stores aggregated values instead of raw data.

By leveraging the fast, approximate calculations of DataSketches, complex analytic queries such as cardinality estimation and retention analysis can be completed in less than one second in Druid. This allows developers to visualize the results in real-time, and to be able to slice and dice results across a variety of different filters. For example, we can quickly determine how many users visited our core products, including Yahoo News, Sports, and Finance, as well as see how many of those users returned some time later. We can also break down our results in real-time based on user demographics such as age and location.

If you have similar use cases to ours, we invite you to try out DataSketches and Druid for behavioral analytics. For more information about DataSketches, please visit the DataSketches website. For more information about Druid, please visit the project webpage. And finally, documents for the DataSketches and Druid integration can be found in the Druid docs.