Tag Archives: santa

Randomly generated, thermal-printed comics

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/random-comic-strip-generation-vomit-comic-robot/

Python code creates curious, wordless comic strips at random, spewing them from the thermal printer mouth of a laser-cut body reminiscent of Disney Pixar’s WALL-E: meet the Vomit Comic Robot!

The age of the thermal printer!

Thermal printers allow you to instantly print photos, data, and text using a few lines of code, with no need for ink. More and more makers are using this handy, low-maintenance bit of kit for truly creative projects, from Pierre Muth’s tiny PolaPi-Zero camera to the sound-printing Waves project by Eunice Lee, Matthew Zhang, and Bomani McClendon (and our own Secret Santa Babbage).

Vomiting robots

Interaction designer and developer Cadin Batrack, whose background is in game design and interactivity, has built the Vomit Comic Robot, which creates “one-of-a-kind comics on demand by processing hand-drawn images through a custom software algorithm.”

The robot is made up of a Raspberry Pi 3, a USB thermal printer, and a handful of LEDs.

Comic Vomit Robot Cadin Batrack's Raspberry Pi comic-generating thermal printer machine

At the press of a button, Processing code selects one of a set of Cadin’s hand-drawn empty comic grids and then randomly picks images from a library to fill in the gaps.

Vomit Comic Robot Cadin Batrack's Raspberry Pi comic-generating thermal printer machine

Each image is associated with data that allows the code to fit it correctly into the available panels. Cadin says about the concept behing his build:

Although images are selected and placed randomly, the comic panel format suggests relationships between elements. Our minds create a story where there is none in an attempt to explain visuals created by a non-intelligent machine.

The Raspberry Pi saves the final image as a high-resolution PNG file (so that Cadin can sell prints on thick paper via Etsy), and a Python script sends it to be vomited up by the thermal printer.

Comic Vomit Robot Cadin Batrack's Raspberry Pi comic-generating thermal printer machine

For more about the Vomit Comic Robot, check out Cadin’s blog. If you want to recreate it, you can find the info you need in the Imgur album he has put together.

We ❤ cute robots

We have a soft spot for cute robots here at Pi Towers, and of course we make no exception for the Vomit Comic Robot. If, like us, you’re a fan of adorable bots, check out Mira, the tiny interactive robot by Alonso Martinez, and Peeqo, the GIF bot by Abhishek Singh.

Mira Alfonso Martinez Raspberry Pi

The post Randomly generated, thermal-printed comics appeared first on Raspberry Pi.

Own your own working Pokémon Pokédex!

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/deep-learning-pokedex/

Squeal with delight as your inner Pokémon trainer witnesses the wonder of Adrian Rosebrock’s deep learning Pokédex.

Creating a real-life Pokedex with a Raspberry Pi, Python, and Deep Learning

This video demos a real-like Pokedex, complete with visual recognition, that I created using a Raspberry Pi, Python, and Deep Learning. You can find the entire blog post, including code, using this link: https://www.pyimagesearch.com/2018/04/30/a-fun-hands-on-deep-learning-project-for-beginners-students-and-hobbyists/ Music credit to YouTube user “No Copyright” for providing royalty free music: https://www.youtube.com/watch?v=PXpjqURczn8

The history of Pokémon in 30 seconds

The Pokémon franchise was created by video game designer Satoshi Tajiri in 1995. In the fictional world of Pokémon, Pokémon Trainers explore the vast landscape, catching and training small creatures called Pokémon. To date, there are 802 different types of Pokémon. They range from the ever recognisable Pikachu, a bright yellow electric Pokémon, to the highly sought-after Shiny Charizard, a metallic, playing-card-shaped Pokémon that your mate Alex claims she has in mint condition, but refuses to show you.

Pokemon GIF

In the world of Pokémon, children as young as ten-year-old protagonist and all-round annoyance Ash Ketchum are allowed to leave home and wander the wilderness. There, they hunt vicious, deadly creatures in the hope of becoming a Pokémon Master.

Adrian’s deep learning Pokédex

Adrian is a bit of a deep learning pro, as demonstrated by his Santa/Not Santa detector, which we wrote about last year. For that project, he also provided a great explanation of what deep learning actually is. In a nutshell:

…a subfield of machine learning, which is, in turn, a subfield of artificial intelligence (AI).While AI embodies a large, diverse set of techniques and algorithms related to automatic reasoning (inference, planning, heuristics, etc), the machine learning subfields are specifically interested in pattern recognition and learning from data.

As with his earlier Raspberry Pi project, Adrian uses the Keras deep learning model and the TensorFlow backend, plus a few other packages such as Adrian’s own imutils functions and OpenCV.

Adrian trained a Convolutional Neural Network using Keras on a dataset of 1191 Pokémon images, obtaining 96.84% accuracy. As Adrian explains, this model is able to identify Pokémon via still image and video. It’s perfect for creating a Pokédex – an interactive Pokémon catalogue that should, according to the franchise, be able to identify and read out information on any known Pokémon when captured by camera. More information on model training can be found on Adrian’s blog.

Adrian Rosebeck deep learning pokemon pokedex

For the physical build, a Raspberry Pi 3 with camera module is paired with the Raspberry Pi 7″ touch display to create a portable Pokédex. And while Adrian comments that the same result can be achieved using your home computer and a webcam, that’s not how Adrian rolls as a Raspberry Pi fan.

Adrian Rosebeck deep learning pokemon pokedex

Plus, the smaller size of the Pi is perfect for one of you to incorporate this deep learning model into a 3D-printed Pokédex for ultimate Pokémon glory, pretty please, thank you.

Adrian Rosebeck deep learning pokemon pokedex

Adrian has gone into impressive detail about how the project works and how you can create your own on his blog, pyimagesearch. So if you’re interested in learning more about deep learning, and making your own Pokédex, be sure to visit.

The post Own your own working Pokémon Pokédex! appeared first on Raspberry Pi.

Announcing Coolest Projects North America

Post Syndicated from Courtney Lentz original https://www.raspberrypi.org/blog/coolest-projects-north-america/

The Raspberry Pi Foundation loves to celebrate people who use technology to solve problems and express themselves creatively, so we’re proud to expand the incredibly successful event Coolest Projects to North America. This free event will be held on Sunday 23 September 2018 at the Discovery Cube Orange County in Santa Ana, California.

Coolest Projects North America logo Raspberry Pi CoderDojo

What is Coolest Projects?

Coolest Projects is a world-leading showcase that empowers and inspires the next generation of digital creators, innovators, changemakers, and entrepreneurs. The event is both a competition and an exhibition to give young digital makers aged 7 to 17 a platform to celebrate their successes, creativity, and ingenuity.

showcase crowd — Coolest Projects North America

In 2012, Coolest Projects was conceived as an opportunity for CoderDojo Ninjas to showcase their work and for supporters to acknowledge these achievements. Week after week, Ninjas would meet up to work diligently on their projects, hacks, and code; however, it can be difficult for them to see their long-term progress on a project when they’re concentrating on its details on a weekly basis. Coolest Projects became a dedicated time each year for Ninjas and supporters to reflect, celebrate, and share both the achievements and challenges of the maker’s journey.

three female coolest projects attendees — Coolest Projects North America

Coolest Projects North America

Not only is Coolest Projects expanding to North America, it’s also expanding its participant pool! Members of our team have met so many amazing young people creating in all areas of the world, that it simply made sense to widen our outreach to include Code Clubs, students of Raspberry Pi Certified Educators, and members of the Raspberry Jam community at large as well as CoderDojo attendees.

 a boy showing a technology project to an old man, with a girl playing on a laptop on the floor — Coolest Projects North America

Exhibit and attend Coolest Projects

Coolest Projects is a free, family- and educator-friendly event. Young people can apply to exhibit their projects, and the general public can register to attend this one-day event. Be sure to register today, because you make Coolest Projects what it is: the coolest.

The post Announcing Coolest Projects North America appeared first on Raspberry Pi.

Welcome Nathan – Our Solutions Engineer

Post Syndicated from Yev original https://www.backblaze.com/blog/welcome-nathan-our-solutions-engineer/

Backblaze is growing, and with it our need to cater to a lot of different use cases that our customers bring to us. We needed a Solutions Engineer to help out, and after a long search we’ve hired our first one! Lets learn a bit more about Nathan shall we?

What is your Backblaze Title?
Solutions Engineer. Our customers bring a thousand different use cases to both B1 and B2, and I’m here to help them figure out how best to make those use cases a reality. Also, any odd jobs that Nilay wants me to do.

Where are you originally from?
I am native to the San Francisco Bay Area, studying mathematics at UC Santa Cruz, and then computer science at California University of Hayward (which has since renamed itself California University of the East Hills. I observe that it’s still in Hayward).

What attracted you to Backblaze?
As a stable, growing company with huge growth and even bigger potential, the business model is attractive, and the team is outstanding. Add to that the strong commitment to transparency, and it’s a hard company to resist. We can store – and restore – data while offering superior reliability at an economic advantage to do-it-yourself, and that’s a great place to be.

What do you expect to learn while being at Backblaze?
Everything I need to, but principally how our customers choose to interact with web storage. Storage isn’t a solution per se, but it’s an important component of any persistent solution. I’m looking forward to working with all the different concepts our customers have to make use of storage.

Where else have you worked?
All sorts of places, but I’ll admit publicly to EMC, Gemalto, and my own little (failed, alas) startup, IC2N. I worked with low-level document imaging.

Where did you go to school?
UC Santa Cruz, BA Mathematics CU Hayward, Master of Science in Computer Science.

What’s your dream job?
Sipping tea in the California redwood forest. However, solutions engineer at Backblaze is a good second choice!

Favorite place you’ve traveled?
Ashland, Oregon, for the Oregon Shakespeare Festival and the marble caves (most caves form from limestone).

Favorite hobby?
Theater. Pathfinder. Writing. Baking cookies and cakes.

Of what achievement are you most proud?
Marrying the most wonderful man in the world.

Star Trek or Star Wars?
Star Trek’s utopian science fiction vision of humanity and science resonates a lot more strongly with me than the dystopian science fantasy of Star Wars.

Coke or Pepsi?
Neither. I’d much rather have a cup of jasmine tea.

Favorite food?
It varies, but I love Indian and Thai cuisine. Truly excellent Italian food is marvelous – wood fired pizza, if I had to pick only one, but the world would be a boring place with a single favorite food.

Why do you like certain things?
If I knew that, I’d be in marketing.

Anything else you’d like you’d like to tell us?
If you haven’t already encountered the amazing authors Patricia McKillip and Lois McMasters Bujold – go encounter them. Be happy.

There’s nothing wrong with a nice cup of tea and a long game of Pathfinder. Sign us up! Welcome to the team Nathan!

The post Welcome Nathan – Our Solutions Engineer appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Fake Santa Surveillance Camera

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/01/fake_santa_surv.html

Reka makes a “decorative Santa cam,” meaning that it’s not a real camera. Instead, it just gets children used to being under constant surveillance.

Our Santa Cam has a cute Father Christmas and mistletoe design, and a red, flashing LED light which will make the most logical kids suspend their disbelief and start to believe!

Thank you for my new Raspberry Pi, Santa! What next?

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/thank-you-for-my-new-raspberry-pi-santa-what-next/

Note: the Pi Towers team have peeled away from their desks to spend time with their families over the festive season, and this blog will be quiet for a while as a result. We’ll be back in the New Year with a bushel of amazing projects, awesome resources, and much merriment and fun times. Happy holidays to all!

Now back to the matter at hand. Your brand new Christmas Raspberry Pi.

Your new Raspberry Pi

Did you wake up this morning to find a new Raspberry Pi under the tree? Congratulations, and welcome to the Raspberry Pi community! You’re one of us now, and we’re happy to have you on board.

But what if you’ve never seen a Raspberry Pi before? What are you supposed to do with it? What’s all the fuss about, and why does your new computer look so naked?

Setting up your Raspberry Pi

Are you comfy? Good. Then let us begin.

Download our free operating system

First of all, you need to make sure you have an operating system on your micro SD card: we suggest Raspbian, the Raspberry Pi Foundation’s official supported operating system. If your Pi is part of a starter kit, you might find that it comes with a micro SD card that already has Raspbian preinstalled. If not, you can download Raspbian for free from our website.

An easy way to get Raspbian onto your SD card is to use a free tool called Etcher. Watch The MagPi’s Lucy Hattersley show you what you need to do. You can also use NOOBS to install Raspbian on your SD card, and our Getting Started guide explains how to do that.

Plug it in and turn it on

Your new Raspberry Pi 3 comes with four USB ports and an HDMI port. These allow you to plug in a keyboard, a mouse, and a television or monitor. If you have a Raspberry Pi Zero, you may need adapters to connect your devices to its micro USB and micro HDMI ports. Both the Raspberry Pi 3 and the Raspberry Pi Zero W have onboard wireless LAN, so you can connect to your home network, and you can also plug an Ethernet cable into the Pi 3.

Make sure to plug the power cable in last. There’s no ‘on’ switch, so your Pi will turn on as soon as you connect the power. Raspberry Pi uses a micro USB power supply, so you can use a phone charger if you didn’t receive one as part of a kit.

Learn with our free projects

If you’ve never used a Raspberry Pi before, or you’re new to the world of coding, the best place to start is our projects site. It’s packed with free projects that will guide you through the basics of coding and digital making. You can create projects right on your screen using Scratch and Python, connect a speaker to make music with Sonic Pi, and upgrade your skills to physical making using items from around your house.

Here’s James to show you how to build a whoopee cushion using a Raspberry Pi, paper plates, tin foil and a sponge:

Whoopee cushion PRANK with a Raspberry Pi: HOW-TO

Explore the world of Raspberry Pi physical computing with our free FutureLearn courses: http://rpf.io/futurelearn Free make your own Whoopi Cushion resource: http://rpf.io/whoopi For more information on Raspberry Pi and the charitable work of the Raspberry Pi Foundation, including Code Club and CoderDojo, visit http://rpf.io Our resources are free to use in schools, clubs, at home and at events.

Diving deeper

You’ve plundered our projects, you’ve successfully rigged every chair in the house to make rude noises, and now you want to dive deeper into digital making. Good! While you’re digesting your Christmas dinner, take a moment to skim through the Raspberry Pi blog for inspiration. You’ll find projects from across our worldwide community, with everything from home automation projects and retrofit upgrades, to robots, gaming systems, and cameras.

You’ll also find bucketloads of ideas in The MagPi magazine, the official monthly Raspberry Pi publication, available in both print and digital format. You can download every issue for free. If you subscribe, you’ll get a Raspberry Pi Zero W to add to your new collection. HackSpace magazine is another fantastic place to turn for Raspberry Pi projects, along with other maker projects and tutorials.

And, of course, simply typing “Raspberry Pi projects” into your preferred search engine will find thousands of ideas. Sites like Hackster, Hackaday, Instructables, Pimoroni, and Adafruit all have plenty of fab Raspberry Pi tutorials that they’ve devised themselves and that community members like you have created.

And finally

If you make something marvellous with your new Raspberry Pi – and we know you will – don’t forget to share it with us! Our Twitter, Facebook, Instagram and Google+ accounts are brimming with chatter, projects, and events. And our forums are a great place to visit if you have questions about your Raspberry Pi or if you need some help.

It’s good to get together with like-minded folks, so check out the growing Raspberry Jam movement. Raspberry Jams are community-run events where makers and enthusiasts can meet other makers, show off their projects, and join in with workshops and discussions. Find your nearest Jam here.

Have a great festive holiday and welcome to the community. We’ll see you in 2018!

The post Thank you for my new Raspberry Pi, Santa! What next? appeared first on Raspberry Pi.

The deep learning Santa/Not Santa detector

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/deep-learning-santa-detector/

Did you see Mommy kissing Santa Claus? Or was it simply an imposter? The Not Santa detector is here to help solve the mystery once and for all.

Building a “Not Santa” detector on the Raspberry Pi using deep learning, Keras, and Python

The video is a demo of my “Not Santa” detector that I deployed to the Raspberry Pi. I trained the detector using deep learning, Keras, and Python. You can find the full source code and tutorial here: https://www.pyimagesearch.com/2017/12/18/keras-deep-learning-raspberry-pi/

Ho-ho-how does it work?

Note: Adrian Rosebrock is not Santa. But he does a good enough impression of the jolly old fellow that his disguise can fool a Raspberry Pi into thinking otherwise.

Raspberry Pi 'Not Santa' detector

We jest, but has anyone seen Adrian and Santa in the same room together?
Image c/o Adrian Rosebrock

But how is the Raspberry Pi able to detect the Santa-ness or Not-Santa-ness of people who walk into the frame?

Two words: deep learning

If you’re not sure what deep learning is, you’re not alone. It’s a hefty topic, and one that Adrian has written a book about, so I grilled him for a bluffers’ guide. In his words, deep learning is:

…a subfield of machine learning, which is, in turn a subfield of artificial intelligence (AI). While AI embodies a large, diverse set of techniques and algorithms related to automatic reasoning (inference, planning, heuristics, etc), the machine learning subfields are specifically interested in pattern recognition and learning from data.

Artificial Neural Networks (ANNs) are a class of machine learning algorithms that can learn from data. We have been using ANNs successfully for over 60 years, but something special happened in the past 5 years — (1) we’ve been able to accumulate massive datasets, orders of magnitude larger than previous datasets, and (2) we have access to specialized hardware to train networks faster (i.e., GPUs).

Given these large datasets and specialized hardware, deeper neural networks can be trained, leading to the term “deep learning”.

So now we have a bird’s-eye view of deep learning, how does the detector detect?

Cameras and twinkly lights

Adrian used a model he had trained on two datasets to detect whether or not an image contains Santa. He deployed the Not Santa detector code to a Raspberry Pi, then attached a camera, speakers, and The Pi Hut’s 3D Xmas Tree.

Raspberry Pi 'Not Santa' detector

Components for Santa detection
Image c/o Adrian Rosebrock

The camera captures footage of Santa in the wild, while the Christmas tree add-on provides a twinkly notification, accompanied by a resonant ho, ho, ho from the speakers.

A deeper deep dive into deep learning

A full breakdown of the project and the workings of the Not Santa detector can be found on Adrian’s blog, PyImageSearch, which includes links to other deep learning and image classification tutorials using TensorFlow and Keras. It’s an excellent place to start if you’d like to understand more about deep learning.

Build your own Santa detector

Santa might catch on to Adrian’s clever detector and start avoiding the camera, and for that eventuality, we have our own Santa detector. It uses motion detection to notify you of his presence (and your presents!).

Raspberry Pi Santa detector

Check out our Santa Detector resource here and use a passive infrared sensor, Raspberry Pi, and Scratch to catch the big man in action.

The post The deep learning Santa/Not Santa detector appeared first on Raspberry Pi.

All the lights, all of the twinkly lights

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/all-of-the-lights/

Twinkly lights are to Christmas what pumpkins are to Halloween. And when you add a Raspberry Pi to your light show, the result instantly goes from “Meh, yeah.” to “OMG, wow!”

Here are some cool light-based Christmas projects to inspire you this weekend.

Raspberry Pi Christmas Lights

App-based light control

Christmas Tree Lights Demo

Project Code – https://github.com/eidolonFIRE/Christmas-Lights Raspberry Pi A+ ws2812b – https://smile.amazon.com/gp/product/B01H04YAIQ/ref=od_aui_detailpages00?ie=UTF8&psc=1 200w 5V supply – https://smile.amazon.com/gp/product/B01LZRIWZD/ref=od_aui_detailpages01?ie=UTF8&psc=1

In his Christmas lights project, Caleb Johnson uses an app as a control panel to switch between predefined displays. The full code is available on his GitHub, and it connects a Raspberry Pi A+ to a strip of programmable LEDs that change their pattern at the touch of a phone screen.

What’s great about this project, aside from the simplicity of its design, is the scope for extending it. Why not share the app with friends and family, allowing them to control your lights remotely? Or link the lights to social media so they are triggered by a specific hashtag, like in Alex Ellis’ #cheerlights project below.

Worldwide holiday #cheerlights

Holiday lights hack – 1$ Snowman + Raspberry Pi

Here we have a smart holiday light which will only run when it detects your presence in the room through a passive infrared PIR sensor. I’ve used hot glue for the fixings and an 8-LED NeoPixel strip connected to port 18.

Cheerlights, an online service created by Hans Scharler, allows makers to incorporate hashtag-controlled lighting into the projects. By tweeting the hashtag #cheerlights, followed by a colour, you can control a network of lights so that they are all displaying the same colour.

For his holiday light hack using Cheerlights, Alex incorporated the Pimoroni Blinkt! and a collection of cheap Christmas decorations to create cute light-up ornaments for the festive season.

To make your own, check out Alex’s blog post, and head to your local £1/$1 store for hackable decor. You could even link your Christmas tree and the trees of your family, syncing them all in one glorious, Santa-pleasing spectacular.

Outdoor decorations

DIY musical Xmas lights for beginners with raspberry pi

With just a few bucks of extra material, I walk you through converting your regular Christmas lights into a whole-house light show. The goal here is to go from scratch. Although this guide is intended for people who don’t know how to use linux at all and those who do alike, the focus is for people for whom linux and the raspberry pi are a complete mystery.

Looking to outdo your neighbours with your Christmas light show this year? YouTuber Makin’Things has created a beginners guide to setting up a Raspberry Pi–based musical light show for your facade, complete with information on soldering, wiring, and coding.

Once you’ve wrapped your house in metres and metres of lights and boosted your speakers so they can be heard for miles around, why not incorporate #cheerlights to make your outdoor decor interactive?

Still not enough? How about controlling your lights using a drum kit? Christian Kratky’s MIDI-Based Christmas Lights Animation system (or as I like to call it, House Rock) does exactly that.

Eye Of The Tiger (MIDI based christmas lights animation system prototype)

Project documentation and source code: https://www.hackster.io/cyborg-titanium-14/light-pi-1c88b0 The song is taken from: https://www.youtube.com/watch?v=G6r1dAire0Y

Any more?

We know these projects are just the tip of the iceberg when it comes to the Raspberry Pi–powered Christmas projects out there, and as always, we’d love you to share yours with us. So post a link in the comments below, or tag us on social media when posting your build photos, videos, and/or blog links. ‘Tis the season for sharing after all.

The post All the lights, all of the twinkly lights appeared first on Raspberry Pi.

The Raspberry Pi Christmas shopping list 2017

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/christmas-shopping-list-2017/

Looking for the perfect Christmas gift for a beloved maker in your life? Maybe you’d like to give a relative or friend a taste of the world of coding and Raspberry Pi? Whatever you’re looking for, the Raspberry Pi Christmas shopping list will point you in the right direction.

An ice-skating Raspberry Pi - The Raspberry Pi Christmas Shopping List 2017

For those getting started

Thinking about introducing someone special to the wonders of Raspberry Pi during the holidays? Although you can set up your Pi with peripherals from around your home, such as a mobile phone charger, your PC’s keyboard, and the old mouse dwelling in an office drawer, a starter kit is a nice all-in-one package for the budding coder.

Check out the starter kits from Raspberry Pi Approved Resellers such as Pimoroni, The Pi Hut, ModMyPi, Adafruit, CanaKit…the list is pretty long. Our products page will direct you to your closest reseller, or you can head to element14 to pick up the official Raspberry Pi Starter Kit.

You can also buy the Raspberry Pi Press’s brand-new Raspberry Pi Beginners Book, which includes a Raspberry Pi Zero W, a case, a ready-made SD card, and adapter cables.

Once you’ve presented a lucky person with their first Raspberry Pi, it’s time for them to spread their maker wings and learn some new skills.

MagPi Essentials books - The Raspberry Pi Christmas Shopping List 2017

To help them along, you could pick your favourite from among the Official Projects Book volume 3, The MagPi Essentials guides, and the brand-new third edition of Carrie Anne Philbin’s Adventures in Raspberry Pi. (She is super excited about this new edition!)

And you can always add a link to our free resources on the gift tag.

For the maker in your life

If you’re looking for something for a confident digital maker, you can’t go wrong with adding to their arsenal of electric and electronic bits and bobs that are no doubt cluttering drawers and boxes throughout their house.

Components such as servomotors, displays, and sensors are staples of the maker world. And when it comes to jumper wires, buttons, and LEDs, one can never have enough.

You could also consider getting your person a soldering iron, some helpings hands, or small tools such as a Dremel or screwdriver set.

And to make their life a little less messy, pop it all inside a Really Useful Box…because they’re really useful.

For kit makers

While some people like to dive into making head-first and to build whatever comes to mind, others enjoy working with kits.

The Naturebytes kit allows you to record the animal visitors of your garden with the help of a camera and a motion sensor. Footage of your local badgers, birds, deer, and more will be saved to an SD card, or tweeted or emailed to you if it’s in range of WiFi.

Cortec Tiny 4WD - The Raspberry Pi Christmas Shopping List 2017

Coretec’s Tiny 4WD is a kit for assembling a Pi Zero–powered remote-controlled robot at home. Not only is the robot adorable, building it also a great introduction to motors and wireless control.

Bare Conductive’s Touch Board Pro Kit offers everything you need to create interactive electronics projects using conductive paint.

Pi Hut Arcade Kit - The Raspberry Pi Christmas Shopping List 2017

Finally, why not help your favourite maker create their own gaming arcade using the Arcade Building Kit from The Pi Hut?

For the reader

For those who like to curl up with a good read, or spend too much of their day on public transport, a book or magazine subscription is the perfect treat.

For makers, hackers, and those interested in new technologies, our brand-new HackSpace magazine and the ever popular community magazine The MagPi are ideal. Both are available via a physical or digital subscription, and new subscribers to The MagPi also receive a free Raspberry Pi Zero W plus case.

Cover of CoderDojo Nano Make your own game

Marc Scott Beginner's Guide to Coding Book

You can also check out other publications from the Raspberry Pi family, including CoderDojo’s new CoderDojo Nano: Make Your Own Game, Eben Upton and Gareth Halfacree’s Raspberry Pi User Guide, and Marc Scott’s A Beginner’s Guide to Coding. And have I mentioned Carrie Anne’s Adventures in Raspberry Pi yet?

Stocking fillers for everyone

Looking for something small to keep your loved ones occupied on Christmas morning? Or do you have to buy a Secret Santa gift for the office tech? Here are some wonderful stocking fillers to fill your boots with this season.

Pi Hut 3D Christmas Tree - The Raspberry Pi Christmas Shopping List 2017

The Pi Hut 3D Xmas Tree: available as both a pre-soldered and a DIY version, this gadget will work with any 40-pin Raspberry Pi and allows you to create your own mini light show.

Google AIY Voice kit: build your own home assistant using a Raspberry Pi, the MagPi Essentials guide, and this brand-new kit. “Google, play Mariah Carey again…”

Pimoroni’s Raspberry Pi Zero W Project Kits offer everything you need, including the Pi, to make your own time-lapse cameras, music players, and more.

The official Raspberry Pi Sense HAT, Camera Module, and cases for the Pi 3 and Pi Zero will complete the collection of any Raspberry Pi owner, while also opening up exciting project opportunities.

STEAM gifts that everyone will love

Awesome Astronauts | Building LEGO’s Women of NASA!

LEGO Idea’s bought out this amazing ‘Women of NASA’ set, and I thought it would be fun to build, play and learn from these inspiring women! First up, let’s discover a little more about Sally Ride and Mae Jemison, two AWESOME ASTRONAUTS!

Treat the kids, and big kids, in your life to the newest LEGO Ideas set, the Women of NASA — starring Nancy Grace Roman, Margaret Hamilton, Sally Ride, and Mae Jemison!

Explore the world of wearables with Pimoroni’s sewable, hackable, wearable, adorable Bearables kits.

Add lights and motors to paper creations with the Activating Origami Kit, available from The Pi Hut.

We all loved Hidden Figures, and the STEAM enthusiast you know will do too. The film’s available on DVD, and you can also buy the original book, along with other fascinating non-fiction such as Rebecca Skloot’s The Immortal Life of Henrietta Lacks, Rachel Ignotofsky’s Women in Science, and Sydney Padua’s (mostly true) The Thrilling Adventures of Lovelace and Babbage.

Have we missed anything?

With so many amazing kits, HATs, and books available from members of the Raspberry Pi community, it’s hard to only pick a few. Have you found something splendid for the maker in your life? Maybe you’ve created your own kit that uses the Raspberry Pi? Share your favourites with us in the comments below or via our social media accounts.

The post The Raspberry Pi Christmas shopping list 2017 appeared first on Raspberry Pi.

The Pi Towers Secret Santa Babbage

Post Syndicated from Mark Calleja original https://www.raspberrypi.org/blog/secret-santa-babbage/

Tired of pulling names out of a hat for office Secret Santa? Upgrade your festive tradition with a Raspberry Pi, thermal printer, and everybody’s favourite microcomputer mascot, Babbage Bear.

Raspberry Pi Babbage Bear Secret Santa

The name’s Santa. Secret Santa.

It’s that time of year again, when the cosiness gets turned up to 11 and everyone starts thinking about jolly fat men, reindeer, toys, and benevolent home invasion. At Raspberry Pi, we’re running a Secret Santa pool: everyone buys a gift for someone else in the office. Obviously, the person you buy for has to be picked in secret and at random, or the whole thing wouldn’t work. With that in mind, I created Secret Santa Babbage to do the somewhat mundane task of choosing gift recipients. This could’ve just been done with some names in a hat, but we’re Raspberry Pi! If we don’t make a Python-based Babbage robot wearing a jaunty hat and programmed to spread Christmas cheer, who will?

Secret Santa Babbage

Ho ho ho!

Mecha-Babbage Xmas shenanigans

The script the robot runs is pretty basic: a list of names entered as comma-separated strings is shuffled at the press of a GPIO button, then a name is popped off the end and stored as a variable. The name is matched to a photo of the person stored on the Raspberry Pi, and a thermal printer pinched from Alex’s super awesome PastyCam (blog post forthcoming, maybe) prints out the picture and name of the person you will need to shower with gifts at the Christmas party. (Well, OK — with one gift. No more than five quid’s worth. Nothing untoward.) There’s also a redo function, just in case you pick yourself: press another button and the last picked name — still stored as a variable — is appended to the list again, which is shuffled once more, and a new name is popped off the end.

Secret Santa Babbage prototyping


As the build was a bit of a rush job undertaken at the request of our ‘Director of Vibe’ Emily, there are a few things I’d like to improve about this functionality that I didn’t get around to — more on that later. To add some extra holiday spirit to the project at the last minute, I used Pygame to play a WAV file of Santa’s jolly laugh while Babbage chooses a name for you. The file is included in the GitHub repo along with everything else, because ‘tis the season, etc., etc.

Secret Santa Babbage prototyping

Editor’s note: Considering these desk adornments, Mark’s Secret Santa gift-giver has a lot to go on.

Writing the code for Xmas Mecha-Babbage was fairly straightforward, though it uses some tricky bits for managing the thermal printer. You’ll need to install the drivers to make it go, as well as the CUPS package for managing the print hosting. You can find instructions for these things here, thanks to the wonderful Adafruit crew. Also, for reasons I couldn’t fathom, this will all only work on a Pi 2 and not a Pi 3, as there are some compatibility issues with the thermal printer otherwise. (I also tested the script on a Pi Zero W…no dice.)

Building a Christmassy throne

The hardest (well, fiddliest) parts of making the whole build were constructing the throne and wiring the bear. Using MakerCase, Inkscape, a bit of ingenuity, and a laser cutter, I was able to rig up a Christmassy plywood throne which has a hole through the seat so I could run the wires down from Babbage and to the Pi inside. I finished the throne by rubbing a couple of fingers of beeswax into it; as well as making the wood shine just a little bit and protecting it against getting wet, this had the added bonus of making it smell awesome.

Secret Santa Babbage inside

Next year’s iteration will be mulled wine–scented.

I next soldered two LEDs to some lengths of wire, and then ran the wires through holes at the top of the throne and down the back along a small channel I had carved with a narrow chisel to connect them to the Pi’s GPIO pins. The green LED will remain on as long as Babbage is running his program, and the red one will light up while he is processing your request. Once the red LED goes off again, the next person can have a go. I also laser-cut a final piece of wood to overlay the back of Babbage’s Xmas throne and cover the wiring a bit.

Creating a Xmas cyborg bear

Taking two 6 mm tactile buttons, I clipped the spiky metal legs off one side of each (the buttons were going into a stuffed christmas toy, after all) and soldered a length of wire to each of the remaining legs. Next, I made a small incision into Babbage with my trusty Swiss army knife (in a place that actually made me cringe a little) and fed the buttons up into his paws. At some point in this process I was standing in the office wrestling with the bear and muttering to myself, which elicited some very strange looks from my colleagues.

Secret Santa Babbage throne

Poor Babbage…

One thing to note here is to make sure the wires remain attached at the solder points while you push them up into Babbage’s paws. The first time I tried it, I snapped one of my connections and had to start again. It helped to remove some stuffing like a tunnel and then replace it afterward. Moreover, you can use your fingertip to support the joints as you poke the wire in. Finally, a couple of squirts of hot glue to keep Babbage’s furry cheeks firmly on the seat, and done!

Secret Santa Babbage

Next year: Game of Thrones–inspired candy cane throne

The Secret Santa Babbage masterpiece

The whole build process was the perfect holiday mix of cheerful and macabre, and while getting the thermal printer to work was a little time-consuming, the finished product definitely raised some smiles around the office and added a bit of interesting digital flavour to a staid office tradition. And it also helped people who are new to the office or from other branches of the Foundation to know for whom they will be buying a gift.

Secret Santa Babbage

Ready to dispense Christmas cheer!

There are a few ways in which I’ll polish this project before next year, such as having the script write the names to external text files to create a record that will persist in case of a reboot, and maybe having Secret Santa Babbage play you a random Christmas carol when you squeeze his paw instead of just laughing merrily every time. (I also thought about adding electric shocks for those people who are on the naughty list, but HR said no. Bah, humbug!)

Make your own

The code and laser cut plans for the whole build are available here. If you plan to make your own, let us know which stuffed toy you will be turning into a Secret Santa cyborg! And if you’ve been working on any other Christmas-themed Raspberry Pi projects, we’d like to see those too, so tag us on social media to share the festive maker cheer.

The post The Pi Towers Secret Santa Babbage appeared first on Raspberry Pi.

Our brand-new Christmas resources

Post Syndicated from Laura Sach original https://www.raspberrypi.org/blog/christmas-resources-2017/

It’s never too early for Christmas-themed resources — especially when you want to make the most of them in your school, Code Club or CoderDojo! So here’s the ever-wonderful Laura Sach with an introduction of our newest festive projects.

A cartoon of people singing Christmas carols - Raspberry Pi Christmas Resources

In the immortal words of Noddy Holder: “it’s Christmaaaaaaasssss!” Well, maybe it isn’t quite Christmas yet, but since the shops have been playing Mariah Carey on a loop since the last pumpkin lantern hit the bargain bin, you’re hopefully well prepared.

To get you in the mood with some festive fun, we’ve put together a selection of seasonal free resources for you. Each project has a difficulty level in line with our Digital Making Curriculum, so you can check which might suit you best. Why not try them out at your local Raspberry Jam, CoderDojo, or Code Club, at school, or even on a cold day at home with a big mug of hot chocolate?

Jazzy jumpers

A cartoon of someone remembering pairs of jumper designs - Raspberry Pi Christmas Resources

Jazzy jumpers (Creator level): as a child in the eighties, you’d always get an embarrassing and probably badly sized jazzy jumper at Christmas from some distant relative. Thank goodness the trend has gone hipster and dreadful jumpers are now cool!

This resource shows you how to build a memory game in Scratch where you must remember the colour and picture of a jazzy jumper before recreating it. How many jumpers can you successfully recall in a row?

Sense HAT advent calendar

A cartoon Sense HAT lit up in the design of a Christmas pudding - Raspberry Pi Christmas Resources

Sense HAT advent calendar (Builder level): put the lovely lights on your Sense HAT to festive use by creating an advent calendar you can open day by day. However, there’s strictly no cheating with this calendar — we teach you how to use Python to detect the current date and prevent would-be premature peekers!

Press the Enter key to open today’s door:

(Note: no chocolate will be dispensed from your Raspberry Pi. Sorry about that.)

Code a carol

A cartoon of people singing Christmas carols - Raspberry Pi Christmas Resources

Code a carol (Developer level): Have you ever noticed how much repetition there is in carols and other songs? This resource teaches you how to break down the Twelve days of Christmas tune into its component parts and code it up in Sonic Pi the lazy way: get the computer to do all the repetition for you!

No musical knowledge required — just follow our lead, and you’ll have yourself a rocking doorbell tune in no time!

Naughty and nice

A cartoon of Santa judging people by their tweets - Raspberry Pi Christmas Resources

Naughty and nice (Maker level): Have you been naughty or nice? Find out by using sentiment analysis on your tweets to see what sort of things you’ve been talking about throughout the year. For added fun, why not use your program on the Twitter account of your sibling/spouse/arch nemesis and report their level of naughtiness to Santa with an @ mention?

raspberry_pi is 65.5 percent NICE, with an accuracy of 0.9046692607003891

It’s Christmaaaaaasssss

With the festive season just around the corner, it’s time to get started on your Christmas projects! Whether you’re planning to run your Christmas lights via a phone app, install a home assistant inside an Elf on a Shelf, or work through our Christmas resources, we would like to see what you make. So do share your festive builds with us on social media, or by posting links in the comments.

The post Our brand-new Christmas resources appeared first on Raspberry Pi.

[$] Finding driver bugs with DR. CHECKER

Post Syndicated from jake original https://lwn.net/Articles/733056/rss

Drivers are a consistent source of kernel bugs, at least partly due to less
review, but also because drivers are typically harder for tools to
analyze. A team from the University of California, Santa Barbara has set
out to change that with a static-analysis tool called DR. CHECKER. In a paper
presented at the recent 26th USENIX
Security Symposium
, the team introduced the tool and the results of
running it on nine production Linux kernels. Those results were rather
correctly identified 158 critical zero-day
bugs with an overall precision of 78%

Hard Drive Stats for Q2 2017

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/hard-drive-failure-stats-q2-2017/

Backblaze Drive Stats Q2 2017

In this update, we’ll review the Q2 2017 and lifetime hard drive failure rates for all our current drive models. We also look at how our drive migration strategy is changing the drives we use and we’ll check in on our enterprise class drives to see how they are doing. Along the way we’ll share our observations and insights and as always we welcome your comments and critiques.

Since our last report for Q1 2017, we have added 635 additional hard drives to bring us to the 83,151 drives we’ll focus on. In Q1 we added over 10,000 new drives to the mix, so adding just 635 in Q2 seems “odd.” In fact, we added 4,921 new drives and retired 4,286 old drives as we migrated from lower density drives to higher density drives. We cover more about migrations later on, but first let’s look at the Q2 quarterly stats.

Hard Drive Stats for Q2 2017

We’ll begin our review by looking at the statistics for the period of April 1, 2017 through June 30, 2017 (Q2 2017). This table includes 17 different 3 ½” drive models that were operational during the indicated period, ranging in size from 3 to 8 TB.

Quarterly Hard Drive Failure Rates for Q2 2017

When looking at the quarterly numbers, remember to look for those drives with at least 50,000 drive hours for the quarter. That works out to about 550 drives running the entire quarter. That’s a good sample size. If the sample size is below that, the failure rates can be skewed based on a small change in the number of drive failures.

As noted previously, we use the quarterly numbers to look for trends. So this time we’ve included a trend indicator in the table. The “Q2Q Trend” column is short for quarter-to-quarter trend, i.e. last quarter to this quarter. We can add, change, or delete trend columns depending on community interest. Let us know what you think in the comments.

Good Migrations

In Q2 we continued with our data migration program. For us, a drive migration means we intentionally remove a good drive from service and replace it with another drive. Drives that are removed via migrations are not counted as failed. Once they are removed they stop accumulating drive hours and other stats in our system.

There are three primary drivers for our migration program.

  1. Increase Storage Density – For example, in Q3 we replaced 3 TB drives with 8 TB drives, more than doubling the amount of storage in a given Storage Pod for the same footprint. The cost of electricity was nominally more with the 8 TB drives, but the increase in density more than offset the additional cost. For those interested you can read more about the cost of cloud storage here.
  2. Backblaze Vaults – Our Vault architecture has proven to be more cost effective over the past two years than using stand-alone Storage Pods. A major goal of the migration program is to have the entire Backblaze cloud deployed on the highly efficient and resilient Backblaze Vault architecture.
  3. Balancing the Load – With our Phoenix data center online and accepting data, we have migrated some systems to the Phoenix DC. Don’t worry, we didn’t put your data on a truck and drive it to Phoenix. We simply built new systems there and transferred the data from our Northern California DC. In the process, we are gaining valuable insights as we move towards being able to replicate data between the two data centers.
During Q2 we migrated nearly 30 Petabytes of data.

During Q2 we migrated the data on 155 systems, giving nearly 30 petabytes of data a new, more durable, place to call home. There are still 644 individual Storage Pods (Storage Pod Classics, as we call them) left to migrate to the Backblaze Vault architecture.

Just in case you don’t know, a Backblaze Vault is a logical collection of 20 beefy Storage Pods (not Classics). Using our own Reed-Solomon erasure coding library, data is spread out across the 20 Pods into 17 data shards and 3 parity shards. The data and parity shards of each arriving data blob can be stored on different Storage Pods in a given Backblaze Vault.

Lifetime Hard Drive Failure Rates for Current Drives

The table below shows the failure rates for the hard drive models we had in service as of June 30, 2017. This is over the period beginning in April 2013 and ending June 30, 2017. If you are interested in the hard drive failure rates for all the hard drives we’ve used over the years, please refer to our 2016 hard drive review.

Cumulative Hard Drive Failure Rates

Enterprise vs Consumer Drives

We added 3,595 enterprise class 8 TB drives in Q2 bringing our total to 6,054 drives. You may be tempted to compare the failure rates of the 8 TB enterprise drive (model: ST8000NM005) to the consumer 8 TB drive (model: ST8000DM002), and conclude the enterprise drives fail at a higher rate. Let’s not jump to that conclusion yet, as the average operational age of the enterprise drives is only 2.11 months.

There are some insights we can gain from the current data. The enterprise drives have 363,282 drives hours and an annualized failure rate of 1.61%. If we look back at our data, we find that as of Q3 2016, the 8 TB consumer drives had 422,263 drive hours with an annualized failure rate of 1.60%. That means that when both drive models had a similar number of drive hours, they had nearly the same annualized failure rate. There are no conclusions to be made here, but the observation is worth considering as we gather data for our comparison.

Next quarter, we should have enough data to compare the 8 TB drives, but by then the 8TB drives could be “antiques.” In the next week or so, we’ll be installing 12 TB hard drives in a Backblaze Vault. Each 60-drive Storage Pod in the Vault would have 720 TB of storage available and a 20-pod Backblaze Vault would have 14.4 petabytes of raw storage.

Better Late Than Never

Sorry for being a bit late with the hard drive stats report this quarter. We were ready to go last week, then this happened. Some folks here thought that was more important than our Q2 Hard Drive Stats. Go figure.

Drive Stats at the Storage Developers Conference

We will be presenting at the Storage Developers Conference in Santa Clara on Monday September 11th at 8:30am. We’ll be reviewing our drive stats along with some interesting observations from the SMART stats we also collect. The conference is the leading event for technical discussions and education on the latest storage technologies and standards. Come join us.

The Data For This Review

If you are interested in the data from the two tables in this review, you can download an Excel spreadsheet containing the two tables. Note: the domain for this download will be f001.backblazeb2.com.

You also can download the entire data set we use for these reports from our Hard Drive Test Data page. You can download and use this data for free for your own purposes. All we ask are three things: 1) you cite Backblaze as the source if you use the data, 2) you accept that you are solely responsible for how you use the data, and 3) you do not sell this data to anyone. It is free.

Good luck, and let us know if you find anything interesting.

The post Hard Drive Stats for Q2 2017 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Affordable Raspberry Pi 3D Body Scanner

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/affordable-raspberry-pi-3d-body-scanner/

With a £1000 grant from Santander, Poppy Mosbacher set out to build a full-body 3D body scanner with the intention of creating an affordable setup for makespaces and similar community groups.

First Scan from DIY Raspberry Pi Scanner

Head and Shoulders Scan with 29 Raspberry Pi Cameras

Uses for full-body 3D scanning

Poppy herself wanted to use the scanner in her work as a fashion designer. With the help of 3D scans of her models, she would be able to create custom cardboard dressmakers dummy to ensure her designs fit perfectly. This is a brilliant way of incorporating digital tech into another industry – and it’s not the only application for this sort of build. Growing numbers of businesses use 3D body scanning, for example the stores around the world where customers can 3D scan and print themselves as action-figure-sized replicas.

Print your own family right on the high street!
image c/o Tom’s Guide and Shapify

We’ve also seen the same technology used in video games for more immersive virtual reality. Moreover, there are various uses for it in healthcare and fitness, such as monitoring the effect of exercise regimes or physiotherapy on body shape or posture.

Within a makespace environment, a 3D body scanner opens the door to including new groups of people in community make projects: imagine 3D printing miniatures of a theatrical cast to allow more realistic blocking of stage productions and better set design, or annually sending grandparents a print of their grandchild so they can compare the child’s year-on-year growth in a hands-on way.

Raspberry Pi 3d Body Scan

The Germany-based clothing business Outfittery uses full body scanners to take the stress out of finding clothes that fits well.
image c/o Outfittery

As cheesy as it sounds, the only limit for the use of 3D scanning is your imagination…and maybe storage space for miniature prints.

Poppy’s Raspberry Pi 3D Body Scanner

For her build, Poppy acquired 27 Raspberry Pi Zeros and 27 Raspberry Pi Camera Modules. With various other components, some 3D-printed or made of cardboard, Poppy got to work. She was helped by members of Build Brighton and by her friend Arthur Guy, who also wrote the code for the scanner.

Raspberry Pi 3D Body Scanner

The Pi Zeros run Raspbian Lite, and are connected to a main server running a node application. Each is fitted into its own laser-cut cardboard case, and secured to a structure of cardboard tubing and 3D-printed connectors.

Raspberry Pi 3D Body Scanner

In the finished build, the person to be scanned stands within the centre of the structure, and the press of a button sends the signal for all Pis to take a photo. The images are sent back to the server, and processed through Autocade ReMake, a freemium software available for the PC (Poppy discovered part-way through the project that the Mac version has recently lost support).

Build your own

Obviously there’s a lot more to the process of building this full-body 3D scanner than what I’ve reported in these few paragraphs. And since it was Poppy’s goal to make a readily available and affordable scanner that anyone can recreate, she’s provided all the instructions and code for it on her Instructables page.

Projects like this, in which people use the Raspberry Pi to create affordable and interesting tech for communities, are exactly the type of thing we love to see. Always make sure to share your Pi-based projects with us on social media, so we can boost their visibility!

If you’re a member of a makespace, run a workshop in a school or club, or simply love to tinker and create, this build could be the perfect addition to your workshop. And if you recreate Poppy’s scanner, or build something similar, we’d love to see the results in the comments below.

The post Affordable Raspberry Pi 3D Body Scanner appeared first on Raspberry Pi.

Event: AWS Serverless Roadshow – Hands-on Workshops

Post Syndicated from Tara Walker original https://aws.amazon.com/blogs/aws/event-aws-serverless-roadshow-hands-on-workshops/

Surely, some of you have contemplated how you would survive the possible Zombie apocalypse or how you would build your exciting new startup to disrupt the transportation industry when Unicorn haven is uncovered. Well, there is no need to worry; I know just the thing to get you prepared to handle both of those scenarios: the AWS Serverless Computing Workshop Roadshow.

With the roadshow’s serverless workshops, you can get hands-on experience building serverless applications and microservices so you can rebuild what remains of our great civilization after a widespread viral infection causes human corpses to reanimate around the world in the AWS Zombie Microservices Workshop. In addition, you can give your startup a jump on the competition with the Wild Rydes workshop in order to revolutionize the transportation industry; just in time for a pilot’s crash landing leading the way to the discovery of abundant Unicorn pastures found on the outskirts of the female Amazonian warrior inhabited island of Themyscira also known as Paradise Island.

These free, guided hands-on workshops will introduce the basics of building serverless applications and microservices for common and uncommon scenarios using services like AWS Lambda, Amazon API Gateway, Amazon DynamoDB, Amazon S3, Amazon Kinesis, AWS Step Functions, and more. Let me share some advice before you decide to tackle Zombies and mount Unicorns – don’t forget to bring your laptop to the workshop and make sure you have an AWS account established and available for use for the event.

Check out the schedule below and get prepared today by registering for an upcoming workshop in a city near you. Remember these are workshops are completely free, so participation is on a first come, first served basis. So register and get there early, we need Zombie hunters and Unicorn riders across the globe.  Learn more about AWS Serverless Computing Workshops here and register for your city using links below.

Event Location Date
Wild Rydes New York Thursday, June 8
Wild Rydes Austin Thursday, June 22
Wild Rydes Santa Monica Thursday, July 20
Zombie Apocalypse Chicago Thursday, July 20
Wild Rydes Atlanta Tuesday, September 12
Zombie Apocalypse Dallas Tuesday, September 19


I look forward to fighting zombies and riding unicorns with you all.


Hard Drive Stats for Q1 2017

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/hard-drive-failure-rates-q1-2017/

2017 hard drive stats

In this update, we’ll review the Q1 2017 and lifetime hard drive failure rates for all our current drive models, and we’ll look at a relatively new class of drives for us – “enterprise”. We’ll share our observations and insights, and as always, you can download the hard drive statistics data we use to create these reports.

Our Hard Drive Data Set

Backblaze has now recorded and saved daily hard drive statistics from the drives in our data centers for over 4 years. This data includes the SMART attributes reported by each drive, along with related information such a the drive serial number and failure status. As of March 31, 2017 we had 84,469 operational hard drives. Of that there were 1,800 boot drives and 82,669 data drives. For our review, we remove drive models of which we have less than 45 drives, leaving us to analyze 82,516 hard drives for this report. There are currently 17 different hard drives models, ranging in size from 3 to 8 TB in size. All of these models are 3½” drives.

Hard Drive Reliability Statistics for Q1 2017

Since our last report in Q4 2016, we have added 10,577 additional hard drives to bring us to the 82,516 drives we’ll focus on. We’ll start by looking at the statistics for the period of January 1, 2017 through March 31, 2017 – Q1 2017. This is for the drives that were operational during that period, ranging in size from 3 to 8 TB as listed below.

hard drive failure rates by model

Observations and Notes on the Q1 Review

You’ll notice that some of the drive models have a failure rate of “0” (zero). Here a failure rate of zero means there were no drive failures for that model during Q1 2017. Later, we will cover how these same drive models faired over their lifetime. Why is the quarterly data important? We use it to look for anything unusual. For example, in Q1 the 4 TB Seagate drive model: ST4000DX000, has a high failure rate of 35.88%, while the lifetime annualized failure rate for this model is much lower, 7.50%. In this case, we only have a 170 drives of this particular drive model, so the failure rate is not statistically significant, but such information could be useful if we were using several thousand drives of this particular model.

There were a total 375 drive failures in Q1. A drive is considered failed if one or more of the following conditions are met:

  • The drive will not spin up or connect to the OS.
  • The drive will not sync, or stay synced, in a RAID Array (see note below).
  • The Smart Stats we use show values above our thresholds.
  • Note: Our stand-alone Storage Pods use RAID-6, our Backblaze Vaults use our own open-sourced implementation of Reed-Solomon erasure coding instead. Both techniques have a concept of a drive not syncing or staying synced with the other member drives in its group.

The annualized hard drive failure rate for Q1 in our current population of drives is 2.11%. That’s a bit higher than previous quarters, but might be a function of us adding 10,577 new drives to our count in Q1. We’ve found that there is a slightly higher rate of drive failures early on, before the drives “get comfortable” in their new surroundings. This is seen in the drive failure rate “bathtub curve” we covered in a previous post.

10,577 More Drives

The additional 10,577 drives are really a combination of 11,002 added drives, less 425 drives that were removed. The removed drives were in addition to the 375 drives marked as failed, as those were replaced 1 for 1. The 425 drives were primarily removed from service due to migrations to higher density drives.

The table below shows the breakdown of the drives added in Q1 2017 by drive size.

drive counts by size

Lifetime Hard Drive Failure Rates for Current Drives

The table below shows the failure rates for the hard drive models we had in service as of March 31, 2017. This is over the period beginning in April 2013 and ending March 31, 2017. If you are interested in the hard drive failure rates for all the hard drives we’ve used over the years, please refer to our 2016 hard drive review.

lifetime hard drive reliability rates

The annualized failure rate for the drive models listed above is 2.07%. This compares to 2.05% for the same collection of drive models as of the end of Q4 2016. The increase makes sense given the increase in Q1 2017 failure rate over previous quarters noted earlier. No new models were added during the current quarter and no old models exited the collection.

Backblaze is Using Enterprise Drives – Oh My!

Some of you may have noticed we now have a significant number of enterprise drives in our data center, namely 2,459 Seagate 8 TB drives, model: ST8000NM055. The HGST 8 TB drives were the first true enterprise drives we used as data drives in our data centers, but we only have 45 of them. So, why did we suddenly decide to purchase 2,400+ of the Seagate 8 TB enterprise drives? There was a very short period of time, as Seagate was introducing new and phasing out old drive models, that the cost per terabyte of the 8 TB enterprise drives fell within our budget. Previously we had purchased 60 of these drives to test in one Storage Pod and were satisfied they could work in our environment. When the opportunity arose to acquire the enterprise drives at a price we liked, we couldn’t resist.

Here’s a comparison of the 8 TB consumer drives versus the 8 TB enterprise drives to date:

enterprise vs. consumer hard drives

What have we learned so far…

  1. It is too early to compare failure rates – The oldest enterprise drives have only been in service for about 2 months, with most being placed into service just prior to the end of Q1. The Backblaze Vaults the enterprise drives reside in have yet to fill up with data. We’ll need at least 6 months before we could start comparing failure rates as the data is still too volatile. For example, if the current enterprise drives were to experience just 2 failures in Q2, their annualized failure rate would be about 0.57% lifetime.
  2. The enterprise drives load data faster – The Backblaze Vaults containing the enterprise drives, loaded data faster than the Backblaze Vaults containing consumer drives. The vaults with the enterprise drives loaded on average 140 TB per day, while the vaults with the consumer drives loaded on average 100 TB per day.
  3. The enterprise drives use more power – No surprise here as according to the Seagate specifications the enterprise drives use 9W average in idle and 10W average in operation. While the consumer drives use 7.2W average in idle and 9W average in operation. For a single drive this may seem insignificant, but when you put 60 drives in a 4U Storage Pod chassis and then 10 chassis in a rack, the difference adds up quickly.
  4. Enterprise drives have some nice features – The Seagate enterprise 8TB drives we used have PowerChoice™ technology that gives us the option to use less power. The data loading times noted above were recorded after we changed to a lower power mode. In short, the enterprise drive in a low power mode still stored 40% more data per day on average than the consumer drives.
  5. While it is great that the enterprise drives can load data faster, drive speed has never been a bottleneck in our system. A system that can load data faster will just “get in line” more often and fill up faster. There is always extra capacity when it comes to accepting data from customers.

    Wrapping Up

    We’ll continue to monitor the 8 TB enterprise drives and keep reporting our findings.

    If you’d like to hear more about our Hard Drive Stats, Backblaze will be presenting at the 33rd International Conference on Massive Storage Systems and Technology (MSST 2017) being held at Santa Clara University in Santa Clara California from May 15th – 19th. The conference will dedicate five days to computer-storage technology, including a day of tutorials, two days of invited papers, two days of peer-reviewed research papers, and a vendor exposition. Come join us.

    As a reminder, the hard drive data we use is available on our Hard Drive Test Data page. You can download and use this data for free for your own purpose, all we ask is three things 1) you cite Backblaze as the source if you use the data, 2) you accept that you are solely responsible for how you use the data, and 3) you do not sell this data to anyone, it is free.

    Good luck and let us know if you find anything interesting.

The post Hard Drive Stats for Q1 2017 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

AWS Hot Startups – February 2017

Post Syndicated from Ana Visneski original https://aws.amazon.com/blogs/aws/aws-hot-startups-february-2017-2/

As we finish up the month of February, Tina Barr is back with some awesome startups.


This month we are bringing you five innovative hot startups:

  • GumGum – Creating and popularizing the field of in-image advertising.
  • Jiobit – Smart tags to help parents keep track of kids.
  • Parsec – Offers flexibility in hardware and location for PC gamers.
  • Peloton – Revolutionizing indoor cycling and fitness classes at home.
  • Tendril – Reducing energy consumption for homeowners.

If you missed any of our January startups, make sure to check them out here.

GumGum (Santa Monica, CA)
GumGum logo1GumGum is best known for inventing and popularizing the field of in-image advertising. Founded in 2008 by Ophir Tanz, the company is on a mission to unlock the value held within the vast content produced daily via social media, editorials, and broadcasts in a variety of industries. GumGum powers campaigns across more than 2,000 premium publishers, which are seen by over 400 million users.

In-image advertising was pioneered by GumGum and has given companies a platform to deliver highly visible ads to a place where the consumer’s attention is already focused. Using image recognition technology, GumGum delivers targeted placements as contextual overlays on related pictures, as banners that fit on all screen sizes, or as In-Feed placements that blend seamlessly into the surrounding content. Using Visual Intelligence, GumGum can scour social media and broadcast TV for all images and videos related to a brand, allowing companies to gain a stronger understanding of their audience and how they are relating to that brand on social media.

GumGum relies on AWS for its Image Processing and Ad Serving operations. Using AWS infrastructure, GumGum currently processes 13 million requests per minute across the globe and generates 30 TB of new data every day. The company uses a suite of services including but not limited to Amazon EC2, Amazon S3, Amazon Kinesis, Amazon EMR, AWS Data Pipeline, and Amazon SNS. AWS edge locations allow GumGum to serve its customers in the US, Europe, Australia, and Japan and the company has plans to expand its infrastructure to Australia and APAC regions in the future.

For a look inside GumGum’s startup culture, check out their first Hackathon!

Jiobit (Chicago, IL)
Jiobit Team1
Jiobit was inspired by a real event that took place in a crowded Chicago park. A couple of summers ago, John Renaldi experienced every parent’s worst nightmare – he lost track of his then 6-year-old son in a public park for almost 30 minutes. John knew he wasn’t the only parent with this problem. After months of research, he determined that over 50% of parents have had a similar experience and an even greater percentage are actively looking for a way to prevent it.

Jiobit is the world’s smallest and longest lasting smart tag that helps parents keep track of their kids in every location – indoors and outdoors. The small device is kid-proof: lightweight, durable, and waterproof. It acts as a virtual “safety harness” as it uses a combination of Bluetooth, Wi-Fi, Multiple Cellular Networks, GPS, and sensors to provide accurate locations in real-time. Jiobit can automatically learn routes and locations, and will send parents an alert if their child does not arrive at their destination on time. The talented team of experienced engineers, designers, marketers, and parents has over 150 patents and has shipped dozens of hardware and software products worldwide.

The Jiobit team is utilizing a number of AWS services in the development of their product. Security is critical to the overall product experience, and they are over-engineering security on both the hardware and software side with the help of AWS. Jiobit is also working towards being the first child monitoring device that will have implemented an Alexa Skill via the Amazon Echo device (see here for a demo!). The devices use AWS IoT to send and receive data from the Jio Cloud over the MQTT protocol. Once data is received, they use AWS Lambda to parse the received data and take appropriate actions, including storing relevant data using Amazon DynamoDB, and sending location data to Amazon Machine Learning processing jobs.

Visit the Jiobit blog for more information.

Parsec (New York, NY)
Parsec logo large1
Parsec operates under the notion that everyone should have access to the best computing in the world because access to technology creates endless opportunities. Founded in 2016 by Benjy Boxer and Chris Dickson, Parsec aims to eliminate the burden of hardware upgrades that users frequently experience by building the technology to make a computer in the cloud available anywhere, at any time. Today, they are using their technology to enable greater flexibility in the hardware and location that PC gamers choose to play their favorite games on. Check out this interview with Benjy and our Startups team for a look at how Parsec works.

Parsec built their first product to improve the gaming experience; gamers no longer have to purchase consoles or expensive PCs to access the entertainment they love. Their low latency video streaming and networking technologies allow gamers to remotely access their gaming rig and play on any Windows, Mac, Android, or Raspberry Pi device. With the global reach of AWS, Parsec is able to deliver cloud gaming to the median user in the US and Europe with less than 30 milliseconds of network latency.

Parsec users currently have two options available to start gaming with cloud resources. They can either set up their own machines with the Parsec AMI in their region or rely on Parsec to manage everything for a seamless experience. In either case, Parsec uses the g2.2xlarge EC2 instance type. Parsec is using Amazon Elastic Block Storage to store games, Amazon DynamoDB for scalability, and Amazon EC2 for its web servers and various APIs. They also deal with a high volume of logs and take advantage of the Amazon Elasticsearch Service to analyze the data.

Be sure to check out Parsec’s blog to keep up with the latest news.

Peloton (New York, NY)
Peloton image 3
The idea for Peloton was born in 2012 when John Foley, Founder and CEO, and his wife Jill started realizing the challenge of balancing work, raising young children, and keeping up with personal fitness. This is a common challenge people face – they want to work out, but there are a lot of obstacles that stand in their way. Peloton offers a solution that enables people to join indoor cycling and fitness classes anywhere, anytime.

Peloton has created a cutting-edge indoor bike that streams up to 14 hours of live classes daily and has over 4,000 on-demand classes. Users can access live classes from world-class instructors from the convenience of their home or gym. The bike tracks progress with in-depth ride metrics and allows people to compete in real-time with other users who have taken a specific ride. The live classes even feature top DJs that play current playlists to keep users motivated.

With an aggressive marketing campaign, which has included high-visibility TV advertising, Peloton made the decision to run its entire platform in the cloud. Most recently, they ran an ad during an NFL playoff game and their rate of requests per minute to their site increased from ~2k/min to ~32.2k/min within 60 seconds. As they continue to grow and diversify, they are utilizing services such as Amazon S3 for thousands of hours of archived on-demand video content, Amazon Redshift for data warehousing, and Application Load Balancer for intelligent request routing.

Learn more about Peloton’s engineering team here.

Tendril (Denver, CO)
Tendril logo1
Tendril was founded in 2004 with the goal of helping homeowners better manage and reduce their energy consumption. Today, electric and gas utilities use Tendril’s data analytics platform on more than 140 million homes to deliver a personalized energy experience for consumers around the world. Using the latest technology in decision science and analytics, Tendril can gain access to real-time, ever evolving data about energy consumers and their homes so they can improve customer acquisition, increase engagement, and orchestrate home energy experiences. In turn, Tendril helps its customers unlock the true value of energy interactions.

AWS helps Tendril run its services globally, while scaling capacity up and down as needed, and in real-time. This has been especially important in support of Tendril’s newest solution, Orchestrated Energy, a continuous demand management platform that calculates a home’s thermal mass, predicts consumer behavior, and integrates with smart thermostats and other connected home devices. This solution allows millions of consumers to create a personalized energy plan for their home based on their individual needs.

Tendril builds and maintains most of its infrastructure services with open sources tools running on Amazon EC2 instances, while also making use of AWS services such as Elastic Load Balancing, Amazon API Gateway, Amazon CloudFront, Amazon Route 53, Amazon Simple Queue Service, and Amazon RDS for PostgreSQL.

Visit the Tendril Blog for more information!

— Tina Barr

Amazon Redshift Engineering’s Advanced Table Design Playbook: Distribution Styles and Distribution Keys

Post Syndicated from AWS Big Data Blog original https://aws.amazon.com/blogs/big-data/amazon-redshift-engineerings-advanced-table-design-playbook-distribution-styles-and-distribution-keys/

Zach Christopherson is a Senior Database Engineer on the Amazon Redshift team.

Part 1: Preamble, Prerequisites, and Prioritization
Part 2: Distribution Styles and Distribution Keys
Part 3: Compound and Interleaved Sort Keys (December 6, 2016)
Part 4: Compression Encodings (December 7, 2016)
Part 5: Table Data Durability (December 8, 2016)

The first table and column properties we discuss in this blog series are table distribution styles (DISTSTYLE) and distribution keys (DISTKEY). This blog installment presents a methodology to guide you through the identification of optimal DISTSTYLEs and DISTKEYs for your unique workload.

When you load data into a table, Amazon Redshift distributes the rows to each of the compute nodes according to the table’s DISTSTYLE. Within each compute node, the rows are assigned to a cluster slice. Depending on node type, each compute node contains 2, 16, or 32 slices. You can think of a slice like a virtual compute node. During query execution, all slices process the rows that they’ve had assigned in parallel. The primary goal in selecting a table’s DISTSTYLE is to evenly distribute the data throughout the cluster for parallel processing.

When you execute a query, the query optimizer might redistribute or broadcast the intermediate tuples throughout the cluster to facilitate any join or aggregation operations. The secondary goal in selecting a table’s DISTSTYLE is to minimize the cost of data movement necessary for query processing. To achieve minimization, data should be located where it needs to be before the query is executed.

A table might be defined with a DISTSTYLE of EVEN, KEY, or ALL. If you’re unfamiliar with these table properties, you can watch my presentation at the 2016 AWS Santa Clara Summit, where I discussed the basics of distribution starting at the 17-minute mark. I summarize these here:

  • EVEN will do a round-robin distribution of data.
  • KEY requires a single column to be defined as a DISTKEY. On ingest, Amazon Redshift hashes each DISTKEY column value, and route hashes to the same slice consistently.
  • ALL distribution stores a full copy of the table on the first slice of each node.

Which style is most appropriate for your table is determined by several criteria. This post presents a two-phase flow chart that will guide you through questions to ask of your data profile to arrive at the ideal DISTSTYLE and DISTKEY for your scenario.

Phase 1: Identifying Appropriate DISTKEY Columns

Phase 1 seeks to determine if KEY distribution is appropriate. To do so, first determine if the table contains any columns that would appropriately distribute the table data if they were specified as a DISTKEY. If we find that no columns are acceptable DISTKEY columns, then we can eliminate DISTSTYLE KEY as a potential DISTSTYLE option for this table.



Does the column data have a uniformly distributed data profile?


If the hashed column values don’t enable uniform distribution of data to the cluster slices, then you’ll end with both data skew at rest and data skew in flight (during query processing)—which results in a performance hit due to an unevenly parallelized workload. A nonuniformly distributed data profile occurs in scenarios such as these:

  • Distributing on a column containing a significant percentage of NULL values
  • Distributing on a column, customer_id, where a minority of your customers are responsible for the majority of your data

You can easily identify columns that contain “heavy hitters” or introduce “hot spots” by using some simple SQL code to review the dataset. In the example following, l_orderkey stands out as a poor option that you can eliminate as a potential DISTKEY column:

[email protected]/dev=# SELECT l_orderkey, COUNT(*) 
FROM lineitem 
LIMIT 100;
 l_orderkey |   count
     [NULL] | 124993010
  260642439 |        80
  240404513 |        80
   56095490 |        72
  348088964 |        72
  466727011 |        72
  438870661 |        72 

When distributing on a given column, it is desirable to have a nearly consistent number of rows/blocks on each slice. Suppose that you think that you’ve identified a column that should result in uniform distribution but want to confirm this. Here, it’s much more efficient to materialize a single-column temporary table, rather than redistributing the entire table only to find out there was nonuniform distribution:

-- Materialize a single column to check distribution
CREATE TEMP TABLE lineitem_dk_l_partkey DISTKEY (l_partkey) AS 
SELECT l_partkey FROM lineitem;

-- Identify the table OID
[email protected]/tpch=# SELECT 'lineitem_dk_l_partkey'::regclass::oid;
(1 row) 

Now that the table exists, it’s trivial to review the distribution. In the following query results, we can assess the following characteristics for a given table with a defined DISTKEY:

  • skew_rows: A ratio of the number of table rows from the slice with the most rows compared to the slice with the fewest table rows. This value defaults to 100.00 if the table doesn’t populate every slice in the cluster. Closer to 1.00 is ideal.
  • storage_skew: A ratio of the number of blocks consumed by the slice with the most blocks compared to the slice with the fewest blocks. Closer to 1.00 is ideal.
  • pct_populated: Percentage of slices in the cluster that have at least 1 table row. Closer to 100 is ideal.
SELECT "table" tablename, skew_rows,
  ROUND(CAST(max_blocks_per_slice AS FLOAT) /
  GREATEST(NVL(min_blocks_per_slice,0)::int,1)::FLOAT,5) storage_skew,
  ROUND(CAST(100*dist_slice AS FLOAT) /
  (SELECT COUNT(DISTINCT slice) FROM stv_slices),2) pct_populated
FROM svv_table_info ti
  JOIN (SELECT tbl, MIN(c) min_blocks_per_slice,
          MAX(c) max_blocks_per_slice,
          COUNT(DISTINCT slice) dist_slice
        FROM (SELECT b.tbl, b.slice, COUNT(*) AS c
              FROM STV_BLOCKLIST b
              GROUP BY b.tbl, b.slice)
        WHERE tbl = 240791 GROUP BY tbl) iq ON iq.tbl = ti.table_id;
       tablename       | skew_rows | storage_skew | pct_populated
 lineitem_dk_l_partkey |      1.00 |      1.00259 |           100
(1 row)

Note: A small amount of data skew shouldn’t immediately discourage you from considering an otherwise appropriate distribution key. In many cases, the benefits of collocating large JOIN operations offset the cost of cluster slices processing a slightly uneven workload.

Does the column data have high cardinality?

Cardinality is a relative measure of how many distinct values exist within the column. It’s important to consider cardinality alongside the uniformity of data distribution. In some scenarios, a uniform distribution of data can result in low relative cardinality. Low relative cardinality leads to wasted compute capacity from lack of parallelization. For example, consider a cluster with 576 slices (36x DS2.8XLARGE) and the following table:

CREATE TABLE orders (                                            
  o_orderkey int8 NOT NULL			,
  o_custkey int8 NOT NULL			,
  o_orderstatus char(1) NOT NULL		,
  o_totalprice numeric(12,2) NOT NULL	,
  o_orderdate date NOT NULL DISTKEY ,
  o_orderpriority char(15) NOT NULL	,
  o_clerk char(15) NOT NULL			,
  o_shippriority int4 NOT NULL		,
  o_comment varchar(79) NOT NULL                  


Within this table, I retain a billion records representing 12 months of orders. Day to day, I expect that the number of orders remains more or less consistent. This consistency creates a uniformly distributed dataset:

[email protected]/tpch=# SELECT o_orderdate, count(*) 
 o_orderdate |  count
 1993-01-18  | 2651712
 1993-08-29  | 2646252
 1993-12-05  | 2644488
 1993-12-04  | 2642598
 1993-09-28  | 2593332
 1993-12-12  | 2593164
 1993-11-14  | 2593164
 1993-12-07  | 2592324
(365 rows)

However, the cardinality is relatively low when we compare the 365 distinct values of the o_orderdate DISTKEY column to the 576 cluster slices. If each day’s value were hashed and assigned to an empty slice, this data only populates 63% of the cluster at best. Over 37% of the cluster remains idle during scans against this table. In real-life scenarios, we’ll end up assigning multiple distinct values to already populated slices before we populate each empty slice with at least one value.

-- How many values are assigned to each slice
[email protected]/tpch=# SELECT rows/2592324 assigned_values, COUNT(*) number_of_slices FROM stv_tbl_perm WHERE name='orders' AND slice<6400 
 assigned_values | number_of_slices
               0 |              307
               1 |              192
               2 |               61
               3 |               13
               4 |                3
(5 rows)

So in this scenario, on one end of the spectrum we have 307 of 576 slices not populated with any day’s worth of data, and on the other end we have 3 slices populated with 4 days’ worth of data. Query execution is limited by the rate at which those 3 slices can process their data. At the same time, over half of the cluster remains idle.

Note: The pct_slices_populated column from the table_inspector.sql query result identifies tables that aren’t fully populating the slices within a cluster.

On the other hand, suppose the o_orderdate DISTKEY column was defined with the timestamp data type and actually stores true order timestamp data (not dates stored as timestamps). In this case, the granularity of the time dimension causes the cardinality of the column to increase from the order of hundreds to the order of millions of distinct values. This approach results in all 576 slices being much more evenly populated.

Note: A timestamp column isn’t usually an appropriate DISTKEY column, because it’s often not joined or aggregated on. However, this case illustrates how relative cardinality can be influenced by data granularity, and the significance it has in resulting in a uniform and complete distribution of table data throughout a cluster.

Do queries perform selective filters on the column?


Even if the DISTKEY column ensures a uniform distribution of data throughout the cluster, suboptimal parallelism can arise if that same column is also used to selectively filter records from the table. To illustrate this, the same orders table with a DISTKEY on o_orderdate is still populated with 1 billion records spanning 365 days of data:

CREATE TABLE orders (                                            
  o_orderkey int8 NOT NULL			,
  o_custkey int8 NOT NULL			,
  o_orderstatus char(1) NOT NULL		,
  o_totalprice numeric(12,2) NOT NULL	,
  o_orderdate date NOT NULL DISTKEY ,
  o_orderpriority char(15) NOT NULL	,
  o_clerk char(15) NOT NULL			,
  o_shippriority int4 NOT NULL		,
  o_comment varchar(79) NOT NULL                  

This time, consider the table on a smaller cluster with 80 slices (5x DS2.8XLARGE) instead of 576 slices. With a uniform data distribution and ~4-5x more distinct values than cluster slices, it’s likely that query execution is more evenly parallelized for full table scans of the table. This effect occurs because each slice is more likely to be populated and assigned an equivalent number of records.

However, in many use cases full table scans are uncommon. For example, with time series data it’s more typical for the workload to scan the past 1, 7, or 30 days of data than it is to repeatedly scan the entire table. Let’s assume I have one of these time series data workloads that performs analytics on orders from the last 7 days with SQL patterns, such as the following:

SELECT ... FROM orders 
JOIN ... 
JOIN ... 
AND o_orderdate between current_date-7 and current_date-1
GROUP BY ...;  

With a predicate such as this, we limit the relevant values to just 7 days. All of these days must reside on a maximum of 7 slices within the cluster. Due to consistent hashing, slices that contain one or more of these 7 values contain all of the records for those specific values:

[email protected]/tpch=# SELECT SLICE_NUM(), COUNT(*) FROM orders 
WHERE o_orderdate BETWEEN current_date-7 AND current_date-1 
 slice_num |  count
         3 | 2553840
        33 | 2553892
        40 | 2555232
        41 | 2553092
        54 | 2554296
        74 | 2552168
        76 | 2552224
(7 rows)  

With the dataset shown above, we have at best 7 slices, each fetching 2.5 million rows to perform further processing. For the scenario with EVEN distribution, we expect 80 slices to fetch ~240,000 records each (((109 records / 365 days) * 7 days) / 80 slices). The important comparison to consider is whether there is in having only 7 slices fetch and process 2.5 million records relative to all 80 slices fetching and processing ~240,000 records each.

If the overhead of having a subset of slices perform the majority of the work is significant, then you want to separate your distribution style from your selective filtering criteria. To do so, choose a different distribution key.

Use the following query to identify how frequently your scans include predicates which filter on the table’s various columns:

    ti."table", ti.diststyle, RTRIM(a.attname) column_name,
    COUNT(DISTINCT s.query ||'-'|| s.segment ||'-'|| s.step) as num_scans,
    COUNT(DISTINCT CASE WHEN TRANSLATE(TRANSLATE(info,')',' '),'(',' ') LIKE ('%'|| a.attname ||'%') THEN s.query ||'-'|| s.segment ||'-'|| s.step END) AS column_filters
FROM stl_explain p
JOIN stl_plan_info i ON ( i.userid=p.userid AND i.query=p.query AND i.nodeid=p.nodeid  )
JOIN stl_scan s ON (s.userid=i.userid AND s.query=i.query AND s.segment=i.segment AND s.step=i.step)
JOIN svv_table_info ti ON ti.table_id=s.tbl
JOIN pg_attribute a ON (a.attrelid=s.tbl AND a.attnum > 0)
WHERE s.tbl IN ([table_id]) 
GROUP BY 1,2,3,a.attnum
ORDER BY attnum;  

From this query result, if the potential DISTKEY column is frequently scanned, you can perform further investigation to identify if those filters are extremely selective or not using more complex SQL:

    ti.schemaname||'.'||ti.tablename AS "table", 
    AVG(r.s_rows_pre_filter) avg_s_rows_pre_filter,
    100*ROUND(1::float - AVG(r.s_rows_pre_filter)::float/ti.tbl_rows::float,6) avg_prune_pct,
    AVG(r.s_rows) avg_s_rows,
    100*ROUND(1::float - AVG(r.s_rows)::float/AVG(r.s_rows_pre_filter)::float,6) avg_filter_pct,
    COUNT(DISTINCT i.query) AS num,
    AVG(r.time) AS scan_time,
    MAX(i.query) AS query, TRIM(info) as filter
FROM stl_explain p
JOIN stl_plan_info i ON ( i.userid=p.userid AND i.query=p.query AND i.nodeid=p.nodeid  )
JOIN stl_scan s ON (s.userid=i.userid AND s.query=i.query AND s.segment=i.segment AND s.step=i.step)
JOIN (SELECT table_id,"table" tablename,schema schemaname,tbl_rows,unsorted,sortkey1,sortkey_num,diststyle FROM svv_table_info) ti ON ti.table_id=s.tbl
SELECT query, segment, step, DATEDIFF(s,MIN(starttime),MAX(endtime)) AS time, SUM(rows) s_rows, SUM(rows_pre_filter) s_rows_pre_filter, ROUND(SUM(rows)::float/SUM(rows_pre_filter)::float,6) filter_pct
FROM stl_scan
WHERE userid>1 AND type=2
AND starttime < endtime
GROUP BY 1,2,3
HAVING sum(rows_pre_filter) > 0
) r ON (r.query = i.query and r.segment = i.segment and r.step = i.step)
LEFT JOIN (SELECT attrelid,t.typname FROM pg_attribute a JOIN pg_type t ON t.oid=a.atttypid WHERE attsortkeyord IN (1,-1)) a ON a.attrelid=s.tbl
WHERE s.tbl IN ([table_id])
AND p.info LIKE 'Filter:%' AND p.nodeid > 0

The above SQL describes these items:

  • tbl_rows: Current number of rows in the table at this moment in time.
  • avg_s_rows_pre_filter: Number of rows that were actually scanned after the zone maps were leveraged to prune a number of blocks from being fetched.
  • avg_prune_pct: Percentage of rows that were pruned from the table just by leveraging the zone maps.
  • avg_s_rows: Number of rows remaining after applying the filter criteria defined in the SQL.
  • avg_filter_pct: Percentage of rows remaining, relative to avg_s_rows_pre_filter, after a user defined filter has been applied.
  • num: Number of queries that include this filter criteria.
  • scan_time: Average number of seconds it takes for the segment which includes that scan to complete.
  • query: Example query ID for the query that issued these filter criteria.
  • filter: Detailed filter criteria specified by user.

In the following query results, we can assess the selectivity for a given filter predicate. Your knowledge of the data profile, and how many distinct values exist within a given range constrained by the filter condition, lets you identify whether a filter should be considered selective or not. If you’re not sure of the data profile, you can always construct SQL code from the query results to get a count of distinct values within that range:

table                 | public.orders
tbl_rows              | 22751520
avg_s_rows_pre_filter | 12581124
avg_prune_pct         | 44.7021
avg_s_rows            | 5736106
avg_filter_pct        | 54.407
num                   | 2
scan_time             | 19
query                 | 1721037
filter                | Filter: ((o_orderdate < '1993-08-01'::date) AND (o_orderdate >= '1993-05-01'::date))

FROM public.orders 
WHERE o_orderdate < '1993-08-01' AND o_orderdate >= '1993-05-01';

We’d especially like to avoid columns that have query patterns with these characteristics:

  • Relative to tbl_rows:
    • A low value for avg_s_rows
    • A high value for avg_s_rows_pre_filter
  • A selective filter on the potential DISTKEY column
  • Limited distinct values within the returned range
  • High scan_time

 If such patterns exist for a column, it’s likely that this column is not a good DISTKEY candidate.

Is the column also a primary compound sortkey column?


Note: Sort keys are discussed in greater detail within Part 3 of this blog series.

As shown in the flow chart, even if we are using the column to selectively filter records (thereby potentially restricting post-scan processing to a portion of the slices), in some circumstances it still makes sense to use the column as the distribution key.

If we selectively filter on the column, we might also be using a sortkey on this column. This approach lets us use the column zone maps effectively on non-necessary slices, to quickly identify the relevant blocks to fetch. Doing this makes selective scanning less expensive by orders of magnitude than a full column scan on each slice. In turn, this lower cost helps to offset the cost of a reduced number of slices processing the bulk of the data after the scan.

You can use the following query to determine the primary sortkey for a table:

SELECT attname FROM pg_attribute 
WHERE attrelid = [table_id] AND attsortkeyord = 1;

Using the SQL code from the last step (used to check avg_s_rows, the number of distinct values in the returned range, and so on), we see the characteristics of a valid DISTKEY option include the following:

  • Relative to tbl_rows, a low value for avg_s_rows_pre_filter
  • Relative to avg_s_rows_pre_filter, a similar number for avg_s_rows
  • Selective filter on the potential DISTKEY column
  • Numerous distinct values within the returned range
  • Low or insignificant scan_time

If such patterns exist, it’s likely that this column is a good DISTKEY candidate.

Do the query patterns facilitate MERGE JOINs?


When the following criteria are met, you can use a MERGE JOIN operation, the fastest of the three join operations:

  1. Two tables are sorted (using a compound sort key) and distributed on the same columns.
  2. Both tables are over 80% sorted (svv_table_info.unsorted < 20%)
  3. These tables are joined using the DISTKEY and SORTKEY columns in the JOIN condition.

Because of these restrictive criteria, it’s unusual to encounter a MERGE JOIN operation by chance. Typically, an end user makes explicit design decisions to force this type of JOIN operation, usually because of a requirement for a particular query’s performance. If this JOIN pattern doesn’t exist in your workload, then you won’t benefit from this optimized JOIN operation.

The following query returns the number of statements that scanned your table, scanned another table that was sorted and distributed on the same column, and performed some type of JOIN operation:

SELECT COUNT(*) num_queries FROM stl_query
WHERE query IN (
  SELECT DISTINCT query FROM stl_scan 
  WHERE tbl = [table_id] AND type = 2 AND userid > 1
  SELECT DISTINCT query FROM stl_scan 
  WHERE tbl <> [table_id] AND type = 2 AND userid > 1
  AND tbl IN (
    SELECT DISTINCT attrelid FROM pg_attribute 
    WHERE attisdistkey = true AND attsortkeyord > 0
    SELECT DISTINCT attrelid FROM pg_attribute
    WHERE attsortkeyord = -1)
  (SELECT DISTINCT query FROM stl_hashjoin WHERE userid > 1
  SELECT DISTINCT query FROM stl_nestloop WHERE userid > 1
  SELECT DISTINCT query FROM stl_mergejoin WHERE userid > 1)

If this query returns any results, you potentially have an opportunity to enable a MERGE JOIN for existing queries without modifying any other tables. If this query returns no results, then you need to proactively tune multiple tables simultaneously to facilitate the performance of a single query.

Note: If a desired MERGE JOIN optimization requires reviewing and modifying multiple tables, you approach the problem in a different fashion than this straightforward approach. This more complex approach goes beyond the scope of this article. If you’re interested in implementing such an optimization, you can check our documentation on the JOIN operations and ask specific questions in the comments at the end of this blog post.

Phase One Recap

Throughout this phase, we answered questions to determine which columns in this table were potentially appropriate DISTKEY columns for our table. At the end of these steps, you might have identified zero to many potential columns for your specific table and dataset. We’ll be keeping these columns (or lack thereof) in mind as we move along to the next phase.

Phase 2: Deciding Distribution Style

Phase 2 dives deeper into the potential distribution styles to determine which is the best choice for your workload. Generally, it’s best to strive for a DISTSTYLE of KEY whenever appropriate. Choose ALL in the scenarios where it makes sense (and KEY doesn’t). Only choose EVEN when neither KEY nor ALL is appropriate.

We’ll work though the following flowchart to assist us with our decision. Because DISTSTYLE is a table property, we run through this analysis table by table, after having completed phase 1 preceding.


Does the table participate in JOINs?


DISTSTYLE ALL is only used to guarantee colocation of JOIN operations, regardless of the columns specified in the JOIN conditions. If the table doesn’t participate in JOIN operations, then DISTSTYLE ALL offers no performance benefits and should be eliminated from consideration.

JOIN operations that benefit from colocation span a robust set of database operations. WHERE clause and JOIN clause join operations (INNER, OUTER, and so on) are obviously included, and so are some not-as-obvious operations and syntax like IN, NOT IN, MINUS/EXCEPT, INTERSECT and EXISTS. When answering whether the table participates in JOINs, consider all of these operations.

This query confirms how many distinct queries have scanned this table and have included one or more JOIN operations at some point in the same query:

SELECT DISTINCT query FROM stl_scan 
WHERE tbl = [table_id] AND type = 2 AND userid > 1
(SELECT DISTINCT query FROM stl_hashjoin
SELECT DISTINCT query FROM stl_nestloop
SELECT DISTINCT query FROM stl_mergejoin));

If this query returns a count of 0, then the table isn’t participating in any type of JOIN, no matter what operations are in use.

Note: Certain uncommon query patterns can cause the preceding query to return false positives (such as if you have simple scan against your table that is later appended to a result set of a subquery that contains JOINs). If you’re not sure, you can always look at the queries specifically with this code:

SELECT userid, query, starttime, endtime, rtrim(querytxt) qtxt 
FROM stl_query WHERE query IN (
SELECT DISTINCT query FROM stl_scan 
WHERE tbl = [table_id] AND type = 2 AND userid > 1
(SELECT DISTINCT query FROM stl_hashjoin
SELECT DISTINCT query FROM stl_nestloop
SELECT DISTINCT query FROM stl_mergejoin))
ORDER BY starttime;


Does the table contain at least one potential DISTKEY column?


The process detailed in phase 1 helped us to identify a table’s appropriate DISTKEY columns. If no appropriate DISTKEY columns exist, then KEY DISTSTYLE is removed from consideration. If appropriate DISTKEY columns do exist, then EVEN distribution is removed from consideration.

With this simple rule, the decision is never between KEY, EVEN, and ALL—rather it’s between these:

  • KEY and ALL in cases where at least one valid DISTKEY column exists
  • EVEN and ALL in cases where no valid DISTKEY columns exist


Can you tolerate additional storage overhead?


To answer whether you can tolerate additional storage overhead, the questions are: How large is the table and how is it currently distributed? You can use the following query to answer these questions:

SELECT table_id, "table", diststyle, size, pct_used 
FROM svv_table_info WHERE table_id = [table_id];

The following example shows how many 1 MB blocks and the percentage of total cluster storage that are currently consumed by duplicate versions of the same orders table with different DISTSTYLEs:

[email protected]/tpch=# SELECT "table", diststyle, size, pct_used
FROM svv_table_info
WHERE "table" LIKE 'orders_diststyle_%';
         table         |    diststyle    | size  | pct_used
 orders_diststyle_even | EVEN            |  6740 |   1.1785
 orders_diststyle_key  | KEY(o_orderkey) |  6740 |   1.1785
 orders_diststyle_all  | ALL             | 19983 |   3.4941
(3 rows)

For DISTSTYLE EVEN or KEY, each node receives just a portion of total table data. However, with DISTSTYLE ALL we are storing a complete version of the table on each compute node. For ALL, as we add nodes to a cluster the amount of data per node remains unchanged. Whether this is significant or not depends on your table size, cluster configuration, and storage overhead. If you use a DS2.8XLARGE configuration with 16TB of storage per node, this increase might be a negligible amount of per-node storage. However, if you use a DC1.LARGE configuration with 160GB of storage per node, then the increase in total cluster storage might be an unacceptable increase.

You can multiply the number of nodes by the current size of your KEY or EVEN distributed table to get a rough estimate of the size of the table as DISTSTYLE ALL. This approach should be provide information to determine if ALL results in an unacceptable growth in table storage:

SELECT "table", size, pct_used, 
 CASE diststyle
  ELSE '< ' || size*(SELECT COUNT(DISTINCT node) FROM stv_slices)
 END est_distall_size,
 CASE diststyle
  WHEN 'ALL' THEN pct_used::TEXT
  ELSE '< ' || pct_used*(SELECT COUNT(DISTINCT node) FROM stv_slices)
 END est_distall_pct_used
FROM svv_table_info WHERE table_id = [table_id];

If the estimate is unacceptable, then DISTSTYLE ALL should be removed from consideration.

Do the query patterns tolerate reduced parallelism?


In MPP database systems, performance at scale is achieved by simultaneously processing portions of the complete dataset with several distributed resources. DISTSTYLE ALL means that you’re sacrificing some parallelism, for both read and write operations, to guarantee a colocation of data on each node.

At some point, the benefits of DISTSTYLE ALL tables are offset by the parallelism reduction. At this point, DISTSTYLE ALL is not a valid option. Where that threshold occurs is different for your write operations and your read operations.

Write operations

For a table with KEY or EVEN DISTSTYLE, database write operations are parallelized across each of the slices. This parallelism means that each slice needs to process only a portion of the complete write operation. For ALL distribution, the write operation doesn’t benefit from parallelism because the write needs to be performed in full on every single node to keep the full dataset synchronized on all nodes. This approach significantly reduces performance compared to the same type of write operation performed on a KEY or EVEN distributed table.

If your table is the target of frequent write operations and you find you can’t tolerate the performance hit, that eliminates DISTSTYLE ALL from consideration.

This query identifies how many write operations have modified a table:

SELECT '[table_id]' AS "table_id", 
(SELECT count(*) FROM 
(SELECT DISTINCT query FROM stl_insert WHERE tbl = [table_id]
SELECT DISTINCT query FROM stl_delete WHERE tbl = [table_id])) AS num_updates,
(SELECT count(*) FROM 
(SELECT DISTINCT query FROM stl_delete WHERE tbl = [table_id]
SELECT DISTINCT query FROM stl_insert WHERE tbl = [table_id])) AS num_deletes,
(SELECT DISTINCT query FROM stl_insert WHERE tbl = [table_id] 
SELECT distinct query FROM stl_s3client
SELECT DISTINCT query FROM stl_delete WHERE tbl = [table_id])) AS num_inserts,
(SELECT DISTINCT query FROM stl_insert WHERE tbl = [table_id]
SELECT distinct query FROM stl_s3client)) as num_copies,
(SELECT DISTINCT xid FROM stl_vacuum WHERE table_id = [table_id]
AND status NOT LIKE 'Skipped%')) AS num_vacuum;

If your table is rarely written to, or if you can tolerate the performance hit, then DISTSTYLE ALL is still a valid option.

Read operations

Reads that access DISTSTYLE ALL tables require slices to scan and process the same data multiple times for a single query operation. This approach seeks to improve query performance by avoiding the network I/O overhead of broadcasting or redistributing data to facilitate a join or aggregation. At the same time, it increases the necessary compute and disk I/O due to the excess work being performed over the same data multiple times.

Suppose that you access the table in many ways, sometimes joining, sometimes not. In this case, you’ll need to determine if the benefit of collocating JOINs with DISTSTYLE ALL is significant and desirable or if the cost of reduced parallelism impacts your queries more significantly.

Patterns and trends to avoid

DISTSTYLE ALL tables are most appropriate for smaller, slowly changing dimension tables. As a general set of guidelines, the patterns following typically suggest that DISTSTYLE ALL is a poor option for a given table:

  • Read operations:
    • Scans against large fact tables
    • Single table scans that are not participating in JOINs
    • Scans against tables with complex aggregations (for example, several windowing aggregates with different partitioning, ordering, and frame clauses)
  • Write operations:
    • A table that is frequently modified with DML statements
    • A table that is ingested with massive data loads
    • A table that requires frequent maintenance with VACUUM or VACUUM REINDEX operations

If your table is accessed in a way that meets these criteria, then DISTSTYLE ALL is unlikely to be a valid option.

Do the query patterns utilize potential DISTKEY columns in JOIN conditions?

If the table participates in JOIN operations and has appropriate DISTKEY columns, then we need to decide between KEY or ALL distribution styles. Considering only how the table participates in JOIN operations, and no other outside factors, these criteria apply:

  • ALL distribution is most appropriate when any of these are true:
  • KEY distribution is most appropriate when


Determining the best DISTKEY column

If you’ve determined that DISTSTYLE KEY is best for your table, the next step is to determine which column serves as the ideal DISTKEY column. Of the columns you’ve flagged as appropriate potential DISTKEY columns in phase 1, you’ll want to identify which has the largest impact on your particular workload.

For tables with only a single candidate column, or for workloads that only use one of the candidate columns in JOINs, the choice is obvious. For workloads with mixed JOIN conditions against the same table, the most optimal column is determined based on your business requirements.

For example, common scenarios to encounter and questions to ask yourself about how you want to distribute are the following:

  • My transformation SQL code and reporting workload benefit from different columns. Do I want to facilitate my transformation job or reporting performance?
  • My dashboard queries and structured reports leverage different JOIN conditions. Do I value interactive query end user experience over business-critical report SLAs?
  • Should I distribute on column_A that occurs in a JOIN condition thousands of times daily for less important analytics, or on column_B that is referenced only tens of times daily for more important analytics? Would I rather improve a 5 second query to 2 seconds 1,000 times per day, or improve a 60-minute query to 24 minutes twice per day?

Your business requirements and where you place value answer these questions, so there is no simple way to offer guidance that covers all scenarios. If you have a scenario with mixed JOIN conditions and no real winner in value, you can always test multiple distribution key options and measure what works best for you. Or you can materialize multiple copies of the table distributed on differing columns and route queries to disparate tables based on query requirements. If you end up attempting the latter approach, pgbouncer-rr is a great utility to simplify the routing of queries for your end users.

Next Steps

Choosing optimal DISTSTYLE and DISTKEY options for your table ensures that your data is distributed evenly for parallel processing, and that data redistribution during query execution is minimal—which ensures your complex analytical workloads perform well over multipetabyte datasets.

By following the process detailed preceding, you can identify the ideal DISTSTYLE and DISTKEY for your specific tables. The final step is to simply rebuild the tables to apply these optimizations. This rebuild can be performed at any time. However, if you intend to continue reading through parts 3, 4, and 5 of the Advanced Table Design Playbook, you might want to wait until the end before you issue the table rebuilds. Otherwise, you might find yourself rebuilding these tables multiple times to implement optimizations identified in later installments.

In Part 3 of our table design playbook, I’ll describe how to use table properties related to table sorting styles and sort keys for another significant performance gain.

Amazon Redshift Engineering’s Advanced Table Design Playbook

Part 1: Preamble, Prerequisites, and Prioritization
Part 2: Distribution Styles and Distribution Keys
Part 3: Compound and Interleaved Sort Keys (December 6, 2016)
Part 4: Compression Encodings (December 7, 2016)
Part 5: Table Data Durability (December 8, 2016)

About the author

christophersonZach Christopherson is a Palo Alto based Senior Database Engineer at AWS.
He assists Amazon Redshift users from all industries in fine-tuning their workloads for optimal performance. As a member of the Amazon Redshift service team, he also influences and contributes to the development of new and existing service features. In his spare time, he enjoys trying new restaurants with his wife, Mary, and caring for his newborn daughter, Sophia.



Top 10 Performance Tuning Techniques for Amazon Redshift (Updated Nov. 28, 2016)