Tag Archives: weather station

MagPi 71: Run Android on Raspberry Pi

Post Syndicated from Rob Zwetsloot original https://www.raspberrypi.org/blog/magpi-71-android-raspberry-pi/

Hey folks, Rob here with good news about the latest edition of The MagPi! Issue 71, out right now, is all about running Android on Raspberry Pi with the help of emteria.OS and Android Things.

Raspberry Pi The MagPi Magazine issue 71 - Android

Android and Raspberry Pi, two great tastes that go great together!

Android and Raspberry Pi

A big part of our main feature looks at emteria.OS, a version of Android that runs directly on the Raspberry Pi. By running it on a touchscreen setup, you can use your Pi just like an Android tablet — one that’s easily customisable and hackable for all your embedded computing needs. Inside the issue, we’ve got a special emteria.OS discount code for readers.

We also look at Android Things, the official Android release for Raspberry Pi that focuses on IoT applications, and we show you some of the amazing projects that have been built with it.

More in The MagPi

If Android’s not your thing, we also have a big feature on building a Raspberry Pi weather station in issue 71!

Raspberry Pi The MagPi Magazine issue 71 - Android

Build your own Raspberry Pi weather station

On top of that, we’ve included guides on how to get started with TensorFlow AI and on building an oscilloscope.

Raspberry Pi The MagPi Magazine issue 71 - Android

We really loved this card scanning project! Read all about it in issue 71.

All this, along with our usual varied selection of project showcases, excellent tutorials, and definitive reviews!

Get The MagPi 71

You can get The MagPi 71 today from WHSmith, Tesco, Sainsbury’s, and Asda. If you live in the US, head over to your local Barnes & Noble or Micro Center in the next few days for a print copy. You can also get the new issue online from our store, or digitally via our Android or iOS apps. And don’t forget, there’s always the free PDF as well.

New subscription offer!

Want to support the Raspberry Pi Foundation and the magazine? We’ve launched a new way to subscribe to the print version of The MagPi: you can now take out a monthly £4 subscription to the magazine, effectively creating a rolling pre-order system that saves you money on each issue.

The MagPi subscription offer — Run Android on Raspberry Pi

You can also take out a twelve-month print subscription and get a Pi Zero W plus case and adapter cables absolutely free! This offer does not currently have an end date.

That’s it, folks! See you at Raspberry Fields.

The post MagPi 71: Run Android on Raspberry Pi appeared first on Raspberry Pi.

Build your own weather station with our new guide!

Post Syndicated from Richard Hayler original https://www.raspberrypi.org/blog/build-your-own-weather-station/

One of the most common enquiries I receive at Pi Towers is “How can I get my hands on a Raspberry Pi Oracle Weather Station?” Now the answer is: “Why not build your own version using our guide?”

Build Your Own weather station kit assembled

Tadaaaa! The BYO weather station fully assembled.

Our Oracle Weather Station

In 2016 we sent out nearly 1000 Raspberry Pi Oracle Weather Station kits to schools from around the world who had applied to be part of our weather station programme. In the original kit was a special HAT that allows the Pi to collect weather data with a set of sensors.

The original Raspberry Pi Oracle Weather Station HAT – Build Your Own Raspberry Pi weather station

The original Raspberry Pi Oracle Weather Station HAT

We designed the HAT to enable students to create their own weather stations and mount them at their schools. As part of the programme, we also provide an ever-growing range of supporting resources. We’ve seen Oracle Weather Stations in great locations with a huge differences in climate, and they’ve even recorded the effects of a solar eclipse.

Our new BYO weather station guide

We only had a single batch of HATs made, and unfortunately we’ve given nearly* all the Weather Station kits away. Not only are the kits really popular, we also receive lots of questions about how to add extra sensors or how to take more precise measurements of a particular weather phenomenon. So today, to satisfy your demand for a hackable weather station, we’re launching our Build your own weather station guide!

Build Your Own Raspberry Pi weather station

Fun with meteorological experiments!

Our guide suggests the use of many of the sensors from the Oracle Weather Station kit, so can build a station that’s as close as possible to the original. As you know, the Raspberry Pi is incredibly versatile, and we’ve made it easy to hack the design in case you want to use different sensors.

Many other tutorials for Pi-powered weather stations don’t explain how the various sensors work or how to store your data. Ours goes into more detail. It shows you how to put together a breadboard prototype, it describes how to write Python code to take readings in different ways, and it guides you through recording these readings in a database.

Build Your Own Raspberry Pi weather station on a breadboard

There’s also a section on how to make your station weatherproof. And in case you want to move past the breadboard stage, we also help you with that. The guide shows you how to solder together all the components, similar to the original Oracle Weather Station HAT.

Who should try this build

We think this is a great project to tackle at home, at a STEM club, Scout group, or CoderDojo, and we’re sure that many of you will be chomping at the bit to get started. Before you do, please note that we’ve designed the build to be as straight-forward as possible, but it’s still fairly advanced both in terms of electronics and programming. You should read through the whole guide before purchasing any components.

Build Your Own Raspberry Pi weather station – components

The sensors and components we’re suggesting balance cost, accuracy, and easy of use. Depending on what you want to use your station for, you may wish to use different components. Similarly, the final soldered design in the guide may not be the most elegant, but we think it is achievable for someone with modest soldering experience and basic equipment.

You can build a functioning weather station without soldering with our guide, but the build will be more durable if you do solder it. If you’ve never tried soldering before, that’s OK: we have a Getting started with soldering resource plus video tutorial that will walk you through how it works step by step.

Prototyping HAT for Raspberry Pi weather station sensors

For those of you who are more experienced makers, there are plenty of different ways to put the final build together. We always like to hear about alternative builds, so please post your designs in the Weather Station forum.

Our plans for the guide

Our next step is publishing supplementary guides for adding extra functionality to your weather station. We’d love to hear which enhancements you would most like to see! Our current ideas under development include adding a webcam, making a tweeting weather station, adding a light/UV meter, and incorporating a lightning sensor. Let us know which of these is your favourite, or suggest your own amazing ideas in the comments!

*We do have a very small number of kits reserved for interesting projects or locations: a particularly cool experiment, a novel idea for how the Oracle Weather Station could be used, or places with specific weather phenomena. If have such a project in mind, please send a brief outline to [email protected], and we’ll consider how we might be able to help you.

The post Build your own weather station with our new guide! appeared first on Raspberry Pi.

Protecting coral reefs with Nemo-Pi, the underwater monitor

Post Syndicated from Janina Ander original https://www.raspberrypi.org/blog/coral-reefs-nemo-pi/

The German charity Save Nemo works to protect coral reefs, and they are developing Nemo-Pi, an underwater “weather station” that monitors ocean conditions. Right now, you can vote for Save Nemo in the Google.org Impact Challenge.

Nemo-Pi — Save Nemo

Save Nemo

The organisation says there are two major threats to coral reefs: divers, and climate change. To make diving saver for reefs, Save Nemo installs buoy anchor points where diving tour boats can anchor without damaging corals in the process.

reef damaged by anchor
boat anchored at buoy

In addition, they provide dos and don’ts for how to behave on a reef dive.

The Nemo-Pi

To monitor the effects of climate change, and to help divers decide whether conditions are right at a reef while they’re still on shore, Save Nemo is also in the process of perfecting Nemo-Pi.

Nemo-Pi schematic — Nemo-Pi — Save Nemo

This Raspberry Pi-powered device is made up of a buoy, a solar panel, a GPS device, a Pi, and an array of sensors. Nemo-Pi measures water conditions such as current, visibility, temperature, carbon dioxide and nitrogen oxide concentrations, and pH. It also uploads its readings live to a public webserver.

Inside the Nemo-Pi device — Save Nemo
Inside the Nemo-Pi device — Save Nemo
Inside the Nemo-Pi device — Save Nemo

The Save Nemo team is currently doing long-term tests of Nemo-Pi off the coast of Thailand and Indonesia. They are also working on improving the device’s power consumption and durability, and testing prototypes with the Raspberry Pi Zero W.

web dashboard — Nemo-Pi — Save Nemo

The web dashboard showing live Nemo-Pi data

Long-term goals

Save Nemo aims to install a network of Nemo-Pis at shallow reefs (up to 60 metres deep) in South East Asia. Then diving tour companies can check the live data online and decide day-to-day whether tours are feasible. This will lower the impact of humans on reefs and help the local flora and fauna survive.

Coral reefs with fishes

A healthy coral reef

Nemo-Pi data may also be useful for groups lobbying for reef conservation, and for scientists and activists who want to shine a spotlight on the awful effects of climate change on sea life, such as coral bleaching caused by rising water temperatures.

Bleached coral

A bleached coral reef

Vote now for Save Nemo

If you want to help Save Nemo in their mission today, vote for them to win the Google.org Impact Challenge:

  1. Head to the voting web page
  2. Click “Abstimmen” in the footer of the page to vote
  3. Click “JA” in the footer to confirm

Voting is open until 6 June. You can also follow Save Nemo on Facebook or Twitter. We think this organisation is doing valuable work, and that their projects could be expanded to reefs across the globe. It’s fantastic to see the Raspberry Pi being used to help protect ocean life.

The post Protecting coral reefs with Nemo-Pi, the underwater monitor appeared first on Raspberry Pi.

Tackling climate change and helping the community

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/fair-haven-weather-station/

In today’s guest post, seventh-grade students Evan Callas, Will Ross, Tyler Fallon, and Kyle Fugate share their story of using the Raspberry Pi Oracle Weather Station in their Innovation Lab class, headed by Raspberry Pi Certified Educator Chris Aviles.

Raspberry Pi Certified Educator Chris Aviles Innovation Lab Oracle Weather Station

United Nations Sustainable Goals

The past couple of weeks in our Innovation Lab class, our teacher, Mr Aviles, has challenged us students to design a project that helps solve one of the United Nations Sustainable Goals. We chose Climate Action. Innovation Lab is a class that gives students the opportunity to learn about where the crossroads of technology, the environment, and entrepreneurship meet. Everyone takes their own paths in innovation and learns about the environment using project-based learning.

Raspberry Pi Certified Educator Chris Aviles Innovation Lab Oracle Weather Station

Raspberry Pi Oracle Weather Station

For our climate change challenge, we decided to build a Raspberry Pi Oracle Weather Station. Tackling the issues of climate change in a way that helps our community stood out to us because we knew with the help of this weather station we can send the local data to farmers and fishermen in town. Recent changes in climate have been affecting farmers’ crops. Unexpected rain, heat, and other unusual weather patterns can completely destabilize the natural growth of the plants and destroy their crops altogether. The amount of labour output needed by farmers has also significantly increased, forcing farmers to grow more food on less resources. By using our Raspberry Pi Oracle Weather Station to alert local farmers, they can be more prepared and aware of the weather, leading to better crops and safe boating.

Raspberry Pi Certified Educator Chris Aviles Innovation Lab Oracle Weather Station

Growing teamwork and coding skills

The process of setting up our weather station was fun and simple. Raspberry Pi made the instructions very easy to understand and read, which was very helpful for our team who had little experience in coding or physical computing. We enjoyed working together as a team and were happy to be growing our teamwork skills.

Once we constructed and coded the weather station, we learned that we needed to support the station with PVC pipes. After we completed these steps, we brought the weather station up to the roof of the school and began collecting data. Our information is currently being sent to the Initial State dashboard so that we can share the information with anyone interested. This information will also be recorded and seen by other schools, businesses, and others from around the world who are using the weather station. For example, we can see the weather in countries such as France, Greece and Italy.

Raspberry Pi Certified Educator Chris Aviles Innovation Lab Oracle Weather Station

Raspberry Pi allows us to build these amazing projects that help us to enjoy coding and physical computing in a fun, engaging, and impactful way. We picked climate change because we care about our community and would like to make a substantial contribution to our town, Fair Haven, New Jersey. It is not every day that kids are given these kinds of opportunities, and we are very lucky and grateful to go to a school and learn from a teacher where these opportunities are given to us. Thanks, Mr Aviles!

To see more awesome projects by Mr Avile’s class, you can keep up with him on his blog and follow him on Twitter.

The post Tackling climate change and helping the community appeared first on Raspberry Pi.

Raspberry Pi aboard Pino, the smart sailboat

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/pino-smart-sailing-boat/

As they sail aboard their floating game design studio Pino, Rekka Bellum and Devine Lu Linvega are starting to explore the use of Raspberry Pis. As part of an experimental development tool and a weather station, Pis are now aiding them on their nautical adventures!

Mar 2018: A Smart Sailboat

Pino is on its way to becoming a smart sailboat! Raspberry Pi is the ideal device for sailors, we hope to make many more projects with it. Also the projects continue still, but we have windows now yay!

Barometer

Using a haul of Pimoroni tech including the Enviro pHat, Scroll pHat HD and Mini Black HAT Hack3r, Rekka and Devine have been experimenting with using a Raspberry Pi Zero as an onboard barometer for their sailboat. On their Hundred Rabbits YouTube channel and website, the pair has documented their experimental setups. They have also built another Raspberry Pi rig for distraction-free work and development.

Hundred Rabbits Pino onboard Raspberry Pi workstation and barometer

The official Raspberry Pi 7″ touch display, a Raspberry Pi 3B+, a Pimorni Blinkt, and a Poker II Keyboard make up Pino‘s experimental development station.

“The Pi computer is currently used only as an experimental development tool aboard Pino, but could readily be turned into a complete development platform, would our principal computers fail.” they explain, before going into the build process for the Raspberry Pi–powered barometer.

Hundred Rabbits Pino onboard Raspberry Pi workstation and barometer

The use of solderless headers make this weather station an ideal build wherever space and tools are limited.

The barometer uses the sensor power of the Pimoroni Enviro HAT to measure atmospheric pressure, and a Raspberry Pi Zero displays this data on the Scroll pHAT HD. It thus advises the two travellers of oncoming storms. By taking advantage of the solderless header provided by the Sheffield-based pirates, the Hundred Rabbits team was able to put the device together with relative ease. They provide all information for the build here.

Hundred Rabbits Pino onboard Raspberry Pi workstation and barometer

All aboard Pino

If you’d like to follow the journey of Rekka Bellum and Devine Lu Linvega as they continue to travel the oceans aboard Pino, you can follow them on YouTube or Twitter, and via their website.

We are Hundred Rabbits

This is us, this what we do, and these are our intentions! We live, and work from our sailboat Pino. Traveling helps us stay creative, and we feed what we see back into our work. We make games, art, books and music under the studio name ‘Hundred Rabbits.’

 

The post Raspberry Pi aboard Pino, the smart sailboat appeared first on Raspberry Pi.

2017 Weather Station round-up

Post Syndicated from Richard Hayler original https://www.raspberrypi.org/blog/2017-weather-station/

As we head into 2018 and start looking forward to longer days in the Northern hemisphere, I thought I’d take a look back at last year’s weather using data from Raspberry Pi Oracle Weather Stations. One of the great things about the kit is that as well as uploading all its readings to the shared online Oracle database, it stores them locally on the Pi in a MySQL or MariaDB database. This means you can use the power of SQL queries coupled with Python code to do automatic data analysis.

Soggy Surrey

My Weather Station has only been installed since May, so I didn’t have a full 52 weeks of my own data to investigate. Still, my station recorded more than 70000 measurements. Living in England, the first thing I wanted to know was: which was the wettest month? Unsurprisingly, both in terms of average daily rainfall and total rainfall, the start of the summer period — exactly when I went on a staycation — was the soggiest:

What about the global Weather Station community?

Even soggier Bavaria

Here things get slightly trickier. Although we have a shiny Oracle database full of all participating schools’ sensor readings, some of the data needs careful interpretation. Many kits are used as part of the school curriculum and do not always record genuine outdoor conditions. Nevertheless, it appears that Adalbert Stifter Gymnasium in Bavaria, Germany, had an even wetter 2017 than my home did:


View larger map

Where the wind blows

The records Robert-Dannemann Schule in Westerstede, Germany, is a good example of data which was most likely collected while testing and investigating the weather station sensors, rather than in genuine external conditions. Unless this school’s Weather Station was transported to a planet which suffers from extreme hurricanes, it wasn’t actually subjected to wind speeds above 1000km/h in November. Dismissing these and all similarly suspect records, I decided to award the ‘Windiest location of the year’ prize to CEIP Noalla-Telleiro, Spain.


View larger map

This school is right on the coast, and is subject to some strong and squally weather systems.

Weather Station at CEIP Noalla - Telleiro

Weather Station at CEIP Noalla-Telleiro

They’ve mounted their wind vane and anemometer nice and high, so I can see how they were able to record such high wind velocities.

A couple of Weather Stations have recently been commissioned in equally exposed places — it will be interesting to see whether they will record even higher speeds during 2018.

Highs and lows

After careful analysis and a few disqualifications (a couple of Weather Stations in contention for this category were housed indoors), the ‘Hottest location’ award went to High School of Chalastra in Thessaloniki, Greece. There were a couple of Weather Stations (the one at The Marwadi Education Foundation in India, for example) that reported higher average temperatures than Chalastra’s 24.54 ºC. However, they had uploaded far fewer readings and their data coverage of 2017 was only partial.


View larger map

At the other end of the thermometer, the location with the coldest average temperature is École de la Rose Sauvage in Calgary, Canada, with a very chilly 9.9 ºC.

Ecole de la Rose sauvage Weather Station

Weather Station at École de la Rose Sauvage

I suspect this school has a good chance of retaining the title: their lowest 2017 temperature of -24 ºC is likely to be beaten in 2018 due to extreme weather currently bringing a freezing start to the year in that part of the world.


View larger map

Analyse your own Weather Station data

If you have an Oracle Raspberry Pi Weather Station and would like to perform an annual review of your local data, you can use this Python script as a starting point. It will display a monthly summary of the temperature and rainfall for 2017, and you should be able to customise the code to focus on other sensor data or on a particular time of year. We’d love to see your results, so please share your findings with [email protected], and we’ll send you some limited-edition Weather Station stickers.

The post 2017 Weather Station round-up appeared first on Raspberry Pi.

Visualising Weather Station data with Initial State

Post Syndicated from Richard Hayler original https://www.raspberrypi.org/blog/initial-state/

Since we launched the Oracle Weather Station project, we’ve collected more than six million records from our network of stations at schools and colleges around the world. Each one of these records contains data from ten separate sensors — that’s over 60 million individual weather measurements!

Weather station measurements in Oracle database - Initial State

Weather station measurements in Oracle database

Weather data collection

Having lots of data covering a long period of time is great for spotting trends, but to do so, you need some way of visualising your measurements. We’ve always had great resources like Graphing the weather to help anyone analyse their weather data.

And from now on its going to be even easier for our Oracle Weather Station owners to display and share their measurements. I’m pleased to announce a new partnership with our friends at Initial State: they are generously providing a white-label platform to which all Oracle Weather Station recipients can stream their data.

Using Initial State

Initial State makes it easy to create vibrant dashboards that show off local climate data. The service is perfect for having your Oracle Weather Station data on permanent display, for example in the school reception area or on the school’s website.

But that’s not all: the Initial State toolkit includes a whole range of easy-to-use analysis tools for extracting trends from your data. Distribution plots and statistics are just a few clicks away!

Humidity value distribution (May-Nov 2017) - Raspberry Pi Oracle Weather Station Initial State

Looks like Auntie Beryl is right — it has been a damp old year! (Humidity value distribution May–Nov 2017)

The wind direction data from my Weather Station supports my excuse as to why I’ve not managed a high-altitude balloon launch this year: to use my launch site, I need winds coming from the east, and those have been in short supply.

Chart showing wind direction over time - Raspberry Pi Oracle Weather Station Initial State

Chart showing wind direction over time

Initial State credientials

Every Raspberry Pi Oracle Weather Station school will shortly be receiving the credentials needed to start streaming their data to Initial State. If you’re super keen though, please email [email protected] with a photo of your Oracle Weather Station, and I’ll let you jump the queue!

The Initial State folks are big fans of Raspberry Pi and have a ton of Pi-related projects on their website. They even included shout-outs to us in the music video they made to celebrate the publication of their 50th tutorial. Can you spot their weather station?

Your home-brew weather station

If you’ve built your own Raspberry Pi–powered weather station and would like to dabble with the Initial State dashboards, you’re in luck! The team at Initial State is offering 14-day trials for everyone. For more information on Initial State, and to sign up for the trial, check out their website.

The post Visualising Weather Station data with Initial State appeared first on Raspberry Pi.

Using taxies to monitor air quality in Peru

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/air-quality-peru/

When James Puderer moved to Lima, Peru, his roadside runs left a rather nasty taste in his mouth. Hit by the pollution from old diesel cars in the area, he decided to monitor the air quality in his new city using Raspberry Pis and the abundant taxies as his tech carriers.

Taxi Datalogger – Assembly

How to assemble the enclosure for my Taxi Datalogger project: https://www.hackster.io/james-puderer/distributed-air-quality-monitoring-using-taxis-69647e

Sensing air quality in Lima

Luckily for James, almost all taxies in Lima are equipped with the standard hollow vinyl roof sign seen in the video above, which makes them ideal for hacking.

Using a Raspberry Pi alongside various Adafuit tech including the BME280 Temperature/Humidity/Pressure Sensor and GPS Antenna, James created a battery-powered retrofit setup that fits snugly into the vinyl sign.

The schematic of the air quality monitor tech inside the taxi sign

With the onboard tech, the device collects data on longitude, latitude, humidity, temperature, pressure, and airborne particle count, feeding it back to an Android Things datalogger. This data is then pushed to Google IoT Core, where it can be remotely accessed.

Next, the data is processed by Google Dataflow and turned into a BigQuery table. Users can then visualize the collected measurements. And while James uses Google Maps to analyse his data, there are many tools online that will allow you to organise and study your figures depending on what final result you’re hoping to achieve.

A heat map of James' local area showing air quality

James hopped in a taxi and took his monitor on the road, collecting results throughout the journey

James has provided the complete build process, including all tech ingredients and code, on his Hackster.io project page, and urges makers to create their own air quality monitor for their local area. He also plans on building upon the existing design by adding a 12V power hookup for connecting to the taxi, functioning lights within the sign, and companion apps for drivers.

Sensing the world around you

We’ve seen a wide variety of Raspberry Pi projects using sensors to track the world around us, such as Kasia Molga’s Human Sensor costume series, which reacts to air pollution by lighting up, and Clodagh O’Mahony’s Social Interaction Dress, which she created to judge how conversation and physical human interaction can be scored and studied.

Human Sensor

Kasia Molga’s Human Sensor — a collection of hi-tech costumes that react to air pollution within the wearer’s environment.

Many people also build their own Pi-powered weather stations, or use the Raspberry Pi Oracle Weather Station, to measure and record conditions in their towns and cities from the roofs of schools, offices, and homes.

Have you incorporated sensors into your Raspberry Pi projects? Share your builds in the comments below or via social media by tagging us.

The post Using taxies to monitor air quality in Peru appeared first on Raspberry Pi.

The possibilities of the Sense HAT

Post Syndicated from Janina Ander original https://www.raspberrypi.org/blog/sense-hat-projects/

Did you realise the Sense HAT has been available for over two years now? Used by astronauts on the International Space Station, the exact same hardware is available to you on Earth. With a new Astro Pi challenge just launched, it’s time for a retrospective/roundup/inspiration post about this marvellous bit of kit.

Sense HAT attached to Pi and power cord

The Sense HAT on a Pi in full glory

The Sense HAT explained

We developed our scientific add-on board to be part of the Astro Pi computers we sent to the International Space Station with ESA astronaut Tim Peake. For a play-by-play of Astro Pi’s history, head to the blog archive.

Astro Pi logo with starry background

Just to remind you, this is all the cool stuff our engineers have managed to fit onto the HAT:

  • A gyroscope (sensing pitch, roll, and yaw)
  • An accelerometer
  • A magnetometer
  • Sensors for temperature, humidity, and barometric pressure
  • A joystick
  • An 8×8 LED matrix

You can find a roundup of the technical specs here on the blog.

How to Sense HAT

It’s easy to begin exploring this device: take a look at our free Getting started with the Sense HAT resource, or use one of our Code Club Sense HAT projects. You can also try out the emulator, available offline on Raspbian and online on Trinket.

Sense HAT emulator on Trinket

The Sense HAT emulator on trinket.io

Fun and games with the Sense HAT

Use the LED matrix and joystick to recreate games such as Pong or Flappy Bird. Of course, you could also add sensor input to your game: code an egg drop game or a Magic 8 Ball that reacts to how the device moves.

Sense HAT Random Sparkles

Create random sparkles on the Sense HAT

Once December rolls around, you could brighten up your home with a voice-controlled Christmas tree or an advent calendar on your Sense HAT.

If you like the great outdoors, you could also use your Sense HAT to recreate this Hiking Companion by Marcus Johnson. Take it with you on your next hike!

Art with the Sense HAT

The LED matrix is perfect for getting creative. To draw something basic without having to squint at a Python list, use this app by our very own Richard Hayler. Feeling more ambitious? The MagPi will teach you how to create magnificent pixel art. Ben Nuttall has created this neat little Python script for displaying a photo taken by the Raspberry Pi Camera Module on the Sense HAT.

Brett Haines Mathematica on the Sense HAT

It’s also possible to incorporate Sense HAT data into your digital art! The Python Turtle module and the Processing language are both useful tools for creating beautiful animations based on real-world information.

A Sense HAT project that also uses this principle is Giorgio Sancristoforo’s Tableau, a ‘generative music album’. This device creates music according to the sensor data:

Tableau Generative Album

“There is no doubt that, as music is removed by the phonographrecord from the realm of live production and from the imperative of artistic activity and becomes petrified, it absorbs into itself, in this process of petrification, the very life that would otherwise vanish.”

Science with the Sense HAT

This free Essentials book from The MagPi team covers all the Sense HAT science basics. You can, for example, learn how to measure gravity.

Cropped cover of Experiment with the Sense HAT book

Our online resource shows you how to record the information your HAT picks up. Next you can analyse and graph your data using Mathematica, which is included for free on Raspbian. This resource walks you through how this software works.

If you’re seeking inspiration for experiments you can do on our Astro Pis Izzy and Ed on the ISS, check out the winning entries of previous rounds of the Astro Pi challenge.

Thomas Pesquet with Ed and Izzy

Thomas Pesquet with Ed and Izzy

But you can also stick to terrestrial scientific investigations. For example, why not build a weather station and share its data on your own web server or via Weather Underground?

Your code in space!

If you’re a student or an educator in one of the 22 ESA member states, you can get a team together to enter our 2017-18 Astro Pi challenge. There are two missions to choose from, including Mission Zero: follow a few guidelines, and your code is guaranteed to run in space!

The post The possibilities of the Sense HAT appeared first on Raspberry Pi.

The Weather Station and the eclipse

Post Syndicated from Richard Hayler original https://www.raspberrypi.org/blog/weather-station-eclipse/

As everyone knows, one of the problems with the weather is that it can be difficult to predict a long time in advance. In the UK we’ve had stormy conditions for weeks but, of course, now that I’ve finished my lightning detector, everything has calmed down. If you’re planning to make scientific measurements of a particular phenomenon, patience is often required.

Oracle Weather Station

Wake STEM ECH get ready to safely observe the eclipse

In the path of the eclipse

Fortunately, this wasn’t a problem for Mr Burgess and his students at Wake STEM Early College High School in Raleigh, North Carolina, USA. They knew exactly when the event they were interested in studying was going to occur: they were going to use their Raspberry Pi Oracle Weather Station to monitor the progress of the 2017 solar eclipse.

Wake STEM EC HS on Twitter

Through the @Celestron telescope #Eclipse2017 @WCPSS via @stemburgess

Measuring the temperature drop

The Raspberry Pi Oracle Weather Stations are always active and recording data, so all the students needed to do was check that everything was connected and working. That left them free to enjoy the eclipse, and take some amazing pictures like the one above.

You can see from the data how the changes in temperature lag behind the solar events – this makes sense, as it takes a while for the air to cool down. When the sun starts to return, the temperature rise continues on its pre-eclipse trajectory.

Oracle Weather Station

Weather station data 21st Aug: the yellow bars mark the start and end of the eclipse, the red bar marks the maximum sun coverage.

Reading Mr Burgess’ description, I’m feeling rather jealous. Being in the path of the Eclipse sounds amazing: “In North Carolina we experienced 93% coverage, so a lot of sunlight was still shining, but the landscape took on an eerie look. And there was a cool wind like you’d experience at dusk, not at 2:30 pm on a hot summer day. I was amazed at the significant drop in temperature that occurred in a small time frame.”

Temperature drop during Eclipse Oracle Weather Station.

Close up of data showing temperature drop as recorded by the Raspberry Pi Oracle Weather Station. The yellow bars mark the start and end of the eclipse, the red bar marks the maximum sun coverage.

 Weather Station in the classroom

I’ve been preparing for the solar eclipse for almost two years, with the weather station arriving early last school year. I did not think about temperature data until I read about citizen scientists on a NASA website,” explains Mr Burgess, who is now in his second year of working with the Raspberry Pi Oracle Weather Station. Around 120 ninth-grade students (ages 14-15) have been involved with the project so far. “I’ve found that students who don’t have a strong interest in meteorology find it interesting to look at real data and figure out trends.”

Wake STEM EC Raspberry Pi Oracle Weather Station installation

Wake STEM EC Raspberry Pi Oracle Weather Station installation

As many schools have discovered, Mr Burgess found that the biggest challenge with the Weather Station project “was finding a suitable place to install the weather station in a place that could get power and Ethernet“. To help with this problem, we’ve recently added two new guides to help with installing the wind sensors outside and using WiFi to connect the kit to the Internet.

Raspberry Pi Oracle Weather Station

If you want to keep up to date with all the latest Raspberry Pi Oracle Weather Station activities undertaken by our network of schools around the world, make sure you regularly check our weather station forum. Meanwhile, everyone at Wake STEM ECH is already starting to plan for their next eclipse on Monday, April 8, 2024. I wonder if they’d like some help with their Weather Station?

The post The Weather Station and the eclipse appeared first on Raspberry Pi.

timeShift(GrafanaBuzz, 1w) Issue 1

Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2017/06/23/timeshiftgrafanabuzz-1w-issue-1/

Introducing timeShift

TimeShift is a new blog series we’ve created to provide a weekly curated list of links and articles centered around Grafana and the growing Grafana community. Each week we come across great articles from people who have written about how they are using Grafana, how to build effective dashboards, and a lot of discussion about the state of open source monitoring. We want to collect this information in one place and post an article every Friday afternoon highlighting some of this great content.

From the Blogosphere

We see a lot of articles covering the devops side of monitoring, but it’s interesting to see how people are using Grafana for different use cases.

Plugins and Dashboards

We are excited that there have been over 100,000 plugin installations since we launched the new plugable architecture in Grafana v3. You can discover and install plugins in your own on-premises or Hosted Grafana instance from our website. Below are some recent additions and updates.

Carpet plot A varient of the heatmap graph panel with additional display options.

DalmatinerDB No-fluff, purpose-built metric database.

Gnocchi This plugin was renamed. Users should uninstall the old version and install this new version.

This week’s MVC (Most Valuable Contributor)

Each week we’ll recognize a Grafana contributor and thank them for all of their PRs, bug reports and feedback. A majority of fixes and improvements come from our fantastic community!

thuck (Denis Doria)

Thank you for all of your PRs!

What do you think?

Anything in particular you’d like to see in this series of posts? Too long? Too short? Boring as shit? Let us know. Comment on this article below, or post something at our community forum. With your help, we can make this a worthwhile resource.

Follow us on Twitter, like us on Facebook, and join the Grafana Labs community.

NeoPixel Temperature Stair Lights

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/neopixel-temperature-stair-lights/

Following a post-Christmas decision to keep illuminated decorations on her stairway bannister throughout the year, Lorraine Underwood found a new purpose for a strip of NeoPixels she had lying around.

Lorraine Underwood on Twitter

Changed the stair lights from a string to a strip & they look awesome! #neopixel #raspberrypi https://t.co/dksLwy1SE1

Simply running the lights up the stairs, blinking and flashing to a random code, wasn’t enough for her. By using an API to check the outdoor weather, Lorraine’s lights went from decorative to informative: they now give an indication of outside weather conditions through their colour and the quantity illuminated.

“The idea is that more lights will light up as it gets warmer,” Lorraine explains. “The temperature is checked every five minutes (I think that may even be a little too often). I am looking forward to walking downstairs to a nice warm yellow light instead of the current blue!”

In total, Lorraine had 240 lights in the strip; she created a chart indicating a range of outside temperatures and the quantity of lights which for each value, as well as specifying the colour of those lights, running from chilly blue through to scorching red.

Lorraine Underwood Neopixel stair way lights

Oh, Lorraine! We love your optimistic dreams of the British summer being more than its usual rainy 16 Celsius…

The lights are controlled by a Raspberry Pi Zero running a code that can be found on Lorraine’s blog. The code dictates which lights are lit and when.

Lorraine Underwood Neopixel stair way lights

“Do I need a coat today? I’ll check the stairs.”

Lorraine is planning some future additions to the build, including a toddler-proof 3D housing, powering the Zero from the lights’ power supply, and gathering her own temperature data instead of relying on a third-party API.

While gathering the temperature data from outside her house, she may also want to look into building an entire weather station, collecting extra data on rain, humidity, and wind conditions. After all, this is the UK: just because it’s hot outside, it doesn’t mean it’s not also raining.

The post NeoPixel Temperature Stair Lights appeared first on Raspberry Pi.

2017: inspiring young makers and supporting educators

Post Syndicated from Philip Colligan original https://www.raspberrypi.org/blog/2017-inspiring-young-makers-educators/

By any measure, the Raspberry Pi Foundation had a fantastic 2016. We ended the year with over 11 million Raspberry Pi computers sold, millions of people using our learning resources, almost 1,000 Certified Educators in the UK and US, 75,000 children regularly attending over 5,000 Code Clubs in the UK, hundreds of Raspberry Jams taking place all over the world, code written by schoolkids running in space (yes, space), and much, much more.

Tim Peake on Twitter

Fantastic to see 5,000 active Code Clubs in the UK, helping over 75,000 young people learn to code. https://t.co/OyShrUzAhI @Raspberry_Pi https://t.co/luFj1qgzvQ

As I’ve said before, what we achieve is only possible thanks to the amazing community of makers, educators, volunteers, and young people all over the world who share our mission and support our work. You’re all awesome: thank you.

So here we are, just over a week into the New Year, and I thought it might be a good time to share with you some of what we’ve got planned for 2017.

Young digital makers

At the core of our mission is getting more young people excited about computing, and learning how to make things with computers. That was the original inspiration for the Raspberry Pi computer and it remains our number-one objective.

One of the ways we do that is through Code Club, a network of after-school clubs for 9- 11-year-olds run by teachers and volunteers. It’s already one of the largest networks of after-school clubs in the world, and this year we’ll be working with our existing partners in Australia, Bangladesh, Brazil, Canada, Croatia, France, Hong Kong, New Zealand, and Ukraine, as well as finding more partners in more countries, to bring Code Club to many more children.

Code Club

This year also sees the launch of Pioneers, our new programme for teen digital makers. It’s built around a series of challenges that will inspire young people to make things with technology and share their makes with the world. Check out the first challenge here, and keep watching the hashtag #MakeYourIdeas across your favourite social media platforms.

This is Pioneers #MakeYourIdeas

UPDATE – The first challenge is now LIVE. Head here for more information https://www.youtube.com/watch?v=OCUzza7LJog Woohoo! Get together, get inspired, and get thinking. We’re looking for Pioneers to use technology to make something awesome. Get together in a team or on your own, post online to show us how you’re getting on, and then show the world your build when you’re done.

We’re also expanding our space programme Astro Pi, with 250 teams across Europe currently developing code that will be run on the ISS by ESA French Astronaut Thomas Pesquet. And, building on our Weather Station project, we’re excited to be developing new ideas for citizen science programmes that get more young people involved in computing.

European Astro Pi Challenge – Code your experiment

British ESA astronaut Tim Peake is safely back on Earth now, but French ESA astronaut Thomas Pesquet is onboard the ISS, keen to see what students from all over Europe can do with the Astro Pi units too.

Supporting educators

Another big part of our work is supporting educators who are bringing computing and digital making into the classroom, and this year we’re going to be doing even more to help them.

Certified Educators

We’ll continue to grow our community of official Raspberry Pi Certified Educators, with Picademy training programmes in the UK and US. Watch out for those dates coming soon. We’re also opening up our educator training to a much wider audience through a series of online courses in partnership with FutureLearn. The first two courses are open for registration now, and we’ve got plans to develop and run more courses throughout the year, so if you’re an educator, let us know what you would find most useful.

We’re also really excited to be launching a brand-new free resource for educators later this month in partnership with CAS, the grass-roots network of computing educators. For now, it’s top-secret, but if you’re in the Bett Arena on 25 January, you’ll be the first to hear all about it.

Free educational resources

One of the most important things we do at Pi Towers is create the free educational resources that are used in Code Clubs, STEM clubs, CoderDojos, classrooms, libraries, makerspaces, and bedrooms by people of all ages learning about computing and digital making. We love making these resources and we know that you love using them. This year, we want to make them even more useful.

resources

As a first step, later this month we will share our digital making curriculum, which explains how we think about learning and progression, and which provides the structure for our educational resources and programmes. We’re publishing it so that we can get feedback to make it better, but we also hope that it will be used by other organisations creating educational resources.

We’re also working hard behind the scenes to improve the content and presentation of our learning resources. We want to include more diverse content like videos, make it easier for users to track their own progress, and generally make the experience more interactive and social. We’re looking forward to sharing that work and getting your feedback over the next few months.

Community

Last, but by no means least, we will continue to support and grow the community around our mission. We’ll be doing even more outreach, with ever more diverse groups, and doing much more to support the Raspberry Jam organisers and others who do so much to involve people in the digital making movement.

Birthday Bash

The other big community news is that we will be formally establishing ourselves as a charity in the US, which will provide the foundation (see what I did there?) for a serious expansion of our charitable activities and community in North America.


As you can see, we’ve got big plans for the year. Let me know what you think in the comments below and, if you’re excited about the mission, there’s lots of ways to get involved.

The post 2017: inspiring young makers and supporting educators appeared first on Raspberry Pi.

Computing and weather stations at Eastlea Community School

Post Syndicated from clive original https://www.raspberrypi.org/blog/eastlea-community-school/

In my day, you were lucky if you had some broken Clackers and a half-sucked, flocculent gobstopper in your trouser pockets. But here I am, half a century later, watching a swarm of school pupils running around the playground with entire computers attached to them.

Or microcontrollers, at least. This was Eastlea Community School’s Technology Day, and Steph and I had been invited along by ICT and computing teacher Mr Richards, a long-term Raspberry Pi forum member and Pi enthusiast. The day was a whole school activity, involving 930 pupils and 100 staff, showcasing how computing and technology can be used across the curriculum. In the playground, PE students had designed and coded micro:bits to measure all manner of sporting metrics. In physics, they were investigating g-forces. In the ICT and computing rooms, whole cohorts were learning to code. This was really innovative stuff.

shelves of awesome

All ICT classrooms should have shelves like this

A highlight of the tour was Mr Richard’s classroom, stuffed with electronics, robots, and hacking goodness, and pupils coming and going. It was a really creative space. Impressively, there are Raspberry Pis permanently installed on every desk, which is just how we envisaged it: a normal classroom tool for digital making.

pis on table

All this was amazing, and certainly the most impressive cross-curricular use of computing I’ve seen in a school. But having lived and breathed the Raspberry Pi Oracle weather station project for several months, I was really keen to see what they’d done with theirs. And it was a corker. Students from the computing club had built and set up the station in their lunch breaks, and installed it in a small garden area.

eastlea ws team

Mr Richards and the Eastlea Community School weather station team

Then they had hacked it, adding a solar panel, battery and WiFi. This gets round the problems of how to power the station and how to transfer data. The standard way is Power over Ethernet, which uses the same cable for power and data, but this is not always the optimal solution, depending on location. It’s not as simple as sticking a solar panel on a stick either. What happens when it’s cloudy? Will the battery recharge in winter? Mr Richards and his students have spent a lot of time investigating such questions, and it’s exactly the sort of problem-solving and engineering that we want to encourage. Also, we love hacking.

eastlea weather station garden

Not content with these achievements, they plan to add a camera to monitor wildlife and vegetation, perhaps tying it in with the weather data. They’re also hoping to install another weather station elsewhere, so that they can compare the data and investigate the school microclimate in more detail. The weather station itself will be used for teaching and learning this September.

Eastlea Community School’s weather station really is a showcase for the project, and we’d like to thank Mr Richards and his students for working so hard on it. If you want to learn more about solar panels and other hacks, then head over to our weather station forum.


Weather station update

The remaining weather station kits have started shipping to schools this week! We sent an email out recently for people to confirm delivery addresses, and if you’ve done this you should have yours soon. If you were offered a weather station last year and have not had an email from us in the last few weeks (early July), then please contact us immediately at [email protected].

The post Computing and weather stations at Eastlea Community School appeared first on Raspberry Pi.

Aquarium lighting and weather system

Post Syndicated from Liz Upton original https://www.raspberrypi.org/blog/aquarium-weather-system/

We spotted this aquarium project on YouTube, and were struck with searing pangs of fishy jealousy; imagine having a 2000-litre slice of the Cayman Islands, complete with the weather as it is right now, in your living room.

Aquarium

aMGee has equipped his (enormous) tropical fish tank, full of corals as well as fish, with an IoT Raspberry Pi weather system. It polls a weather station in the Cayman Islands every two minutes and duplicates that weather in the tank: clouds; wind speed and direction; exact sunset and sunrise times; and moon phase, including the direction the moon travels across the tank.

Screen Shot 2016-05-11 at 14.23.34

The setup uses three 100W and 18 20W multi-chip leds, which are controlled separately by an Arduino that lives on top of the lamp. There’s also a web interface, just in case you feel like playing Thor.

DIY LED aquarium lighting with real time weather simulation

DIY LED aquarium lighting project for my reef tank. The 660 watts fixture simulates the weather from Cayman Islands in real time. 3 x 100 watts and 18 x 20 watts multi-chip leds controlled separately by an arduino sitting on the lamp).

If you want to learn more, aMGee answers questions about the build (which, sadly, doesn’t have a how-to attached) at the Reef Central forums.

It’s a beautiful project, considerably less expensive (and more satisfying) than any off-the-shelf equivalent; and a really lovely demonstration of meaningful IoT. Thanks aMGee!

The post Aquarium lighting and weather system appeared first on Raspberry Pi.

Anomaly Detection Using PySpark, Hive, and Hue on Amazon EMR

Post Syndicated from Veronika Megler original https://blogs.aws.amazon.com/bigdata/post/Tx2642DKK75JBP8/Anomaly-Detection-Using-PySpark-Hive-and-Hue-on-Amazon-EMR

Veronika Megler, Ph.D., is a Senior Consultant with AWS Professional Services

We are surrounded by more and more sensors – some of which we’re not even consciously aware. As sensors become cheaper and easier to connect, they create an increasing flood of data that’s getting cheaper and easier to store and process.

However, sensor readings are notoriously “noisy” or “dirty”. To produce meaningful analyses, we’d like to identify anomalies in the sensor data and remove them before we perform further analysis. Or we may wish to analyze the anomalies themselves, as they may help us understand how our system really works or how our system is changing. For example, throwing away (more and more) high temperature readings in the Arctic because they are “obviously bad data” would cause us to miss the warming that is happening there.

The use case for this post is from the domain of road traffic: freeway traffic sensors. These sensors report three measures (called “features”): speed, volume, and occupancy, each of which are sampled several times a minute (see “Appendix: Measurement Definitions” at the end of this post for details on measurement definitions). Each reading from the sensors is called an observation. Sensors of different types (radar, in-road, Bluetooth) are often mixed in a single network and may be installed in varied configurations. For in-road sensors, there’s often a separate sensor in each lane; in freeways with a “carpool” lane, that lane will have different traffic characteristics from the others. Different sections of the freeway may have different traffic characteristics, such as rush hour on the inbound vs. outbound side of the freeway.

Thus, anomaly detection is frequently an iterative process where the system, as represented by the data from the sensors, must first be segmented in some way and “normal” characterized for each part of the system, before variations from that “normal” can be detected. After these variations or anomalies are removed, we can perform various analyses of the cleaned data such as trend analysis, model creation, and predictions. This post describes how two popular and powerful open-source technologies, Spark and Hive, were used to detect anomalies in data from a network of traffic sensors. While it’s based on real usage (see "References" at the end of this post), here you’ll work with similar, anonymized data.

The same characteristics and challenges apply to many other sensor networks. Specific examples I’ve worked with include weather stations, such as Weather Underground (www.wunderground.com), that report temperature, air pressure, humidity, wind and rainfall, amongst other things; ocean observatories such as CMOP (http://www.stccmop.org/datamart/observation_network) that collect physical, geochemical and biological observations; and satellite data from NOAA (http://www.nodc.noaa.gov/).

Detecting anomalies

An anomaly in a sensor network may be a single variable with an unreasonable reading (speed = 250 m.p.h.; for a thermometer, air temperature = 200F). However, each traffic sensor reading has several features (speed, volume, occupancy). There can be situations where each reading itself has a reasonable value, but the combination itself is highly unlikely (an anomaly). For traffic sensors, a speed of more than 100 m.p.h. is possible during times of low congestion (that is, low occupancy and low volume) but extremely unlikely during a traffic jam.

Many of these “valid” or “invalid” combinations are situational, as is the case here. Common combinations often have descriptive terms, such as “traffic jam”, “congested traffic”, or “light traffic”. These terms are representative of a commonly seen combination of characteristics, which would be represented in the data as a cluster of observations.

So, to detect anomalies: First, identify the common situations (as represented by a large cluster of similar combinations of features), and then identify observations that are sufficiently different from those clusters. You essentially apply two methods from basic statistics: clustering using the most common algorithm, k-means. Then, measure the distance from each observation to the closest cluster, and classify those “far away” as being anomalies. (Note that other anomaly detection techniques exist, some of which could be used against the same data, but would reflect a different model or understanding of the problem.)

This post walks through the three major steps:

Clustering the data.

Choosing the number of clusters.

Detecting probable anomalies.

For the project, you process the data using Spark, Hive, and Hue on an Amazon EMR cluster, reading input data from an Amazon S3 bucket.

Clustering the data

To perform k-means clustering, you first need to know how many clusters exist in the data. However, in most cases, as is true here, you don’t know the “right” number to use. A common solution is to repeatedly cluster the data, each time using a different number (“k”) of clusters. For each “k”,  calculate a metric: the sum of the squared distance of each point from its closest cluster center, known as the Within Set Sum of Squared Error (WSSSE). (My code extends this sample.) The smaller the WSSSE, the better your clustering is considered to be – within limits, as more clusters will almost always give a smaller WSSSE but having more clusters may distract rather than add to your analysis.

Here, the input data is a CSV format file stored in an S3 bucket. Each row contains a single observation taken by a specific sensor at a specific time, and consists of 9 numeric values. There are two versions of the input:   

s3://vmegler/traffic/sensorinput/, with 24M rows

s3://vmegler/traffic/sensorinputsmall/, an extract with 50,000 rows

In this post, I show how to run the programs here with the smaller input. However, the exact same code runs over the 24M row input. Here’s the Hive SQL definition for the input:

CREATE EXTERNAL TABLE sensorinput (
highway int, — highway id
sensorloc int, — one sensor location may have
— multiple sensors, e.g. for different highway lanes
sensorid int, — sensor id
dayofyear bigint, — yyyyddd
dayofweek bigint, — 0=Sunday, 1=Monday, etc
time decimal(10,2), — seconds since midnight
— e.g. a value of 185.67 is 3:05:67 a.m.
volume int, — a count
speed int, — average, in m.p.h.
occupancy int — a count
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’ LINES TERMINATED BY ‘n’
LOCATION ‘s3://vmegler/traffic/sensorinput/’;

Start an EMR cluster in us-west-2 (where this bucket is located), specifying Spark, Hue, Hive, and Ganglia. (For more information, see Getting Started: Analyzing Big Data with Amazon EMR.)  I’ve run the same program in two different clusters: a small cluster with 1 master and 2 core nodes, all m3.xlarge; and a larger cluster, with 1 master and 8 core nodes, all m4.mxlarge.

Spark has two interfaces that can be used to run a Spark/Python program: an interactive interface, pyspark, and batch submission via spark-submit. I generally begin my projects by reviewing my data and testing my approach interactively in pyspark, while logged on to the cluster master. Then, I run my completed program using spark-submit (see also Submitting User Applications with spark-submit). After the program is ready to operationalize, I start submitting the jobs as steps to a running cluster using the AWS CLI for EMR or from a script such as a Python script using Boto3 to interface to EMR, with appropriate parameterization.

I’ve written two PySpark programs: one to repeatedly cluster the data and calculate the WSSSE using different numbers of clusters (kmeanswsssey.py); and a second one (kmeansandey.py) to calculate the distances of each observation from the closest cluster. The other parts of the anomaly detection—choosing the number of clusters to use, and deciding which observations are the outliers—are performed interactively, using Hue and Hive. I also provide a file (traffic-hive.hql), with the table definitions and sample queries.

For simplicity, I’ll describe how to run the programs using spark-submit while logged on to the master instance console.

To prepare the cluster for executing your programs, install some Python packages:

sudo yum install python-numpy python-scipy -y 

Copy the programs from S3 onto the master node’s local disk; I often run this way while I’m still editing the programs and experimenting with slightly different variations:  

aws s3 cp s3://vmegler/traffic/code/kmeanswsssey.py /home/hadoop
aws s3 cp s3://vmegler/traffic/code/kmeansandey.py /home/hadoop
aws s3 cp s3://vmegler/traffic/code/traffic-hive.hql /home/hadoop

My first PySpark program (kmeanswsssey.py) calculates WSSSE repeatedly, starting with 1 cluster (k=1), then for 2 clusters, and so on, up to some maximum k that you define. It outputs a CSV file; for each k, it appends a set of lines containing the WSSSE and some statistics that describe each of the clusters. This program takes 3 arguments: the input file location, the maximum k to use, and a prefix to prepend to the output file for this run for when I’m testing multiple variations:  <infile> <maxk> <runId> <outfile>. For example:

spark-submit /home/hadoop/kmeanswsssey.py s3://vmegler/traffic/sensorinputsmall/ 10 run1 s3:///sensorclusterssmall/

When run on the small cluster with the small input, this program took around 5 minutes. The same program, run on the 24M row input on the larger cluster, took 2.5 hours. Running the large input on the smaller cluster produces correct results, but takes over 24 hours to complete.

Choosing the number of clusters

Next, review the clustering results and choose the number of clusters to use for the actual anomaly detection.

A common and easy way to do that is to graph the WSSSE calculated for each k, and to choose “the knee in the curve”. That is, look for a point where the total distance has dropped sufficiently that increasing the number of clusters does not drop the WSSSE by much. If you’re very lucky, each cluster has characteristics that match your mental model for the problem domain such as low speed, high occupancy, and high volume, matching “congested traffic”.

Here you use Hue and Hive, conveniently selected when you started the cluster, for data exploration and simple graphing. In Hue’s Hive Query Editor, define a table that describes the output file you created in the previous step. Here, I’m pointing to a precomputed version calculated over the larger dataset:

CREATE EXTERNAL TABLE kcalcs (
run string,
wssse decimal(20,3),
k decimal,
clusterid decimal,
clustersize decimal,
volcntr decimal(10,1),
spdcntr decimal(10,1),
occcntr decimal(10,1),
volstddev decimal(10,1),
spdstddev decimal(10,1),
occstddev decimal(10,1)
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’ LINES TERMINATED BY ‘n’
LOCATION ‘s3://vmegler/traffic/sensorclusters/’
tblproperties ("skip.header.line.count"="2");

To decide how many clusters to use for the next step, use Hue to plot a line graph of the number of clusters versus WSSSE. First, select the information to display:

SELECT DISTINCT k, wssse FROM kcalcs ORDER BY k;

In the results panel, choose Chart. Choose the icon representing a line graph, choose “k” for X-Axis and “wssse” for Y-Axis, and Hue builds the following chart. Hover your cursor above a particular bar, and Hue shows the value of the X and Y axis for that bar. 

Chart built by Hue

For the “best number of clusters”, you’re looking for the “knee in the curve”: the place where going to a higher number of clusters does not significantly reduce the total distance function (WSSE). For this data, it looks as though around 4 is a good choice, as the gains of going to 5 or 6 clusters looks minimal.

You can explore the characteristics of the identified clusters with the following SELECT statement:

SELECT DISTINCT k, clusterid, clustersize, volcntr, spdcntr, occcntr, volstddev, spdstddev, occstddev
FROM kcalcs
ORDER BY k, spdcntr;

By looking at, for example, the lines for three clusters (k=3), you can see a “congestion” cluster (17.1 m.p.h., occupancy 37.7 cars), a “free-flowing heavy-traffic” cluster, and a “light traffic” cluster (65.2 m.p.h., occupancy 5.1). With k=4, you still see the “congestion” and “fast, light traffic” clusters, but the “free-flowing heavy-traffic” cluster from k=3 has been split into two distinct clusters with very different occupancy. Choose to stay with 4 clusters.

Detecting anomalies

Use the following method with these clusters to identify anomalies:

Assign each sensor reading to the closest cluster.

Calculate the distance (using some measure) for each reading to the assigned cluster center.

Filter for the entries with a greater distance than some chosen threshold.

I like to use Mahalanobis distance as the distance measure, as it compensates for differences in units (speed in m.p.h., while volume and occupancy are counts), averages, and scales of the several features I’m clustering across. 

Run the second PySpark program, kmeansandey.py (you copied this program onto local disk earlier, during setup). Give this program the number of clusters to use, decided in the previous step, and the input data. For each input observation, this program does the following:

Identifies the closest cluster center.

Calculates the Mahalanobis distance from this observation to the closest center.

Creates an output record consisting of the original observation, plus the cluster number, the cluster center, and the distance.

The program takes the following parameters: <infile> <k> <outfile>. The output is a CSV file, placed in an S3 bucket of your choice. To run the program, use spark-submit:

spark-submit /mnt/var/kmeansandey.py s3://vmegler/traffic/sensorinputsmall/ 4 s3://<your_bucket>/sensoroutputsmall/

On the small cluster with the small input, this job finished in under a minute; on the bigger cluster, the 24M dataset took around 17 minutes. In the next step, you review the observations that are “distant” from the closest cluster as calculated by that distance calculation. Because these observations are unlike the majority of the other observations, they are considered outliers, and probable anomalies.

Exploring identified anomalies

Now you’re ready to look at the probable anomalies and decide whether they really should be considered anomalies. In Hive, you define a table that describes the output file created in the previous step. Use an output file from the S3 bucket, which contains the original 7 columns (sensorid through occupancy) plus 5 new ones (clusterid through maldist). The smaller dataset’s output is less interesting to explore as it only contains data from one sensor, so I’ve precomputed the output over the large dataset for this exploration. Here is the modified table definition:

CREATE EXTERNAL TABLE sensoroutput (
highway int, — highway id etc. (as before)

occupancy int, — a count
clusterid int, — cluster identifier
volcntr decimal(10,2), — cluster center, volume
spdcntr decimal(10,2), — cluster center, speed
occcntr decimal(10,2), — cluster center, occupancy
maldist decimal(10,2) — Mahalanobis distance to this cluster
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’ LINES TERMINATED BY ‘n’
LOCATION ‘s3://vmegler/traffic/sensoroutput/’;

Explore your results. To look at the number of observations assigned to each cluster for each sensor, try the following query:

SELECT sensorid, clusterid, concat(cast(sensorid AS string), ‘.’, cast(clusterid AS string)) AS senclust, count(*) AS howmany, max(maldist) AS dist
FROM sensoroutput
GROUP BY sensorid, clusterid
ORDER BY sensorid, clusterid;

The “concat” statement creates a compound column, senclust, that you can use in Hue’s built-in graphing tool to compare the clusters visually for each sensor. For this chart, choose a bar graph, choose the compound column “senclust” for X-Axis and “howmany” for Y-Axis, and Hue builds the following chart.

You can now easily compare the sizes, and the largest and average distances for each cluster across the different sensors. The smaller clusters probably bear investigation; they either represent unusual traffic conditions, or a cluster of bad readings. Note that an additional cluster of known bad readings (0 speed, volume, and occupancy) was identified using a similar process during a prior run; these observations are all assigned to a dummy clusterid of “-1” and have a high maldist.

SELECT clusterid, volcntr, spdcntr, occcntr, count(*) AS num, max(maldist) AS maxmaldist, avg(maldist) AS avgmaldist,
stddev_pop(maldist) AS stddevmal
FROM sensoroutput
GROUP BY clusterid, volcntr, spdcntr, occcntr
ORDER BY spdcntr;

How do you choose the threshold for defining an observation as an anomaly? This is another black art. I chose 2.5 by a combination of standard practice, discussing the graphs, and looking at how much and which data I’d be throwing away by using that assumption.  To explore the distribution of outliers across the sensors, use a query like the following:

SELECT sensorid, clusterid, count(*) AS num_outliers, avg(spdcntr) AS spdcntr, avg(maldist) AS avgdist
FROM sensoroutput
WHERE maldist > 2.5
GROUP BY sensorid, clusterid
ORDER BY sensorid, clusterid;

The number of outliers varies quite a bit by sensor and cluster. You can explore the 100 entries for sensor 44, cluster 2:

SELECT *
FROM sensoroutput
WHERE maldist > 2.5
AND sensorid = 44 AND clusterid = 0
ORDER BY maldist desc
LIMIT 100;

The query results show some entries that look reasonable (volume 6, occupancy 1), and others that look less so (volume of 3 and occupancy of 10). Depending on your intended use, you may decide that the number of observations that might not really be anomalies is small enough that you should just exclude all these entries – but perhaps you want to study these entries further to find a pattern, such as that this is a state that often occurs during transition times from one traffic pattern to another.

After you understand your clusters and the flagged “potential anomalies” sufficiently, you can choose which observations to exclude from further analysis.

Conclusion

This post describes anomaly detection for sensor data, and works through a case of identifying anomalies in traffic sensor data. You’ve dived into some of the complexities that comes with deciding which subset of sensor data is dirty or not, and the tools used to ask those questions. I showed how an iterative approach is often needed, with each analysis leading to further questions and further analyses.

In the real use case (see "References" below), we iteratively clustered subsets of the data: for different highways, days of the week, different sensor types, and so on, to understand the data and anomalies. We’ve seen here some of the challenges in deciding whether or not something is an anomaly in the data, or an anomaly in our approach. We used Amazon EMR, along with Apache Spark, Apache Hive and Hue to implement the approach and explore the results, allowing us to quickly experiment with a number of alternative clusters before settling on the combination that we felt best identified the real anomalies in our data.

Now, you can move forward: providing “clean data” to the business users; combining this data with weather, school holiday calendars, and sporting events to identify the causes of specific traffic patterns and pattern changes; and then using that model to predict future traffic conditions.

Appendix: Measurement Definitions

Volume measures how many vehicles have passed this sensor during the given time period. Occupancy measures the number of vehicles at the sensor at the measurement time. The combination of volume and occupancy gives a view of overall traffic density. For example: if the traffic is completely stopped, a sensor may have very high occupancy – many vehicles sitting at the sensor – but a volume close to 0, as very few vehicles have passed the sensor. This is a common circumstance for sensors at freeway entrances that limit freeway entry, often via lights that only permit one car from one lane to pass every few seconds.

Note that different sensor types may have different capabilities to detect these situations, such as radar vs. in-road sensors, and different sensors types or models may have different defaults for how they report various situations. For example, “0,0,0” may mean no traffic, or known bad data, or assumed bad data based on hard limits, such as traffic above a specific density (ouch!). Thus sensor type, capability, and context are all important factors in identifying “bad data”. In this study, the analysis of which sensors were “similar enough” for the data to be analyzed together was performed prior to data extract. The anomaly detection steps described here were performed separately for each set of similar sensors, as defined by the pre-analysis.

References

V. M. Megler, K. A. Tufte, and D. Maier, “Improving Data Quality in Intelligent Transportation Systems (Technical Report),” Portland  Or., Jul-2015 [Online]. Available: http://arxiv.org/abs/1602.03100

If you have questions or suggestions, please leave a comment below.

—————————–

Related

Large-Scale Machine Learning with Spark on Amazon EMR

 

Anomaly Detection Using PySpark, Hive, and Hue on Amazon EMR

Post Syndicated from Veronika Megler original https://blogs.aws.amazon.com/bigdata/post/Tx2642DKK75JBP8/Anomaly-Detection-Using-PySpark-Hive-and-Hue-on-Amazon-EMR

Veronika Megler, Ph.D., is a Senior Consultant with AWS Professional Services

We are surrounded by more and more sensors – some of which we’re not even consciously aware. As sensors become cheaper and easier to connect, they create an increasing flood of data that’s getting cheaper and easier to store and process.

However, sensor readings are notoriously “noisy” or “dirty”. To produce meaningful analyses, we’d like to identify anomalies in the sensor data and remove them before we perform further analysis. Or we may wish to analyze the anomalies themselves, as they may help us understand how our system really works or how our system is changing. For example, throwing away (more and more) high temperature readings in the Arctic because they are “obviously bad data” would cause us to miss the warming that is happening there.

The use case for this post is from the domain of road traffic: freeway traffic sensors. These sensors report three measures (called “features”): speed, volume, and occupancy, each of which are sampled several times a minute (see “Appendix: Measurement Definitions” at the end of this post for details on measurement definitions). Each reading from the sensors is called an observation. Sensors of different types (radar, in-road, Bluetooth) are often mixed in a single network and may be installed in varied configurations. For in-road sensors, there’s often a separate sensor in each lane; in freeways with a “carpool” lane, that lane will have different traffic characteristics from the others. Different sections of the freeway may have different traffic characteristics, such as rush hour on the inbound vs. outbound side of the freeway.

Thus, anomaly detection is frequently an iterative process where the system, as represented by the data from the sensors, must first be segmented in some way and “normal” characterized for each part of the system, before variations from that “normal” can be detected. After these variations or anomalies are removed, we can perform various analyses of the cleaned data such as trend analysis, model creation, and predictions. This post describes how two popular and powerful open-source technologies, Spark and Hive, were used to detect anomalies in data from a network of traffic sensors. While it’s based on real usage (see "References" at the end of this post), here you’ll work with similar, anonymized data.

The same characteristics and challenges apply to many other sensor networks. Specific examples I’ve worked with include weather stations, such as Weather Underground (www.wunderground.com), that report temperature, air pressure, humidity, wind and rainfall, amongst other things; ocean observatories such as CMOP (http://www.stccmop.org/datamart/observation_network) that collect physical, geochemical and biological observations; and satellite data from NOAA (http://www.nodc.noaa.gov/).

Detecting anomalies

An anomaly in a sensor network may be a single variable with an unreasonable reading (speed = 250 m.p.h.; for a thermometer, air temperature = 200F). However, each traffic sensor reading has several features (speed, volume, occupancy). There can be situations where each reading itself has a reasonable value, but the combination itself is highly unlikely (an anomaly). For traffic sensors, a speed of more than 100 m.p.h. is possible during times of low congestion (that is, low occupancy and low volume) but extremely unlikely during a traffic jam.

Many of these “valid” or “invalid” combinations are situational, as is the case here. Common combinations often have descriptive terms, such as “traffic jam”, “congested traffic”, or “light traffic”. These terms are representative of a commonly seen combination of characteristics, which would be represented in the data as a cluster of observations.

So, to detect anomalies: First, identify the common situations (as represented by a large cluster of similar combinations of features), and then identify observations that are sufficiently different from those clusters. You essentially apply two methods from basic statistics: clustering using the most common algorithm, k-means. Then, measure the distance from each observation to the closest cluster, and classify those “far away” as being anomalies. (Note that other anomaly detection techniques exist, some of which could be used against the same data, but would reflect a different model or understanding of the problem.)

This post walks through the three major steps:

Clustering the data.

Choosing the number of clusters.

Detecting probable anomalies.

For the project, you process the data using Spark, Hive, and Hue on an Amazon EMR cluster, reading input data from an Amazon S3 bucket.

Clustering the data

To perform k-means clustering, you first need to know how many clusters exist in the data. However, in most cases, as is true here, you don’t know the “right” number to use. A common solution is to repeatedly cluster the data, each time using a different number (“k”) of clusters. For each “k”,  calculate a metric: the sum of the squared distance of each point from its closest cluster center, known as the Within Set Sum of Squared Error (WSSSE). (My code extends this sample.) The smaller the WSSSE, the better your clustering is considered to be – within limits, as more clusters will almost always give a smaller WSSSE but having more clusters may distract rather than add to your analysis.

Here, the input data is a CSV format file stored in an S3 bucket. Each row contains a single observation taken by a specific sensor at a specific time, and consists of 9 numeric values. There are two versions of the input:   

s3://vmegler/traffic/sensorinput/, with 24M rows

s3://vmegler/traffic/sensorinputsmall/, an extract with 50,000 rows

In this post, I show how to run the programs here with the smaller input. However, the exact same code runs over the 24M row input. Here’s the Hive SQL definition for the input:

CREATE EXTERNAL TABLE sensorinput (
highway int, — highway id
sensorloc int, — one sensor location may have
— multiple sensors, e.g. for different highway lanes
sensorid int, — sensor id
dayofyear bigint, — yyyyddd
dayofweek bigint, — 0=Sunday, 1=Monday, etc
time decimal(10,2), — seconds since midnight
— e.g. a value of 185.67 is 3:05:67 a.m.
volume int, — a count
speed int, — average, in m.p.h.
occupancy int — a count
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’ LINES TERMINATED BY ‘n’
LOCATION ‘s3://vmegler/traffic/sensorinput/’;

Start an EMR cluster in us-west-2 (where this bucket is located), specifying Spark, Hue, Hive, and Ganglia. (For more information, see Getting Started: Analyzing Big Data with Amazon EMR.)  I’ve run the same program in two different clusters: a small cluster with 1 master and 2 core nodes, all m3.xlarge; and a larger cluster, with 1 master and 8 core nodes, all m4.mxlarge.

Spark has two interfaces that can be used to run a Spark/Python program: an interactive interface, pyspark, and batch submission via spark-submit. I generally begin my projects by reviewing my data and testing my approach interactively in pyspark, while logged on to the cluster master. Then, I run my completed program using spark-submit (see also Submitting User Applications with spark-submit). After the program is ready to operationalize, I start submitting the jobs as steps to a running cluster using the AWS CLI for EMR or from a script such as a Python script using Boto3 to interface to EMR, with appropriate parameterization.

I’ve written two PySpark programs: one to repeatedly cluster the data and calculate the WSSSE using different numbers of clusters (kmeanswsssey.py); and a second one (kmeansandey.py) to calculate the distances of each observation from the closest cluster. The other parts of the anomaly detection—choosing the number of clusters to use, and deciding which observations are the outliers—are performed interactively, using Hue and Hive. I also provide a file (traffic-hive.hql), with the table definitions and sample queries.

For simplicity, I’ll describe how to run the programs using spark-submit while logged on to the master instance console.

To prepare the cluster for executing your programs, install some Python packages:

sudo yum install python-numpy python-scipy -y 

Copy the programs from S3 onto the master node’s local disk; I often run this way while I’m still editing the programs and experimenting with slightly different variations:  

aws s3 cp s3://vmegler/traffic/code/kmeanswsssey.py /home/hadoop
aws s3 cp s3://vmegler/traffic/code/kmeansandey.py /home/hadoop
aws s3 cp s3://vmegler/traffic/code/traffic-hive.hql /home/hadoop

My first PySpark program (kmeanswsssey.py) calculates WSSSE repeatedly, starting with 1 cluster (k=1), then for 2 clusters, and so on, up to some maximum k that you define. It outputs a CSV file; for each k, it appends a set of lines containing the WSSSE and some statistics that describe each of the clusters. This program takes 3 arguments: the input file location, the maximum k to use, and a prefix to prepend to the output file for this run for when I’m testing multiple variations:  <infile> <maxk> <runId> <outfile>. For example:

spark-submit /home/hadoop/kmeanswsssey.py s3://vmegler/traffic/sensorinputsmall/ 10 run1 s3:///sensorclusterssmall/

When run on the small cluster with the small input, this program took around 5 minutes. The same program, run on the 24M row input on the larger cluster, took 2.5 hours. Running the large input on the smaller cluster produces correct results, but takes over 24 hours to complete.

Choosing the number of clusters

Next, review the clustering results and choose the number of clusters to use for the actual anomaly detection.

A common and easy way to do that is to graph the WSSSE calculated for each k, and to choose “the knee in the curve”. That is, look for a point where the total distance has dropped sufficiently that increasing the number of clusters does not drop the WSSSE by much. If you’re very lucky, each cluster has characteristics that match your mental model for the problem domain such as low speed, high occupancy, and high volume, matching “congested traffic”.

Here you use Hue and Hive, conveniently selected when you started the cluster, for data exploration and simple graphing. In Hue’s Hive Query Editor, define a table that describes the output file you created in the previous step. Here, I’m pointing to a precomputed version calculated over the larger dataset:

CREATE EXTERNAL TABLE kcalcs (
run string,
wssse decimal(20,3),
k decimal,
clusterid decimal,
clustersize decimal,
volcntr decimal(10,1),
spdcntr decimal(10,1),
occcntr decimal(10,1),
volstddev decimal(10,1),
spdstddev decimal(10,1),
occstddev decimal(10,1)
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’ LINES TERMINATED BY ‘n’
LOCATION ‘s3://vmegler/traffic/sensorclusters/’
tblproperties ("skip.header.line.count"="2");

To decide how many clusters to use for the next step, use Hue to plot a line graph of the number of clusters versus WSSSE. First, select the information to display:

SELECT DISTINCT k, wssse FROM kcalcs ORDER BY k;

In the results panel, choose Chart. Choose the icon representing a line graph, choose “k” for X-Axis and “wssse” for Y-Axis, and Hue builds the following chart. Hover your cursor above a particular bar, and Hue shows the value of the X and Y axis for that bar. 

Chart built by Hue

For the “best number of clusters”, you’re looking for the “knee in the curve”: the place where going to a higher number of clusters does not significantly reduce the total distance function (WSSE). For this data, it looks as though around 4 is a good choice, as the gains of going to 5 or 6 clusters looks minimal.

You can explore the characteristics of the identified clusters with the following SELECT statement:

SELECT DISTINCT k, clusterid, clustersize, volcntr, spdcntr, occcntr, volstddev, spdstddev, occstddev
FROM kcalcs
ORDER BY k, spdcntr;

By looking at, for example, the lines for three clusters (k=3), you can see a “congestion” cluster (17.1 m.p.h., occupancy 37.7 cars), a “free-flowing heavy-traffic” cluster, and a “light traffic” cluster (65.2 m.p.h., occupancy 5.1). With k=4, you still see the “congestion” and “fast, light traffic” clusters, but the “free-flowing heavy-traffic” cluster from k=3 has been split into two distinct clusters with very different occupancy. Choose to stay with 4 clusters.

Detecting anomalies

Use the following method with these clusters to identify anomalies:

Assign each sensor reading to the closest cluster.

Calculate the distance (using some measure) for each reading to the assigned cluster center.

Filter for the entries with a greater distance than some chosen threshold.

I like to use Mahalanobis distance as the distance measure, as it compensates for differences in units (speed in m.p.h., while volume and occupancy are counts), averages, and scales of the several features I’m clustering across. 

Run the second PySpark program, kmeansandey.py (you copied this program onto local disk earlier, during setup). Give this program the number of clusters to use, decided in the previous step, and the input data. For each input observation, this program does the following:

Identifies the closest cluster center.

Calculates the Mahalanobis distance from this observation to the closest center.

Creates an output record consisting of the original observation, plus the cluster number, the cluster center, and the distance.

The program takes the following parameters: <infile> <k> <outfile>. The output is a CSV file, placed in an S3 bucket of your choice. To run the program, use spark-submit:

spark-submit /mnt/var/kmeansandey.py s3://vmegler/traffic/sensorinputsmall/ 4 s3://<your_bucket>/sensoroutputsmall/

On the small cluster with the small input, this job finished in under a minute; on the bigger cluster, the 24M dataset took around 17 minutes. In the next step, you review the observations that are “distant” from the closest cluster as calculated by that distance calculation. Because these observations are unlike the majority of the other observations, they are considered outliers, and probable anomalies.

Exploring identified anomalies

Now you’re ready to look at the probable anomalies and decide whether they really should be considered anomalies. In Hive, you define a table that describes the output file created in the previous step. Use an output file from the S3 bucket, which contains the original 7 columns (sensorid through occupancy) plus 5 new ones (clusterid through maldist). The smaller dataset’s output is less interesting to explore as it only contains data from one sensor, so I’ve precomputed the output over the large dataset for this exploration. Here is the modified table definition:

CREATE EXTERNAL TABLE sensoroutput (
highway int, — highway id etc. (as before)

occupancy int, — a count
clusterid int, — cluster identifier
volcntr decimal(10,2), — cluster center, volume
spdcntr decimal(10,2), — cluster center, speed
occcntr decimal(10,2), — cluster center, occupancy
maldist decimal(10,2) — Mahalanobis distance to this cluster
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’ LINES TERMINATED BY ‘n’
LOCATION ‘s3://vmegler/traffic/sensoroutput/’;

Explore your results. To look at the number of observations assigned to each cluster for each sensor, try the following query:

SELECT sensorid, clusterid, concat(cast(sensorid AS string), ‘.’, cast(clusterid AS string)) AS senclust, count(*) AS howmany, max(maldist) AS dist
FROM sensoroutput
GROUP BY sensorid, clusterid
ORDER BY sensorid, clusterid;

The “concat” statement creates a compound column, senclust, that you can use in Hue’s built-in graphing tool to compare the clusters visually for each sensor. For this chart, choose a bar graph, choose the compound column “senclust” for X-Axis and “howmany” for Y-Axis, and Hue builds the following chart.

You can now easily compare the sizes, and the largest and average distances for each cluster across the different sensors. The smaller clusters probably bear investigation; they either represent unusual traffic conditions, or a cluster of bad readings. Note that an additional cluster of known bad readings (0 speed, volume, and occupancy) was identified using a similar process during a prior run; these observations are all assigned to a dummy clusterid of “-1” and have a high maldist.

SELECT clusterid, volcntr, spdcntr, occcntr, count(*) AS num, max(maldist) AS maxmaldist, avg(maldist) AS avgmaldist,
stddev_pop(maldist) AS stddevmal
FROM sensoroutput
GROUP BY clusterid, volcntr, spdcntr, occcntr
ORDER BY spdcntr;

How do you choose the threshold for defining an observation as an anomaly? This is another black art. I chose 2.5 by a combination of standard practice, discussing the graphs, and looking at how much and which data I’d be throwing away by using that assumption.  To explore the distribution of outliers across the sensors, use a query like the following:

SELECT sensorid, clusterid, count(*) AS num_outliers, avg(spdcntr) AS spdcntr, avg(maldist) AS avgdist
FROM sensoroutput
WHERE maldist > 2.5
GROUP BY sensorid, clusterid
ORDER BY sensorid, clusterid;

The number of outliers varies quite a bit by sensor and cluster. You can explore the 100 entries for sensor 44, cluster 2:

SELECT *
FROM sensoroutput
WHERE maldist > 2.5
AND sensorid = 44 AND clusterid = 0
ORDER BY maldist desc
LIMIT 100;

The query results show some entries that look reasonable (volume 6, occupancy 1), and others that look less so (volume of 3 and occupancy of 10). Depending on your intended use, you may decide that the number of observations that might not really be anomalies is small enough that you should just exclude all these entries – but perhaps you want to study these entries further to find a pattern, such as that this is a state that often occurs during transition times from one traffic pattern to another.

After you understand your clusters and the flagged “potential anomalies” sufficiently, you can choose which observations to exclude from further analysis.

Conclusion

This post describes anomaly detection for sensor data, and works through a case of identifying anomalies in traffic sensor data. You’ve dived into some of the complexities that comes with deciding which subset of sensor data is dirty or not, and the tools used to ask those questions. I showed how an iterative approach is often needed, with each analysis leading to further questions and further analyses.

In the real use case (see "References" below), we iteratively clustered subsets of the data: for different highways, days of the week, different sensor types, and so on, to understand the data and anomalies. We’ve seen here some of the challenges in deciding whether or not something is an anomaly in the data, or an anomaly in our approach. We used Amazon EMR, along with Apache Spark, Apache Hive and Hue to implement the approach and explore the results, allowing us to quickly experiment with a number of alternative clusters before settling on the combination that we felt best identified the real anomalies in our data.

Now, you can move forward: providing “clean data” to the business users; combining this data with weather, school holiday calendars, and sporting events to identify the causes of specific traffic patterns and pattern changes; and then using that model to predict future traffic conditions.

Appendix: Measurement Definitions

Volume measures how many vehicles have passed this sensor during the given time period. Occupancy measures the number of vehicles at the sensor at the measurement time. The combination of volume and occupancy gives a view of overall traffic density. For example: if the traffic is completely stopped, a sensor may have very high occupancy – many vehicles sitting at the sensor – but a volume close to 0, as very few vehicles have passed the sensor. This is a common circumstance for sensors at freeway entrances that limit freeway entry, often via lights that only permit one car from one lane to pass every few seconds.

Note that different sensor types may have different capabilities to detect these situations, such as radar vs. in-road sensors, and different sensors types or models may have different defaults for how they report various situations. For example, “0,0,0” may mean no traffic, or known bad data, or assumed bad data based on hard limits, such as traffic above a specific density (ouch!). Thus sensor type, capability, and context are all important factors in identifying “bad data”. In this study, the analysis of which sensors were “similar enough” for the data to be analyzed together was performed prior to data extract. The anomaly detection steps described here were performed separately for each set of similar sensors, as defined by the pre-analysis.

References

V. M. Megler, K. A. Tufte, and D. Maier, “Improving Data Quality in Intelligent Transportation Systems (Technical Report),” Portland  Or., Jul-2015 [Online]. Available: http://arxiv.org/abs/1602.03100

If you have questions or suggestions, please leave a comment below.

—————————–

Related

Large-Scale Machine Learning with Spark on Amazon EMR

 

Raspberry Pi Oracle Weather Stations shipped

Post Syndicated from clive original https://www.raspberrypi.org/blog/weather-stations-shipped/

Big brown boxes
If this blog was an Ealing comedy, it would be a speeded-up montage of an increasingly flustered postman delivering huge numbers of huge boxes to school reception desks across the land. At the end, they’d push their cap up at a jaunty angle and wipe their brow with a large spotted handkerchief. With squeaky sound effects.
Over the past couple of days, huge brown boxes have indeed been dropping onto the counters of school receptions across the UK, and they contain something wonderful— a Raspberry Pi Oracle Weather Station.
DJCS on Twitter
Code club students building a weather station kindly donated by the @Raspberry_Pi foundation thanks @clivebeale pic.twitter.com/yGQP4BQ6SP

This week, we sent out the first batch of Weather Station kits to 150 UK schools. Yesterday – World Meteorological Day, of course! – they started to appear in the wild.
DHFS Computing Dept on Twitter
The next code club project has just arrived! Can’t wait to get stuck in! @Raspberry_Pi @clivebeale pic.twitter.com/axA7wJ1RMF

Pilot “lite”
We’re running the UK delivery as a short pilot scheme. With almost 1000 schools involved worldwide, it will give us a chance us to tweak software and resources, and to get a feel for how we can best support schools.  In the next few weeks, we’ll send out the remainder of the weather stations. We’ll have a good idea of when this will be next week, when the first kits have been in schools for a while.
Once all the stations are shipped, we’ll be extending and expanding our teaching and learning resources. In particular, we would like resources for big data management and visualisation, and for non-computing subjects such as geography.  And, of course, if you make any of your own we’d love to see them.
BWoodhead Primary on Twitter
Super exciting raspberry pi weather station arrived, very lucky to be one of the 150 uk schools @rasberrypi pic.twitter.com/ZER0RPKqIf

 “Just” a milestone
This is a big milestone for the project, but it’s not the end by any means. In fact, it’s just the beginning as schools start to build their stations, using them to investigate the weather and to learn. We’re hoping to see and encourage lots of collaboration between schools. We started the project back in 2014. Over time, it’s easy to take any project for granted, so it was brilliant to see the excitement of teachers and students when they received their kit.
Stackpole V.C School on Twitter
We were really excited to receive our @Raspberry_Pi weather station today. Indoor trial tomorrow. @clivebeale pic.twitter.com/7fsI7DYCYg

It’s been a fun two years, and if you’ve opened a big brown box this morning and found a weather station inside, we think you’ll agree that it’s been worth the wait.
Building and setting up your weather station
The weather station page has tutorials for building the hardware and setting up the software for your weather station, along with a scheme of work for teachers and other resources.
Getting involved
The community is hugely important to us and whether you’ve just received a weather station or not, we’d love to hear from you.  The best way to get involved is to come to the friendly Weather Station corner of our forums and say hi. This is also the place to get help and to share ideas. If you’re tweeting, then you can reach us @raspberry_pi or on the hashtag #weatherstation – thanks!
BA Science on Twitter
Our weather station has arrived!Thanks to @Raspberry_Pi now need some students to help us build it! @BromptonAcademy pic.twitter.com/8qZPG3JTaQ

Buying the kit
We’re often asked if we’ll be selling the kits. We’re currently looking into this and hope that they will be commercially available at some point. I’d love to see a Raspberry Pi Weather Station attached to every school – it’s a project that genuinely engages students across many subjects. In addition, the data gathered from thousands of weather stations, all sending data back to a central database, would be really useful.
That’s all for now
But now that the kits are shipped there’ll be lots going on, so expect more news soon. And do pop into the forums for a chat.
Thanks
As well as the talented and lovely folk at Pi Towers, we’ve only made it this far with the help of others. At risk of turning into a mawkish awards ceremony speech, a few shout-outs are needed:
Oracle for their generous funding and the database support, especially Nicole at Oracle Giving, Jane at Oracle Academy, and Jeff who built our Apex database.
Rachel, Kevin and Team @cpc_tweet for the kit build (each kit has around 80 parts!) and amazing logistics support.
@HackerJimbo for sterling software development and the disk image.
If I’ve missed you out, it doesn’t mean I don’t love you.
The post Raspberry Pi Oracle Weather Stations shipped appeared first on Raspberry Pi.