Tag Archives: sensors

Forums project: humidity-controlled cellar ventilation

Post Syndicated from Liz Upton original https://www.raspberrypi.org/blog/forums-project-humidity-controlled-cellar-ventilation/

The Raspberry Pi official forums are the central online meeting place for the Raspberry Pi community. They’re where you’ll find support from hundreds of thousands (141,183, as of this morning) of other Pi users, including people from our own engineering team; lots of inspiration for your own projects, and loads of advice. You can chat to a selection of those of us who work at Raspberry Pi there too – we’re usually poking around in there for part of the day.

Commence to poking

Commence to poking.

We found this rather brilliant hack to ventilate and maintain a cellar’s humidity on the forums. Forum member DasManul, from Frankfurt, put this together to measure temperature and relative humidity inside and outside his cellar, and to use those values to calculate absolute humidity. The setup then ventilates the space if the humidity inside is higher than it is outside.

On reading what he was up to, I assumed DasManul was looking after a cellarful of wine. Then I saw his pictures. He’s actually tending bottles of fabric softener and yoghurt.


Nestled next to the nonalcoholic liquids, you’ll find a touchscreen controller for the system, along with a USB receiver. Here’s a closer look at the display:


(For non-German speakers, DasManul says: “Keller – cellar, Aussen – outside, TP (Taupunkt) – dew point, RF/AF (Relative/Absolute Feuchte) – relative/absolute humidity, Lüfter – fan, An/Aus – on/off”.)

The system also outputs more detailed graphs (daily, weekly, or monthly) to a website served by ngnix, which allows you to control the system remotely if you don’t happen to be down in the cellar conditioning your fabric.

1Mavg 1w

DasManul says that he’s not much for hardware tinkering, and didn’t want to start drilling into his house’s infrastructure, so he used off-the-shelf parts for sensing and controlling. Two inexpensive wireless sensors, one indoors and one outdoors, from elv.de, do all the work checking the humidity; they feed information to the USB receiver, and intake and exhaust fans are controlled with an Energenie plug strip. (These things are great – I use an Energenie plug strip to turn lamps on and off via a remote PIR sensor in my living room).

DasManul has made all the code available (with German and English documentation) over at BitBucket so you can replicate the project. There’s plenty more like this over at the Raspberry Pi Forums – get stuck in!


The post Forums project: humidity-controlled cellar ventilation appeared first on Raspberry Pi.

Month in Review: June 2016

Post Syndicated from Andy Werth original https://blogs.aws.amazon.com/bigdata/post/Tx2ZWGEI8MNGY51/Month-in-Review-June-2016

Lots to see on the Big Data Blog in June! Please take a look at the summaries below for something that catches your interest.

Use Sqoop to Transfer Data from Amazon EMR to Amazon RDS
Customers commonly process and transform vast amounts of data with EMR and then transfer and store summaries or aggregates of that data in relational databases such as MySQL or Oracle. In this post, learn how to transfer data using Apache Sqoop, a tool designed to transfer data between Hadoop and relational databases.

Analyze Realtime Data from Amazon Kinesis Streams Using Zeppelin and Spark Streaming
Streaming data is everywhere. This includes clickstream data, data from sensors, data emitted from billions of IoT devices, and more. Not surprisingly, data scientists want to analyze and explore these data streams in real time. This post shows you how you can use Spark Streaming to process data coming from Amazon Kinesis streams, build some graphs using Zeppelin, and then store the Zeppelin notebook in Amazon S3.

Processing Amazon DynamoDB Streams Using the Amazon Kinesis Client Library
This post demystifies the KCL by explaining some of its important configurable properties and estimate its resource consumption

Apache Tez Now Available with Amazon EMR
Amazon EMR has added Apache Tez version 0.8.3 as a supported application in release 4.7.0. Tez is an extensible framework for building batch and interactive data processing applications on top of Hadoop YARN. This post helps you get started.

Use Apache Oozie Workflows to Automate Apache Spark Jobs (and more!) on Amazon EMR
The AWS Big Data Blog has a large community of authors who are passionate about Apache Spark and who regularly publish content that helps customers use Spark to build real-world solutions. You’ll see content on a variety of topics, including deep-dives on Spark’s internals, building Spark Streaming applications, creating machine learning pipelines using MLlib, and ways to apply Spark to various real-world use cases.

JOIN Amazon Redshift AND Amazon RDS PostgreSQL WITH dblink
In this post, you’ll learn how easy it is to create a master key in KMS, encrypt data either client-side or server-side, upload it to S3, and have EMR seamlessly read and write that encrypted data to and from S3 using the master key that you created.


Running R on AWS (July 2015) Learn how to install and run R, RStudio Server, and Shiny Server on AWS.


Want to learn more about Big Data or Streaming Data? Check out our Big Data and Streaming data educational pages.

Leave a comment below to let us know what big data topics you’d like to see next on the AWS Big Data Blog.

Analyze Realtime Data from Amazon Kinesis Streams Using Zeppelin and Spark Streaming

Post Syndicated from Manjeet Chayel original https://blogs.aws.amazon.com/bigdata/post/Tx3K805CZ8WFBRP/Analyze-Realtime-Data-from-Amazon-Kinesis-Streams-Using-Zeppelin-and-Spark-Strea

Manjeet Chayel is a Solutions Architect with AWS

There is streaming data everywhere. This includes clickstream data, data from sensors, data emitted from billions of IoT devices, and more. Not suprisingly, data scientists want to analyze and explore these data streams in real time. This post shows you how you can use Spark Streaming to process data coming from Amazon Kinesis streams, build some graphs using Zeppelin, and then store the Zeppelin notebook in Amazon S3.

Zeppelin overview

Apache Zeppelin is an open source GUI which creates interactive and collaborative notebooks for data exploration using Spark. You can use Scala, Python, SQL (using Spark SQL), or HiveQL to manipulate data and quickly visualize results.

Zeppelin notebooks can be shared among several users, and visualizations can be published to external dashboards. Zeppelin uses the Spark settings on your cluster and can use Spark’s dynamic allocation of executors to let YARN estimate the optimal resource consumption.

With the latest Zeppelin release (0.5.6) included on Amazon EMR 4.7.0, you can now import notes using links to S3 JSON files, raw file URLs in GitHub, or local files. You can also download a note as a JSON file. This new functionality makes it easier to save and share Zeppelin notes, and it allows you to version your notes during development. The import feature is located on the Zeppelin home screen, and the export feature is located on the toolbar for each note.

Additionally, you can still configure Zeppelin to store its entire notebook file in S3 by adding a configuration for zeppelin-env when creating your cluster (just make sure you have already created the bucket in S3 before creating your cluster).

Streaming data walkthrough

To use this post to play around with streaming data, you need an AWS account and AWS CLI configured on your machine. The entire pattern can be implemented in few simple steps:

  1. Create an Amazon Kinesis stream.
  2. Spin up an EMR cluster with Hadoop, Spark, and Zeppelin applications from advanced options.
  3. Use a Simple Java producer to push random IoT events data into the Amazon Kinesis stream.
  4. Connect to the Zeppelin notebook.
  5. Import the Zeppelin notebook from GitHub.
  6. Analyze and visualize the streaming data.

We’ll look at each of these steps below.

Create a Kinesis stream

First, create a simple Amazon Kinesis stream, “spark-demo,” with two shards. For more information, see Creating a Stream.

Spin up an EMR cluster with Hadoop, Spark, and Zeppelin

Edit the software settings for Zeppelin by copying and pasting the configuration below. Replace the bucket name “demo-s3-bucket” with your S3 bucket name. Note: you do not have to specify S3://. This configuration sets S3 as the notebook storage location and adds the Amazon Kinesis Client Library (KCL) to the environment.

               "SPARK_SUBMIT_OPTIONS" : '"$SPARK_SUBMIT_OPTIONS --packages org.apache.spark:spark-streaming-kinesis-asl_2.10:1.6.0 --conf spark.executorEnv.PYTHONPATH=/usr/lib/spark/python/lib/py4j-0.9-src.zip:/usr/lib/spark/python/:<CPS>{{PWD}}/pyspark.zip<{{PWD}}>/py4j-0.9-src.zip --conf spark.yarn.isPython=true"'



It takes a few minutes for the cluster to start and change to the “Waiting” state.

While this is happening, you can configure your machine to view web interfaces on the cluster. For more information, see View Web Interfaces Hosted on Amazon EMR Clusters.

Use a simple Java producer to push random IoT events into the Amazon Kinesis stream

I have implemented a simple Java producer application, using the Kinesis Producer Library, which ingests random IoT sensor data into the “spark-demo” Amazon Kinesis stream.

Download the JAR and run it from your laptop or EC2 instance (this requires Java8):

java –jar KinesisProducer.jar

Data is pushed in CSV format:


Note: If you are using an EC2 instance, make sure that it has the required permissions to push the data into the Amazon Kinesis stream.

Connect to the Zeppelin notebook

There are several ways to connect to the UI on the master node. One method is to use a proxy extension to the browser. To learn how, see Option 2, Part 2: Configure Proxy Settings to View Websites Hosted on the Master Node.

To reach the web interfaces, you must establish an SSH tunnel with the master node using either dynamic or local port forwarding. If you establish an SSH tunnel using dynamic port forwarding, you must also configure a proxy server to view the web interface.

The following command opens dynamic port forwarding on port 8157 to the EMR master node. After running it, enable FoxyProxy on your browser using the steps in Configure FoxyProxy for Firefox.

ssh -i <<YOUR-KEY-PAIR>> -ND 8157 [email protected]<<EMR-MASTER-DNS>>>

Import the Zeppelin notebook from GitHub

In Zeppelin, choose Import note and Add from URL to import the notebook from the AWS Big Data blog GitHub repository

Analyze and visualize streaming data

After you import the notebook, you’ll see a few lines of code and some sample SQL as paragraphs. The code in the notebook reads the data from your “spark-demo” Amazon Kinesis stream in batches of 5 seconds (this period can be modified) and stores the data into a temporary Spark table.

After the streaming context has started, Spark starts reading data from streams and populates the temporary table. You can run your SQL queries on this table.

import …   

val endpointUrl = "https://kinesis.us-east-1.amazonaws.com"
val credentials = new DefaultAWSCredentialsProviderChain().getCredentials()
    require(credentials != null,
      "No AWS credentials found. Please specify credentials using one of the methods specified " +
        "in http://docs.aws.amazon.com/AWSSdkDocsJava/latest/DeveloperGuide/credentials.html")
    val kinesisClient = new AmazonKinesisClient(credentials)
    val numShards = kinesisClient.describeStream("spark-demo").getStreamDescription().getShards().size

val numStreams = numShards

//Setting batch interval to 5 seconds
val batchInterval = Seconds(5)
val kinesisCheckpointInterval = batchInterval
val regionName = RegionUtils.getRegionByEndpoint(endpointUrl).getName()

val ssc = new StreamingContext(sc, batchInterval)

 // Create the DStreams
    val kinesisStreams = (0 until numStreams).map { i =>
      KinesisUtils.createStream(ssc, "app-spark-demo", "spark-demo", endpointUrl, regionName,InitialPositionInStream.LATEST, kinesisCheckpointInterval, StorageLevel.MEMORY_AND_DISK_2)

// Union all the streams
val unionStreams = ssc.union(kinesisStreams)

//Schema of the incoming data on the stream
val schemaString = "device_id,temperature,timestamp"

//Parse the data in DStreams
val tableSchema = StructType( schemaString.split(",").map(fieldName => StructField(fieldName, StringType, true)))

//Processing each RDD and storing it in temporary table
 unionStreams.foreachRDD ((rdd: RDD[Array[Byte]], time: Time) => {
  val rowRDD = rdd.map(w => Row.fromSeq(new String(w).split(",")))
  val wordsDF = sqlContext.createDataFrame(rowRDD,tableSchema)

Example SQL:

SELECT device_id,timestamp, avg(temperature) AS avg_temp
FROM realtimetable  
GROUP BY device_id,timestamp 
ORDER BY timestamp

You can also use pie charts.

To modify the processing logic in the foreachRDD block, gracefully stop the streaming context, re-run the foreach paragraph, and re-start the streaming context.


In this post, I’ve showed you how to use Spark Streaming from a Zeppelin notebook and directly analyze the incoming streaming data. After the analysis you can terminate the cluster; the data is available in the S3 bucket that you configured during cluster creation. I hope you’ve seen how easy it is to use Spark Streaming, Kinesis, and Zeppelin to uncover and share the business intelligence in your streaming data. Please give the process in this post a try and let us know in the comments what your results were!

If you have questions or suggestions, please comment below.



Querying Amazon Kinesis Streams Directly with SQL and Spark Streaming



Semi-Autonomous Driving Using EC2 Spot Instances at Mapbox

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/semi-autonomous-driving-using-ec2-spot-instances-at-mapbox/

Will White of Mapbox shared the following guest post with me. In the post, Will describes how they use EC2 Spot Instances to economically process the billions of data points that they collect each day.

I do have one note to add to Will’s excellent post. We know that many AWS customers would like to create Spot Fleets that automatically scale up and down in response to changes in demand. This is on our near-term roadmap and I’ll have more to say about it before too long.


The largest automotive tech conference, TU-Automotive, kicked off in Detroit this morning with almost every conversation focused on strategies for processing the firehose of data coming off connected cars. The volume of data is staggering – last week alone we collected and processed over 100 million miles of sensor data into our maps.

Collecting Street Data
Rather than driving a fleet of cars down every street to make a map, we turn phones, cars, and other devices into a network of real-time sensors. EC2 Spot Instances process the billions of points we collect each day and let us see every street, analyze the speed of traffic, and connect the entire road network. This anonymized and aggregated data protects user privacy while allowing us to quickly detect road changes. The result is Mapbox Drive, the map built specifically for semi-autonomous driving, ride sharing, and connected cars.

Bidding for Spot Capacity
We use the Spot market to bid on spare EC2 instances, letting us scale our data collection and processing at 1/10th the cost. When you launch an EC2 Spot instance you set a bid price for how much you are willing to pay for the instance. The market price (the price you actually pay) constantly changes based on supply and demand in the market. If the market price ever exceeds your bid price, your EC2 Spot instance is terminated. Since spot instances can spontaneously terminate, they have become a popular cost-saving tool for non-critical environments like staging, QA, and R&D – services that don’t require high availability. However, if you can architect your application to handle this kind of sudden termination, it becomes possible to run extremely resource-intensive services on spot and save a massive amount of money while maintaining high availability.

The infrastructure that processes the 100 million miles of sensor data we collect each week is critical and must always be online, but it uses EC2 Spot Instances. We do it by running two Auto Scaling groups, a Spot group and an On-Demand group, that share a single Elastic Load Balancer. When Spot prices spike and instances get terminated, we simply fallback by automatically launching On-Demand instances to pick up the slack.

Handling Termination Notices
We use termination notices, which give us a two-minute warning before any EC2 Spot instance is terminated. When an instance receives a termination notice it immediately makes a call to the Auto Scaling API to scale up the On-Demand Auto Scaling group, seamlessly adding stable capacity to the Elastic Load Balancer. We have to pay On-Demand prices for the replacement EC2s, but only for as long as the Spot interruption lasts. When the Spot market price falls back below our bid price, our Spot Auto Scaling group will automatically launch new Spot instances. As the Spot capacity scales back up, an aggressive Auto Scaling policy scales down the On-Demand group, terminating the more expensive instances.

Building our data processing pipeline on Spot worked so well that we have now moved nearly every Mapbox service over to Spot too. As the traffic done by over 170 million unique users of apps like Foursquare, MapQuest, and Weather.com grows each month, our cost of goods sold (COGS) continues to fall. Spot interruptions are relatively rare for the instance types we use so the fallback is only triggered a 1-2 times per month. This means we are running on discounted Spot instances more than 98% of the time. On our maps service alone, this has resulted in an 90% savings on our EC2 costs each month.

Going Further with Spot
To further optimize our COGS we’re working on a “waterfall” approach to fallback, pushing traffic to other configurations of Spot Instances first and only using On-Demand as an absolute last resort. For example, an application that normally runs on c4.xlarge instances, is often compatible with other instance sizes in the same family (c4.2xlarge, c4.4xlarge, etc) and instance types in other families (m4.2xlarge, m4.4xlarge, etc). When our Spot EC2s get terminated, we’ll bid on the next cheapest option on the Spot market. This will result in more Spot interruptions, but our COGS decrease further because we’ll fallback on Spot instances instead of paying full price for On-Demand EC2 instances. This maximizes our COGS savings while maintaining high availability for our enterprise customers.

It’s worth noting that similar fallback functionality is built into EC2 Spot Fleet, but we prefer Auto Scaling groups due to a few limitations with Spot Fleet (for example, there’s no support for Auto Scaling without implementing it yourself) and because Auto Scaling groups give us the most flexibility.

Over the last 12 months, data collection and processing has increased our consumption of EC2 compute hours by 1044%, but our COGS actually decreased. We used to see our costs increase linearly with consumption, but now see these hockey stick increases in consumption while costs stay basically flat for the same period.

If you’re building a resource-hungry application that requires high availability and the costs for On-Demand EC2 instances make it unsustainable to run, take a close look at EC2 Spot Instances. Combined with the right architecture and some creative orchestration, EC2 Spot Instances will allow you to run your application with extremely low COGS.

Will White, Development Engineering, Mapbox

Identifying People from their Driving Patterns

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2016/05/identifying_peo_7.html

People can be identified from their “driver fingerprint“:

…a group of researchers from the University of Washington and the University of California at San Diego found that they could “fingerprint” drivers based only on data they collected from internal computer network of the vehicle their test subjects were driving, what’s known as a car’s CAN bus. In fact, they found that the data collected from a car’s brake pedal alone could let them correctly distinguish the correct driver out of 15 individuals about nine times out of ten, after just 15 minutes of driving. With 90 minutes driving data or monitoring more car components, they could pick out the correct driver fully 100 percent of the time.

The paper: “Automobile Driver Fingerprinting,” by Miro Enev, Alex Takahuwa, Karl Koscher, and Tadayoshi Kohno.

Abstract: Today’s automobiles leverage powerful sensors and embedded computers to optimize efficiency, safety, and driver engagement. However the complexity of possible inferences using in-car sensor data is not well understood. While we do not know of attempts by automotive manufacturers or makers of after-market components (like insurance dongles) to violate privacy, a key question we ask is: could they (or their collection and later accidental leaks of data) violate a driver’s privacy? In the present study, we experimentally investigate the potential to identify individuals using sensor data snippets of their natural driving behavior. More specifically we record the in-vehicle sensor data on the controller area-network (CAN) of a typical modern vehicle (popular 2009 sedan) as each of 15 participants (a) performed a series of maneuvers in an isolated parking lot, and (b) drove the vehicle in traffic along a defined ~50 mile loop through the Seattle metropolitan area. We then split the data into training and testing sets, train an ensemble of classifiers, and evaluate identification accuracy of test data queries by looking at the highest voted candidate when considering all possible one-vs-one comparisons. Our results indicate that, at least among small sets, drivers are indeed distinguishable using only in car sensors. In particular, we find that it is possible to differentiate our 15 drivers with 100% accuracy when training with all of the available sensors using 90% of driving data from each person. Furthermore, it is possible to reach high identification rates using less than 8 minutes of training data. When more training data is available it is possible to reach very high identification using only a single sensor (e.g., the brake pedal). As an extension, we also demonstrate the feasibility of performing driver identification across multiple days of data collection.

Repurposing Old Smartphones for Home Automation (Linux.com)

Post Syndicated from ris original http://lwn.net/Articles/688467/rss

Linux.com has an interview
with Dietrich Ayala
about using old smartphones for home automation.
Ayala spent a lot of time studying the readouts from sensors, as well as from the phone’s microphone, camera, and, radios, that would enable a remote user to draw conclusions about what was happening at home. This contextual information could then be codified into more useful notifications.

With ambient light, for example, if it suddenly goes dark in the daytime, maybe someone is standing over a device, explained Ayala. Feedback from the accelerometer can be analyzed to determine the difference between footsteps, an earthquake, or someone picking up the device. Scripts can use radio APIs to determine if a person moving around is carrying a phone with a potentially revealing Bluetooth signature.”

Arduino Web Editor and Cloud Platform – Powered by AWS

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/arduino-web-editor-and-cloud-platform-powered-by-aws/

Last night I spoke with Luca Cipriani from Arduino to learn more about the new AWS-powered Arduino Web Editor and Arduino Cloud Platform offerings. Luca was en-route to the Bay Area Maker Faire and we had just a few minutes to speak, but that was enough time for me to learn a bit about what they have built.

If you have ever used an Arduino, you know that there are several steps involved. First you need to connect the board to your PC’s serial port using a special cable (you can also use Wi-Fi if you have the appropriate add-on “shield”), ensure that the port is properly configured, and establish basic communication. Then you need to install, configure, and launch your development environment, make sure that it can talk to your Arduino, tell it which make and model of Arduino that you are using, and select the libraries that you want to call from your code. With all of that taken care of, you are ready to write code, compile it, and then download it to the board for debugging and testing.

Arduino Code Editor
Luca told me that the Arduino Code Editor was designed to simplify and streamline the setup and development process. The editor runs within your browser and is hosted on AWS (although we did not have time to get in to the details, I understand that they made good use of AWS Lambda and several other AWS services).

You can write and modify your code, save it to the cloud and optionally share it with your colleagues and/or friends. The editor can also detect your board (using a small native plugin) and configure itself accordingly; it even makes sure that you can only write code using libraries that are compatible with your board. All of your code is compiled in the cloud and then downloaded to your board for execution.

Here’s what the editor looks like (see Sneak Peek on the New, Web-Based Arduino Create for more):

Arduino Cloud Platform
Because Arduinos are small, easy to program, and consume very little power, they work well in IoT (Internet of Things) applications. Even better, it is easy to connect them to all sorts of sensors, displays, and actuators so that they can collect data and effect changes.

The new Arduino Cloud Platform is designed to simplify the task of building IoT applications that make use of Arduino technology. Connected devices will be able to be able to connect to the Internet, upload information derived from sensors, and effect changes upon command from the cloud. Building upon the functionality provided by AWS IoT, this new platform will allow devices to communicate with the Internet and with each other. While the final details are still under wraps, I believe that this will pave the wave for sensors to activate Lambda functions and for Lambda functions to take control of displays and actuators.

I look forward to learning more about this platform as the details become available!



Build your own Raspberry Pi terrarium controller

Post Syndicated from Helen Lynn original https://www.raspberrypi.org/blog/build-your-own-raspberry-pi-terrarium-controller/

Tom Bennet grows Nepenthes, tropical carnivorous plants that I know by the name of pitcher plants. To stay healthy they need a temperature- and humidity-controlled environment, and Tom ensures this by housing them in a terrarium controlled by a Raspberry Pi 3 and Energenie’s Pi-mote starter kit, which provides an easy way to control mains electrical sockets from a Pi. He has written step-by-step instructions to help you build your own terrarium controller, the first such guide we’ve seen for this particular application.

A terrarium in a cuboid glass tank with fluorescent lighting, containing six Nepenthes plants of various species

Nepenthes plants of various species in Tom Bennet’s Pi-controlled terrarium. Photo by Tom Bennet

Tom’s terrarium controller doesn’t only monitor and regulate temperature, humidity and light, three of the four main variables in a terrarium (the fourth, he explains, is water, and because terrariums tend to be nearly or completely sealed, this requires only infrequent intervention). It also logs data from its sensors to Internet-of-Things data platform ThingSpeak, which offers real-time data visualisation and alerts.

Line chart plotting terrarium temperature and humidity over a 24-hour period

24 hours’ worth of temperature and humidity data for Tom’s terrarium

One of the appealing aspects of this project, as Tom observes, is its capacity for extension. You could quite easily add a soil moisture sensor or, particularly for a terrarium that houses reptiles rather than plants, a camera module, as well as using the online data logs in all kinds of ways.

The very clear instructions include a full and costed bill of materials consisting of off-the-shelf parts that come to less than £90/$125 including the Pi. There are helpful photographs and wiring diagrams, straightforward explanations, practical advice, and Python scripts that can easily be adapted to meet the demands of different habitats and ambient conditions. Thank you for writing such a useful guide, Tom; we’re certain it will help plenty of other people set up their own Pi-controlled terrariums!

The post Build your own Raspberry Pi terrarium controller appeared first on Raspberry Pi.