Tag Archives: Social Media

Predict Billboard Top 10 Hits Using RStudio, H2O and Amazon Athena

Post Syndicated from Gopal Wunnava original https://aws.amazon.com/blogs/big-data/predict-billboard-top-10-hits-using-rstudio-h2o-and-amazon-athena/

Success in the popular music industry is typically measured in terms of the number of Top 10 hits artists have to their credit. The music industry is a highly competitive multi-billion dollar business, and record labels incur various costs in exchange for a percentage of the profits from sales and concert tickets.

Predicting the success of an artist’s release in the popular music industry can be difficult. One release may be extremely popular, resulting in widespread play on TV, radio and social media, while another single may turn out quite unpopular, and therefore unprofitable. Record labels need to be selective in their decision making, and predictive analytics can help them with decision making around the type of songs and artists they need to promote.

In this walkthrough, you leverage H2O.ai, Amazon Athena, and RStudio to make predictions on whether a song might make it to the Top 10 Billboard charts. You explore the GLM, GBM, and deep learning modeling techniques using H2O’s rapid, distributed and easy-to-use open source parallel processing engine. RStudio is a popular IDE, licensed either commercially or under AGPLv3, for working with R. This is ideal if you don’t want to connect to a server via SSH and use code editors such as vi to do analytics. RStudio is available in a desktop version, or a server version that allows you to access R via a web browser. RStudio’s Notebooks feature is used to demonstrate the execution of code and output. In addition, this post showcases how you can leverage Athena for query and interactive analysis during the modeling phase. A working knowledge of statistics and machine learning would be helpful to interpret the analysis being performed in this post.

Walkthrough

Your goal is to predict whether a song will make it to the Top 10 Billboard charts. For this purpose, you will be using multiple modeling techniques―namely GLM, GBM and deep learning―and choose the model that is the best fit.

This solution involves the following steps:

  • Install and configure RStudio with Athena
  • Log in to RStudio
  • Install R packages
  • Connect to Athena
  • Create a dataset
  • Create models

Install and configure RStudio with Athena

Use the following AWS CloudFormation stack to install, configure, and connect RStudio on an Amazon EC2 instance with Athena.

Launching this stack creates all required resources and prerequisites:

  • Amazon EC2 instance with Amazon Linux (minimum size of t2.large is recommended)
  • Provisioning of the EC2 instance in an existing VPC and public subnet
  • Installation of Java 8
  • Assignment of an IAM role to the EC2 instance with the required permissions for accessing Athena and Amazon S3
  • Security group allowing access to the RStudio and SSH ports from the internet (I recommend restricting access to these ports)
  • S3 staging bucket required for Athena (referenced within RStudio as ATHENABUCKET)
  • RStudio username and password
  • Setup logs in Amazon CloudWatch Logs (if needed for additional troubleshooting)
  • Amazon EC2 Systems Manager agent, which makes it easy to manage and patch

All AWS resources are created in the US-East-1 Region. To avoid cross-region data transfer fees, launch the CloudFormation stack in the same region. To check the availability of Athena in other regions, see Region Table.

Log in to RStudio

The instance security group has been automatically configured to allow incoming connections on the RStudio port 8787 from any source internet address. You can edit the security group to restrict source IP access. If you have trouble connecting, ensure that port 8787 isn’t blocked by subnet network ACLS or by your outgoing proxy/firewall.

  1. In the CloudFormation stack, choose Outputs, Value, and then open the RStudio URL. You might need to wait for a few minutes until the instance has been launched.
  2. Log in to RStudio with the and password you provided during setup.

Install R packages

Next, install the required R packages from the RStudio console. You can download the R notebook file containing just the code.

#install pacman – a handy package manager for managing installs
if("pacman" %in% rownames(installed.packages()) == FALSE)
{install.packages("pacman")}  
library(pacman)
p_load(h2o,rJava,RJDBC,awsjavasdk)
h2o.init(nthreads = -1)
##  Connection successful!
## 
## R is connected to the H2O cluster: 
##     H2O cluster uptime:         2 hours 42 minutes 
##     H2O cluster version:        3.10.4.6 
##     H2O cluster version age:    4 months and 4 days !!! 
##     H2O cluster name:           H2O_started_from_R_rstudio_hjx881 
##     H2O cluster total nodes:    1 
##     H2O cluster total memory:   3.30 GB 
##     H2O cluster total cores:    4 
##     H2O cluster allowed cores:  4 
##     H2O cluster healthy:        TRUE 
##     H2O Connection ip:          localhost 
##     H2O Connection port:        54321 
##     H2O Connection proxy:       NA 
##     H2O Internal Security:      FALSE 
##     R Version:                  R version 3.3.3 (2017-03-06)
## Warning in h2o.clusterInfo(): 
## Your H2O cluster version is too old (4 months and 4 days)!
## Please download and install the latest version from http://h2o.ai/download/
#install aws sdk if not present (pre-requisite for using Athena with an IAM role)
if (!aws_sdk_present()) {
  install_aws_sdk()
}

load_sdk()
## NULL

Connect to Athena

Next, establish a connection to Athena from RStudio, using an IAM role associated with your EC2 instance. Use ATHENABUCKET to specify the S3 staging directory.

URL <- 'https://s3.amazonaws.com/athena-downloads/drivers/AthenaJDBC41-1.0.1.jar'
fil <- basename(URL)
#download the file into current working directory
if (!file.exists(fil)) download.file(URL, fil)
#verify that the file has been downloaded successfully
list.files()
## [1] "AthenaJDBC41-1.0.1.jar"
drv <- JDBC(driverClass="com.amazonaws.athena.jdbc.AthenaDriver", fil, identifier.quote="'")

con <- jdbcConnection <- dbConnect(drv, 'jdbc:awsathena://athena.us-east-1.amazonaws.com:443/',
                                   s3_staging_dir=Sys.getenv("ATHENABUCKET"),
                                   aws_credentials_provider_class="com.amazonaws.auth.DefaultAWSCredentialsProviderChain")

Verify the connection. The results returned depend on your specific Athena setup.

con
## <JDBCConnection>
dbListTables(con)
##  [1] "gdelt"               "wikistats"           "elb_logs_raw_native"
##  [4] "twitter"             "twitter2"            "usermovieratings"   
##  [7] "eventcodes"          "events"              "billboard"          
## [10] "billboardtop10"      "elb_logs"            "gdelthist"          
## [13] "gdeltmaster"         "twitter"             "twitter3"

Create a dataset

For this analysis, you use a sample dataset combining information from Billboard and Wikipedia with Echo Nest data in the Million Songs Dataset. Upload this dataset into your own S3 bucket. The table below provides a description of the fields used in this dataset.

Field Description
year Year that song was released
songtitle Title of the song
artistname Name of the song artist
songid Unique identifier for the song
artistid Unique identifier for the song artist
timesignature Variable estimating the time signature of the song
timesignature_confidence Confidence in the estimate for the timesignature
loudness Continuous variable indicating the average amplitude of the audio in decibels
tempo Variable indicating the estimated beats per minute of the song
tempo_confidence Confidence in the estimate for tempo
key Variable with twelve levels indicating the estimated key of the song (C, C#, B)
key_confidence Confidence in the estimate for key
energy Variable that represents the overall acoustic energy of the song, using a mix of features such as loudness
pitch Continuous variable that indicates the pitch of the song
timbre_0_min thru timbre_11_min Variables that indicate the minimum values over all segments for each of the twelve values in the timbre vector
timbre_0_max thru timbre_11_max Variables that indicate the maximum values over all segments for each of the twelve values in the timbre vector
top10 Indicator for whether or not the song made it to the Top 10 of the Billboard charts (1 if it was in the top 10, and 0 if not)

Create an Athena table based on the dataset

In the Athena console, select the default database, sampled, or create a new database.

Run the following create table statement.

create external table if not exists billboard
(
year int,
songtitle string,
artistname string,
songID string,
artistID string,
timesignature int,
timesignature_confidence double,
loudness double,
tempo double,
tempo_confidence double,
key int,
key_confidence double,
energy double,
pitch double,
timbre_0_min double,
timbre_0_max double,
timbre_1_min double,
timbre_1_max double,
timbre_2_min double,
timbre_2_max double,
timbre_3_min double,
timbre_3_max double,
timbre_4_min double,
timbre_4_max double,
timbre_5_min double,
timbre_5_max double,
timbre_6_min double,
timbre_6_max double,
timbre_7_min double,
timbre_7_max double,
timbre_8_min double,
timbre_8_max double,
timbre_9_min double,
timbre_9_max double,
timbre_10_min double,
timbre_10_max double,
timbre_11_min double,
timbre_11_max double,
Top10 int
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION 's3://aws-bigdata-blog/artifacts/predict-billboard/data'
;

Inspect the table definition for the ‘billboard’ table that you have created. If you chose a database other than sampledb, replace that value with your choice.

dbGetQuery(con, "show create table sampledb.billboard")
##                                      createtab_stmt
## 1       CREATE EXTERNAL TABLE `sampledb.billboard`(
## 2                                       `year` int,
## 3                               `songtitle` string,
## 4                              `artistname` string,
## 5                                  `songid` string,
## 6                                `artistid` string,
## 7                              `timesignature` int,
## 8                `timesignature_confidence` double,
## 9                                `loudness` double,
## 10                                  `tempo` double,
## 11                       `tempo_confidence` double,
## 12                                       `key` int,
## 13                         `key_confidence` double,
## 14                                 `energy` double,
## 15                                  `pitch` double,
## 16                           `timbre_0_min` double,
## 17                           `timbre_0_max` double,
## 18                           `timbre_1_min` double,
## 19                           `timbre_1_max` double,
## 20                           `timbre_2_min` double,
## 21                           `timbre_2_max` double,
## 22                           `timbre_3_min` double,
## 23                           `timbre_3_max` double,
## 24                           `timbre_4_min` double,
## 25                           `timbre_4_max` double,
## 26                           `timbre_5_min` double,
## 27                           `timbre_5_max` double,
## 28                           `timbre_6_min` double,
## 29                           `timbre_6_max` double,
## 30                           `timbre_7_min` double,
## 31                           `timbre_7_max` double,
## 32                           `timbre_8_min` double,
## 33                           `timbre_8_max` double,
## 34                           `timbre_9_min` double,
## 35                           `timbre_9_max` double,
## 36                          `timbre_10_min` double,
## 37                          `timbre_10_max` double,
## 38                          `timbre_11_min` double,
## 39                          `timbre_11_max` double,
## 40                                     `top10` int)
## 41                             ROW FORMAT DELIMITED 
## 42                         FIELDS TERMINATED BY ',' 
## 43                            STORED AS INPUTFORMAT 
## 44       'org.apache.hadoop.mapred.TextInputFormat' 
## 45                                     OUTPUTFORMAT 
## 46  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
## 47                                        LOCATION
## 48    's3://aws-bigdata-blog/artifacts/predict-billboard/data'
## 49                                  TBLPROPERTIES (
## 50            'transient_lastDdlTime'='1505484133')

Run a sample query

Next, run a sample query to obtain a list of all songs from Janet Jackson that made it to the Billboard Top 10 charts.

dbGetQuery(con, " SELECT songtitle,artistname,top10   FROM sampledb.billboard WHERE lower(artistname) =     'janet jackson' AND top10 = 1")
##                       songtitle    artistname top10
## 1                       Runaway Janet Jackson     1
## 2               Because Of Love Janet Jackson     1
## 3                         Again Janet Jackson     1
## 4                            If Janet Jackson     1
## 5  Love Will Never Do (Without You) Janet Jackson 1
## 6                     Black Cat Janet Jackson     1
## 7               Come Back To Me Janet Jackson     1
## 8                       Alright Janet Jackson     1
## 9                      Escapade Janet Jackson     1
## 10                Rhythm Nation Janet Jackson     1

Determine how many songs in this dataset are specifically from the year 2010.

dbGetQuery(con, " SELECT count(*)   FROM sampledb.billboard WHERE year = 2010")
##   _col0
## 1   373

The sample dataset provides certain song properties of interest that can be analyzed to gauge the impact to the song’s overall popularity. Look at one such property, timesignature, and determine the value that is the most frequent among songs in the database. Timesignature is a measure of the number of beats and the type of note involved.

Running the query directly may result in an error, as shown in the commented lines below. This error is a result of trying to retrieve a large result set over a JDBC connection, which can cause out-of-memory issues at the client level. To address this, reduce the fetch size and run again.

#t<-dbGetQuery(con, " SELECT timesignature FROM sampledb.billboard")
#Note:  Running the preceding query results in the following error: 
#Error in .jcall(rp, "I", "fetch", stride, block): java.sql.SQLException: The requested #fetchSize is more than the allowed value in Athena. Please reduce the fetchSize and try #again. Refer to the Athena documentation for valid fetchSize values.
# Use the dbSendQuery function, reduce the fetch size, and run again
r <- dbSendQuery(con, " SELECT timesignature     FROM sampledb.billboard")
dftimesignature<- fetch(r, n=-1, block=100)
dbClearResult(r)
## [1] TRUE
table(dftimesignature)
## dftimesignature
##    0    1    3    4    5    7 
##   10  143  503 6787  112   19
nrow(dftimesignature)
## [1] 7574

From the results, observe that 6787 songs have a timesignature of 4.

Next, determine the song with the highest tempo.

dbGetQuery(con, " SELECT songtitle,artistname,tempo   FROM sampledb.billboard WHERE tempo = (SELECT max(tempo) FROM sampledb.billboard) ")
##                   songtitle      artistname   tempo
## 1 Wanna Be Startin' Somethin' Michael Jackson 244.307

Create the training dataset

Your model needs to be trained such that it can learn and make accurate predictions. Split the data into training and test datasets, and create the training dataset first.  This dataset contains all observations from the year 2009 and earlier. You may face the same JDBC connection issue pointed out earlier, so this query uses a fetch size.

#BillboardTrain <- dbGetQuery(con, "SELECT * FROM sampledb.billboard WHERE year <= 2009")
#Running the preceding query results in the following error:-
#Error in .verify.JDBC.result(r, "Unable to retrieve JDBC result set for ", : Unable to retrieve #JDBC result set for SELECT * FROM sampledb.billboard WHERE year <= 2009 (Internal error)
#Follow the same approach as before to address this issue.

r <- dbSendQuery(con, "SELECT * FROM sampledb.billboard WHERE year <= 2009")
BillboardTrain <- fetch(r, n=-1, block=100)
dbClearResult(r)
## [1] TRUE
BillboardTrain[1:2,c(1:3,6:10)]
##   year           songtitle artistname timesignature
## 1 2009 The Awkward Goodbye    Athlete             3
## 2 2009        Rubik's Cube    Athlete             3
##   timesignature_confidence loudness   tempo tempo_confidence
## 1                    0.732   -6.320  89.614   0.652
## 2                    0.906   -9.541 117.742   0.542
nrow(BillboardTrain)
## [1] 7201

Create the test dataset

BillboardTest <- dbGetQuery(con, "SELECT * FROM sampledb.billboard where year = 2010")
BillboardTest[1:2,c(1:3,11:15)]
##   year              songtitle        artistname key
## 1 2010 This Is the House That Doubt Built A Day to Remember  11
## 2 2010        Sticks & Bricks A Day to Remember  10
##   key_confidence    energy pitch timbre_0_min
## 1          0.453 0.9666556 0.024        0.002
## 2          0.469 0.9847095 0.025        0.000
nrow(BillboardTest)
## [1] 373

Convert the training and test datasets into H2O dataframes

train.h2o <- as.h2o(BillboardTrain)
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=================================================================| 100%
test.h2o <- as.h2o(BillboardTest)
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=================================================================| 100%

Inspect the column names in your H2O dataframes.

colnames(train.h2o)
##  [1] "year"                     "songtitle"               
##  [3] "artistname"               "songid"                  
##  [5] "artistid"                 "timesignature"           
##  [7] "timesignature_confidence" "loudness"                
##  [9] "tempo"                    "tempo_confidence"        
## [11] "key"                      "key_confidence"          
## [13] "energy"                   "pitch"                   
## [15] "timbre_0_min"             "timbre_0_max"            
## [17] "timbre_1_min"             "timbre_1_max"            
## [19] "timbre_2_min"             "timbre_2_max"            
## [21] "timbre_3_min"             "timbre_3_max"            
## [23] "timbre_4_min"             "timbre_4_max"            
## [25] "timbre_5_min"             "timbre_5_max"            
## [27] "timbre_6_min"             "timbre_6_max"            
## [29] "timbre_7_min"             "timbre_7_max"            
## [31] "timbre_8_min"             "timbre_8_max"            
## [33] "timbre_9_min"             "timbre_9_max"            
## [35] "timbre_10_min"            "timbre_10_max"           
## [37] "timbre_11_min"            "timbre_11_max"           
## [39] "top10"

Create models

You need to designate the independent and dependent variables prior to applying your modeling algorithms. Because you’re trying to predict the ‘top10’ field, this would be your dependent variable and everything else would be independent.

Create your first model using GLM. Because GLM works best with numeric data, you create your model by dropping non-numeric variables. You only use the variables in the dataset that describe the numerical attributes of the song in the logistic regression model. You won’t use these variables:  “year”, “songtitle”, “artistname”, “songid”, or “artistid”.

y.dep <- 39
x.indep <- c(6:38)
x.indep
##  [1]  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
## [24] 29 30 31 32 33 34 35 36 37 38

Create Model 1: All numeric variables

Create Model 1 with the training dataset, using GLM as the modeling algorithm and H2O’s built-in h2o.glm function.

modelh1 <- h2o.glm( y = y.dep, x = x.indep, training_frame = train.h2o, family = "binomial")
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=====                                                            |   8%
  |                                                                       
  |=================================================================| 100%

Measure the performance of Model 1, using H2O’s built-in performance function.

h2o.performance(model=modelh1,newdata=test.h2o)
## H2OBinomialMetrics: glm
## 
## MSE:  0.09924684
## RMSE:  0.3150347
## LogLoss:  0.3220267
## Mean Per-Class Error:  0.2380168
## AUC:  0.8431394
## Gini:  0.6862787
## R^2:  0.254663
## Null Deviance:  326.0801
## Residual Deviance:  240.2319
## AIC:  308.2319
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##          0   1    Error     Rate
## 0      255  59 0.187898  =59/314
## 1       17  42 0.288136   =17/59
## Totals 272 101 0.203753  =76/373
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                         metric threshold    value idx
## 1                       max f1  0.192772 0.525000 100
## 2                       max f2  0.124912 0.650510 155
## 3                 max f0point5  0.416258 0.612903  23
## 4                 max accuracy  0.416258 0.879357  23
## 5                max precision  0.813396 1.000000   0
## 6                   max recall  0.037579 1.000000 282
## 7              max specificity  0.813396 1.000000   0
## 8             max absolute_mcc  0.416258 0.455251  23
## 9   max min_per_class_accuracy  0.161402 0.738854 125
## 10 max mean_per_class_accuracy  0.124912 0.765006 155
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or ` 
h2o.auc(h2o.performance(modelh1,test.h2o)) 
## [1] 0.8431394

The AUC metric provides insight into how well the classifier is able to separate the two classes. In this case, the value of 0.8431394 indicates that the classification is good. (A value of 0.5 indicates a worthless test, while a value of 1.0 indicates a perfect test.)

Next, inspect the coefficients of the variables in the dataset.

dfmodelh1 <- as.data.frame(h2o.varimp(modelh1))
dfmodelh1
##                       names coefficients sign
## 1              timbre_0_max  1.290938663  NEG
## 2                  loudness  1.262941934  POS
## 3                     pitch  0.616995941  NEG
## 4              timbre_1_min  0.422323735  POS
## 5              timbre_6_min  0.349016024  NEG
## 6                    energy  0.348092062  NEG
## 7             timbre_11_min  0.307331997  NEG
## 8              timbre_3_max  0.302225619  NEG
## 9             timbre_11_max  0.243632060  POS
## 10             timbre_4_min  0.224233951  POS
## 11             timbre_4_max  0.204134342  POS
## 12             timbre_5_min  0.199149324  NEG
## 13             timbre_0_min  0.195147119  POS
## 14 timesignature_confidence  0.179973904  POS
## 15         tempo_confidence  0.144242598  POS
## 16            timbre_10_max  0.137644568  POS
## 17             timbre_7_min  0.126995955  NEG
## 18            timbre_10_min  0.123851179  POS
## 19             timbre_7_max  0.100031481  NEG
## 20             timbre_2_min  0.096127636  NEG
## 21           key_confidence  0.083115820  POS
## 22             timbre_6_max  0.073712419  POS
## 23            timesignature  0.067241917  POS
## 24             timbre_8_min  0.061301881  POS
## 25             timbre_8_max  0.060041698  POS
## 26                      key  0.056158445  POS
## 27             timbre_3_min  0.050825116  POS
## 28             timbre_9_max  0.033733561  POS
## 29             timbre_2_max  0.030939072  POS
## 30             timbre_9_min  0.020708113  POS
## 31             timbre_1_max  0.014228818  NEG
## 32                    tempo  0.008199861  POS
## 33             timbre_5_max  0.004837870  POS
## 34                                    NA <NA>

Typically, songs with heavier instrumentation tend to be louder (have higher values in the variable “loudness”) and more energetic (have higher values in the variable “energy”). This knowledge is helpful for interpreting the modeling results.

You can make the following observations from the results:

  • The coefficient estimates for the confidence values associated with the time signature, key, and tempo variables are positive. This suggests that higher confidence leads to a higher predicted probability of a Top 10 hit.
  • The coefficient estimate for loudness is positive, meaning that mainstream listeners prefer louder songs with heavier instrumentation.
  • The coefficient estimate for energy is negative, meaning that mainstream listeners prefer songs that are less energetic, which are those songs with light instrumentation.

These coefficients lead to contradictory conclusions for Model 1. This could be due to multicollinearity issues. Inspect the correlation between the variables “loudness” and “energy” in the training set.

cor(train.h2o$loudness,train.h2o$energy)
## [1] 0.7399067

This number indicates that these two variables are highly correlated, and Model 1 does indeed suffer from multicollinearity. Typically, you associate a value of -1.0 to -0.5 or 1.0 to 0.5 to indicate strong correlation, and a value of 0.1 to 0.1 to indicate weak correlation. To avoid this correlation issue, omit one of these two variables and re-create the models.

You build two variations of the original model:

  • Model 2, in which you keep “energy” and omit “loudness”
  • Model 3, in which you keep “loudness” and omit “energy”

You compare these two models and choose the model with a better fit for this use case.

Create Model 2: Keep energy and omit loudness

colnames(train.h2o)
##  [1] "year"                     "songtitle"               
##  [3] "artistname"               "songid"                  
##  [5] "artistid"                 "timesignature"           
##  [7] "timesignature_confidence" "loudness"                
##  [9] "tempo"                    "tempo_confidence"        
## [11] "key"                      "key_confidence"          
## [13] "energy"                   "pitch"                   
## [15] "timbre_0_min"             "timbre_0_max"            
## [17] "timbre_1_min"             "timbre_1_max"            
## [19] "timbre_2_min"             "timbre_2_max"            
## [21] "timbre_3_min"             "timbre_3_max"            
## [23] "timbre_4_min"             "timbre_4_max"            
## [25] "timbre_5_min"             "timbre_5_max"            
## [27] "timbre_6_min"             "timbre_6_max"            
## [29] "timbre_7_min"             "timbre_7_max"            
## [31] "timbre_8_min"             "timbre_8_max"            
## [33] "timbre_9_min"             "timbre_9_max"            
## [35] "timbre_10_min"            "timbre_10_max"           
## [37] "timbre_11_min"            "timbre_11_max"           
## [39] "top10"
y.dep <- 39
x.indep <- c(6:7,9:38)
x.indep
##  [1]  6  7  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
## [24] 30 31 32 33 34 35 36 37 38
modelh2 <- h2o.glm( y = y.dep, x = x.indep, training_frame = train.h2o, family = "binomial")
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=======                                                          |  10%
  |                                                                       
  |=================================================================| 100%

Measure the performance of Model 2.

h2o.performance(model=modelh2,newdata=test.h2o)
## H2OBinomialMetrics: glm
## 
## MSE:  0.09922606
## RMSE:  0.3150017
## LogLoss:  0.3228213
## Mean Per-Class Error:  0.2490554
## AUC:  0.8431933
## Gini:  0.6863867
## R^2:  0.2548191
## Null Deviance:  326.0801
## Residual Deviance:  240.8247
## AIC:  306.8247
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##          0  1    Error     Rate
## 0      280 34 0.108280  =34/314
## 1       23 36 0.389831   =23/59
## Totals 303 70 0.152815  =57/373
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                         metric threshold    value idx
## 1                       max f1  0.254391 0.558140  69
## 2                       max f2  0.113031 0.647208 157
## 3                 max f0point5  0.413999 0.596026  22
## 4                 max accuracy  0.446250 0.876676  18
## 5                max precision  0.811739 1.000000   0
## 6                   max recall  0.037682 1.000000 283
## 7              max specificity  0.811739 1.000000   0
## 8             max absolute_mcc  0.254391 0.469060  69
## 9   max min_per_class_accuracy  0.141051 0.716561 131
## 10 max mean_per_class_accuracy  0.113031 0.761821 157
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`
dfmodelh2 <- as.data.frame(h2o.varimp(modelh2))
dfmodelh2
##                       names coefficients sign
## 1                     pitch  0.700331511  NEG
## 2              timbre_1_min  0.510270513  POS
## 3              timbre_0_max  0.402059546  NEG
## 4              timbre_6_min  0.333316236  NEG
## 5             timbre_11_min  0.331647383  NEG
## 6              timbre_3_max  0.252425901  NEG
## 7             timbre_11_max  0.227500308  POS
## 8              timbre_4_max  0.210663865  POS
## 9              timbre_0_min  0.208516163  POS
## 10             timbre_5_min  0.202748055  NEG
## 11             timbre_4_min  0.197246582  POS
## 12            timbre_10_max  0.172729619  POS
## 13         tempo_confidence  0.167523934  POS
## 14 timesignature_confidence  0.167398830  POS
## 15             timbre_7_min  0.142450727  NEG
## 16             timbre_8_max  0.093377516  POS
## 17            timbre_10_min  0.090333426  POS
## 18            timesignature  0.085851625  POS
## 19             timbre_7_max  0.083948442  NEG
## 20           key_confidence  0.079657073  POS
## 21             timbre_6_max  0.076426046  POS
## 22             timbre_2_min  0.071957831  NEG
## 23             timbre_9_max  0.071393189  POS
## 24             timbre_8_min  0.070225578  POS
## 25                      key  0.061394702  POS
## 26             timbre_3_min  0.048384697  POS
## 27             timbre_1_max  0.044721121  NEG
## 28                   energy  0.039698433  POS
## 29             timbre_5_max  0.039469064  POS
## 30             timbre_2_max  0.018461133  POS
## 31                    tempo  0.013279926  POS
## 32             timbre_9_min  0.005282143  NEG
## 33                                    NA <NA>

h2o.auc(h2o.performance(modelh2,test.h2o)) 
## [1] 0.8431933

You can make the following observations:

  • The AUC metric is 0.8431933.
  • Inspecting the coefficient of the variable energy, Model 2 suggests that songs with high energy levels tend to be more popular. This is as per expectation.
  • As H2O orders variables by significance, the variable energy is not significant in this model.

You can conclude that Model 2 is not ideal for this use , as energy is not significant.

CreateModel 3: Keep loudness but omit energy

colnames(train.h2o)
##  [1] "year"                     "songtitle"               
##  [3] "artistname"               "songid"                  
##  [5] "artistid"                 "timesignature"           
##  [7] "timesignature_confidence" "loudness"                
##  [9] "tempo"                    "tempo_confidence"        
## [11] "key"                      "key_confidence"          
## [13] "energy"                   "pitch"                   
## [15] "timbre_0_min"             "timbre_0_max"            
## [17] "timbre_1_min"             "timbre_1_max"            
## [19] "timbre_2_min"             "timbre_2_max"            
## [21] "timbre_3_min"             "timbre_3_max"            
## [23] "timbre_4_min"             "timbre_4_max"            
## [25] "timbre_5_min"             "timbre_5_max"            
## [27] "timbre_6_min"             "timbre_6_max"            
## [29] "timbre_7_min"             "timbre_7_max"            
## [31] "timbre_8_min"             "timbre_8_max"            
## [33] "timbre_9_min"             "timbre_9_max"            
## [35] "timbre_10_min"            "timbre_10_max"           
## [37] "timbre_11_min"            "timbre_11_max"           
## [39] "top10"
y.dep <- 39
x.indep <- c(6:12,14:38)
x.indep
##  [1]  6  7  8  9 10 11 12 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
## [24] 30 31 32 33 34 35 36 37 38
modelh3 <- h2o.glm( y = y.dep, x = x.indep, training_frame = train.h2o, family = "binomial")
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |========                                                         |  12%
  |                                                                       
  |=================================================================| 100%
perfh3<-h2o.performance(model=modelh3,newdata=test.h2o)
perfh3
## H2OBinomialMetrics: glm
## 
## MSE:  0.0978859
## RMSE:  0.3128672
## LogLoss:  0.3178367
## Mean Per-Class Error:  0.264925
## AUC:  0.8492389
## Gini:  0.6984778
## R^2:  0.2648836
## Null Deviance:  326.0801
## Residual Deviance:  237.1062
## AIC:  303.1062
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##          0  1    Error     Rate
## 0      286 28 0.089172  =28/314
## 1       26 33 0.440678   =26/59
## Totals 312 61 0.144772  =54/373
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                         metric threshold    value idx
## 1                       max f1  0.273799 0.550000  60
## 2                       max f2  0.125503 0.663265 155
## 3                 max f0point5  0.435479 0.628931  24
## 4                 max accuracy  0.435479 0.882038  24
## 5                max precision  0.821606 1.000000   0
## 6                   max recall  0.038328 1.000000 280
## 7              max specificity  0.821606 1.000000   0
## 8             max absolute_mcc  0.435479 0.471426  24
## 9   max min_per_class_accuracy  0.173693 0.745763 120
## 10 max mean_per_class_accuracy  0.125503 0.775073 155
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`
dfmodelh3 <- as.data.frame(h2o.varimp(modelh3))
dfmodelh3
##                       names coefficients sign
## 1              timbre_0_max 1.216621e+00  NEG
## 2                  loudness 9.780973e-01  POS
## 3                     pitch 7.249788e-01  NEG
## 4              timbre_1_min 3.891197e-01  POS
## 5              timbre_6_min 3.689193e-01  NEG
## 6             timbre_11_min 3.086673e-01  NEG
## 7              timbre_3_max 3.025593e-01  NEG
## 8             timbre_11_max 2.459081e-01  POS
## 9              timbre_4_min 2.379749e-01  POS
## 10             timbre_4_max 2.157627e-01  POS
## 11             timbre_0_min 1.859531e-01  POS
## 12             timbre_5_min 1.846128e-01  NEG
## 13 timesignature_confidence 1.729658e-01  POS
## 14             timbre_7_min 1.431871e-01  NEG
## 15            timbre_10_max 1.366703e-01  POS
## 16            timbre_10_min 1.215954e-01  POS
## 17         tempo_confidence 1.183698e-01  POS
## 18             timbre_2_min 1.019149e-01  NEG
## 19           key_confidence 9.109701e-02  POS
## 20             timbre_7_max 8.987908e-02  NEG
## 21             timbre_6_max 6.935132e-02  POS
## 22             timbre_8_max 6.878241e-02  POS
## 23            timesignature 6.120105e-02  POS
## 24                      key 5.814805e-02  POS
## 25             timbre_8_min 5.759228e-02  POS
## 26             timbre_1_max 2.930285e-02  NEG
## 27             timbre_9_max 2.843755e-02  POS
## 28             timbre_3_min 2.380245e-02  POS
## 29             timbre_2_max 1.917035e-02  POS
## 30             timbre_5_max 1.715813e-02  POS
## 31                    tempo 1.364418e-02  NEG
## 32             timbre_9_min 8.463143e-05  NEG
## 33                                    NA <NA>
h2o.sensitivity(perfh3,0.5)
## Warning in h2o.find_row_by_threshold(object, t): Could not find exact
## threshold: 0.5 for this set of metrics; using closest threshold found:
## 0.501855569251422. Run `h2o.predict` and apply your desired threshold on a
## probability column.
## [[1]]
## [1] 0.2033898
h2o.auc(perfh3)
## [1] 0.8492389

You can make the following observations:

  • The AUC metric is 0.8492389.
  • From the confusion matrix, the model correctly predicts that 33 songs will be top 10 hits (true positives). However, it has 26 false positives (songs that the model predicted would be Top 10 hits, but ended up not being Top 10 hits).
  • Loudness has a positive coefficient estimate, meaning that this model predicts that songs with heavier instrumentation tend to be more popular. This is the same conclusion from Model 2.
  • Loudness is significant in this model.

Overall, Model 3 predicts a higher number of top 10 hits with an accuracy rate that is acceptable. To choose the best fit for production runs, record labels should consider the following factors:

  • Desired model accuracy at a given threshold
  • Number of correct predictions for top10 hits
  • Tolerable number of false positives or false negatives

Next, make predictions using Model 3 on the test dataset.

predict.regh <- h2o.predict(modelh3, test.h2o)
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=================================================================| 100%
print(predict.regh)
##   predict        p0          p1
## 1       0 0.9654739 0.034526052
## 2       0 0.9654748 0.034525236
## 3       0 0.9635547 0.036445318
## 4       0 0.9343579 0.065642149
## 5       0 0.9978334 0.002166601
## 6       0 0.9779949 0.022005078
## 
## [373 rows x 3 columns]
predict.regh$predict
##   predict
## 1       0
## 2       0
## 3       0
## 4       0
## 5       0
## 6       0
## 
## [373 rows x 1 column]
dpr<-as.data.frame(predict.regh)
#Rename the predicted column 
colnames(dpr)[colnames(dpr) == 'predict'] <- 'predict_top10'
table(dpr$predict_top10)
## 
##   0   1 
## 312  61

The first set of output results specifies the probabilities associated with each predicted observation.  For example, observation 1 is 96.54739% likely to not be a Top 10 hit, and 3.4526052% likely to be a Top 10 hit (predict=1 indicates Top 10 hit and predict=0 indicates not a Top 10 hit).  The second set of results list the actual predictions made.  From the third set of results, this model predicts that 61 songs will be top 10 hits.

Compute the baseline accuracy, by assuming that the baseline predicts the most frequent outcome, which is that most songs are not Top 10 hits.

table(BillboardTest$top10)
## 
##   0   1 
## 314  59

Now observe that the baseline model would get 314 observations correct, and 59 wrong, for an accuracy of 314/(314+59) = 0.8418231.

It seems that Model 3, with an accuracy of 0.8552, provides you with a small improvement over the baseline model. But is this model useful for record labels?

View the two models from an investment perspective:

  • A production company is interested in investing in songs that are more likely to make it to the Top 10. The company’s objective is to minimize the risk of financial losses attributed to investing in songs that end up unpopular.
  • How many songs does Model 3 correctly predict as a Top 10 hit in 2010? Looking at the confusion matrix, you see that it predicts 33 top 10 hits correctly at an optimal threshold, which is more than half the number
  • It will be more useful to the record label if you can provide the production company with a list of songs that are highly likely to end up in the Top 10.
  • The baseline model is not useful, as it simply does not label any song as a hit.

Considering the three models built so far, you can conclude that Model 3 proves to be the best investment choice for the record label.

GBM model

H2O provides you with the ability to explore other learning models, such as GBM and deep learning. Explore building a model using the GBM technique, using the built-in h2o.gbm function.

Before you do this, you need to convert the target variable to a factor for multinomial classification techniques.

train.h2o$top10=as.factor(train.h2o$top10)
gbm.modelh <- h2o.gbm(y=y.dep, x=x.indep, training_frame = train.h2o, ntrees = 500, max_depth = 4, learn_rate = 0.01, seed = 1122,distribution="multinomial")
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |===                                                              |   5%
  |                                                                       
  |=====                                                            |   7%
  |                                                                       
  |======                                                           |   9%
  |                                                                       
  |=======                                                          |  10%
  |                                                                       
  |======================                                           |  33%
  |                                                                       
  |=====================================                            |  56%
  |                                                                       
  |====================================================             |  79%
  |                                                                       
  |================================================================ |  98%
  |                                                                       
  |=================================================================| 100%
perf.gbmh<-h2o.performance(gbm.modelh,test.h2o)
perf.gbmh
## H2OBinomialMetrics: gbm
## 
## MSE:  0.09860778
## RMSE:  0.3140188
## LogLoss:  0.3206876
## Mean Per-Class Error:  0.2120263
## AUC:  0.8630573
## Gini:  0.7261146
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##          0  1    Error     Rate
## 0      266 48 0.152866  =48/314
## 1       16 43 0.271186   =16/59
## Totals 282 91 0.171582  =64/373
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                       metric threshold    value idx
## 1                     max f1  0.189757 0.573333  90
## 2                     max f2  0.130895 0.693717 145
## 3               max f0point5  0.327346 0.598802  26
## 4               max accuracy  0.442757 0.876676  14
## 5              max precision  0.802184 1.000000   0
## 6                 max recall  0.049990 1.000000 284
## 7            max specificity  0.802184 1.000000   0
## 8           max absolute_mcc  0.169135 0.496486 104
## 9 max min_per_class_accuracy  0.169135 0.796610 104
## 10 max mean_per_class_accuracy  0.169135 0.805948 104
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `
h2o.sensitivity(perf.gbmh,0.5)
## Warning in h2o.find_row_by_threshold(object, t): Could not find exact
## threshold: 0.5 for this set of metrics; using closest threshold found:
## 0.501205344484314. Run `h2o.predict` and apply your desired threshold on a
## probability column.
## [[1]]
## [1] 0.1355932
h2o.auc(perf.gbmh)
## [1] 0.8630573

This model correctly predicts 43 top 10 hits, which is 10 more than the number predicted by Model 3. Moreover, the AUC metric is higher than the one obtained from Model 3.

As seen above, H2O’s API provides the ability to obtain key statistical measures required to analyze the models easily, using several built-in functions. The record label can experiment with different parameters to arrive at the model that predicts the maximum number of Top 10 hits at the desired level of accuracy and threshold.

H2O also allows you to experiment with deep learning models. Deep learning models have the ability to learn features implicitly, but can be more expensive computationally.

Now, create a deep learning model with the h2o.deeplearning function, using the same training and test datasets created before. The time taken to run this model depends on the type of EC2 instance chosen for this purpose.  For models that require more computation, consider using accelerated computing instances such as the P2 instance type.

system.time(
  dlearning.modelh <- h2o.deeplearning(y = y.dep,
                                      x = x.indep,
                                      training_frame = train.h2o,
                                      epoch = 250,
                                      hidden = c(250,250),
                                      activation = "Rectifier",
                                      seed = 1122,
                                      distribution="multinomial"
  )
)
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |===                                                              |   4%
  |                                                                       
  |=====                                                            |   8%
  |                                                                       
  |========                                                         |  12%
  |                                                                       
  |==========                                                       |  16%
  |                                                                       
  |=============                                                    |  20%
  |                                                                       
  |================                                                 |  24%
  |                                                                       
  |==================                                               |  28%
  |                                                                       
  |=====================                                            |  32%
  |                                                                       
  |=======================                                          |  36%
  |                                                                       
  |==========================                                       |  40%
  |                                                                       
  |=============================                                    |  44%
  |                                                                       
  |===============================                                  |  48%
  |                                                                       
  |==================================                               |  52%
  |                                                                       
  |====================================                             |  56%
  |                                                                       
  |=======================================                          |  60%
  |                                                                       
  |==========================================                       |  64%
  |                                                                       
  |============================================                     |  68%
  |                                                                       
  |===============================================                  |  72%
  |                                                                       
  |=================================================                |  76%
  |                                                                       
  |====================================================             |  80%
  |                                                                       
  |=======================================================          |  84%
  |                                                                       
  |=========================================================        |  88%
  |                                                                       
  |============================================================     |  92%
  |                                                                       
  |==============================================================   |  96%
  |                                                                       
  |=================================================================| 100%
##    user  system elapsed 
##   1.216   0.020 166.508
perf.dl<-h2o.performance(model=dlearning.modelh,newdata=test.h2o)
perf.dl
## H2OBinomialMetrics: deeplearning
## 
## MSE:  0.1678359
## RMSE:  0.4096778
## LogLoss:  1.86509
## Mean Per-Class Error:  0.3433013
## AUC:  0.7568822
## Gini:  0.5137644
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##          0  1    Error     Rate
## 0      290 24 0.076433  =24/314
## 1       36 23 0.610169   =36/59
## Totals 326 47 0.160858  =60/373
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                       metric threshold    value idx
## 1                     max f1  0.826267 0.433962  46
## 2                     max f2  0.000000 0.588235 239
## 3               max f0point5  0.999929 0.511811  16
## 4               max accuracy  0.999999 0.865952  10
## 5              max precision  1.000000 1.000000   0
## 6                 max recall  0.000000 1.000000 326
## 7            max specificity  1.000000 1.000000   0
## 8           max absolute_mcc  0.999929 0.363219  16
## 9 max min_per_class_accuracy  0.000004 0.662420 145
## 10 max mean_per_class_accuracy  0.000000 0.685334 224
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`
h2o.sensitivity(perf.dl,0.5)
## Warning in h2o.find_row_by_threshold(object, t): Could not find exact
## threshold: 0.5 for this set of metrics; using closest threshold found:
## 0.496293348880151. Run `h2o.predict` and apply your desired threshold on a
## probability column.
## [[1]]
## [1] 0.3898305
h2o.auc(perf.dl)
## [1] 0.7568822

The AUC metric for this model is 0.7568822, which is less than what you got from the earlier models. I recommend further experimentation using different hyper parameters, such as the learning rate, epoch or the number of hidden layers.

H2O’s built-in functions provide many key statistical measures that can help measure model performance. Here are some of these key terms.

Metric Description
Sensitivity Measures the proportion of positives that have been correctly identified. It is also called the true positive rate, or recall.
Specificity Measures the proportion of negatives that have been correctly identified. It is also called the true negative rate.
Threshold Cutoff point that maximizes specificity and sensitivity. While the model may not provide the highest prediction at this point, it would not be biased towards positives or negatives.
Precision The fraction of the documents retrieved that are relevant to the information needed, for example, how many of the positively classified are relevant
AUC

Provides insight into how well the classifier is able to separate the two classes. The implicit goal is to deal with situations where the sample distribution is highly skewed, with a tendency to overfit to a single class.

0.90 – 1 = excellent (A)

0.8 – 0.9 = good (B)

0.7 – 0.8 = fair (C)

.6 – 0.7 = poor (D)

0.5 – 0.5 = fail (F)

Here’s a summary of the metrics generated from H2O’s built-in functions for the three models that produced useful results.

Metric Model 3 GBM Model Deep Learning Model

Accuracy

(max)

0.882038

(t=0.435479)

0.876676

(t=0.442757)

0.865952

(t=0.999999)

Precision

(max)

1.0

(t=0.821606)

1.0

(t=0802184)

1.0

(t=1.0)

Recall

(max)

1.0 1.0

1.0

(t=0)

Specificity

(max)

1.0 1.0

1.0

(t=1)

Sensitivity

 

0.2033898 0.1355932

0.3898305

(t=0.5)

AUC 0.8492389 0.8630573 0.756882

Note: ‘t’ denotes threshold.

Your options at this point could be narrowed down to Model 3 and the GBM model, based on the AUC and accuracy metrics observed earlier.  If the slightly lower accuracy of the GBM model is deemed acceptable, the record label can choose to go to production with the GBM model, as it can predict a higher number of Top 10 hits.  The AUC metric for the GBM model is also higher than that of Model 3.

Record labels can experiment with different learning techniques and parameters before arriving at a model that proves to be the best fit for their business. Because deep learning models can be computationally expensive, record labels can choose more powerful EC2 instances on AWS to run their experiments faster.

Conclusion

In this post, I showed how the popular music industry can use analytics to predict the type of songs that make the Top 10 Billboard charts. By running H2O’s scalable machine learning platform on AWS, data scientists can easily experiment with multiple modeling techniques and interactively query the data using Amazon Athena, without having to manage the underlying infrastructure. This helps record labels make critical decisions on the type of artists and songs to promote in a timely fashion, thereby increasing sales and revenue.

If you have questions or suggestions, please comment below.


Additional Reading

Learn how to build and explore a simple geospita simple GEOINT application using SparkR.


About the Authors

gopalGopal Wunnava is a Partner Solution Architect with the AWS GSI Team. He works with partners and customers on big data engagements, and is passionate about building analytical solutions that drive business capabilities and decision making. In his spare time, he loves all things sports and movies related and is fond of old classics like Asterix, Obelix comics and Hitchcock movies.

 

 

Bob Strahan, a Senior Consultant with AWS Professional Services, contributed to this post.

 

 

The CoderDojo Girls Initiative

Post Syndicated from Nuala McHale original https://www.raspberrypi.org/blog/coderdojo-girls-initiative/

In March, the CoderDojo Foundation launched their Girls Initiative, which aims to increase the average proportion of girls attending CoderDojo clubs from 29% to at least 40% over the next three years.

The CoderDojo Girls Initiative

Six months on, we wanted to highlight what we’ve done so far and what’s next for our initiative.

What we’ve done so far

To date, we have focussed our efforts on four key areas:

  • Developing and improving content
  • Conducting and learning from research
  • Highlighting role models
  • Developing a guide of tried and tested best practices for encouraging and sustaining girls in a Dojo setting (Empowering the Future)

Content

We’ve taken measures to ensure our resources are as friendly to girls as well as boys, and we are improving them based on feedback from girls. For example, we have developed beginner-level content (Sushi Cards) for working with wearables and for building apps using App Inventor. In response to girls’ feedback, we are exploring more creative goal-orientated content.

The CoderDojo Girls Initiative

Moreover, as part of our Empowering the Future guide, we have developed three short ‘Mini-Sushi’ projects which provide a taster of different programming languages, such as Scratch, HTML, and App Inventor.

What’s next?

We are currently finalising our intermediate-level wearables Sushi Cards. These are resources for learners to further explore wearables and integrate them with other coding skills they are developing. The Cards will enable young people to program LEDs which can be sewn into clothing with conductive thread. We are also planning another series of Sushi Cards focused on using coding skills to solve problems Ninjas have reported as important to them.

Research

In June 2017 we conducted the first Ninja survey. It was sent to all young people registered on the CoderDojo community platform, Zen. Hundreds of young people involved in Dojos around the world responded and shared their experiences.

The CoderDojo Girls Initiative

We are currently examining these results to identify areas in which girls feel most or least confident, as well as the motivations and influencing factors that cause them to continue with coding.

What’s next?

Over the coming months we will delve deeper into the findings of this research, and decide how we can improve our content and Dojo support to adapt accordingly. Additionally, as part of sending out our Empowering the Future guide, we’re asking Dojos to provide insights into their current proportions of girls and female Mentors.

The CoderDojo Girls Initiative

We will follow up with recipients of the guide to document the impact of the recommended approaches they try at their Dojo. Thus, we will find out which approaches are most effective in different regional contexts, which will help us improve our support for Dojos wanting to increase their proportion of attending girls.

Role models

Many Dojos, Champions, and Mentors are doing amazing work to support and encourage girls at their Dojos. Female Mentors not only help by supporting attending girls, but they also act as vital role models in an environment which is often male-dominated. Blogs by female Mentors and Ninjas which have already featured on our website include:

What’s next?

We recognise the importance of female role models, and over the coming months we will continue to encourage community members to share their stories so that we bring them to the wider CoderDojo community. Do you know a female Mentor or Ninja you would like to shine a spotline on? Get in touch with us at [email protected] You can also use #CoderDojoGirls on social media.

The CoderDojo Girls Initiative

Empowering the Future guide

Ahead of Ada Lovelace Day and International Day of the Girl Child, the CoderDojo Foundation has released Empowering the Future, a comprehensive guide of practical approaches which Dojos have tested to engage and sustain girls.

Some topics covered in the guide are:

  • Approaches to improve the Dojo environment and layout
  • Language and images used to describe and promote Dojos
  • Content considerations, and suggested resources
  • The importance of female Mentors, and ways to increase access to role models

For the next month, Dojos that want to improve their proportion of girls can still sign up to have the guide book sent to them for free! From today, Dojos and anyone else can also download a PDF file of the guide.

The CoderDojo Girls Initiative

We would like to say a massive thank you to all community members who have shared their insights with us to make our Empowering the Future guide as comprehensive and beneficial as possible for other Dojos.

Tell us what you think

Have you found an approach, or used content, which girls find particularly engaging? Do you have questions about our Girls Initiative? We would love to hear your ideas, insights, and experiences in relation to supporting CoderDojo girls! Feel free to use our forums to share with the global CoderDojo community, and email us at [email protected]

The post The CoderDojo Girls Initiative appeared first on Raspberry Pi.

Spooktacular Halloween Haunted Portrait

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/spooktacular-halloween-haunted-portrait/

October has come at last, and with it, the joy of Halloween is now upon us. So while I spend the next 30 days quoting Hocus Pocus at every opportunity, here’s Adafruit’s latest spooky build … the spooktacular Haunted Portrait.

Adafruit Raspberry Pi Haunted Portrait

Haunted Portraits

If you’ve visited a haunted house such as Disney’s Haunted Mansion, or walked the halls of Hogwarts at Universal Studios, you will have seen a ‘moving portrait’. Whether it’s the classic ‘did that painting just blink?’ approach, or occupants moving in and out of frame, they’re an effective piece of spooky decoration – and now you can make your own!

Adafruit’s AdaBox

John Park, maker extraordinaire, recently posted a live make video where he used the contents of the Raspberry Pi-themed AdaBox 005 to create a blinking portrait.

AdaBox 005 Raspberry Pi Haunted Portrait

The Adabox is Adafruit’s own maker subscription service where plucky makers receive a mystery parcel containing exciting tech and inspirational builds. Their more recent delivery, the AdaBox 005, contains a Raspberry Pi Zero, their own Joy Bonnet, a case, and peripherals, including Pimoroni’s no-solder Hammer Headers.

AdaBox 005 Raspberry Pi Haunted Portrait

While you can purchase the AdaBoxes as one-off buys, subscribers get extra goodies. With AdaBox 005, they received bonus content including Raspberry Pi swag in the form of stickers, and a copy of The MagPi Magazine.

AdaBox 005 Raspberry Pi Haunted Portrait

The contents of AdaBox 005 allows makers to build their own Raspberry Pi Zero tiny gaming machine. But the ever-working minds of the Adafruit team didn’t want to settle there, so they decided to create more tutorials based on the box’s contents, such as John Park’s Haunted Portrait.

Bringing a portrait to life

Alongside the AdaBox 005 content, all of which can be purchased from Adafruit directly, you’ll need a flat-screen monitor and a fancy frame. The former could be an old TV or computer screen while the latter, unless you happen to have an ornate frame that perfectly fits your monitor, can be made from cardboard, CNC-cut wood or gold-painted macaroni and tape … probably.

Adafruit Raspberry Pi Haunted Portrait

You’ll need to attach headers to your Raspberry Pi Zero. For those of you who fear the soldering iron, the Hammer Headers can be hammered into place without the need for melty hot metal. If you’d like to give soldering a go, you can follow Laura’s Getting Started With Soldering tutorial video.

Adafruit Raspberry Pi Haunted Portrait Hammer Header

In his tutorial, John goes on to explain how to set up the Joy Bonnet (if you wish to use it as an added controller), set your Raspberry Pi to display in portrait mode, and manipulate an image in Photoshop or GIMP to create the blinking effect.

Adafruit Raspberry Pi Haunted Portrait

Blinking eyes are just the start of the possibilities for this project. This is your moment to show off your image manipulation skills! Why not have the entire head flash to show the skull within? Or have an ethereal image appear in the background of an otherwise unexceptional painting of a bowl of fruit?

In the final stages of the tutorial, John explains how to set an image slideshow running on the Pi, and how to complete the look with the aforementioned ornate frame. He also goes into detail about the importance of using a matte effect screen or transparent gels to give a more realistic ‘painted’ feel.

You’ll find everything you need to make your own haunted portrait here, including a link to John’s entire live stream.

Get spooky!

We’re going to make this for Pi Towers. In fact, I’m wondering whether I could create an entire gallery of portraits specifically for our reception area and see how long it takes people to notice …

… though I possibly shouldn’t have given my idea away on this rather public blog post.

If you make the Haunted Portrait, or any other Halloween-themed Pi build, make sure you share it with us via social media, or in the comments below.

The post Spooktacular Halloween Haunted Portrait appeared first on Raspberry Pi.

RaspiReader: build your own fingerprint reader

Post Syndicated from Janina Ander original https://www.raspberrypi.org/blog/raspireader-fingerprint-scanner/

Three researchers from Michigan State University have developed a low-cost, open-source fingerprint reader which can detect fake prints. They call it RaspiReader, and they’ve built it using a Raspberry Pi 3 and two Camera Modules. Joshua and his colleagues have just uploaded all the info you need to build your own version — let’s go!

GIF of fingerprint match points being aligned on fingerprint, not real output of RaspiReader software

Sadly not the real output of the RaspiReader

Falsified fingerprints

We’ve probably all seen a movie in which a burglar crosses a room full of laser tripwires and then enters the safe full of loot by tricking the fingerprint-secured lock with a fake print. Turns out, the second part is not that unrealistic: you can fake fingerprints using a range of materials, such as glue or latex.

Examples of live and fake fingerprints collected by the RaspiReader team

The RaspiReader team collected live and fake fingerprints to test the device

If the spoof print layer capping the spoofer’s finger is thin enough, it can even fool readers that detect blood flow, pulse, or temperature. This is becoming a significant security risk, not least for anyone who unlocks their smartphone using a fingerprint.

The RaspiReader

This is where Anil K. Jain comes in: Professor Jain leads a biometrics research group. Under his guidance, Joshua J. Engelsma and Kai Cao set out to develop a fingerprint reader with improved spoof-print detection. Ultimately, they aim to help the development of more secure commercial technologies. With their project, the team has also created an amazing resource for anyone who wants to build their own fingerprint reader.

So that replicating their device would be easy, they wanted to make it using inexpensive, readily available components, which is why they turned to Raspberry Pi technology.

RaspiReader fingerprint scanner by PRIP lab

The Raspireader and its output

Inside the RaspiReader’s 3D-printed housing, LEDs shine light through an acrylic prism, on top of which the user rests their finger. The prism refracts the light so that the two Camera Modules can take images from different angles. The Pi receives these images via a Multi Camera Adapter Module feeding into the CSI port. Collecting two images means the researchers’ spoof detection algorithm has more information to work with.

Comparison of live and spoof fingerprints

Real on the left, fake on the right

RaspiReader software

The Camera Adaptor uses the RPi.GPIO Python package. The RaspiReader performs image processing, and its spoof detection takes image colour and 3D friction ridge patterns into account. The detection algorithm extracts colour local binary patterns … please don’t ask me to explain! You can have a look at the researchers’ manuscript if you want to get stuck into the fine details of their project.

Build your own fingerprint reader

I’ve had my eyes glued to my inbox waiting for Josh to send me links to instructions and files for this build, and here they are (thanks, Josh)! Check out the video tutorial, which walks you through how to assemble the RaspiReader:

RaspiReader: Cost-Effective Open-Source Fingerprint Reader

Building a cost-effective, open-source, and spoof-resilient fingerprint reader for $160* in under an hour. Code: https://github.com/engelsjo/RaspiReader Links to parts: 1. PRISM – https://www.amazon.com/gp/product/B00WL3OBK4/ref=oh_aui_detailpage_o05_s00?ie=UTF8&psc=1 (Better fit) https://www.thorlabs.com/thorproduct.cfm?partnumber=PS611 2. RaspiCams – https://www.amazon.com/gp/product/B012V1HEP4/ref=oh_aui_detailpage_o00_s00?ie=UTF8&psc=1 3. Camera Multiplexer https://www.amazon.com/gp/product/B012UQWOOQ/ref=oh_aui_detailpage_o04_s01?ie=UTF8&psc=1 4. Raspberry Pi Kit: https://www.amazon.com/CanaKit-Raspberry-Clear-Power-Supply/dp/B01C6EQNNK/ref=sr_1_6?ie=UTF8&qid=1507058509&sr=8-6&keywords=raspberry+pi+3b Whitepaper: https://arxiv.org/abs/1708.07887 * Prices can vary based on Amazon’s pricing. P.s.

You can find a parts list with links to suppliers in the video description — the whole build costs around $160. All the STL files for the housing and the Python scripts you need to run on the Pi are available on Josh’s GitHub.

Enhance your home security

The RaspiReader is a great resource for researchers, and it would also be a terrific project to build at home! Is there a more impressive way to protect a treasured possession, or secure access to your computer, than with a DIY fingerprint scanner?

Check out this James-Bond-themed blog post for Raspberry Pi resources to help you build a high-security lair. If you want even more inspiration, watch this video about a laser-secured cookie jar which Estefannie made for us. And be sure to share your successful fingerprint scanner builds with us via social media!

The post RaspiReader: build your own fingerprint reader appeared first on Raspberry Pi.

Adafruit’s read-only Raspberry Pi

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/adafruits-read-only/

For passive projects such as point-of-sale displays, video loopers, and your upcoming Halloween builds, Adafruit have come up with a read-only solution for powering down your Raspberry Pi without endangering your SD card.

Adafruit read-only raspberry pi

Pulling the plug

At home, at a coding club, or at a Jam, you rarely need to pull the plug on your Raspberry Pi without going through the correct shutdown procedure. To ensure a long life for your SD card and its contents, you should always turn off you Pi by selecting the shutdown option from the menu. This way the Pi saves any temporary files to the card before relinquishing power.

Dramatic reconstruction

By pulling the plug while your OS is still running, you might corrupt these files, which could result in the Pi failing to boot up again. The only fix? Wipe the SD card clean and start over, waving goodbye to all files you didn’t back up.

Passive projects

But what if it’s not as easy as selecting shutdown, because your Raspberry Pi is embedded deep inside the belly of a project? Maybe you’ve hot-glued your Zero W into a pumpkin which is now screwed to the roof of your porch, or your store has a bank of Pi-powered monitors playing ads and the power is set to shut off every evening. Without the ability to shut down your Pi via the menu, you risk the SD card’s contents every time you power down your project.

Read-only

Just in time of the plethora of Halloween projects we’re looking forward to this month, the clever folk at Adafruit have designed a solution for this issue. They’ve shared a script which forces the Raspberry Pi to run in read-only mode, so that powering it down via a plug pull will not corrupt the SD card.

But how?

The script makes the Pi save temporary files to the RAM instead of the SD card. Of course, this means that no files or new software can be written to the card. However, if that’s not necessary for your Pi project, you might be happy to make the trade-off. Note that you can only use Adafruit’s script on Raspbian Lite.

Find more about the read-only Raspberry Pi solution, including the script and optional GPIO-halt utility, on the Adafruit Learn page. And be aware that making your Pi read-only is irreversible, so be sure to back up the contents of your SD card before you implement the script.

Halloween!

It’s October, and we’re now allowed to get excited about Halloween and all of the wonderful projects you plan on making for the big night.

Adafruit read-only raspberry pi

Adafruit’s animated snake eyes

We’ll be covering some of our favourite spooky build on social media throughout the month — make sure to share yours with us, either in the comments below or on Facebook, Twitter, Instagram, or G+.

The post Adafruit’s read-only Raspberry Pi appeared first on Raspberry Pi.

Algo-rhythmic PianoAI

Post Syndicated from Janina Ander original https://www.raspberrypi.org/blog/pianoai/

It’s no secret that we love music projects at Pi Towers. On the contrary, we often shout it from the rooftops like we’re in Moulin Rouge! But the PianoAI project by Zack left us slack-jawed: he built an AI on a Raspberry Pi that listens to his piano playing, and then produces improvised, real-time accompaniment.

Jamming with PIanoAI (clip #1) (Version 1.0)

Another example of a short teaching and then jamming with piano with a version I’m more happy with. I have to play for the Pi for a little while before the Pi has enough data to make its own music.

The PianoAI

Inspired by a story about jazz musician Dan Tepfer, Zack set out to create an AI able to imitate his piano-playing style in real time. He began programming the AI in Python, before starting over in the open-source programming language Go.

The Go language gopher mascot with headphones and a MIDI keyboard

The Go mascot is a gopher. Why not?

Zack has published an excellent write-up of how he built PianoAI. It’s a very readable account of the progress he made and the obstacles he had to overcome while writing PianoAI, and it includes more example videos. It’s hard to add anything to Zack’s own words, so I shan’t try.

Paper notes for PianoAI algorithm

Some of Zack’s notes for his AI

If you just want to try out PianoAI, head over to his GitHub. He provides a detailed guide that talks you through how to implement and use it.

Music to our ears

The Raspberry Pi community never fails to amaze us with their wonderful builds, not least when it comes to musical ones. Check out this cool-looking synth by Toby Hendricks, this geometric instrument by David Sharples, and this pyrite-disc-reading music player by Dmitry Morozov. Aren’t they all splendid? And the list goes on and on

Which instrument do you play? The recorder? The ocarina? The jaw harp? Could you create an AI like Zack’s for it? Let us know in the comments below, and share your builds with us via social media.

The post Algo-rhythmic PianoAI appeared first on Raspberry Pi.

Department of Homeland Security to Collect Social Media of Immigrants and Citizens

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/09/department_of_h_3.html

New rules give the DHS permission to collect “social media handles, aliases, associated identifiable information, and search results” as part of people’s immigration file. The Federal Register has the details, which seems to also include US citizens that communicate with immigrants.

This is part of the general trend to srcrutinize people coming into the US more, but it’s hard to get too worked up about the DHS accessing publicly available information. More disturbing is the trend of occasonally asking for social media passwords at the border.

Dialekt-o-maten vending machine

Post Syndicated from Janina Ander original https://www.raspberrypi.org/blog/dialekt-o-maten-vending-machine/

At some point, many of you will have become exasperated with your AI personal assistant for not understanding you due to your accent – or worse, your fantastic regional dialect! A vending machine from Coca-Cola Sweden turns this issue inside out: the Dialekt-o-maten rewards users with a free soft drink for speaking in a Swedish regional dialect.

The world’s first vending machine where you pay with a dialect!

Thirsty fans along with journalists were invited to try Dialekt-o-maten at Stureplan in central Stockholm. Depending on how well they could pronounce the different phrases in assorted Swedish dialects – they were rewarded an ice cold Coke with that destination on the label.

The Dialekt-o-maten

The machine, which uses a Raspberry Pi, was set up in Stureplan Square in Stockholm. A person presses one of six buttons to choose the regional dialect they want to try out. They then hit ‘record’, and speak into the microphone. The recording is compared to a library of dialect samples, and, if it matches closely enough, voila! — the Dialekt-o-maten dispenses a soft drink for free.

Dialekt-o-maten on the highstreet in Stockholm

Code for the Dialekt-o-maten

The team of developers used the dejavu Python library, as well as custom-written code which responded to new recordings. Carl-Anders Svedberg, one of the developers, said:

Testing the voices and fine-tuning the right level of difficulty for the users was quite tricky. And we really should have had more voice samples. Filtering out noise from the surroundings, like cars and music, was also a small hurdle.

While they wrote the initial software on macOS, the team transferred it to a Raspberry Pi so they could install the hardware inside the Dialekt-o-maten.

Regional dialects

Even though Sweden has only ten million inhabitants, there are more than 100 Swedish dialects. In some areas of Sweden, the local language even still resembles Old Norse. The Dialekt-o-maten recorded how well people spoke the six dialects it used. Apparently, the hardest one to imitate is spoken in Vadstena, and the easiest is spoken in Smögen.

Dialekt-o-maten on Stockholm highstreet

Speech recognition with the Pi

Because of its audio input capabilities, the Raspberry Pi is very useful for building devices that use speech recognition software. One of our favourite projects in this vein is of course Allen Pan’s Real-Life Wizard Duel. We also think this pronunciation training machine by Japanese makers HomeMadeGarbage is really neat. Ideas from these projects and the Dialekt-o-maten could potentially be combined to make a fully fledged language-learning tool!

How about you? Have you used a Raspberry Pi to help you become multilingual? If so, do share your project with us in the comments or via social media.

The post Dialekt-o-maten vending machine appeared first on Raspberry Pi.

FRED-209 Nerf gun tank

Post Syndicated from Janina Ander original https://www.raspberrypi.org/blog/nerf-gun-tank-fred-209/

David Pride, known to many of you as an active member of our maker community, has done it again! His FRED-209 build combines a Nerf gun, 3D printing, a Raspberry Pi Zero, and robotics to make one neat remotely controlled Nerf tank.

FRED-209 – 3D printed Raspberry Pi Nerf Tank

Uploaded by David Pride on 2017-09-17.

A Nerf gun for FRED-209

David says he worked on FRED-209 over the summer in order to have some fun with Nerf guns, which weren’t around when he was a kid. He purchased an Elite Stryfe model at a car boot sale, and took it apart to see what made it tick. Then he set about figuring out how to power it with motors and a servo.

Nerf Elite Stryfe components for the FRED-209 Nerf tank of David Pride

To control the motors, David used a ZeroBorg add-on board for the Pi Zero, and he set up a PlayStation 3 controller to pilot his tank. These components were also part of a robot that David entered into the Pi Wars competition, so he had already written code for them.

3D printing for FRED-209

During prototyping for his Nerf tank, which David named after ED-209 from RoboCop, he used lots of eBay loot and several 3D-printed parts. He used the free OpenSCAD software package to design the parts he wanted to print. If you’re a novice at 3D printing, you might find the printing advice he shares in the write-up on his blog very useful.

3D-printed lid of FRED-209 nerf gun tank by David Pride

David found the 3D printing of the 24cm-long lid of FRED-209 tricky

On eBay, David found some cool-looking chunky wheels, but these turned out to be too heavy for the motors. In the end, he decided to use a Rover 5 chassis, which changed the look of FRED-209 from ‘monster truck’ to ‘tank’.

FRED-209 Nerf tank by David Pride

Next step: teach it to use stairs

The final result looks awesome, and David’s video demonstrates that it shoots very accurately as well. A make like this might be a great defensive project for our new apocalypse-themed Pioneers challenge!

Taking FRED-209 further

David will be uploading code and STL files for FRED-209 soon, so keep an eye on his blog or Twitter for updates. He’s also bringing the Nerf tank to the Cotswold Raspberry Jam this weekend. If you’re attending the event, make sure you catch him and try FRED-209 out yourself.

Never one to rest on his laurels, David is already working on taking his build to the next level. He wants to include a web interface controller and a camera, and is working on implementing OpenCV to give the Nerf tank the ability to autonomously detect targets.

Pi Wars 2018

I have a feeling we might get to see an advanced version of David’s project at next year’s Pi Wars!

The 2018 Pi Wars have just been announced. They will take place on 21-22 April at the Cambridge Computer Laboratory, and you have until 3 October to apply to enter the competition. What are you waiting for? Get making! And as always, do share your robot builds with us via social media.

The post FRED-209 Nerf gun tank appeared first on Raspberry Pi.

Laser Cookies: a YouTube collaboration

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/laser-cookies/

Lasers! Cookies! Raspberry Pi! We’re buzzing with excitement about sharing our latest YouTube video with you, which comes directly from the kitchen of maker Estefannie Explains It All!

Laser-guarded cookies feat. Estefannie Explains It All

Uploaded by Raspberry Pi on 2017-09-18.

Estefannie Explains It All + Raspberry Pi

When Estefannie visited Pi Towers earlier this year, we introduced her to the Raspberry Pi Digital Curriculum and the free resources on our website. We’d already chatted to her via email about the idea of creating a collab video for the Raspberry Pi channel. Once she’d met members of the Raspberry Pi Foundation team and listened to them wax lyrical about the work we do here, she was even more keen to collaborate with us.

Estefannie on Twitter

Ahhhh!!! I still can’t believe I got to hang out and make stuff at the @Raspberry_Pi towers!! Thank you thank you!!

Estefannie returned to the US filled with inspiration for a video for our channel, and we’re so pleased with how awesome her final result is. The video is a super addition to our Raspberry Pi YouTube channel, it shows what our resources can help you achieve, and it’s great fun. You might also have noticed that the project fits in perfectly with this season’s Pioneers challenge. A win all around!

So yeah, we’re really chuffed about this video, and we hope you all like it too!

Estefannie’s Laser Cookies guide

For those of you wanting to try your hand at building your own Cookie Jar Laser Surveillance Security System, Estefannie has provided a complete guide to talk you through it. Here she goes:

First off, you’ll need:

  • 10 lasers
  • 10 photoresistors
  • 10 capacitors
  • 1 Raspberry Pi Zero W
  • 1 buzzer
  • 1 Raspberry Pi Camera Module
  • 12 ft PVC pipes + 4 corners
  • 1 acrylic panel
  • 1 battery pack
  • 8 zip ties
  • tons of cookies

I used the Raspberry Pi Foundation’s Laser trip wire and the Tweeting Babbage resources to get one laser working and to set up the camera and Twitter API. This took me less than an hour, and it was easy, breezy, beautiful, Raspberry Pi.


I soldered ten lasers in parallel and connected ten photoresistors to their own GPIO pins. I didn’t wire them up in series because of sensitivity reasons and to make debugging easier.

Building the frame took a few tries: I actually started with a wood frame, then tried a clear case, and finally realized the best and cleaner solution would be pipes. All the wires go inside the pipes and come out in a small window on the top to wire up to the Zero W.



Using pipes also made the build cheaper, since they were about $3 for 12 ft. Wiring inside the pipes was tricky, and to finish the circuit, I soldered some of the wires after they were already in the pipes.

I tried glueing the lasers to the frame, but the lasers melted the glue and became decalibrated. Next I tried tape, and then I found picture mounting putty. The putty worked perfectly — it was easy to mold a putty base for the lasers and to calibrate and re-calibrate them if needed. Moreover, the lasers stayed in place no matter how hot they got.

Estefannie Explains It All Raspberry Pi Cookie Jar

Although the lasers were not very strong, I still strained my eyes after long hours of calibrating — hence the sunglasses! Working indoors with lasers, sunglasses, and code was weird. But now I can say I’ve done that…in my kitchen.

Using all the knowledge I have shared, this project should take a couple of hours. The code you need lives on my GitHub!

Estefannie Explains It All Raspberry Pi Cookie Jar

“The cookie recipe is my grandma’s, and I am not allowed to share it.”

Estefannie on YouTube

Estefannie made this video for us as a gift, and we’re so grateful for the time and effort she put into it! If you enjoyed it and would like to also show your gratitude, subscribe to her channel on YouTube and follow her on Instagram and Twitter. And if you make something similar, or build anything with our free resources, make sure to share it with us in the comments below or via our social media channels.

The post Laser Cookies: a YouTube collaboration appeared first on Raspberry Pi.

A Million ‘Pirate’ Boxes Sold in the UK During The Last Two Years

Post Syndicated from Andy original https://torrentfreak.com/a-million-pirate-boxes-sold-in-the-uk-during-the-last-two-years-170919/

With the devices hitting the headlines on an almost weekly basis, it probably comes as no surprise that ‘pirate’ set-top boxes are quickly becoming public enemy number one with video rightsholders.

Typically loaded with the legal Kodi software but augmented with third-party addons, these often Android-based pieces of hardware drag piracy out of the realm of the computer savvy and into the living rooms of millions.

One of the countries reportedly most affected by this boom is the UK. The consumption of these devices among the general public is said to have reached epidemic proportions, and anecdotal evidence suggests that terms like Kodi and Showbox are now household terms.

Today we have another report to digest, this time from the Federation Against Copyright Theft, or FACT as they’re often known. Titled ‘Cracking Down on Digital Piracy,’ the report provides a general overview of the piracy scene, tackling well-worn topics such as how release groups and site operators work, among others.

The report is produced by FACT after consultation with the Police Intellectual Property Crime Unit, Intellectual Property Office, Police Scotland, and anti-piracy outfit Entura International. It begins by noting that the vast majority of the British public aren’t involved in the consumption of infringing content.

“The most recent stats show that 75% of Brits who look at content online abide by the law and don’t download or stream it illegally – up from 70% in 2013. However, that still leaves 25% who do access material illegally,” the report reads.

The report quickly heads to the topic of ‘pirate’ set-top boxes which is unsurprising, not least due to FACT’s current focus as a business entity.

While it often positions itself alongside government bodies (which no doubt boosts its status with the general public), FACT is a private limited company serving The Premier League, another company desperate to stamp out the use of infringing devices.

Nevertheless, it’s difficult to argue with some of the figures cited in the report.

“At a conservative estimate, we believe a million set-top boxes with software added
to them to facilitate illegal downloads have been sold in the UK in the last couple
of years,” the Intellectual Property Office reveals.

Interestingly, given a growing tech-savvy public, FACT’s report notes that ready-configured boxes are increasingly coming into the country.

“Historically, individuals and organized gangs have added illegal apps and add-ons onto the boxes once they have been imported, to allow illegal access to premium channels. However more recently, more boxes are coming into the UK complete with illegal access to copyrighted content via apps and add-ons already installed,” FACT notes.

“Boxes are often stored in ‘fulfillment houses’ along with other illegal electrical items and sold on social media. The boxes are either sold as one-off purchases, or with a monthly subscription to access paid-for channels.”

While FACT press releases regularly blur the lines when people are prosecuted for supplying set-top boxes in general, it’s important to note that there are essentially two kinds of products on offer to the public.

The first relies on Kodi-type devices which provide on-going free access to infringing content. The second involves premium IPTV subscriptions which are a whole different level of criminality. Separating the two when reading news reports can be extremely difficult, but it’s a hugely important to recognize the difference when assessing the kinds of sentences set-top box suppliers are receiving in the UK.

Nevertheless, FACT correctly highlights that the supply of both kinds of product are on the increase, with various parties recognizing the commercial opportunities.

“A significant number of home-grown British criminals are now involved in this type of crime. Some of them import the boxes wholesale through entirely legal channels, and modify them with illegal software at home. Others work with sophisticated criminal networks across Europe to bring the boxes into the UK.

“They then sell these boxes online, for example through eBay or Facebook, sometimes managing to sell hundreds or thousands of boxes before being caught,” the company adds.

The report notes that in some cases the sale of infringing set-top boxes occurs through cottage industry, with suppliers often working on their own or with small groups of friends and family. Invetiably, perhaps, larger scale operations are reported to be part of networks with connections to other kinds of crime, such as dealing in drugs.

“In contrast to drugs, streaming devices provide a relatively steady and predictable revenue stream for these criminals – while still being lucrative, often generating hundreds of thousands of pounds a year, they are seen as a lower risk activity with less likelihood of leading to arrest or imprisonment,” FACT reports.

While there’s certainly the potential to earn large sums from ‘pirate’ boxes and premium IPTV services, operating on the “hundreds of thousands of pounds a year” scale in the UK would attract a lot of unwanted attention. That’s not saying that it isn’t already, however.

Noting that digital piracy has evolved hugely over the past three or four years, the report says that the cases investigated so far are just the “tip of the iceberg” and that many other cases are in the early stages and will only become known to the public in the months and years ahead.

Indeed, the Intellectual Property Office hints that some kind of large-scale enforcement action may be on the horizon.

“We have identified a significant criminal business model which we have discussed and shared with key law enforcement partners. I can’t go into detail on this, but as investigations take their course, you will see the scale,” an IPO spokesperson reveals.

While details are necessarily scarce, a source familiar with this area told TF that he would be very surprised if the targets aren’t the growing handful of commercial UK-based IPTV re-sellers who offer full subscription TV services for a few pounds per month.

“They’re brazen. Watch this space,” he said.

FACT’s full report, Cracking Down on Digital Piracy, can be downloaded here (pdf)

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Pioneers: only you can save us

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/pioneers-third-challenge/

Pioneers, we just received this message through our network — have you seen it?

Can you see me? Only YOU can save us!

Uploaded by Raspberry Pi on 2017-09-14.

Only you can save us

We have no choice – we must help her! If things are as bad as she says they are, our only hope of survival is to work together.

We know you have the skills and imagination required to make something. We’ve seen that in previous Pioneers challenges. That’s why we’re coming directly to you with this: we know you won’t let her down.

What you need to do

We’ve watched back through the recording and pulled out as much information as we can:

  • To save us, you have ten weeks to create something using tech. This means you need to be done on 1 December, or it will be too late!
  • The build you will create needs to help her in the treacherous situation she’s in. What you decide to make is completely up to you.
  • Her call is for those of you aged between 11 and 16 who are based in the UK or Republic of Ireland. You need to work in groups of up to five, and you need to find someone aged 18 or over to act as a mentor and support your project.
  • Any tech will do. We work for the Raspberry Pi Foundation, but this doesn’t mean you need to use a Raspberry Pi. Use anything at all — from microcontrollers to repurposed devices such as laptops and cameras.

To keep in contact with you, it looks like she’s created a form for you to fill in and share your team name and details with her. In return she will trade some items with you — things that will help inspire you in your mission. We’ve managed to find the link to the form: you can fill it in here.

Only you can save us - Raspberry Pi Pioneers

In order to help her (and any others who might still be out there!) to recreate your project, you need to make sure you record your working process. Take photos and footage to document how you build your make, and put together a video to send to her when you’re done making.

If you manage to access social media, you could also share your progress as you go along! Make sure to use #MakeYourIdeas, so that other survivors can see your work.

We’ve assembled some more information on the Pioneers website to create a port of call for you. Check it out, and let us know if you have any questions. We will do whatever we can to help you protect the world.

Good luck, everybody! It’s up to you now.

Only you can save us.

The post Pioneers: only you can save us appeared first on Raspberry Pi.

Turtle, the earthbound crowdfunded rover

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/turtle-rover/

With ten days to go until the end of their crowdfunding campaign, the team behind the Turtle Rover are waiting eagerly for their project to become a reality for earthbound explorers across the globe.

Turtle Rover

Turtle is the product of the Mars Rover prototype engineers at Wroclaw University of Technology, Poland. Their waterproof land rover can be controlled via your tablet or smartphone, and allows you to explore hidden worlds too small or dangerous for humans. The team says this about their project:

NASA and ESA plan to send another rover to Mars in 2020. SpaceX wants to send one million people to Mars in the next 100 years. However, before anyone sends a rover to another planet, we designed Turtle — a robot to remind you about how beautiful the Earth is.

With a Raspberry Pi at its core, Turtle is an open-source, modular device to which you can attach new, interesting features such as extra cameras, lights, and a DSLR adapter. Depending on the level at which you back the Kickstarter, you might also receive a robotic arm as a reward for your support.

Turtle Rover Kickstarter Raspberry Pi

The Turtle can capture photos and video, and even live-stream video to your device. Moreover, its emergency stop button offers peace of mind whenever your explorations takes your Turtle to cliff edges or other unsafe locations.

Constructed of aerospace-grade aluminium, plastics, and stainless steel, its robust form, watertight and dust-proof body, and 4-hour battery life make the Turtle a great tool for education and development, as well as a wonderful addition to recreational activities such as Airsoft.

Back the Turtle

If you want to join in the Turtle Rover revolution, you have ten days left to back the team on Kickstarter. Pledge €1497 for an unassembled kit (you’ll need your own Raspberry Pi, battery, and servos), or €1549 for a complete rover. The team plan to send your Turtle to you by June 2018 — so get ready to explore!

Turtle Rover Kickstarter Raspberry Pi

For more information on the build, including all crowdfunding rewards, check out their Kickstarter page. And if you’d like to follow their journey, be sure to follow them on Twitter.

Your Projects

Are you running a Raspberry Pi-based crowdfunding campaign? Or maybe you’ve got your idea, and you’re soon going to unleash it on the world? Whatever your plans, we’d love to see what you’re up to, so make sure to let us know via our social media channels or an email to [email protected]

 

The post Turtle, the earthbound crowdfunded rover appeared first on Raspberry Pi.

The Things Pirates Do To Hinder Anti-Piracy Investigations

Post Syndicated from Andy original https://torrentfreak.com/the-things-pirates-do-to-hinder-anti-piracy-outfits-170909/

Dedicated Internet pirates dealing in fresh content or operating at any significant scale can be pretty sure that rightsholders and their anti-piracy colleagues are interested in their activities at some level.

With this in mind, most pirates these days are aware of things they can do to enhance their security, with products like VPNs often get discussed on the consumer side.

This week, in a report detailing the challenges social media poses to intellectual property rights, UK anti-piracy outfit Federation Against Copyright Theft published a list of techniques deployed by pirates that hinder their investigations.

Fake/hidden website registration details

“Website registration details are often fake or hidden, which provides no further links to the person controlling the domain and its illegal activities,” the group reveals.

Protected WHOIS records are nothing new and can sometimes be uncloaked by a determined adversary via court procedures. However, in the early stages of an investigation, open records provide leads that can be extremely useful in building an early picture about who might be involved in the operation of a website.

Having them hidden is a definite plus for pirate site operators, especially when the underlying details are also fake, which is particularly common practice. And, with companies like Peter Sunde’s Njalla entering the market, hiding registrations is easier than ever.

Overseas servers

“Investigating servers located offshore cause some specific problems for FACT’s law-enforcement partners. In order to complete a full investigation into an offshore server, a law-enforcement agency must liaise with its counterpart in the country where the server is located. The difficulties of obtaining evidence from other countries are well known,” FACT notes.

While FACT no doubt corresponds with entities overseas, the anti-piracy outfit has a history of targeting UK citizens who are reportedly infringing copyright. It regularly involves UK police in its investigations (FACT itself employs former police officers) but jurisdiction is necessarily limited to the UK.

It is possible to get overseas law enforcement entities involved to seize a server, for example, but they have to be convinced of the need to do so by the police, which isn’t easy and is usually reserved for more serious cases. The bottom line is that by placing a server a long way away from a pirate’s home territory, things can be made much more difficult for local investigators.

Torrent websites and DMCA compliance

“Some torrent website operators who maintain a high DMCA compliance rate will often use this to try to appease the law, while continuing to provide infringing links,” FACT says.

This is an interesting one. Under law in both the United States and Europe, service providers are required to remove infringing content from their systems when they are notified of its existence by a rightsholder or its agent. Not doing so can render them liable, if the content is indeed infringing.

What FACT appears to be saying is that sites that comply with the law, by removing infringing content when asked to, become more difficult targets for legal action. It sounds very obvious but the underlying suggestion is that compliance on the surface is used as a protective mechanism. No example sites are mentioned but the strategy has clearly hindered FACT.

Current legislation too vague to remove infringing live sports streams

“Current legislation is insufficient to effectively tackle the issue of websites illegally offering coverage of live sports events. Section 512 (c) of the Digital Millennium Copyright Act (DMCA) states that: upon notification of claimed infringement, the service provider should ‘respond expeditiously’ to remove or disable access to the copyright-infringing material. Most live sports events are under two hours long, so such non-specific timeframes for required action are inadequate,” FACT complains.

Since government reports like these can take a long time to prepare, it appears that FACT and its partners may have already found a solution to this particular problem. Major FACT client the Premier League now has a High Court injunction in place which allows it to block infringing streams on a real-time basis. It doesn’t remove the content at its source, but it still renders it largely inaccessible in the UK.

Nevertheless, FACT calls for takedowns to be actioned more swiftly, noting that “the law needs to reflect this narrow timeframe with a specified required response period for websites offering such live feeds.”

Camming content directly from cinema screen to the cloud

“Recent advancements in technology have made this a viable option to ‘cammers’ to avoid detection. Attempts to curtail and delete illicitly recorded film footage may become increasingly difficult with the emergence of streaming apps that automatically upload recorded video to cloud services,” FACT reports.

Over the years, FACT has been involved in numerous operations to hinder those who record movies with cameras in theaters and then upload them to the Internet. Once the perpetrator has exited the theater, FACT has effectively lost the battle, but the possibility that a live upload can now take place is certainly an interesting proposition.

“While enforcing officers may delete the footage held on the device, the footage has potentially already been stored remotely on a cloud system,” FACT warns.

Equally, this could also prove a problem for those seeking to secure evidence. With a cloud upload, the person doing the recording could safely delete the footage from the local device. That could be an obstacle to proving that an offense had even been committed when a suspect is confronted in situ.

Virtual currencies

“There is great potential in virtual currencies for money launderers and illicit traders. Government and law enforcement have raised concerns on how virtual currencies can be sent anonymously, leaving little or no trail for regulators or law-enforcement agencies,” FACT writes.

For many years, pirates of all kinds have relied on systems like PayPal, Mastercard, and Visa, to shift money around. However, these payment systems are now more difficult to deploy on pirate services and are more easily traced, even when operators manage to squeeze them through the gaps.

The same cannot be said of bitcoin and similar currencies that are gaining in popularity all the time. They are harder to use, of course, but there’s little doubt accessibility issues will be innovated out of the equation at some point. Once that happens, these currencies will be a force to be reckoned with.

The UK government’s Share and Share Alike report, which examines the challenges social media poses to intellectual property rights, can be downloaded here (pdf)

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Chinese Man Jailed For Nine Months For Selling VPN Software

Post Syndicated from Andy original https://torrentfreak.com/chinese-man-jailed-for-nine-months-for-selling-vpn-software-170904/

Back in January, China’s Ministry of Industry and Information Technology announced that due to Internet technologies and services expanding in a “disorderly” fashion, regulation would be needed to restore order.

The government said that it would take measures to “strengthen network information security management” and would embark on a “nationwide Internet network access services clean-up.”

One of the initial targets was reported as censorship-busting VPNs, which allow citizens to evade the so-called Great Firewall of China. Operating such a service without a corresponding telecommunications business license would constitute an offense, the government said.

The news was met with hostility, with media and citizens alike bemoaning Chinese censorship. Then early July, a further report suggested that the government would go a step further by ordering ISPs to block VPNs altogether. This elicited an immediate response from local authorities, who quickly denied the reports, blaming “foreign media” for false reporting.

But it was clear something was amiss in China. Later that month, it was revealed that Apple had banned VPN software and services from its app store.

“We are writing to notify you that your application will be removed from the China App Store because it includes content that is illegal in China, which is not in compliance with the App Store Review Guidelines,” Apple informed developers.

With an effort clearly underway to target VPNs, news today from China suggests that the government is indeed determined to tackle the anti-censorship threat presented by such tools. According to local media, Chinese man Deng Mouwei who ran a small website through which he sold VPN software, has been sentenced to prison.

The 26-year-old, from the city of Dongguan in the Guangdong province, was first arrested in October 2016 after setting up a website to sell VPNs. Just two products were on offer but this was enough to spring authorities into action.

A prosecution notice, published by Chinese publication Whatsonweibo, reveals the university educated man was arrested “on suspicion of providing tools for illegal control of a computer information system.”

It’s alleged that the man used several phrases to market the VPNs including “VPN over the wall” and “Shadow shuttle cloud”. The business wasn’t particularly profitable though, generating just 13957 yuan ($2,133) since October 2015.

“The court held that the defendant Deng Mouwei disregarded state law, by providing tools specifically for the invasion and illegal control of computer information systems procedures,” the Guandong Province’s First People’s Court said in its ruling, handed down earlier this year but only just made public.

“The circumstances are serious and the behavior violated the ‘Criminal Law of the People’s Republic of China Article 285.”

Article 285 – don’t interfere with the state

“The facts of the crime are clear, the evidence is true and sufficient. In accordance with the provisions of Article 172 of the Criminal Procedure Law of the People’s Republic of China, the defendant shall be sentenced according to law.”

Under Chinese law, Article 172 references stolen goods, noting that people who “conceal or act as distributors” shall be sentenced to not more than three years of fixed-term imprisonment, or fined, depending on circumstances. Where VPNs fit into that isn’t clear, but things didn’t end well for the defendant.

For offering tools that enable people to “visit foreign websites that can not be accessed via a domestic (mainland) IP address,” Deng Mouwei received a nine-month prison sentence.

News of the sentencing appeared on Chinese social media over the weekend, prompting fear and confusion among local users. While many struggled to see the sense of the prosecution, some expressed fear that people who even use VPN software to evade China’s Great Firewall could be subjected to prosecution in the future.

Whatever the outcome, it’s now abundantly clear that China is the midst of a VPN crackdown across the board and is serious about stamping out efforts to bypass its censorship. With the Internet’s ability to treat censorship as damage and route round it, it’s a battle that won’t be easily won.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Director of Kim Dotcom Documentary Speaks Out on Piracy

Post Syndicated from Ernesto original https://torrentfreak.com/director-of-kim-dotcom-documentary-speaks-out-on-piracy-170902/

When you make a documentary about Kim Dotcom, someone who’s caught up in one of the largest criminal copyright infringement cases in history, the piracy issue is unavoidable.

And indeed, the topic is discussed in depth in “Kim Dotcom: Caught in the Web,” which enjoyed its digital release early last week.

As happens with most digital releases, a pirated copy soon followed. While no filmmaker would actively encourage people not to pay for their work, director Annie Goldson wasn’t surprised at all when she saw the first unauthorized copies appear online.

The documentary highlights that piracy is in part triggered by lacking availability, so it was a little ironic that the film itself wasn’t released worldwide on all services. However, Goldson had no direct influence on the distribution process.

“It was inevitable really. We have tried to adopt a distribution model that we hope will encourage viewers to buy legal copies making it available as widely as possible,” Goldson informs TorrentFreak.

“We had sold the rights, so didn’t have complete control over reach or pricing which I think are two critical variables that do impact on the degree of piracy. Although I think our sales agent did make good strides towards a worldwide release.”

Now that millions of pirates have access to her work for free, it will be interesting to see how this impacts sales. For now, however, there’s still plenty of legitimate interest, with the film now appearing in the iTunes top ten of independent films.

In any case, Goldson doesn’t subscribe to the ‘one instance of piracy is a lost sale’ theory and notes that views about piracy are sharply polarized.

“Some claim financial devastation while others argue that infringement leads to ‘buzz,’ that this can generate further sales – so we shall see. At one level, watching this unfold is quite an interesting research exercise into distribution, which ironically is one of the big themes of the film of course,” Goldson notes.

Piracy overall doesn’t help the industry forward though, she says, as it hurts the development of better distribution models.

“I’m opposed to copyright infringement and piracy as it muddies the waters when it comes to devising a better model for distribution, one that would nurture and support artists and creatives, those that do the hard yards.”

Kim Dotcom: Caught in the Web trailer

The director has no issues with copyright enforcement either. Not just to safeguard financial incentives, but also because the author does have moral and ethical rights about how their works are distributed. That said, instead of pouring money into enforcement, it might be better spent on finding a better business model.

“I’m with Wikipedia founder Jimmy Wales who says [in the documentary] that the problem is primarily with the existing business model. If you make films genuinely available at prices people can afford, at the same time throughout the world, piracy would drop to low levels.

“I think most people would prefer to access their choice of entertainment legally rather than delving into dark corners of the Internet. I might be wrong of course,” Goldson adds.

In any case, ‘simply’ enforcing piracy into oblivion seems to be an unworkable prospect – not without massive censorship, or the shutdown of the entire Internet.

“I feel the risk is that anti-piracy efforts will step up and erode important freedoms. Or we have to close down the Internet altogether. After all, the unwieldy beast is a giant copying machine – making copies is what it does well,” Goldson says.

The problems is that the industry is keeping piracy intact through its own business model. When people can’t get what they want, when, and where they want it, they often turn to pirate sites.

“One problem is that the industry has been slow to change and hence we now have generations of viewers who have had to regularly infringe to be part of a global conversation.

“I do feel if the industry is promoting and advertising works internationally, using globalized communication and social media, then denying viewers from easily accessing works, either through geo-blocking or price points, obviously, digitally-savvy viewers will find them regardless,” Goldson adds.

And yes, this ironically also applies to her own documentary.

The solution is to continue to improve the legal options. This is easier said than done, as Goldson and her team tried hard, so it won’t happen overnight. However, universal access for a decent price would seem to be the future.

Unless the movie industry prefers to shut down the Internet entirely, of course.

For those who haven’t seen “Kim Dotcom: Caught in the Web yet,” the film is available globally on Vimeo OnDemand, and in a lot of territories on iTunes, the PlayStation Store, Amazon, Google Play, and the Microsoft/Xbox Store. In the US there is also Vudu, Fandango Now & Verizon.

If that doesn’t work, then…

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

The Pronunciation Training Machine

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/pronunciation-training-machine/

Using a Raspberry Pi, an Arduino, an Adafruit NeoPixel Ring and a servomotor, Japanese makers HomeMadeGarbage produced this Pronunciation Training Machine to help their parents distinguish ‘L’s and ‘R’s when speaking English.

L R 発音矯正ギブス お母ちゃん編 Pronunciation training machine #right #light #raspberrypi #arduino #neopixel

23 Likes, 1 Comments – Home Made Garbage (@homemadegarbage) on Instagram: “L R 発音矯正ギブス お母ちゃん編 Pronunciation training machine #right #light #raspberrypi #arduino #neopixel”

How does an Pronunciation Training Machine work?

As you can see in the video above, the machine utilises the Google Cloud Speech API to recognise their parents’ pronunciation of the words ‘right’ and ‘light’. Correctly pronounce the former, and the servo-mounted arrow points to the right. Pronounce the later and the NeoPixel Ring illuminates because, well, you just said “light”.

An image showing how the project works - English Pronunciation TrainingYou can find the full code for the project on its hackster page here.

Variations on the idea

It’s a super-cute project with great potential, and the concept could easily be amended for other training purposes. How about using motion sensors to help someone learn their left from their right?

A photo of hands with left and right written on them - English Pronunciation Training

Wait…your left or my left?
image c/o tattly

Or use random.choice to switch on LEDs over certain images, and speech recognition to reward a correct answer? Light up a picture of a cat, for example, and when the player says “cat”, they receive a ‘purr’ or a treat?

A photo of a kitten - English Pronunciation Training

Obligatory kitten picture
image c/o somewhere on the internet!

Raspberry Pi-based educational aids do not have to be elaborate builds. They can use components as simple as a servo and an LED, and still have the potential to make great improvements in people’s day-to-day lives.

Your own projects

If you’ve created an educational tool using a Raspberry Pi, we’d love to see it. The Raspberry Pi itself is an educational tool, so you’re helping it to fulfil its destiny! Make sure you share your projects with us on social media, or pop a link in the comments below. We’d also love to see people using the Pronunciation Training Machine (or similar projects), so make sure you share those too!

A massive shout out to Artie at hackster.io for this heads-up, and for all the other Raspberry Pi projects he sends my way. What a star!

The post The Pronunciation Training Machine appeared first on Raspberry Pi.

Affordable Raspberry Pi 3D Body Scanner

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/affordable-raspberry-pi-3d-body-scanner/

With a £1000 grant from Santander, Poppy Mosbacher set out to build a full-body 3D body scanner with the intention of creating an affordable setup for makespaces and similar community groups.

First Scan from DIY Raspberry Pi Scanner

Head and Shoulders Scan with 29 Raspberry Pi Cameras

Uses for full-body 3D scanning

Poppy herself wanted to use the scanner in her work as a fashion designer. With the help of 3D scans of her models, she would be able to create custom cardboard dressmakers dummy to ensure her designs fit perfectly. This is a brilliant way of incorporating digital tech into another industry – and it’s not the only application for this sort of build. Growing numbers of businesses use 3D body scanning, for example the stores around the world where customers can 3D scan and print themselves as action-figure-sized replicas.

Print your own family right on the high street!
image c/o Tom’s Guide and Shapify

We’ve also seen the same technology used in video games for more immersive virtual reality. Moreover, there are various uses for it in healthcare and fitness, such as monitoring the effect of exercise regimes or physiotherapy on body shape or posture.

Within a makespace environment, a 3D body scanner opens the door to including new groups of people in community make projects: imagine 3D printing miniatures of a theatrical cast to allow more realistic blocking of stage productions and better set design, or annually sending grandparents a print of their grandchild so they can compare the child’s year-on-year growth in a hands-on way.

Raspberry Pi 3d Body Scan

The Germany-based clothing business Outfittery uses full body scanners to take the stress out of finding clothes that fits well.
image c/o Outfittery

As cheesy as it sounds, the only limit for the use of 3D scanning is your imagination…and maybe storage space for miniature prints.

Poppy’s Raspberry Pi 3D Body Scanner

For her build, Poppy acquired 27 Raspberry Pi Zeros and 27 Raspberry Pi Camera Modules. With various other components, some 3D-printed or made of cardboard, Poppy got to work. She was helped by members of Build Brighton and by her friend Arthur Guy, who also wrote the code for the scanner.

Raspberry Pi 3D Body Scanner

The Pi Zeros run Raspbian Lite, and are connected to a main server running a node application. Each is fitted into its own laser-cut cardboard case, and secured to a structure of cardboard tubing and 3D-printed connectors.

Raspberry Pi 3D Body Scanner

In the finished build, the person to be scanned stands within the centre of the structure, and the press of a button sends the signal for all Pis to take a photo. The images are sent back to the server, and processed through Autocade ReMake, a freemium software available for the PC (Poppy discovered part-way through the project that the Mac version has recently lost support).

Build your own

Obviously there’s a lot more to the process of building this full-body 3D scanner than what I’ve reported in these few paragraphs. And since it was Poppy’s goal to make a readily available and affordable scanner that anyone can recreate, she’s provided all the instructions and code for it on her Instructables page.

Projects like this, in which people use the Raspberry Pi to create affordable and interesting tech for communities, are exactly the type of thing we love to see. Always make sure to share your Pi-based projects with us on social media, so we can boost their visibility!

If you’re a member of a makespace, run a workshop in a school or club, or simply love to tinker and create, this build could be the perfect addition to your workshop. And if you recreate Poppy’s scanner, or build something similar, we’d love to see the results in the comments below.

The post Affordable Raspberry Pi 3D Body Scanner appeared first on Raspberry Pi.