Tag Archives: social

Netflix Expands Content Protection Team to Reduce Piracy

Post Syndicated from Ernesto original https://torrentfreak.com/netflix-expands-content-protection-team-to-reduce-piracy-171015/

There is little doubt that, in the United States and many other countries, Netflix has become the standard for watching movies on the Internet.

Despite the widespread availability, however, Netflix originals are widely pirated. Episodes from House of Cards, Narcos, and Orange is the New Black are downloaded and streamed millions of times through unauthorized platforms.

The streaming giant is obviously not happy with this situation and has ramped up its anti-piracy efforts in recent years. Since last year the company has sent out over a million takedown requests to Google alone and this volume continues to expand.

This growth coincides with an expansion of the company’s internal anti-piracy division. A new job posting shows that Netflix is expanding this team with a Copyright and Content Protection Coordinator. The ultimate goal is to reduce piracy to a fringe activity.

“The growing Global Copyright & Content Protection Group is looking to expand its team with the addition of a coordinator,” the job listing reads.

“He or she will be tasked with supporting the Netflix Global Copyright & Content Protection Group in its internal tactical take down efforts with the goal of reducing online piracy to a socially unacceptable fringe activity.”

Among other things, the new coordinator will evaluate new technological solutions to tackle piracy online.

More old-fashioned takedown efforts are also part of the job. This includes monitoring well-known content platforms, search engines and social network sites for pirated content.

“Day to day scanning of Facebook, YouTube, Twitter, Periscope, Google Search, Bing Search, VK, DailyMotion and all other platforms (including live platforms) used for piracy,” is listed as one of the main responsibilities.

Netflix’ Copyright and Content Protection Coordinator Job

The coordinator is further tasked with managing Facebook’s Rights Manager and YouTube’s Content-ID system, to prevent circumvention of these piracy filters. Experience with fingerprinting technologies and other anti-piracy tools will be helpful in this regard.

Netflix doesn’t do all the copyright enforcement on its own though. The company works together with other media giants in the recently launched “Alliance for Creativity and Entertainment” that is spearheaded by the MPAA.

In addition, the company also uses the takedown services of external anti-piracy outfits to target more traditional infringement sources, such as cyberlockers and piracy streaming sites. The coordinator has to keep an eye on these as well.

“Liaise with our vendors on manual takedown requests on linking sites and hosting sites and gathering data on pirate streaming sites, cyberlockers and usenet platforms.”

The above shows that Netflix is doing its best to prevent piracy from getting out of hand. It’s definitely taking the issue more seriously than a few years ago when the company didn’t have much original content.

The switch from being merely a distribution platform to becoming a major content producer and copyright holder has changed the stakes. Netflix hasn’t won the war on piracy, it’s just getting started.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Predict Billboard Top 10 Hits Using RStudio, H2O and Amazon Athena

Post Syndicated from Gopal Wunnava original https://aws.amazon.com/blogs/big-data/predict-billboard-top-10-hits-using-rstudio-h2o-and-amazon-athena/

Success in the popular music industry is typically measured in terms of the number of Top 10 hits artists have to their credit. The music industry is a highly competitive multi-billion dollar business, and record labels incur various costs in exchange for a percentage of the profits from sales and concert tickets.

Predicting the success of an artist’s release in the popular music industry can be difficult. One release may be extremely popular, resulting in widespread play on TV, radio and social media, while another single may turn out quite unpopular, and therefore unprofitable. Record labels need to be selective in their decision making, and predictive analytics can help them with decision making around the type of songs and artists they need to promote.

In this walkthrough, you leverage H2O.ai, Amazon Athena, and RStudio to make predictions on whether a song might make it to the Top 10 Billboard charts. You explore the GLM, GBM, and deep learning modeling techniques using H2O’s rapid, distributed and easy-to-use open source parallel processing engine. RStudio is a popular IDE, licensed either commercially or under AGPLv3, for working with R. This is ideal if you don’t want to connect to a server via SSH and use code editors such as vi to do analytics. RStudio is available in a desktop version, or a server version that allows you to access R via a web browser. RStudio’s Notebooks feature is used to demonstrate the execution of code and output. In addition, this post showcases how you can leverage Athena for query and interactive analysis during the modeling phase. A working knowledge of statistics and machine learning would be helpful to interpret the analysis being performed in this post.

Walkthrough

Your goal is to predict whether a song will make it to the Top 10 Billboard charts. For this purpose, you will be using multiple modeling techniques―namely GLM, GBM and deep learning―and choose the model that is the best fit.

This solution involves the following steps:

  • Install and configure RStudio with Athena
  • Log in to RStudio
  • Install R packages
  • Connect to Athena
  • Create a dataset
  • Create models

Install and configure RStudio with Athena

Use the following AWS CloudFormation stack to install, configure, and connect RStudio on an Amazon EC2 instance with Athena.

Launching this stack creates all required resources and prerequisites:

  • Amazon EC2 instance with Amazon Linux (minimum size of t2.large is recommended)
  • Provisioning of the EC2 instance in an existing VPC and public subnet
  • Installation of Java 8
  • Assignment of an IAM role to the EC2 instance with the required permissions for accessing Athena and Amazon S3
  • Security group allowing access to the RStudio and SSH ports from the internet (I recommend restricting access to these ports)
  • S3 staging bucket required for Athena (referenced within RStudio as ATHENABUCKET)
  • RStudio username and password
  • Setup logs in Amazon CloudWatch Logs (if needed for additional troubleshooting)
  • Amazon EC2 Systems Manager agent, which makes it easy to manage and patch

All AWS resources are created in the US-East-1 Region. To avoid cross-region data transfer fees, launch the CloudFormation stack in the same region. To check the availability of Athena in other regions, see Region Table.

Log in to RStudio

The instance security group has been automatically configured to allow incoming connections on the RStudio port 8787 from any source internet address. You can edit the security group to restrict source IP access. If you have trouble connecting, ensure that port 8787 isn’t blocked by subnet network ACLS or by your outgoing proxy/firewall.

  1. In the CloudFormation stack, choose Outputs, Value, and then open the RStudio URL. You might need to wait for a few minutes until the instance has been launched.
  2. Log in to RStudio with the and password you provided during setup.

Install R packages

Next, install the required R packages from the RStudio console. You can download the R notebook file containing just the code.

#install pacman – a handy package manager for managing installs
if("pacman" %in% rownames(installed.packages()) == FALSE)
{install.packages("pacman")}  
library(pacman)
p_load(h2o,rJava,RJDBC,awsjavasdk)
h2o.init(nthreads = -1)
##  Connection successful!
## 
## R is connected to the H2O cluster: 
##     H2O cluster uptime:         2 hours 42 minutes 
##     H2O cluster version:        3.10.4.6 
##     H2O cluster version age:    4 months and 4 days !!! 
##     H2O cluster name:           H2O_started_from_R_rstudio_hjx881 
##     H2O cluster total nodes:    1 
##     H2O cluster total memory:   3.30 GB 
##     H2O cluster total cores:    4 
##     H2O cluster allowed cores:  4 
##     H2O cluster healthy:        TRUE 
##     H2O Connection ip:          localhost 
##     H2O Connection port:        54321 
##     H2O Connection proxy:       NA 
##     H2O Internal Security:      FALSE 
##     R Version:                  R version 3.3.3 (2017-03-06)
## Warning in h2o.clusterInfo(): 
## Your H2O cluster version is too old (4 months and 4 days)!
## Please download and install the latest version from http://h2o.ai/download/
#install aws sdk if not present (pre-requisite for using Athena with an IAM role)
if (!aws_sdk_present()) {
  install_aws_sdk()
}

load_sdk()
## NULL

Connect to Athena

Next, establish a connection to Athena from RStudio, using an IAM role associated with your EC2 instance. Use ATHENABUCKET to specify the S3 staging directory.

URL <- 'https://s3.amazonaws.com/athena-downloads/drivers/AthenaJDBC41-1.0.1.jar'
fil <- basename(URL)
#download the file into current working directory
if (!file.exists(fil)) download.file(URL, fil)
#verify that the file has been downloaded successfully
list.files()
## [1] "AthenaJDBC41-1.0.1.jar"
drv <- JDBC(driverClass="com.amazonaws.athena.jdbc.AthenaDriver", fil, identifier.quote="'")

con <- jdbcConnection <- dbConnect(drv, 'jdbc:awsathena://athena.us-east-1.amazonaws.com:443/',
                                   s3_staging_dir=Sys.getenv("ATHENABUCKET"),
                                   aws_credentials_provider_class="com.amazonaws.auth.DefaultAWSCredentialsProviderChain")

Verify the connection. The results returned depend on your specific Athena setup.

con
## <JDBCConnection>
dbListTables(con)
##  [1] "gdelt"               "wikistats"           "elb_logs_raw_native"
##  [4] "twitter"             "twitter2"            "usermovieratings"   
##  [7] "eventcodes"          "events"              "billboard"          
## [10] "billboardtop10"      "elb_logs"            "gdelthist"          
## [13] "gdeltmaster"         "twitter"             "twitter3"

Create a dataset

For this analysis, you use a sample dataset combining information from Billboard and Wikipedia with Echo Nest data in the Million Songs Dataset. Upload this dataset into your own S3 bucket. The table below provides a description of the fields used in this dataset.

Field Description
year Year that song was released
songtitle Title of the song
artistname Name of the song artist
songid Unique identifier for the song
artistid Unique identifier for the song artist
timesignature Variable estimating the time signature of the song
timesignature_confidence Confidence in the estimate for the timesignature
loudness Continuous variable indicating the average amplitude of the audio in decibels
tempo Variable indicating the estimated beats per minute of the song
tempo_confidence Confidence in the estimate for tempo
key Variable with twelve levels indicating the estimated key of the song (C, C#, B)
key_confidence Confidence in the estimate for key
energy Variable that represents the overall acoustic energy of the song, using a mix of features such as loudness
pitch Continuous variable that indicates the pitch of the song
timbre_0_min thru timbre_11_min Variables that indicate the minimum values over all segments for each of the twelve values in the timbre vector
timbre_0_max thru timbre_11_max Variables that indicate the maximum values over all segments for each of the twelve values in the timbre vector
top10 Indicator for whether or not the song made it to the Top 10 of the Billboard charts (1 if it was in the top 10, and 0 if not)

Create an Athena table based on the dataset

In the Athena console, select the default database, sampled, or create a new database.

Run the following create table statement.

create external table if not exists billboard
(
year int,
songtitle string,
artistname string,
songID string,
artistID string,
timesignature int,
timesignature_confidence double,
loudness double,
tempo double,
tempo_confidence double,
key int,
key_confidence double,
energy double,
pitch double,
timbre_0_min double,
timbre_0_max double,
timbre_1_min double,
timbre_1_max double,
timbre_2_min double,
timbre_2_max double,
timbre_3_min double,
timbre_3_max double,
timbre_4_min double,
timbre_4_max double,
timbre_5_min double,
timbre_5_max double,
timbre_6_min double,
timbre_6_max double,
timbre_7_min double,
timbre_7_max double,
timbre_8_min double,
timbre_8_max double,
timbre_9_min double,
timbre_9_max double,
timbre_10_min double,
timbre_10_max double,
timbre_11_min double,
timbre_11_max double,
Top10 int
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION 's3://aws-bigdata-blog/artifacts/predict-billboard/data'
;

Inspect the table definition for the ‘billboard’ table that you have created. If you chose a database other than sampledb, replace that value with your choice.

dbGetQuery(con, "show create table sampledb.billboard")
##                                      createtab_stmt
## 1       CREATE EXTERNAL TABLE `sampledb.billboard`(
## 2                                       `year` int,
## 3                               `songtitle` string,
## 4                              `artistname` string,
## 5                                  `songid` string,
## 6                                `artistid` string,
## 7                              `timesignature` int,
## 8                `timesignature_confidence` double,
## 9                                `loudness` double,
## 10                                  `tempo` double,
## 11                       `tempo_confidence` double,
## 12                                       `key` int,
## 13                         `key_confidence` double,
## 14                                 `energy` double,
## 15                                  `pitch` double,
## 16                           `timbre_0_min` double,
## 17                           `timbre_0_max` double,
## 18                           `timbre_1_min` double,
## 19                           `timbre_1_max` double,
## 20                           `timbre_2_min` double,
## 21                           `timbre_2_max` double,
## 22                           `timbre_3_min` double,
## 23                           `timbre_3_max` double,
## 24                           `timbre_4_min` double,
## 25                           `timbre_4_max` double,
## 26                           `timbre_5_min` double,
## 27                           `timbre_5_max` double,
## 28                           `timbre_6_min` double,
## 29                           `timbre_6_max` double,
## 30                           `timbre_7_min` double,
## 31                           `timbre_7_max` double,
## 32                           `timbre_8_min` double,
## 33                           `timbre_8_max` double,
## 34                           `timbre_9_min` double,
## 35                           `timbre_9_max` double,
## 36                          `timbre_10_min` double,
## 37                          `timbre_10_max` double,
## 38                          `timbre_11_min` double,
## 39                          `timbre_11_max` double,
## 40                                     `top10` int)
## 41                             ROW FORMAT DELIMITED 
## 42                         FIELDS TERMINATED BY ',' 
## 43                            STORED AS INPUTFORMAT 
## 44       'org.apache.hadoop.mapred.TextInputFormat' 
## 45                                     OUTPUTFORMAT 
## 46  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
## 47                                        LOCATION
## 48    's3://aws-bigdata-blog/artifacts/predict-billboard/data'
## 49                                  TBLPROPERTIES (
## 50            'transient_lastDdlTime'='1505484133')

Run a sample query

Next, run a sample query to obtain a list of all songs from Janet Jackson that made it to the Billboard Top 10 charts.

dbGetQuery(con, " SELECT songtitle,artistname,top10   FROM sampledb.billboard WHERE lower(artistname) =     'janet jackson' AND top10 = 1")
##                       songtitle    artistname top10
## 1                       Runaway Janet Jackson     1
## 2               Because Of Love Janet Jackson     1
## 3                         Again Janet Jackson     1
## 4                            If Janet Jackson     1
## 5  Love Will Never Do (Without You) Janet Jackson 1
## 6                     Black Cat Janet Jackson     1
## 7               Come Back To Me Janet Jackson     1
## 8                       Alright Janet Jackson     1
## 9                      Escapade Janet Jackson     1
## 10                Rhythm Nation Janet Jackson     1

Determine how many songs in this dataset are specifically from the year 2010.

dbGetQuery(con, " SELECT count(*)   FROM sampledb.billboard WHERE year = 2010")
##   _col0
## 1   373

The sample dataset provides certain song properties of interest that can be analyzed to gauge the impact to the song’s overall popularity. Look at one such property, timesignature, and determine the value that is the most frequent among songs in the database. Timesignature is a measure of the number of beats and the type of note involved.

Running the query directly may result in an error, as shown in the commented lines below. This error is a result of trying to retrieve a large result set over a JDBC connection, which can cause out-of-memory issues at the client level. To address this, reduce the fetch size and run again.

#t<-dbGetQuery(con, " SELECT timesignature FROM sampledb.billboard")
#Note:  Running the preceding query results in the following error: 
#Error in .jcall(rp, "I", "fetch", stride, block): java.sql.SQLException: The requested #fetchSize is more than the allowed value in Athena. Please reduce the fetchSize and try #again. Refer to the Athena documentation for valid fetchSize values.
# Use the dbSendQuery function, reduce the fetch size, and run again
r <- dbSendQuery(con, " SELECT timesignature     FROM sampledb.billboard")
dftimesignature<- fetch(r, n=-1, block=100)
dbClearResult(r)
## [1] TRUE
table(dftimesignature)
## dftimesignature
##    0    1    3    4    5    7 
##   10  143  503 6787  112   19
nrow(dftimesignature)
## [1] 7574

From the results, observe that 6787 songs have a timesignature of 4.

Next, determine the song with the highest tempo.

dbGetQuery(con, " SELECT songtitle,artistname,tempo   FROM sampledb.billboard WHERE tempo = (SELECT max(tempo) FROM sampledb.billboard) ")
##                   songtitle      artistname   tempo
## 1 Wanna Be Startin' Somethin' Michael Jackson 244.307

Create the training dataset

Your model needs to be trained such that it can learn and make accurate predictions. Split the data into training and test datasets, and create the training dataset first.  This dataset contains all observations from the year 2009 and earlier. You may face the same JDBC connection issue pointed out earlier, so this query uses a fetch size.

#BillboardTrain <- dbGetQuery(con, "SELECT * FROM sampledb.billboard WHERE year <= 2009")
#Running the preceding query results in the following error:-
#Error in .verify.JDBC.result(r, "Unable to retrieve JDBC result set for ", : Unable to retrieve #JDBC result set for SELECT * FROM sampledb.billboard WHERE year <= 2009 (Internal error)
#Follow the same approach as before to address this issue.

r <- dbSendQuery(con, "SELECT * FROM sampledb.billboard WHERE year <= 2009")
BillboardTrain <- fetch(r, n=-1, block=100)
dbClearResult(r)
## [1] TRUE
BillboardTrain[1:2,c(1:3,6:10)]
##   year           songtitle artistname timesignature
## 1 2009 The Awkward Goodbye    Athlete             3
## 2 2009        Rubik's Cube    Athlete             3
##   timesignature_confidence loudness   tempo tempo_confidence
## 1                    0.732   -6.320  89.614   0.652
## 2                    0.906   -9.541 117.742   0.542
nrow(BillboardTrain)
## [1] 7201

Create the test dataset

BillboardTest <- dbGetQuery(con, "SELECT * FROM sampledb.billboard where year = 2010")
BillboardTest[1:2,c(1:3,11:15)]
##   year              songtitle        artistname key
## 1 2010 This Is the House That Doubt Built A Day to Remember  11
## 2 2010        Sticks & Bricks A Day to Remember  10
##   key_confidence    energy pitch timbre_0_min
## 1          0.453 0.9666556 0.024        0.002
## 2          0.469 0.9847095 0.025        0.000
nrow(BillboardTest)
## [1] 373

Convert the training and test datasets into H2O dataframes

train.h2o <- as.h2o(BillboardTrain)
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=================================================================| 100%
test.h2o <- as.h2o(BillboardTest)
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=================================================================| 100%

Inspect the column names in your H2O dataframes.

colnames(train.h2o)
##  [1] "year"                     "songtitle"               
##  [3] "artistname"               "songid"                  
##  [5] "artistid"                 "timesignature"           
##  [7] "timesignature_confidence" "loudness"                
##  [9] "tempo"                    "tempo_confidence"        
## [11] "key"                      "key_confidence"          
## [13] "energy"                   "pitch"                   
## [15] "timbre_0_min"             "timbre_0_max"            
## [17] "timbre_1_min"             "timbre_1_max"            
## [19] "timbre_2_min"             "timbre_2_max"            
## [21] "timbre_3_min"             "timbre_3_max"            
## [23] "timbre_4_min"             "timbre_4_max"            
## [25] "timbre_5_min"             "timbre_5_max"            
## [27] "timbre_6_min"             "timbre_6_max"            
## [29] "timbre_7_min"             "timbre_7_max"            
## [31] "timbre_8_min"             "timbre_8_max"            
## [33] "timbre_9_min"             "timbre_9_max"            
## [35] "timbre_10_min"            "timbre_10_max"           
## [37] "timbre_11_min"            "timbre_11_max"           
## [39] "top10"

Create models

You need to designate the independent and dependent variables prior to applying your modeling algorithms. Because you’re trying to predict the ‘top10’ field, this would be your dependent variable and everything else would be independent.

Create your first model using GLM. Because GLM works best with numeric data, you create your model by dropping non-numeric variables. You only use the variables in the dataset that describe the numerical attributes of the song in the logistic regression model. You won’t use these variables:  “year”, “songtitle”, “artistname”, “songid”, or “artistid”.

y.dep <- 39
x.indep <- c(6:38)
x.indep
##  [1]  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
## [24] 29 30 31 32 33 34 35 36 37 38

Create Model 1: All numeric variables

Create Model 1 with the training dataset, using GLM as the modeling algorithm and H2O’s built-in h2o.glm function.

modelh1 <- h2o.glm( y = y.dep, x = x.indep, training_frame = train.h2o, family = "binomial")
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=====                                                            |   8%
  |                                                                       
  |=================================================================| 100%

Measure the performance of Model 1, using H2O’s built-in performance function.

h2o.performance(model=modelh1,newdata=test.h2o)
## H2OBinomialMetrics: glm
## 
## MSE:  0.09924684
## RMSE:  0.3150347
## LogLoss:  0.3220267
## Mean Per-Class Error:  0.2380168
## AUC:  0.8431394
## Gini:  0.6862787
## R^2:  0.254663
## Null Deviance:  326.0801
## Residual Deviance:  240.2319
## AIC:  308.2319
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##          0   1    Error     Rate
## 0      255  59 0.187898  =59/314
## 1       17  42 0.288136   =17/59
## Totals 272 101 0.203753  =76/373
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                         metric threshold    value idx
## 1                       max f1  0.192772 0.525000 100
## 2                       max f2  0.124912 0.650510 155
## 3                 max f0point5  0.416258 0.612903  23
## 4                 max accuracy  0.416258 0.879357  23
## 5                max precision  0.813396 1.000000   0
## 6                   max recall  0.037579 1.000000 282
## 7              max specificity  0.813396 1.000000   0
## 8             max absolute_mcc  0.416258 0.455251  23
## 9   max min_per_class_accuracy  0.161402 0.738854 125
## 10 max mean_per_class_accuracy  0.124912 0.765006 155
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or ` 
h2o.auc(h2o.performance(modelh1,test.h2o)) 
## [1] 0.8431394

The AUC metric provides insight into how well the classifier is able to separate the two classes. In this case, the value of 0.8431394 indicates that the classification is good. (A value of 0.5 indicates a worthless test, while a value of 1.0 indicates a perfect test.)

Next, inspect the coefficients of the variables in the dataset.

dfmodelh1 <- as.data.frame(h2o.varimp(modelh1))
dfmodelh1
##                       names coefficients sign
## 1              timbre_0_max  1.290938663  NEG
## 2                  loudness  1.262941934  POS
## 3                     pitch  0.616995941  NEG
## 4              timbre_1_min  0.422323735  POS
## 5              timbre_6_min  0.349016024  NEG
## 6                    energy  0.348092062  NEG
## 7             timbre_11_min  0.307331997  NEG
## 8              timbre_3_max  0.302225619  NEG
## 9             timbre_11_max  0.243632060  POS
## 10             timbre_4_min  0.224233951  POS
## 11             timbre_4_max  0.204134342  POS
## 12             timbre_5_min  0.199149324  NEG
## 13             timbre_0_min  0.195147119  POS
## 14 timesignature_confidence  0.179973904  POS
## 15         tempo_confidence  0.144242598  POS
## 16            timbre_10_max  0.137644568  POS
## 17             timbre_7_min  0.126995955  NEG
## 18            timbre_10_min  0.123851179  POS
## 19             timbre_7_max  0.100031481  NEG
## 20             timbre_2_min  0.096127636  NEG
## 21           key_confidence  0.083115820  POS
## 22             timbre_6_max  0.073712419  POS
## 23            timesignature  0.067241917  POS
## 24             timbre_8_min  0.061301881  POS
## 25             timbre_8_max  0.060041698  POS
## 26                      key  0.056158445  POS
## 27             timbre_3_min  0.050825116  POS
## 28             timbre_9_max  0.033733561  POS
## 29             timbre_2_max  0.030939072  POS
## 30             timbre_9_min  0.020708113  POS
## 31             timbre_1_max  0.014228818  NEG
## 32                    tempo  0.008199861  POS
## 33             timbre_5_max  0.004837870  POS
## 34                                    NA <NA>

Typically, songs with heavier instrumentation tend to be louder (have higher values in the variable “loudness”) and more energetic (have higher values in the variable “energy”). This knowledge is helpful for interpreting the modeling results.

You can make the following observations from the results:

  • The coefficient estimates for the confidence values associated with the time signature, key, and tempo variables are positive. This suggests that higher confidence leads to a higher predicted probability of a Top 10 hit.
  • The coefficient estimate for loudness is positive, meaning that mainstream listeners prefer louder songs with heavier instrumentation.
  • The coefficient estimate for energy is negative, meaning that mainstream listeners prefer songs that are less energetic, which are those songs with light instrumentation.

These coefficients lead to contradictory conclusions for Model 1. This could be due to multicollinearity issues. Inspect the correlation between the variables “loudness” and “energy” in the training set.

cor(train.h2o$loudness,train.h2o$energy)
## [1] 0.7399067

This number indicates that these two variables are highly correlated, and Model 1 does indeed suffer from multicollinearity. Typically, you associate a value of -1.0 to -0.5 or 1.0 to 0.5 to indicate strong correlation, and a value of 0.1 to 0.1 to indicate weak correlation. To avoid this correlation issue, omit one of these two variables and re-create the models.

You build two variations of the original model:

  • Model 2, in which you keep “energy” and omit “loudness”
  • Model 3, in which you keep “loudness” and omit “energy”

You compare these two models and choose the model with a better fit for this use case.

Create Model 2: Keep energy and omit loudness

colnames(train.h2o)
##  [1] "year"                     "songtitle"               
##  [3] "artistname"               "songid"                  
##  [5] "artistid"                 "timesignature"           
##  [7] "timesignature_confidence" "loudness"                
##  [9] "tempo"                    "tempo_confidence"        
## [11] "key"                      "key_confidence"          
## [13] "energy"                   "pitch"                   
## [15] "timbre_0_min"             "timbre_0_max"            
## [17] "timbre_1_min"             "timbre_1_max"            
## [19] "timbre_2_min"             "timbre_2_max"            
## [21] "timbre_3_min"             "timbre_3_max"            
## [23] "timbre_4_min"             "timbre_4_max"            
## [25] "timbre_5_min"             "timbre_5_max"            
## [27] "timbre_6_min"             "timbre_6_max"            
## [29] "timbre_7_min"             "timbre_7_max"            
## [31] "timbre_8_min"             "timbre_8_max"            
## [33] "timbre_9_min"             "timbre_9_max"            
## [35] "timbre_10_min"            "timbre_10_max"           
## [37] "timbre_11_min"            "timbre_11_max"           
## [39] "top10"
y.dep <- 39
x.indep <- c(6:7,9:38)
x.indep
##  [1]  6  7  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
## [24] 30 31 32 33 34 35 36 37 38
modelh2 <- h2o.glm( y = y.dep, x = x.indep, training_frame = train.h2o, family = "binomial")
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=======                                                          |  10%
  |                                                                       
  |=================================================================| 100%

Measure the performance of Model 2.

h2o.performance(model=modelh2,newdata=test.h2o)
## H2OBinomialMetrics: glm
## 
## MSE:  0.09922606
## RMSE:  0.3150017
## LogLoss:  0.3228213
## Mean Per-Class Error:  0.2490554
## AUC:  0.8431933
## Gini:  0.6863867
## R^2:  0.2548191
## Null Deviance:  326.0801
## Residual Deviance:  240.8247
## AIC:  306.8247
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##          0  1    Error     Rate
## 0      280 34 0.108280  =34/314
## 1       23 36 0.389831   =23/59
## Totals 303 70 0.152815  =57/373
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                         metric threshold    value idx
## 1                       max f1  0.254391 0.558140  69
## 2                       max f2  0.113031 0.647208 157
## 3                 max f0point5  0.413999 0.596026  22
## 4                 max accuracy  0.446250 0.876676  18
## 5                max precision  0.811739 1.000000   0
## 6                   max recall  0.037682 1.000000 283
## 7              max specificity  0.811739 1.000000   0
## 8             max absolute_mcc  0.254391 0.469060  69
## 9   max min_per_class_accuracy  0.141051 0.716561 131
## 10 max mean_per_class_accuracy  0.113031 0.761821 157
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`
dfmodelh2 <- as.data.frame(h2o.varimp(modelh2))
dfmodelh2
##                       names coefficients sign
## 1                     pitch  0.700331511  NEG
## 2              timbre_1_min  0.510270513  POS
## 3              timbre_0_max  0.402059546  NEG
## 4              timbre_6_min  0.333316236  NEG
## 5             timbre_11_min  0.331647383  NEG
## 6              timbre_3_max  0.252425901  NEG
## 7             timbre_11_max  0.227500308  POS
## 8              timbre_4_max  0.210663865  POS
## 9              timbre_0_min  0.208516163  POS
## 10             timbre_5_min  0.202748055  NEG
## 11             timbre_4_min  0.197246582  POS
## 12            timbre_10_max  0.172729619  POS
## 13         tempo_confidence  0.167523934  POS
## 14 timesignature_confidence  0.167398830  POS
## 15             timbre_7_min  0.142450727  NEG
## 16             timbre_8_max  0.093377516  POS
## 17            timbre_10_min  0.090333426  POS
## 18            timesignature  0.085851625  POS
## 19             timbre_7_max  0.083948442  NEG
## 20           key_confidence  0.079657073  POS
## 21             timbre_6_max  0.076426046  POS
## 22             timbre_2_min  0.071957831  NEG
## 23             timbre_9_max  0.071393189  POS
## 24             timbre_8_min  0.070225578  POS
## 25                      key  0.061394702  POS
## 26             timbre_3_min  0.048384697  POS
## 27             timbre_1_max  0.044721121  NEG
## 28                   energy  0.039698433  POS
## 29             timbre_5_max  0.039469064  POS
## 30             timbre_2_max  0.018461133  POS
## 31                    tempo  0.013279926  POS
## 32             timbre_9_min  0.005282143  NEG
## 33                                    NA <NA>

h2o.auc(h2o.performance(modelh2,test.h2o)) 
## [1] 0.8431933

You can make the following observations:

  • The AUC metric is 0.8431933.
  • Inspecting the coefficient of the variable energy, Model 2 suggests that songs with high energy levels tend to be more popular. This is as per expectation.
  • As H2O orders variables by significance, the variable energy is not significant in this model.

You can conclude that Model 2 is not ideal for this use , as energy is not significant.

CreateModel 3: Keep loudness but omit energy

colnames(train.h2o)
##  [1] "year"                     "songtitle"               
##  [3] "artistname"               "songid"                  
##  [5] "artistid"                 "timesignature"           
##  [7] "timesignature_confidence" "loudness"                
##  [9] "tempo"                    "tempo_confidence"        
## [11] "key"                      "key_confidence"          
## [13] "energy"                   "pitch"                   
## [15] "timbre_0_min"             "timbre_0_max"            
## [17] "timbre_1_min"             "timbre_1_max"            
## [19] "timbre_2_min"             "timbre_2_max"            
## [21] "timbre_3_min"             "timbre_3_max"            
## [23] "timbre_4_min"             "timbre_4_max"            
## [25] "timbre_5_min"             "timbre_5_max"            
## [27] "timbre_6_min"             "timbre_6_max"            
## [29] "timbre_7_min"             "timbre_7_max"            
## [31] "timbre_8_min"             "timbre_8_max"            
## [33] "timbre_9_min"             "timbre_9_max"            
## [35] "timbre_10_min"            "timbre_10_max"           
## [37] "timbre_11_min"            "timbre_11_max"           
## [39] "top10"
y.dep <- 39
x.indep <- c(6:12,14:38)
x.indep
##  [1]  6  7  8  9 10 11 12 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
## [24] 30 31 32 33 34 35 36 37 38
modelh3 <- h2o.glm( y = y.dep, x = x.indep, training_frame = train.h2o, family = "binomial")
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |========                                                         |  12%
  |                                                                       
  |=================================================================| 100%
perfh3<-h2o.performance(model=modelh3,newdata=test.h2o)
perfh3
## H2OBinomialMetrics: glm
## 
## MSE:  0.0978859
## RMSE:  0.3128672
## LogLoss:  0.3178367
## Mean Per-Class Error:  0.264925
## AUC:  0.8492389
## Gini:  0.6984778
## R^2:  0.2648836
## Null Deviance:  326.0801
## Residual Deviance:  237.1062
## AIC:  303.1062
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##          0  1    Error     Rate
## 0      286 28 0.089172  =28/314
## 1       26 33 0.440678   =26/59
## Totals 312 61 0.144772  =54/373
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                         metric threshold    value idx
## 1                       max f1  0.273799 0.550000  60
## 2                       max f2  0.125503 0.663265 155
## 3                 max f0point5  0.435479 0.628931  24
## 4                 max accuracy  0.435479 0.882038  24
## 5                max precision  0.821606 1.000000   0
## 6                   max recall  0.038328 1.000000 280
## 7              max specificity  0.821606 1.000000   0
## 8             max absolute_mcc  0.435479 0.471426  24
## 9   max min_per_class_accuracy  0.173693 0.745763 120
## 10 max mean_per_class_accuracy  0.125503 0.775073 155
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`
dfmodelh3 <- as.data.frame(h2o.varimp(modelh3))
dfmodelh3
##                       names coefficients sign
## 1              timbre_0_max 1.216621e+00  NEG
## 2                  loudness 9.780973e-01  POS
## 3                     pitch 7.249788e-01  NEG
## 4              timbre_1_min 3.891197e-01  POS
## 5              timbre_6_min 3.689193e-01  NEG
## 6             timbre_11_min 3.086673e-01  NEG
## 7              timbre_3_max 3.025593e-01  NEG
## 8             timbre_11_max 2.459081e-01  POS
## 9              timbre_4_min 2.379749e-01  POS
## 10             timbre_4_max 2.157627e-01  POS
## 11             timbre_0_min 1.859531e-01  POS
## 12             timbre_5_min 1.846128e-01  NEG
## 13 timesignature_confidence 1.729658e-01  POS
## 14             timbre_7_min 1.431871e-01  NEG
## 15            timbre_10_max 1.366703e-01  POS
## 16            timbre_10_min 1.215954e-01  POS
## 17         tempo_confidence 1.183698e-01  POS
## 18             timbre_2_min 1.019149e-01  NEG
## 19           key_confidence 9.109701e-02  POS
## 20             timbre_7_max 8.987908e-02  NEG
## 21             timbre_6_max 6.935132e-02  POS
## 22             timbre_8_max 6.878241e-02  POS
## 23            timesignature 6.120105e-02  POS
## 24                      key 5.814805e-02  POS
## 25             timbre_8_min 5.759228e-02  POS
## 26             timbre_1_max 2.930285e-02  NEG
## 27             timbre_9_max 2.843755e-02  POS
## 28             timbre_3_min 2.380245e-02  POS
## 29             timbre_2_max 1.917035e-02  POS
## 30             timbre_5_max 1.715813e-02  POS
## 31                    tempo 1.364418e-02  NEG
## 32             timbre_9_min 8.463143e-05  NEG
## 33                                    NA <NA>
h2o.sensitivity(perfh3,0.5)
## Warning in h2o.find_row_by_threshold(object, t): Could not find exact
## threshold: 0.5 for this set of metrics; using closest threshold found:
## 0.501855569251422. Run `h2o.predict` and apply your desired threshold on a
## probability column.
## [[1]]
## [1] 0.2033898
h2o.auc(perfh3)
## [1] 0.8492389

You can make the following observations:

  • The AUC metric is 0.8492389.
  • From the confusion matrix, the model correctly predicts that 33 songs will be top 10 hits (true positives). However, it has 26 false positives (songs that the model predicted would be Top 10 hits, but ended up not being Top 10 hits).
  • Loudness has a positive coefficient estimate, meaning that this model predicts that songs with heavier instrumentation tend to be more popular. This is the same conclusion from Model 2.
  • Loudness is significant in this model.

Overall, Model 3 predicts a higher number of top 10 hits with an accuracy rate that is acceptable. To choose the best fit for production runs, record labels should consider the following factors:

  • Desired model accuracy at a given threshold
  • Number of correct predictions for top10 hits
  • Tolerable number of false positives or false negatives

Next, make predictions using Model 3 on the test dataset.

predict.regh <- h2o.predict(modelh3, test.h2o)
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=================================================================| 100%
print(predict.regh)
##   predict        p0          p1
## 1       0 0.9654739 0.034526052
## 2       0 0.9654748 0.034525236
## 3       0 0.9635547 0.036445318
## 4       0 0.9343579 0.065642149
## 5       0 0.9978334 0.002166601
## 6       0 0.9779949 0.022005078
## 
## [373 rows x 3 columns]
predict.regh$predict
##   predict
## 1       0
## 2       0
## 3       0
## 4       0
## 5       0
## 6       0
## 
## [373 rows x 1 column]
dpr<-as.data.frame(predict.regh)
#Rename the predicted column 
colnames(dpr)[colnames(dpr) == 'predict'] <- 'predict_top10'
table(dpr$predict_top10)
## 
##   0   1 
## 312  61

The first set of output results specifies the probabilities associated with each predicted observation.  For example, observation 1 is 96.54739% likely to not be a Top 10 hit, and 3.4526052% likely to be a Top 10 hit (predict=1 indicates Top 10 hit and predict=0 indicates not a Top 10 hit).  The second set of results list the actual predictions made.  From the third set of results, this model predicts that 61 songs will be top 10 hits.

Compute the baseline accuracy, by assuming that the baseline predicts the most frequent outcome, which is that most songs are not Top 10 hits.

table(BillboardTest$top10)
## 
##   0   1 
## 314  59

Now observe that the baseline model would get 314 observations correct, and 59 wrong, for an accuracy of 314/(314+59) = 0.8418231.

It seems that Model 3, with an accuracy of 0.8552, provides you with a small improvement over the baseline model. But is this model useful for record labels?

View the two models from an investment perspective:

  • A production company is interested in investing in songs that are more likely to make it to the Top 10. The company’s objective is to minimize the risk of financial losses attributed to investing in songs that end up unpopular.
  • How many songs does Model 3 correctly predict as a Top 10 hit in 2010? Looking at the confusion matrix, you see that it predicts 33 top 10 hits correctly at an optimal threshold, which is more than half the number
  • It will be more useful to the record label if you can provide the production company with a list of songs that are highly likely to end up in the Top 10.
  • The baseline model is not useful, as it simply does not label any song as a hit.

Considering the three models built so far, you can conclude that Model 3 proves to be the best investment choice for the record label.

GBM model

H2O provides you with the ability to explore other learning models, such as GBM and deep learning. Explore building a model using the GBM technique, using the built-in h2o.gbm function.

Before you do this, you need to convert the target variable to a factor for multinomial classification techniques.

train.h2o$top10=as.factor(train.h2o$top10)
gbm.modelh <- h2o.gbm(y=y.dep, x=x.indep, training_frame = train.h2o, ntrees = 500, max_depth = 4, learn_rate = 0.01, seed = 1122,distribution="multinomial")
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |===                                                              |   5%
  |                                                                       
  |=====                                                            |   7%
  |                                                                       
  |======                                                           |   9%
  |                                                                       
  |=======                                                          |  10%
  |                                                                       
  |======================                                           |  33%
  |                                                                       
  |=====================================                            |  56%
  |                                                                       
  |====================================================             |  79%
  |                                                                       
  |================================================================ |  98%
  |                                                                       
  |=================================================================| 100%
perf.gbmh<-h2o.performance(gbm.modelh,test.h2o)
perf.gbmh
## H2OBinomialMetrics: gbm
## 
## MSE:  0.09860778
## RMSE:  0.3140188
## LogLoss:  0.3206876
## Mean Per-Class Error:  0.2120263
## AUC:  0.8630573
## Gini:  0.7261146
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##          0  1    Error     Rate
## 0      266 48 0.152866  =48/314
## 1       16 43 0.271186   =16/59
## Totals 282 91 0.171582  =64/373
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                       metric threshold    value idx
## 1                     max f1  0.189757 0.573333  90
## 2                     max f2  0.130895 0.693717 145
## 3               max f0point5  0.327346 0.598802  26
## 4               max accuracy  0.442757 0.876676  14
## 5              max precision  0.802184 1.000000   0
## 6                 max recall  0.049990 1.000000 284
## 7            max specificity  0.802184 1.000000   0
## 8           max absolute_mcc  0.169135 0.496486 104
## 9 max min_per_class_accuracy  0.169135 0.796610 104
## 10 max mean_per_class_accuracy  0.169135 0.805948 104
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `
h2o.sensitivity(perf.gbmh,0.5)
## Warning in h2o.find_row_by_threshold(object, t): Could not find exact
## threshold: 0.5 for this set of metrics; using closest threshold found:
## 0.501205344484314. Run `h2o.predict` and apply your desired threshold on a
## probability column.
## [[1]]
## [1] 0.1355932
h2o.auc(perf.gbmh)
## [1] 0.8630573

This model correctly predicts 43 top 10 hits, which is 10 more than the number predicted by Model 3. Moreover, the AUC metric is higher than the one obtained from Model 3.

As seen above, H2O’s API provides the ability to obtain key statistical measures required to analyze the models easily, using several built-in functions. The record label can experiment with different parameters to arrive at the model that predicts the maximum number of Top 10 hits at the desired level of accuracy and threshold.

H2O also allows you to experiment with deep learning models. Deep learning models have the ability to learn features implicitly, but can be more expensive computationally.

Now, create a deep learning model with the h2o.deeplearning function, using the same training and test datasets created before. The time taken to run this model depends on the type of EC2 instance chosen for this purpose.  For models that require more computation, consider using accelerated computing instances such as the P2 instance type.

system.time(
  dlearning.modelh <- h2o.deeplearning(y = y.dep,
                                      x = x.indep,
                                      training_frame = train.h2o,
                                      epoch = 250,
                                      hidden = c(250,250),
                                      activation = "Rectifier",
                                      seed = 1122,
                                      distribution="multinomial"
  )
)
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |===                                                              |   4%
  |                                                                       
  |=====                                                            |   8%
  |                                                                       
  |========                                                         |  12%
  |                                                                       
  |==========                                                       |  16%
  |                                                                       
  |=============                                                    |  20%
  |                                                                       
  |================                                                 |  24%
  |                                                                       
  |==================                                               |  28%
  |                                                                       
  |=====================                                            |  32%
  |                                                                       
  |=======================                                          |  36%
  |                                                                       
  |==========================                                       |  40%
  |                                                                       
  |=============================                                    |  44%
  |                                                                       
  |===============================                                  |  48%
  |                                                                       
  |==================================                               |  52%
  |                                                                       
  |====================================                             |  56%
  |                                                                       
  |=======================================                          |  60%
  |                                                                       
  |==========================================                       |  64%
  |                                                                       
  |============================================                     |  68%
  |                                                                       
  |===============================================                  |  72%
  |                                                                       
  |=================================================                |  76%
  |                                                                       
  |====================================================             |  80%
  |                                                                       
  |=======================================================          |  84%
  |                                                                       
  |=========================================================        |  88%
  |                                                                       
  |============================================================     |  92%
  |                                                                       
  |==============================================================   |  96%
  |                                                                       
  |=================================================================| 100%
##    user  system elapsed 
##   1.216   0.020 166.508
perf.dl<-h2o.performance(model=dlearning.modelh,newdata=test.h2o)
perf.dl
## H2OBinomialMetrics: deeplearning
## 
## MSE:  0.1678359
## RMSE:  0.4096778
## LogLoss:  1.86509
## Mean Per-Class Error:  0.3433013
## AUC:  0.7568822
## Gini:  0.5137644
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##          0  1    Error     Rate
## 0      290 24 0.076433  =24/314
## 1       36 23 0.610169   =36/59
## Totals 326 47 0.160858  =60/373
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                       metric threshold    value idx
## 1                     max f1  0.826267 0.433962  46
## 2                     max f2  0.000000 0.588235 239
## 3               max f0point5  0.999929 0.511811  16
## 4               max accuracy  0.999999 0.865952  10
## 5              max precision  1.000000 1.000000   0
## 6                 max recall  0.000000 1.000000 326
## 7            max specificity  1.000000 1.000000   0
## 8           max absolute_mcc  0.999929 0.363219  16
## 9 max min_per_class_accuracy  0.000004 0.662420 145
## 10 max mean_per_class_accuracy  0.000000 0.685334 224
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`
h2o.sensitivity(perf.dl,0.5)
## Warning in h2o.find_row_by_threshold(object, t): Could not find exact
## threshold: 0.5 for this set of metrics; using closest threshold found:
## 0.496293348880151. Run `h2o.predict` and apply your desired threshold on a
## probability column.
## [[1]]
## [1] 0.3898305
h2o.auc(perf.dl)
## [1] 0.7568822

The AUC metric for this model is 0.7568822, which is less than what you got from the earlier models. I recommend further experimentation using different hyper parameters, such as the learning rate, epoch or the number of hidden layers.

H2O’s built-in functions provide many key statistical measures that can help measure model performance. Here are some of these key terms.

Metric Description
Sensitivity Measures the proportion of positives that have been correctly identified. It is also called the true positive rate, or recall.
Specificity Measures the proportion of negatives that have been correctly identified. It is also called the true negative rate.
Threshold Cutoff point that maximizes specificity and sensitivity. While the model may not provide the highest prediction at this point, it would not be biased towards positives or negatives.
Precision The fraction of the documents retrieved that are relevant to the information needed, for example, how many of the positively classified are relevant
AUC

Provides insight into how well the classifier is able to separate the two classes. The implicit goal is to deal with situations where the sample distribution is highly skewed, with a tendency to overfit to a single class.

0.90 – 1 = excellent (A)

0.8 – 0.9 = good (B)

0.7 – 0.8 = fair (C)

.6 – 0.7 = poor (D)

0.5 – 0.5 = fail (F)

Here’s a summary of the metrics generated from H2O’s built-in functions for the three models that produced useful results.

Metric Model 3 GBM Model Deep Learning Model

Accuracy

(max)

0.882038

(t=0.435479)

0.876676

(t=0.442757)

0.865952

(t=0.999999)

Precision

(max)

1.0

(t=0.821606)

1.0

(t=0802184)

1.0

(t=1.0)

Recall

(max)

1.0 1.0

1.0

(t=0)

Specificity

(max)

1.0 1.0

1.0

(t=1)

Sensitivity

 

0.2033898 0.1355932

0.3898305

(t=0.5)

AUC 0.8492389 0.8630573 0.756882

Note: ‘t’ denotes threshold.

Your options at this point could be narrowed down to Model 3 and the GBM model, based on the AUC and accuracy metrics observed earlier.  If the slightly lower accuracy of the GBM model is deemed acceptable, the record label can choose to go to production with the GBM model, as it can predict a higher number of Top 10 hits.  The AUC metric for the GBM model is also higher than that of Model 3.

Record labels can experiment with different learning techniques and parameters before arriving at a model that proves to be the best fit for their business. Because deep learning models can be computationally expensive, record labels can choose more powerful EC2 instances on AWS to run their experiments faster.

Conclusion

In this post, I showed how the popular music industry can use analytics to predict the type of songs that make the Top 10 Billboard charts. By running H2O’s scalable machine learning platform on AWS, data scientists can easily experiment with multiple modeling techniques and interactively query the data using Amazon Athena, without having to manage the underlying infrastructure. This helps record labels make critical decisions on the type of artists and songs to promote in a timely fashion, thereby increasing sales and revenue.

If you have questions or suggestions, please comment below.


Additional Reading

Learn how to build and explore a simple geospita simple GEOINT application using SparkR.


About the Authors

gopalGopal Wunnava is a Partner Solution Architect with the AWS GSI Team. He works with partners and customers on big data engagements, and is passionate about building analytical solutions that drive business capabilities and decision making. In his spare time, he loves all things sports and movies related and is fond of old classics like Asterix, Obelix comics and Hitchcock movies.

 

 

Bob Strahan, a Senior Consultant with AWS Professional Services, contributed to this post.

 

 

Popcorn Time Creator Readies BitTorrent & Blockchain-Powered Video Platform

Post Syndicated from Andy original https://torrentfreak.com/popcorn-time-creator-readies-bittorrent-blockchain-powered-youtube-competitor-171012/

Without a doubt, YouTube is one of the most important websites available on the Internet today.

Its massive archive of videos brings pleasure to millions on a daily basis but its centralized nature means that owner Google always exercises control.

Over the years, people have looked to decentralize the YouTube concept and the latest project hoping to shake up the market has a particularly interesting player onboard.

Until 2015, only insiders knew that Argentinian designer Federico Abad was actually ‘Sebastian’, the shadowy figure behind notorious content sharing platform Popcorn Time.

Now he’s part of the team behind Flixxo, a BitTorrent and blockchain-powered startup hoping to wrestle a share of the video market from YouTube. Here’s how the team, which features blockchain startup RSK Labs, hope things will play out.

The Flixxo network will have no centralized storage of data, eliminating the need for expensive hosting along with associated costs. Instead, transfers will take place between peers using BitTorrent, meaning video content will be stored on the machines of Flixxo users. In practice, the content will be downloaded and uploaded in much the same way as users do on The Pirate Bay or indeed Abad’s baby, Popcorn Time.

However, there’s a twist to the system that envisions content creators, content consumers, and network participants (seeders) making revenue from their efforts.

At the heart of the Flixxo system are digital tokens (think virtual currency), called Flixx. These Flixx ‘coins’, which will go on sale in 12 days, can be used to buy access to content. Creators can also opt to pay consumers when those people help to distribute their content to others.

“Free from structural costs, producers can share the earnings from their content with the network that supports them,” the team explains.

“This way you get paid for helping us improve Flixxo, and you earn credits (in the form of digital tokens called Flixx) for watching higher quality content. Having no intermediaries means that the price you pay for watching the content that you actually want to watch is lower and fairer.”

The Flixxo team

In addition to earning tokens from helping to distribute content, people in the Flixxo ecosystem can also earn currency by watching sponsored content, i.e advertisements. While in a traditional system adverts are often considered a nuisance, Flixx tokens have real value, with a promise that users will be able to trade their Flixx not only for videos, but also for tangible and semi-tangible goods.

“Use your Flixx to reward the producers you follow, encouraging them to create more awesome content. Or keep your Flixx in your wallet and use them to buy a movie ticket, a pair of shoes from an online retailer, a chest of coins in your favourite game or even convert them to old-fashioned cash or up-and-coming digital assets, like Bitcoin,” the team explains.

The Flixxo team have big plans. After foundation in early 2016, the second quarter of 2017 saw the completion of a functional alpha release. In a little under two weeks, the project will begin its token generation event, with new offices in Los Angeles planned for the first half of 2018 alongside a premiere of the Flixxo platform.

“A total of 1,000,000,000 (one billion) Flixx tokens will be issued. A maximum of 300,000,000 (three hundred million) tokens will be sold. Some of these tokens (not more than 33% or 100,000,000 Flixx) may be sold with anticipation of the token allocation event to strategic investors,” Flixxo states.

Like all content platforms, Flixxo will live or die by the quality of the content it provides and whether, at least in the first instance, it can persuade people to part with their hard-earned cash. Only time will tell whether its content will be worth a premium over readily accessible YouTube content but with much-reduced costs, it may tempt creators seeking a bigger piece of the pie.

“Flixxo will also educate its community, teaching its users that in this new internet era value can be held and transferred online without intermediaries, a value that can be earned back by participating in a community, by contributing, being rewarded for every single social interaction,” the team explains.

Of course, the elephant in the room is what will happen when people begin sharing copyrighted content via Flixxo. Certainly, the fact that Popcorn Time’s founder is a key player and rival streaming platform Stremio is listed as a partner means that things could get a bit spicy later on.

Nevertheless, the team suggests that piracy and spam content distribution will be limited by mechanisms already built into the system.

“[A]uthors have to time-block tokens in a smart contract (set as a warranty) in order to upload content. This contract will also handle and block their earnings for a certain period of time, so that in the case of a dispute the unfair-uploader may lose those tokens,” they explain.

That being said, Flixxo also says that “there is no way” for third parties to censor content “which means that anyone has the chance of making any piece of media available on the network.” However, Flixxo says it will develop tools for filtering what it describes as “inappropriate content.”

At this point, things start to become a little unclear. On the one hand Flixxo says it could become a “revolutionary tool for uncensorable and untraceable media” yet on the other it says that it’s necessary to ensure that adult content, for example, isn’t seen by kids.

“We know there is a thin line between filtering or curating content and censorship, and it is a fact that we have an open network for everyone to upload any content. However, Flixxo as a platform will apply certain filtering based on clear rules – there should be a behavior-code for uploaders in order to offer the right content to the right user,” Flixxo explains.

To this end, Flixxo says it will deploy a centralized curation function, carried out by 101 delegates elected by the community, which will become progressively decentralized over time.

“This curation will have a cost, paid in Flixx, and will be collected from the warranty blocked by the content uploaders,” they add.

There can be little doubt that if Flixxo begins ‘curating’ unsuitable content, copyright holders will call on it to do the same for their content too. And, if the platform really takes off, 101 curators probably won’t scratch the surface. There’s also the not inconsiderable issue of what might happen to curators’ judgment when they’re incentivized to block curate content.

Finally, for those sick of “not available in your region” messages, there’s good and bad news. Flixxo insists there will be no geo-blocking of content on its part but individual creators will still have that feature available to them, should they choose.

The Flixx whitepaper can be downloaded here (pdf)

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

The CoderDojo Girls Initiative

Post Syndicated from Nuala McHale original https://www.raspberrypi.org/blog/coderdojo-girls-initiative/

In March, the CoderDojo Foundation launched their Girls Initiative, which aims to increase the average proportion of girls attending CoderDojo clubs from 29% to at least 40% over the next three years.

The CoderDojo Girls Initiative

Six months on, we wanted to highlight what we’ve done so far and what’s next for our initiative.

What we’ve done so far

To date, we have focussed our efforts on four key areas:

  • Developing and improving content
  • Conducting and learning from research
  • Highlighting role models
  • Developing a guide of tried and tested best practices for encouraging and sustaining girls in a Dojo setting (Empowering the Future)

Content

We’ve taken measures to ensure our resources are as friendly to girls as well as boys, and we are improving them based on feedback from girls. For example, we have developed beginner-level content (Sushi Cards) for working with wearables and for building apps using App Inventor. In response to girls’ feedback, we are exploring more creative goal-orientated content.

The CoderDojo Girls Initiative

Moreover, as part of our Empowering the Future guide, we have developed three short ‘Mini-Sushi’ projects which provide a taster of different programming languages, such as Scratch, HTML, and App Inventor.

What’s next?

We are currently finalising our intermediate-level wearables Sushi Cards. These are resources for learners to further explore wearables and integrate them with other coding skills they are developing. The Cards will enable young people to program LEDs which can be sewn into clothing with conductive thread. We are also planning another series of Sushi Cards focused on using coding skills to solve problems Ninjas have reported as important to them.

Research

In June 2017 we conducted the first Ninja survey. It was sent to all young people registered on the CoderDojo community platform, Zen. Hundreds of young people involved in Dojos around the world responded and shared their experiences.

The CoderDojo Girls Initiative

We are currently examining these results to identify areas in which girls feel most or least confident, as well as the motivations and influencing factors that cause them to continue with coding.

What’s next?

Over the coming months we will delve deeper into the findings of this research, and decide how we can improve our content and Dojo support to adapt accordingly. Additionally, as part of sending out our Empowering the Future guide, we’re asking Dojos to provide insights into their current proportions of girls and female Mentors.

The CoderDojo Girls Initiative

We will follow up with recipients of the guide to document the impact of the recommended approaches they try at their Dojo. Thus, we will find out which approaches are most effective in different regional contexts, which will help us improve our support for Dojos wanting to increase their proportion of attending girls.

Role models

Many Dojos, Champions, and Mentors are doing amazing work to support and encourage girls at their Dojos. Female Mentors not only help by supporting attending girls, but they also act as vital role models in an environment which is often male-dominated. Blogs by female Mentors and Ninjas which have already featured on our website include:

What’s next?

We recognise the importance of female role models, and over the coming months we will continue to encourage community members to share their stories so that we bring them to the wider CoderDojo community. Do you know a female Mentor or Ninja you would like to shine a spotline on? Get in touch with us at [email protected] You can also use #CoderDojoGirls on social media.

The CoderDojo Girls Initiative

Empowering the Future guide

Ahead of Ada Lovelace Day and International Day of the Girl Child, the CoderDojo Foundation has released Empowering the Future, a comprehensive guide of practical approaches which Dojos have tested to engage and sustain girls.

Some topics covered in the guide are:

  • Approaches to improve the Dojo environment and layout
  • Language and images used to describe and promote Dojos
  • Content considerations, and suggested resources
  • The importance of female Mentors, and ways to increase access to role models

For the next month, Dojos that want to improve their proportion of girls can still sign up to have the guide book sent to them for free! From today, Dojos and anyone else can also download a PDF file of the guide.

The CoderDojo Girls Initiative

We would like to say a massive thank you to all community members who have shared their insights with us to make our Empowering the Future guide as comprehensive and beneficial as possible for other Dojos.

Tell us what you think

Have you found an approach, or used content, which girls find particularly engaging? Do you have questions about our Girls Initiative? We would love to hear your ideas, insights, and experiences in relation to supporting CoderDojo girls! Feel free to use our forums to share with the global CoderDojo community, and email us at [email protected]

The post The CoderDojo Girls Initiative appeared first on Raspberry Pi.

Spooktacular Halloween Haunted Portrait

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/spooktacular-halloween-haunted-portrait/

October has come at last, and with it, the joy of Halloween is now upon us. So while I spend the next 30 days quoting Hocus Pocus at every opportunity, here’s Adafruit’s latest spooky build … the spooktacular Haunted Portrait.

Adafruit Raspberry Pi Haunted Portrait

Haunted Portraits

If you’ve visited a haunted house such as Disney’s Haunted Mansion, or walked the halls of Hogwarts at Universal Studios, you will have seen a ‘moving portrait’. Whether it’s the classic ‘did that painting just blink?’ approach, or occupants moving in and out of frame, they’re an effective piece of spooky decoration – and now you can make your own!

Adafruit’s AdaBox

John Park, maker extraordinaire, recently posted a live make video where he used the contents of the Raspberry Pi-themed AdaBox 005 to create a blinking portrait.

AdaBox 005 Raspberry Pi Haunted Portrait

The Adabox is Adafruit’s own maker subscription service where plucky makers receive a mystery parcel containing exciting tech and inspirational builds. Their more recent delivery, the AdaBox 005, contains a Raspberry Pi Zero, their own Joy Bonnet, a case, and peripherals, including Pimoroni’s no-solder Hammer Headers.

AdaBox 005 Raspberry Pi Haunted Portrait

While you can purchase the AdaBoxes as one-off buys, subscribers get extra goodies. With AdaBox 005, they received bonus content including Raspberry Pi swag in the form of stickers, and a copy of The MagPi Magazine.

AdaBox 005 Raspberry Pi Haunted Portrait

The contents of AdaBox 005 allows makers to build their own Raspberry Pi Zero tiny gaming machine. But the ever-working minds of the Adafruit team didn’t want to settle there, so they decided to create more tutorials based on the box’s contents, such as John Park’s Haunted Portrait.

Bringing a portrait to life

Alongside the AdaBox 005 content, all of which can be purchased from Adafruit directly, you’ll need a flat-screen monitor and a fancy frame. The former could be an old TV or computer screen while the latter, unless you happen to have an ornate frame that perfectly fits your monitor, can be made from cardboard, CNC-cut wood or gold-painted macaroni and tape … probably.

Adafruit Raspberry Pi Haunted Portrait

You’ll need to attach headers to your Raspberry Pi Zero. For those of you who fear the soldering iron, the Hammer Headers can be hammered into place without the need for melty hot metal. If you’d like to give soldering a go, you can follow Laura’s Getting Started With Soldering tutorial video.

Adafruit Raspberry Pi Haunted Portrait Hammer Header

In his tutorial, John goes on to explain how to set up the Joy Bonnet (if you wish to use it as an added controller), set your Raspberry Pi to display in portrait mode, and manipulate an image in Photoshop or GIMP to create the blinking effect.

Adafruit Raspberry Pi Haunted Portrait

Blinking eyes are just the start of the possibilities for this project. This is your moment to show off your image manipulation skills! Why not have the entire head flash to show the skull within? Or have an ethereal image appear in the background of an otherwise unexceptional painting of a bowl of fruit?

In the final stages of the tutorial, John explains how to set an image slideshow running on the Pi, and how to complete the look with the aforementioned ornate frame. He also goes into detail about the importance of using a matte effect screen or transparent gels to give a more realistic ‘painted’ feel.

You’ll find everything you need to make your own haunted portrait here, including a link to John’s entire live stream.

Get spooky!

We’re going to make this for Pi Towers. In fact, I’m wondering whether I could create an entire gallery of portraits specifically for our reception area and see how long it takes people to notice …

… though I possibly shouldn’t have given my idea away on this rather public blog post.

If you make the Haunted Portrait, or any other Halloween-themed Pi build, make sure you share it with us via social media, or in the comments below.

The post Spooktacular Halloween Haunted Portrait appeared first on Raspberry Pi.

RaspiReader: build your own fingerprint reader

Post Syndicated from Janina Ander original https://www.raspberrypi.org/blog/raspireader-fingerprint-scanner/

Three researchers from Michigan State University have developed a low-cost, open-source fingerprint reader which can detect fake prints. They call it RaspiReader, and they’ve built it using a Raspberry Pi 3 and two Camera Modules. Joshua and his colleagues have just uploaded all the info you need to build your own version — let’s go!

GIF of fingerprint match points being aligned on fingerprint, not real output of RaspiReader software

Sadly not the real output of the RaspiReader

Falsified fingerprints

We’ve probably all seen a movie in which a burglar crosses a room full of laser tripwires and then enters the safe full of loot by tricking the fingerprint-secured lock with a fake print. Turns out, the second part is not that unrealistic: you can fake fingerprints using a range of materials, such as glue or latex.

Examples of live and fake fingerprints collected by the RaspiReader team

The RaspiReader team collected live and fake fingerprints to test the device

If the spoof print layer capping the spoofer’s finger is thin enough, it can even fool readers that detect blood flow, pulse, or temperature. This is becoming a significant security risk, not least for anyone who unlocks their smartphone using a fingerprint.

The RaspiReader

This is where Anil K. Jain comes in: Professor Jain leads a biometrics research group. Under his guidance, Joshua J. Engelsma and Kai Cao set out to develop a fingerprint reader with improved spoof-print detection. Ultimately, they aim to help the development of more secure commercial technologies. With their project, the team has also created an amazing resource for anyone who wants to build their own fingerprint reader.

So that replicating their device would be easy, they wanted to make it using inexpensive, readily available components, which is why they turned to Raspberry Pi technology.

RaspiReader fingerprint scanner by PRIP lab

The Raspireader and its output

Inside the RaspiReader’s 3D-printed housing, LEDs shine light through an acrylic prism, on top of which the user rests their finger. The prism refracts the light so that the two Camera Modules can take images from different angles. The Pi receives these images via a Multi Camera Adapter Module feeding into the CSI port. Collecting two images means the researchers’ spoof detection algorithm has more information to work with.

Comparison of live and spoof fingerprints

Real on the left, fake on the right

RaspiReader software

The Camera Adaptor uses the RPi.GPIO Python package. The RaspiReader performs image processing, and its spoof detection takes image colour and 3D friction ridge patterns into account. The detection algorithm extracts colour local binary patterns … please don’t ask me to explain! You can have a look at the researchers’ manuscript if you want to get stuck into the fine details of their project.

Build your own fingerprint reader

I’ve had my eyes glued to my inbox waiting for Josh to send me links to instructions and files for this build, and here they are (thanks, Josh)! Check out the video tutorial, which walks you through how to assemble the RaspiReader:

RaspiReader: Cost-Effective Open-Source Fingerprint Reader

Building a cost-effective, open-source, and spoof-resilient fingerprint reader for $160* in under an hour. Code: https://github.com/engelsjo/RaspiReader Links to parts: 1. PRISM – https://www.amazon.com/gp/product/B00WL3OBK4/ref=oh_aui_detailpage_o05_s00?ie=UTF8&psc=1 (Better fit) https://www.thorlabs.com/thorproduct.cfm?partnumber=PS611 2. RaspiCams – https://www.amazon.com/gp/product/B012V1HEP4/ref=oh_aui_detailpage_o00_s00?ie=UTF8&psc=1 3. Camera Multiplexer https://www.amazon.com/gp/product/B012UQWOOQ/ref=oh_aui_detailpage_o04_s01?ie=UTF8&psc=1 4. Raspberry Pi Kit: https://www.amazon.com/CanaKit-Raspberry-Clear-Power-Supply/dp/B01C6EQNNK/ref=sr_1_6?ie=UTF8&qid=1507058509&sr=8-6&keywords=raspberry+pi+3b Whitepaper: https://arxiv.org/abs/1708.07887 * Prices can vary based on Amazon’s pricing. P.s.

You can find a parts list with links to suppliers in the video description — the whole build costs around $160. All the STL files for the housing and the Python scripts you need to run on the Pi are available on Josh’s GitHub.

Enhance your home security

The RaspiReader is a great resource for researchers, and it would also be a terrific project to build at home! Is there a more impressive way to protect a treasured possession, or secure access to your computer, than with a DIY fingerprint scanner?

Check out this James-Bond-themed blog post for Raspberry Pi resources to help you build a high-security lair. If you want even more inspiration, watch this video about a laser-secured cookie jar which Estefannie made for us. And be sure to share your successful fingerprint scanner builds with us via social media!

The post RaspiReader: build your own fingerprint reader appeared first on Raspberry Pi.

MPAA Reports Pirate Sites, Hosts and Ad-Networks to US Government

Post Syndicated from Ernesto original https://torrentfreak.com/mpaa-reports-pirate-sites-hosts-and-ad-networks-to-us-government-171004/

Responding to a request from the Office of the US Trade Representative (USTR), the MPAA has submitted an updated list of “notorious markets” that it says promote the illegal distribution of movies and TV-shows.

These annual submissions help to guide the U.S. Government’s position towards foreign countries when it comes to copyright enforcement.

What stands out in the MPAA’s latest overview is that it no longer includes offline markets, only sites and services that are available on the Internet. This suggests that online copyright infringement is seen as a priority.

The MPAA’s report includes more than two dozen alleged pirate sites in various categories. While this is not an exhaustive list, the movie industry specifically highlights some of the worst offenders in various categories.

“Content thieves take advantage of a wide constellation of easy-to-use online technologies, such as direct download and streaming, to create infringing sites and applications, often with the look and feel of legitimate content distributors, luring unsuspecting consumers into piracy,” the MPAA writes.

According to the MPAA, torrent sites remain popular, serving millions of torrents to tens of millions of users at any given time.

The Pirate Bay has traditionally been one of the main targets. Based on data from Alexa and SimilarWeb, the MPAA says that TPB has about 62 million unique visitors per month. The other torrent sites mentioned are 1337x.to, Rarbg.to, Rutracker.org, and Torrentz2.eu.

MPAA calls out torrent sites

The second highlighted category covers various linking and streaming sites. This includes the likes of Fmovies.is, Gostream.is, Primewire.ag, Kinogo.club, MeWatchSeries.to, Movie4k.tv and Repelis.tv.

Direct download sites and video hosting services also get a mention. Nowvideo.sx, Openload.co, Rapidgator.net, Uploaded.net and the Russian social network VK.com. Many of these services refuse to properly process takedown notices, the MPAA claims.

The last category is new and centers around piracy apps. These sites offer mobile applications that allow users to stream pirated content, such as IpPlayBox.tv, MoreTV, 3DBoBoVR, TVBrowser, and KuaiKa, which are particularly popular in Asia.

Aside from listing specific sites, the MPAA also draws the US Government’s attention to the streaming box problem. The report specifically mentions that Kodi-powered boxes are regularly abused for infringing purposes.

“An emerging global threat is streaming piracy which is enabled by piracy devices preloaded with software to illicitly stream movies and television programming and a burgeoning ecosystem of infringing add-ons,” the MPAA notes.

“The most popular software is an open source media player software, Kodi. Although Kodi is not itself unlawful, and does not host or link to unlicensed content, it can be easily configured to direct consumers toward unlicensed films and television shows.”

Pirate streaming boxes

There are more than 750 websites offering infringing devices, the Hollywood group notes, adding that the rapid growth of this problem is startling. Interestingly, the report mentions TVAddons.ag as a “piracy add-on repository,” noting that it’s currently offline. Whether the new TVAddons is also seen a problematic is unclear.

The MPAA also continues its trend of calling out third-party intermediaries, including hosting providers. These companies refuse to take pirate sites offline following complaints, even when the MPAA views them as blatantly violating the law.

“Hosting companies provide the essential infrastructure required to operate a website,” the MPAA writes. “Given the central role of hosting providers in the online ecosystem, it is very concerning that many refuse to take action upon being notified…”

The Hollywood group specifically mentions Private Layer and Netbrella as notorious markets. CDN provider CloudFlare is also named. As a US-based company, the latter can’t be included in the list. However, the MPAA explains that it is often used as an anonymization tool by sites and services that are mentioned in the report.

Another group of intermediaries that play a role in fueling piracy (mentioned for the first time) are advertising networks. The MPAA specifically calls out the Canadian company WWWPromoter, which works with sites such as Primewire.ag, Projectfreetv.at and 123movies.to

“The companies connecting advertisers to infringing websites and inadvertently contribute to the prevalence and prosperity of infringing sites by providing funding to the operators of these sites through advertising revenue,” the MPAA writes.

The MPAA’s full report is available here (pdf). The USTR will use this input above to make up its own list of notorious markets. This will help to identify current threats and call on foreign governments to take appropriate action.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Adafruit’s read-only Raspberry Pi

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/adafruits-read-only/

For passive projects such as point-of-sale displays, video loopers, and your upcoming Halloween builds, Adafruit have come up with a read-only solution for powering down your Raspberry Pi without endangering your SD card.

Adafruit read-only raspberry pi

Pulling the plug

At home, at a coding club, or at a Jam, you rarely need to pull the plug on your Raspberry Pi without going through the correct shutdown procedure. To ensure a long life for your SD card and its contents, you should always turn off you Pi by selecting the shutdown option from the menu. This way the Pi saves any temporary files to the card before relinquishing power.

Dramatic reconstruction

By pulling the plug while your OS is still running, you might corrupt these files, which could result in the Pi failing to boot up again. The only fix? Wipe the SD card clean and start over, waving goodbye to all files you didn’t back up.

Passive projects

But what if it’s not as easy as selecting shutdown, because your Raspberry Pi is embedded deep inside the belly of a project? Maybe you’ve hot-glued your Zero W into a pumpkin which is now screwed to the roof of your porch, or your store has a bank of Pi-powered monitors playing ads and the power is set to shut off every evening. Without the ability to shut down your Pi via the menu, you risk the SD card’s contents every time you power down your project.

Read-only

Just in time of the plethora of Halloween projects we’re looking forward to this month, the clever folk at Adafruit have designed a solution for this issue. They’ve shared a script which forces the Raspberry Pi to run in read-only mode, so that powering it down via a plug pull will not corrupt the SD card.

But how?

The script makes the Pi save temporary files to the RAM instead of the SD card. Of course, this means that no files or new software can be written to the card. However, if that’s not necessary for your Pi project, you might be happy to make the trade-off. Note that you can only use Adafruit’s script on Raspbian Lite.

Find more about the read-only Raspberry Pi solution, including the script and optional GPIO-halt utility, on the Adafruit Learn page. And be aware that making your Pi read-only is irreversible, so be sure to back up the contents of your SD card before you implement the script.

Halloween!

It’s October, and we’re now allowed to get excited about Halloween and all of the wonderful projects you plan on making for the big night.

Adafruit read-only raspberry pi

Adafruit’s animated snake eyes

We’ll be covering some of our favourite spooky build on social media throughout the month — make sure to share yours with us, either in the comments below or on Facebook, Twitter, Instagram, or G+.

The post Adafruit’s read-only Raspberry Pi appeared first on Raspberry Pi.

Join AWS Security on October 4 for a Night of Trivia at Grace Hopper Celebration 2017

Post Syndicated from Sara Duffer original https://aws.amazon.com/blogs/security/join-aws-security-for-a-night-of-trivia-at-grace-hopper-2017/

AWS Security Jam image

If you’re attending this year’s Grace Hopper Celebration in Orlando, AWS is inviting all attendees to join us for a free evening of learning and networking. This AWS Security Jam will feature an opportunity to learn more about the AWS Security team (and about AWS security), socialize with peers, and engage in a night of trivia with your fellow conference friends. We will provide light appetizers and drinks. RSVP today.

  • Day: Wednesday, October 4, 2017
  • Time: 5:30–8:00 P.M. Eastern Time
  • Location: Rosen Centre Hotel Executive Ballroom, 9840 International Drive, Orlando, FL 32819 (next to the Orange County Convention Center)

The first 150 attendees will win a door prize, and we will give additional prizes as part of a raffle at the end of the event. Follow us on Twitter @AWSSecurityInfo for more information and updates about all things AWS Security and Compliance.

– Sara

Algo-rhythmic PianoAI

Post Syndicated from Janina Ander original https://www.raspberrypi.org/blog/pianoai/

It’s no secret that we love music projects at Pi Towers. On the contrary, we often shout it from the rooftops like we’re in Moulin Rouge! But the PianoAI project by Zack left us slack-jawed: he built an AI on a Raspberry Pi that listens to his piano playing, and then produces improvised, real-time accompaniment.

Jamming with PIanoAI (clip #1) (Version 1.0)

Another example of a short teaching and then jamming with piano with a version I’m more happy with. I have to play for the Pi for a little while before the Pi has enough data to make its own music.

The PianoAI

Inspired by a story about jazz musician Dan Tepfer, Zack set out to create an AI able to imitate his piano-playing style in real time. He began programming the AI in Python, before starting over in the open-source programming language Go.

The Go language gopher mascot with headphones and a MIDI keyboard

The Go mascot is a gopher. Why not?

Zack has published an excellent write-up of how he built PianoAI. It’s a very readable account of the progress he made and the obstacles he had to overcome while writing PianoAI, and it includes more example videos. It’s hard to add anything to Zack’s own words, so I shan’t try.

Paper notes for PianoAI algorithm

Some of Zack’s notes for his AI

If you just want to try out PianoAI, head over to his GitHub. He provides a detailed guide that talks you through how to implement and use it.

Music to our ears

The Raspberry Pi community never fails to amaze us with their wonderful builds, not least when it comes to musical ones. Check out this cool-looking synth by Toby Hendricks, this geometric instrument by David Sharples, and this pyrite-disc-reading music player by Dmitry Morozov. Aren’t they all splendid? And the list goes on and on

Which instrument do you play? The recorder? The ocarina? The jaw harp? Could you create an AI like Zack’s for it? Let us know in the comments below, and share your builds with us via social media.

The post Algo-rhythmic PianoAI appeared first on Raspberry Pi.

Department of Homeland Security to Collect Social Media of Immigrants and Citizens

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/09/department_of_h_3.html

New rules give the DHS permission to collect “social media handles, aliases, associated identifiable information, and search results” as part of people’s immigration file. The Federal Register has the details, which seems to also include US citizens that communicate with immigrants.

This is part of the general trend to srcrutinize people coming into the US more, but it’s hard to get too worked up about the DHS accessing publicly available information. More disturbing is the trend of occasonally asking for social media passwords at the border.

US Court Orders Dozens of “Pirate” Site Domain Seizures

Post Syndicated from Ernesto original https://torrentfreak.com/us-court-orders-dozens-of-pirate-site-domain-seizures-170927/

ABS-CBN, the largest media and entertainment company in the Philippines, has delivered another strike to pirate sites in the United States.

Last week a federal court in Florida signed a default judgment against 43 websites that offered copyright-infringing streams of ABS-CBN owned movies, including Star Cinema titles.

The order was signed exactly one day after the complaint was filed, in what appears to be a streamlined process.

The media company accused the websites of trademark and copyright infringement by making free streams of its content available without permission. It then asked the court for assistance to shut these sites down as soon as possible.

“Defendants’ websites operating under the Subject Domain Names are classic examples of pirate operations, having no regard whatsoever for the rights of ABS-CBN and willfully infringing ABS-CBN’s intellectual property.

“As a result, ABS-CBN requires this Court’s intervention if any meaningful stop is to be put to Defendants’ piracy,” ABS-CBN wrote.

Instead of a lengthy legal process that can take years to complete, ABS-CBN went for an “ex-parte” request for domain seizures, which means that the websites in question are not notified or involved in the process before the order is issued.

After reviewing the proposed injunction, US District Judge Beth Bloom signed off on it. This means that all the associated registrars must hand over the domain names in question.

“The domain name registrars for the Subject Domain Names shall immediately assist in changing the registrar of record for the Subject Domain Names, to a holding account with a registrar of Plaintiffs’ choosing..,” the order (pdf) reads.

In the days that followed, several streaming-site domains were indeed taken over. Movieonline.io, 1movies.tv, 123movieshd.us, 4k-movie.us, icefilms.ws and others are now linking to a notice page with information about the lawsuit instead.

The notice

Gomovies.es, which is also included, has not been transferred yet, but the operator appears to be aware of the lawsuit as the site now redirects to Gomovies.vg. Other domains, such as Onlinefullmovie.me, Putlockerm.live and Newasiantv.io remain online as well.

While the targeted sites together are good for thousands of daily visitors, they’re certainly not the biggest fish.

That said, the most significant thing about the case is not that these domain names have been taken offline. What stands out is the ability of an ex-parte request from a copyright holder to easily take out dozens of sites in one swoop.

Given ABS-CBN’s legal track record, this is likely not the last effort of this kind. The question now is if others will follow suit.

The full list of targeted domain is as follows.

1 movieonline.io
2 1movies.tv
3 gomovies.es
4 123movieshd.us
5 4k-movie.us
6 desitvflix.net
7 globalpinoymovies.com
8 icefilms.ws
9 jhonagemini.com
10 lambinganph.info
11 mrkdrama.com
12 newasiantv.me
13 onlinefullmovie.me
14 pariwiki.net
15 pinoychannel.live
16 pinoychannel.mobi
17 pinoyfullmovies.net
18 pinoyhdtorrent.com
19 pinoylibangandito.pw
20 pinoymoviepedia.ch
21 pinoysharetv.com
22 pinoytambayanhd.com
23 pinoyteleseryerewind.info
24 philnewsnetwork.com
25 pinoytvrewind.info
26 pinoytzater.com
27 subenglike.com
28 tambayantv.org
29 teleseryi.com
30 thepinoy1tv.com
31 thepinoychannel.com
32 tvbwiki.com
33 tvnaa.com
34 urpinoytv.com
35 vikiteleserye.com
36 viralsocialnetwork.com
37 watchpinoymoviesonline.com
38 pinoysteleserye.xyz
39 pinoytambayan.world
40 lambingan.lol
41 123movies.film
42 putlockerm.live
43 yonip.zone
43 yonipzone.rocks

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Dialekt-o-maten vending machine

Post Syndicated from Janina Ander original https://www.raspberrypi.org/blog/dialekt-o-maten-vending-machine/

At some point, many of you will have become exasperated with your AI personal assistant for not understanding you due to your accent – or worse, your fantastic regional dialect! A vending machine from Coca-Cola Sweden turns this issue inside out: the Dialekt-o-maten rewards users with a free soft drink for speaking in a Swedish regional dialect.

The world’s first vending machine where you pay with a dialect!

Thirsty fans along with journalists were invited to try Dialekt-o-maten at Stureplan in central Stockholm. Depending on how well they could pronounce the different phrases in assorted Swedish dialects – they were rewarded an ice cold Coke with that destination on the label.

The Dialekt-o-maten

The machine, which uses a Raspberry Pi, was set up in Stureplan Square in Stockholm. A person presses one of six buttons to choose the regional dialect they want to try out. They then hit ‘record’, and speak into the microphone. The recording is compared to a library of dialect samples, and, if it matches closely enough, voila! — the Dialekt-o-maten dispenses a soft drink for free.

Dialekt-o-maten on the highstreet in Stockholm

Code for the Dialekt-o-maten

The team of developers used the dejavu Python library, as well as custom-written code which responded to new recordings. Carl-Anders Svedberg, one of the developers, said:

Testing the voices and fine-tuning the right level of difficulty for the users was quite tricky. And we really should have had more voice samples. Filtering out noise from the surroundings, like cars and music, was also a small hurdle.

While they wrote the initial software on macOS, the team transferred it to a Raspberry Pi so they could install the hardware inside the Dialekt-o-maten.

Regional dialects

Even though Sweden has only ten million inhabitants, there are more than 100 Swedish dialects. In some areas of Sweden, the local language even still resembles Old Norse. The Dialekt-o-maten recorded how well people spoke the six dialects it used. Apparently, the hardest one to imitate is spoken in Vadstena, and the easiest is spoken in Smögen.

Dialekt-o-maten on Stockholm highstreet

Speech recognition with the Pi

Because of its audio input capabilities, the Raspberry Pi is very useful for building devices that use speech recognition software. One of our favourite projects in this vein is of course Allen Pan’s Real-Life Wizard Duel. We also think this pronunciation training machine by Japanese makers HomeMadeGarbage is really neat. Ideas from these projects and the Dialekt-o-maten could potentially be combined to make a fully fledged language-learning tool!

How about you? Have you used a Raspberry Pi to help you become multilingual? If so, do share your project with us in the comments or via social media.

The post Dialekt-o-maten vending machine appeared first on Raspberry Pi.

FRED-209 Nerf gun tank

Post Syndicated from Janina Ander original https://www.raspberrypi.org/blog/nerf-gun-tank-fred-209/

David Pride, known to many of you as an active member of our maker community, has done it again! His FRED-209 build combines a Nerf gun, 3D printing, a Raspberry Pi Zero, and robotics to make one neat remotely controlled Nerf tank.

FRED-209 – 3D printed Raspberry Pi Nerf Tank

Uploaded by David Pride on 2017-09-17.

A Nerf gun for FRED-209

David says he worked on FRED-209 over the summer in order to have some fun with Nerf guns, which weren’t around when he was a kid. He purchased an Elite Stryfe model at a car boot sale, and took it apart to see what made it tick. Then he set about figuring out how to power it with motors and a servo.

Nerf Elite Stryfe components for the FRED-209 Nerf tank of David Pride

To control the motors, David used a ZeroBorg add-on board for the Pi Zero, and he set up a PlayStation 3 controller to pilot his tank. These components were also part of a robot that David entered into the Pi Wars competition, so he had already written code for them.

3D printing for FRED-209

During prototyping for his Nerf tank, which David named after ED-209 from RoboCop, he used lots of eBay loot and several 3D-printed parts. He used the free OpenSCAD software package to design the parts he wanted to print. If you’re a novice at 3D printing, you might find the printing advice he shares in the write-up on his blog very useful.

3D-printed lid of FRED-209 nerf gun tank by David Pride

David found the 3D printing of the 24cm-long lid of FRED-209 tricky

On eBay, David found some cool-looking chunky wheels, but these turned out to be too heavy for the motors. In the end, he decided to use a Rover 5 chassis, which changed the look of FRED-209 from ‘monster truck’ to ‘tank’.

FRED-209 Nerf tank by David Pride

Next step: teach it to use stairs

The final result looks awesome, and David’s video demonstrates that it shoots very accurately as well. A make like this might be a great defensive project for our new apocalypse-themed Pioneers challenge!

Taking FRED-209 further

David will be uploading code and STL files for FRED-209 soon, so keep an eye on his blog or Twitter for updates. He’s also bringing the Nerf tank to the Cotswold Raspberry Jam this weekend. If you’re attending the event, make sure you catch him and try FRED-209 out yourself.

Never one to rest on his laurels, David is already working on taking his build to the next level. He wants to include a web interface controller and a camera, and is working on implementing OpenCV to give the Nerf tank the ability to autonomously detect targets.

Pi Wars 2018

I have a feeling we might get to see an advanced version of David’s project at next year’s Pi Wars!

The 2018 Pi Wars have just been announced. They will take place on 21-22 April at the Cambridge Computer Laboratory, and you have until 3 October to apply to enter the competition. What are you waiting for? Get making! And as always, do share your robot builds with us via social media.

The post FRED-209 Nerf gun tank appeared first on Raspberry Pi.

Laser Cookies: a YouTube collaboration

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/laser-cookies/

Lasers! Cookies! Raspberry Pi! We’re buzzing with excitement about sharing our latest YouTube video with you, which comes directly from the kitchen of maker Estefannie Explains It All!

Laser-guarded cookies feat. Estefannie Explains It All

Uploaded by Raspberry Pi on 2017-09-18.

Estefannie Explains It All + Raspberry Pi

When Estefannie visited Pi Towers earlier this year, we introduced her to the Raspberry Pi Digital Curriculum and the free resources on our website. We’d already chatted to her via email about the idea of creating a collab video for the Raspberry Pi channel. Once she’d met members of the Raspberry Pi Foundation team and listened to them wax lyrical about the work we do here, she was even more keen to collaborate with us.

Estefannie on Twitter

Ahhhh!!! I still can’t believe I got to hang out and make stuff at the @Raspberry_Pi towers!! Thank you thank you!!

Estefannie returned to the US filled with inspiration for a video for our channel, and we’re so pleased with how awesome her final result is. The video is a super addition to our Raspberry Pi YouTube channel, it shows what our resources can help you achieve, and it’s great fun. You might also have noticed that the project fits in perfectly with this season’s Pioneers challenge. A win all around!

So yeah, we’re really chuffed about this video, and we hope you all like it too!

Estefannie’s Laser Cookies guide

For those of you wanting to try your hand at building your own Cookie Jar Laser Surveillance Security System, Estefannie has provided a complete guide to talk you through it. Here she goes:

First off, you’ll need:

  • 10 lasers
  • 10 photoresistors
  • 10 capacitors
  • 1 Raspberry Pi Zero W
  • 1 buzzer
  • 1 Raspberry Pi Camera Module
  • 12 ft PVC pipes + 4 corners
  • 1 acrylic panel
  • 1 battery pack
  • 8 zip ties
  • tons of cookies

I used the Raspberry Pi Foundation’s Laser trip wire and the Tweeting Babbage resources to get one laser working and to set up the camera and Twitter API. This took me less than an hour, and it was easy, breezy, beautiful, Raspberry Pi.


I soldered ten lasers in parallel and connected ten photoresistors to their own GPIO pins. I didn’t wire them up in series because of sensitivity reasons and to make debugging easier.

Building the frame took a few tries: I actually started with a wood frame, then tried a clear case, and finally realized the best and cleaner solution would be pipes. All the wires go inside the pipes and come out in a small window on the top to wire up to the Zero W.



Using pipes also made the build cheaper, since they were about $3 for 12 ft. Wiring inside the pipes was tricky, and to finish the circuit, I soldered some of the wires after they were already in the pipes.

I tried glueing the lasers to the frame, but the lasers melted the glue and became decalibrated. Next I tried tape, and then I found picture mounting putty. The putty worked perfectly — it was easy to mold a putty base for the lasers and to calibrate and re-calibrate them if needed. Moreover, the lasers stayed in place no matter how hot they got.

Estefannie Explains It All Raspberry Pi Cookie Jar

Although the lasers were not very strong, I still strained my eyes after long hours of calibrating — hence the sunglasses! Working indoors with lasers, sunglasses, and code was weird. But now I can say I’ve done that…in my kitchen.

Using all the knowledge I have shared, this project should take a couple of hours. The code you need lives on my GitHub!

Estefannie Explains It All Raspberry Pi Cookie Jar

“The cookie recipe is my grandma’s, and I am not allowed to share it.”

Estefannie on YouTube

Estefannie made this video for us as a gift, and we’re so grateful for the time and effort she put into it! If you enjoyed it and would like to also show your gratitude, subscribe to her channel on YouTube and follow her on Instagram and Twitter. And if you make something similar, or build anything with our free resources, make sure to share it with us in the comments below or via our social media channels.

The post Laser Cookies: a YouTube collaboration appeared first on Raspberry Pi.

A Million ‘Pirate’ Boxes Sold in the UK During The Last Two Years

Post Syndicated from Andy original https://torrentfreak.com/a-million-pirate-boxes-sold-in-the-uk-during-the-last-two-years-170919/

With the devices hitting the headlines on an almost weekly basis, it probably comes as no surprise that ‘pirate’ set-top boxes are quickly becoming public enemy number one with video rightsholders.

Typically loaded with the legal Kodi software but augmented with third-party addons, these often Android-based pieces of hardware drag piracy out of the realm of the computer savvy and into the living rooms of millions.

One of the countries reportedly most affected by this boom is the UK. The consumption of these devices among the general public is said to have reached epidemic proportions, and anecdotal evidence suggests that terms like Kodi and Showbox are now household terms.

Today we have another report to digest, this time from the Federation Against Copyright Theft, or FACT as they’re often known. Titled ‘Cracking Down on Digital Piracy,’ the report provides a general overview of the piracy scene, tackling well-worn topics such as how release groups and site operators work, among others.

The report is produced by FACT after consultation with the Police Intellectual Property Crime Unit, Intellectual Property Office, Police Scotland, and anti-piracy outfit Entura International. It begins by noting that the vast majority of the British public aren’t involved in the consumption of infringing content.

“The most recent stats show that 75% of Brits who look at content online abide by the law and don’t download or stream it illegally – up from 70% in 2013. However, that still leaves 25% who do access material illegally,” the report reads.

The report quickly heads to the topic of ‘pirate’ set-top boxes which is unsurprising, not least due to FACT’s current focus as a business entity.

While it often positions itself alongside government bodies (which no doubt boosts its status with the general public), FACT is a private limited company serving The Premier League, another company desperate to stamp out the use of infringing devices.

Nevertheless, it’s difficult to argue with some of the figures cited in the report.

“At a conservative estimate, we believe a million set-top boxes with software added
to them to facilitate illegal downloads have been sold in the UK in the last couple
of years,” the Intellectual Property Office reveals.

Interestingly, given a growing tech-savvy public, FACT’s report notes that ready-configured boxes are increasingly coming into the country.

“Historically, individuals and organized gangs have added illegal apps and add-ons onto the boxes once they have been imported, to allow illegal access to premium channels. However more recently, more boxes are coming into the UK complete with illegal access to copyrighted content via apps and add-ons already installed,” FACT notes.

“Boxes are often stored in ‘fulfillment houses’ along with other illegal electrical items and sold on social media. The boxes are either sold as one-off purchases, or with a monthly subscription to access paid-for channels.”

While FACT press releases regularly blur the lines when people are prosecuted for supplying set-top boxes in general, it’s important to note that there are essentially two kinds of products on offer to the public.

The first relies on Kodi-type devices which provide on-going free access to infringing content. The second involves premium IPTV subscriptions which are a whole different level of criminality. Separating the two when reading news reports can be extremely difficult, but it’s a hugely important to recognize the difference when assessing the kinds of sentences set-top box suppliers are receiving in the UK.

Nevertheless, FACT correctly highlights that the supply of both kinds of product are on the increase, with various parties recognizing the commercial opportunities.

“A significant number of home-grown British criminals are now involved in this type of crime. Some of them import the boxes wholesale through entirely legal channels, and modify them with illegal software at home. Others work with sophisticated criminal networks across Europe to bring the boxes into the UK.

“They then sell these boxes online, for example through eBay or Facebook, sometimes managing to sell hundreds or thousands of boxes before being caught,” the company adds.

The report notes that in some cases the sale of infringing set-top boxes occurs through cottage industry, with suppliers often working on their own or with small groups of friends and family. Invetiably, perhaps, larger scale operations are reported to be part of networks with connections to other kinds of crime, such as dealing in drugs.

“In contrast to drugs, streaming devices provide a relatively steady and predictable revenue stream for these criminals – while still being lucrative, often generating hundreds of thousands of pounds a year, they are seen as a lower risk activity with less likelihood of leading to arrest or imprisonment,” FACT reports.

While there’s certainly the potential to earn large sums from ‘pirate’ boxes and premium IPTV services, operating on the “hundreds of thousands of pounds a year” scale in the UK would attract a lot of unwanted attention. That’s not saying that it isn’t already, however.

Noting that digital piracy has evolved hugely over the past three or four years, the report says that the cases investigated so far are just the “tip of the iceberg” and that many other cases are in the early stages and will only become known to the public in the months and years ahead.

Indeed, the Intellectual Property Office hints that some kind of large-scale enforcement action may be on the horizon.

“We have identified a significant criminal business model which we have discussed and shared with key law enforcement partners. I can’t go into detail on this, but as investigations take their course, you will see the scale,” an IPO spokesperson reveals.

While details are necessarily scarce, a source familiar with this area told TF that he would be very surprised if the targets aren’t the growing handful of commercial UK-based IPTV re-sellers who offer full subscription TV services for a few pounds per month.

“They’re brazen. Watch this space,” he said.

FACT’s full report, Cracking Down on Digital Piracy, can be downloaded here (pdf)

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Pioneers: only you can save us

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/pioneers-third-challenge/

Pioneers, we just received this message through our network — have you seen it?

Can you see me? Only YOU can save us!

Uploaded by Raspberry Pi on 2017-09-14.

Only you can save us

We have no choice – we must help her! If things are as bad as she says they are, our only hope of survival is to work together.

We know you have the skills and imagination required to make something. We’ve seen that in previous Pioneers challenges. That’s why we’re coming directly to you with this: we know you won’t let her down.

What you need to do

We’ve watched back through the recording and pulled out as much information as we can:

  • To save us, you have ten weeks to create something using tech. This means you need to be done on 1 December, or it will be too late!
  • The build you will create needs to help her in the treacherous situation she’s in. What you decide to make is completely up to you.
  • Her call is for those of you aged between 11 and 16 who are based in the UK or Republic of Ireland. You need to work in groups of up to five, and you need to find someone aged 18 or over to act as a mentor and support your project.
  • Any tech will do. We work for the Raspberry Pi Foundation, but this doesn’t mean you need to use a Raspberry Pi. Use anything at all — from microcontrollers to repurposed devices such as laptops and cameras.

To keep in contact with you, it looks like she’s created a form for you to fill in and share your team name and details with her. In return she will trade some items with you — things that will help inspire you in your mission. We’ve managed to find the link to the form: you can fill it in here.

Only you can save us - Raspberry Pi Pioneers

In order to help her (and any others who might still be out there!) to recreate your project, you need to make sure you record your working process. Take photos and footage to document how you build your make, and put together a video to send to her when you’re done making.

If you manage to access social media, you could also share your progress as you go along! Make sure to use #MakeYourIdeas, so that other survivors can see your work.

We’ve assembled some more information on the Pioneers website to create a port of call for you. Check it out, and let us know if you have any questions. We will do whatever we can to help you protect the world.

Good luck, everybody! It’s up to you now.

Only you can save us.

The post Pioneers: only you can save us appeared first on Raspberry Pi.