Tag Archives: energy

Predict Billboard Top 10 Hits Using RStudio, H2O and Amazon Athena

Post Syndicated from Gopal Wunnava original https://aws.amazon.com/blogs/big-data/predict-billboard-top-10-hits-using-rstudio-h2o-and-amazon-athena/

Success in the popular music industry is typically measured in terms of the number of Top 10 hits artists have to their credit. The music industry is a highly competitive multi-billion dollar business, and record labels incur various costs in exchange for a percentage of the profits from sales and concert tickets.

Predicting the success of an artist’s release in the popular music industry can be difficult. One release may be extremely popular, resulting in widespread play on TV, radio and social media, while another single may turn out quite unpopular, and therefore unprofitable. Record labels need to be selective in their decision making, and predictive analytics can help them with decision making around the type of songs and artists they need to promote.

In this walkthrough, you leverage H2O.ai, Amazon Athena, and RStudio to make predictions on whether a song might make it to the Top 10 Billboard charts. You explore the GLM, GBM, and deep learning modeling techniques using H2O’s rapid, distributed and easy-to-use open source parallel processing engine. RStudio is a popular IDE, licensed either commercially or under AGPLv3, for working with R. This is ideal if you don’t want to connect to a server via SSH and use code editors such as vi to do analytics. RStudio is available in a desktop version, or a server version that allows you to access R via a web browser. RStudio’s Notebooks feature is used to demonstrate the execution of code and output. In addition, this post showcases how you can leverage Athena for query and interactive analysis during the modeling phase. A working knowledge of statistics and machine learning would be helpful to interpret the analysis being performed in this post.

Walkthrough

Your goal is to predict whether a song will make it to the Top 10 Billboard charts. For this purpose, you will be using multiple modeling techniques―namely GLM, GBM and deep learning―and choose the model that is the best fit.

This solution involves the following steps:

  • Install and configure RStudio with Athena
  • Log in to RStudio
  • Install R packages
  • Connect to Athena
  • Create a dataset
  • Create models

Install and configure RStudio with Athena

Use the following AWS CloudFormation stack to install, configure, and connect RStudio on an Amazon EC2 instance with Athena.

Launching this stack creates all required resources and prerequisites:

  • Amazon EC2 instance with Amazon Linux (minimum size of t2.large is recommended)
  • Provisioning of the EC2 instance in an existing VPC and public subnet
  • Installation of Java 8
  • Assignment of an IAM role to the EC2 instance with the required permissions for accessing Athena and Amazon S3
  • Security group allowing access to the RStudio and SSH ports from the internet (I recommend restricting access to these ports)
  • S3 staging bucket required for Athena (referenced within RStudio as ATHENABUCKET)
  • RStudio username and password
  • Setup logs in Amazon CloudWatch Logs (if needed for additional troubleshooting)
  • Amazon EC2 Systems Manager agent, which makes it easy to manage and patch

All AWS resources are created in the US-East-1 Region. To avoid cross-region data transfer fees, launch the CloudFormation stack in the same region. To check the availability of Athena in other regions, see Region Table.

Log in to RStudio

The instance security group has been automatically configured to allow incoming connections on the RStudio port 8787 from any source internet address. You can edit the security group to restrict source IP access. If you have trouble connecting, ensure that port 8787 isn’t blocked by subnet network ACLS or by your outgoing proxy/firewall.

  1. In the CloudFormation stack, choose Outputs, Value, and then open the RStudio URL. You might need to wait for a few minutes until the instance has been launched.
  2. Log in to RStudio with the and password you provided during setup.

Install R packages

Next, install the required R packages from the RStudio console. You can download the R notebook file containing just the code.

#install pacman – a handy package manager for managing installs
if("pacman" %in% rownames(installed.packages()) == FALSE)
{install.packages("pacman")}  
library(pacman)
p_load(h2o,rJava,RJDBC,awsjavasdk)
h2o.init(nthreads = -1)
##  Connection successful!
## 
## R is connected to the H2O cluster: 
##     H2O cluster uptime:         2 hours 42 minutes 
##     H2O cluster version:        3.10.4.6 
##     H2O cluster version age:    4 months and 4 days !!! 
##     H2O cluster name:           H2O_started_from_R_rstudio_hjx881 
##     H2O cluster total nodes:    1 
##     H2O cluster total memory:   3.30 GB 
##     H2O cluster total cores:    4 
##     H2O cluster allowed cores:  4 
##     H2O cluster healthy:        TRUE 
##     H2O Connection ip:          localhost 
##     H2O Connection port:        54321 
##     H2O Connection proxy:       NA 
##     H2O Internal Security:      FALSE 
##     R Version:                  R version 3.3.3 (2017-03-06)
## Warning in h2o.clusterInfo(): 
## Your H2O cluster version is too old (4 months and 4 days)!
## Please download and install the latest version from http://h2o.ai/download/
#install aws sdk if not present (pre-requisite for using Athena with an IAM role)
if (!aws_sdk_present()) {
  install_aws_sdk()
}

load_sdk()
## NULL

Connect to Athena

Next, establish a connection to Athena from RStudio, using an IAM role associated with your EC2 instance. Use ATHENABUCKET to specify the S3 staging directory.

URL <- 'https://s3.amazonaws.com/athena-downloads/drivers/AthenaJDBC41-1.0.1.jar'
fil <- basename(URL)
#download the file into current working directory
if (!file.exists(fil)) download.file(URL, fil)
#verify that the file has been downloaded successfully
list.files()
## [1] "AthenaJDBC41-1.0.1.jar"
drv <- JDBC(driverClass="com.amazonaws.athena.jdbc.AthenaDriver", fil, identifier.quote="'")

con <- jdbcConnection <- dbConnect(drv, 'jdbc:awsathena://athena.us-east-1.amazonaws.com:443/',
                                   s3_staging_dir=Sys.getenv("ATHENABUCKET"),
                                   aws_credentials_provider_class="com.amazonaws.auth.DefaultAWSCredentialsProviderChain")

Verify the connection. The results returned depend on your specific Athena setup.

con
## <JDBCConnection>
dbListTables(con)
##  [1] "gdelt"               "wikistats"           "elb_logs_raw_native"
##  [4] "twitter"             "twitter2"            "usermovieratings"   
##  [7] "eventcodes"          "events"              "billboard"          
## [10] "billboardtop10"      "elb_logs"            "gdelthist"          
## [13] "gdeltmaster"         "twitter"             "twitter3"

Create a dataset

For this analysis, you use a sample dataset combining information from Billboard and Wikipedia with Echo Nest data in the Million Songs Dataset. Upload this dataset into your own S3 bucket. The table below provides a description of the fields used in this dataset.

Field Description
yearYear that song was released
songtitleTitle of the song
artistnameName of the song artist
songidUnique identifier for the song
artistidUnique identifier for the song artist
timesignatureVariable estimating the time signature of the song
timesignature_confidenceConfidence in the estimate for the timesignature
loudnessContinuous variable indicating the average amplitude of the audio in decibels
tempoVariable indicating the estimated beats per minute of the song
tempo_confidenceConfidence in the estimate for tempo
keyVariable with twelve levels indicating the estimated key of the song (C, C#, B)
key_confidenceConfidence in the estimate for key
energyVariable that represents the overall acoustic energy of the song, using a mix of features such as loudness
pitchContinuous variable that indicates the pitch of the song
timbre_0_min thru timbre_11_minVariables that indicate the minimum values over all segments for each of the twelve values in the timbre vector
timbre_0_max thru timbre_11_maxVariables that indicate the maximum values over all segments for each of the twelve values in the timbre vector
top10Indicator for whether or not the song made it to the Top 10 of the Billboard charts (1 if it was in the top 10, and 0 if not)

Create an Athena table based on the dataset

In the Athena console, select the default database, sampled, or create a new database.

Run the following create table statement.

create external table if not exists billboard
(
year int,
songtitle string,
artistname string,
songID string,
artistID string,
timesignature int,
timesignature_confidence double,
loudness double,
tempo double,
tempo_confidence double,
key int,
key_confidence double,
energy double,
pitch double,
timbre_0_min double,
timbre_0_max double,
timbre_1_min double,
timbre_1_max double,
timbre_2_min double,
timbre_2_max double,
timbre_3_min double,
timbre_3_max double,
timbre_4_min double,
timbre_4_max double,
timbre_5_min double,
timbre_5_max double,
timbre_6_min double,
timbre_6_max double,
timbre_7_min double,
timbre_7_max double,
timbre_8_min double,
timbre_8_max double,
timbre_9_min double,
timbre_9_max double,
timbre_10_min double,
timbre_10_max double,
timbre_11_min double,
timbre_11_max double,
Top10 int
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION 's3://aws-bigdata-blog/artifacts/predict-billboard/data'
;

Inspect the table definition for the ‘billboard’ table that you have created. If you chose a database other than sampledb, replace that value with your choice.

dbGetQuery(con, "show create table sampledb.billboard")
##                                      createtab_stmt
## 1       CREATE EXTERNAL TABLE `sampledb.billboard`(
## 2                                       `year` int,
## 3                               `songtitle` string,
## 4                              `artistname` string,
## 5                                  `songid` string,
## 6                                `artistid` string,
## 7                              `timesignature` int,
## 8                `timesignature_confidence` double,
## 9                                `loudness` double,
## 10                                  `tempo` double,
## 11                       `tempo_confidence` double,
## 12                                       `key` int,
## 13                         `key_confidence` double,
## 14                                 `energy` double,
## 15                                  `pitch` double,
## 16                           `timbre_0_min` double,
## 17                           `timbre_0_max` double,
## 18                           `timbre_1_min` double,
## 19                           `timbre_1_max` double,
## 20                           `timbre_2_min` double,
## 21                           `timbre_2_max` double,
## 22                           `timbre_3_min` double,
## 23                           `timbre_3_max` double,
## 24                           `timbre_4_min` double,
## 25                           `timbre_4_max` double,
## 26                           `timbre_5_min` double,
## 27                           `timbre_5_max` double,
## 28                           `timbre_6_min` double,
## 29                           `timbre_6_max` double,
## 30                           `timbre_7_min` double,
## 31                           `timbre_7_max` double,
## 32                           `timbre_8_min` double,
## 33                           `timbre_8_max` double,
## 34                           `timbre_9_min` double,
## 35                           `timbre_9_max` double,
## 36                          `timbre_10_min` double,
## 37                          `timbre_10_max` double,
## 38                          `timbre_11_min` double,
## 39                          `timbre_11_max` double,
## 40                                     `top10` int)
## 41                             ROW FORMAT DELIMITED 
## 42                         FIELDS TERMINATED BY ',' 
## 43                            STORED AS INPUTFORMAT 
## 44       'org.apache.hadoop.mapred.TextInputFormat' 
## 45                                     OUTPUTFORMAT 
## 46  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
## 47                                        LOCATION
## 48    's3://aws-bigdata-blog/artifacts/predict-billboard/data'
## 49                                  TBLPROPERTIES (
## 50            'transient_lastDdlTime'='1505484133')

Run a sample query

Next, run a sample query to obtain a list of all songs from Janet Jackson that made it to the Billboard Top 10 charts.

dbGetQuery(con, " SELECT songtitle,artistname,top10   FROM sampledb.billboard WHERE lower(artistname) =     'janet jackson' AND top10 = 1")
##                       songtitle    artistname top10
## 1                       Runaway Janet Jackson     1
## 2               Because Of Love Janet Jackson     1
## 3                         Again Janet Jackson     1
## 4                            If Janet Jackson     1
## 5  Love Will Never Do (Without You) Janet Jackson 1
## 6                     Black Cat Janet Jackson     1
## 7               Come Back To Me Janet Jackson     1
## 8                       Alright Janet Jackson     1
## 9                      Escapade Janet Jackson     1
## 10                Rhythm Nation Janet Jackson     1

Determine how many songs in this dataset are specifically from the year 2010.

dbGetQuery(con, " SELECT count(*)   FROM sampledb.billboard WHERE year = 2010")
##   _col0
## 1   373

The sample dataset provides certain song properties of interest that can be analyzed to gauge the impact to the song’s overall popularity. Look at one such property, timesignature, and determine the value that is the most frequent among songs in the database. Timesignature is a measure of the number of beats and the type of note involved.

Running the query directly may result in an error, as shown in the commented lines below. This error is a result of trying to retrieve a large result set over a JDBC connection, which can cause out-of-memory issues at the client level. To address this, reduce the fetch size and run again.

#t<-dbGetQuery(con, " SELECT timesignature FROM sampledb.billboard")
#Note:  Running the preceding query results in the following error: 
#Error in .jcall(rp, "I", "fetch", stride, block): java.sql.SQLException: The requested #fetchSize is more than the allowed value in Athena. Please reduce the fetchSize and try #again. Refer to the Athena documentation for valid fetchSize values.
# Use the dbSendQuery function, reduce the fetch size, and run again
r <- dbSendQuery(con, " SELECT timesignature     FROM sampledb.billboard")
dftimesignature<- fetch(r, n=-1, block=100)
dbClearResult(r)
## [1] TRUE
table(dftimesignature)
## dftimesignature
##    0    1    3    4    5    7 
##   10  143  503 6787  112   19
nrow(dftimesignature)
## [1] 7574

From the results, observe that 6787 songs have a timesignature of 4.

Next, determine the song with the highest tempo.

dbGetQuery(con, " SELECT songtitle,artistname,tempo   FROM sampledb.billboard WHERE tempo = (SELECT max(tempo) FROM sampledb.billboard) ")
##                   songtitle      artistname   tempo
## 1 Wanna Be Startin' Somethin' Michael Jackson 244.307

Create the training dataset

Your model needs to be trained such that it can learn and make accurate predictions. Split the data into training and test datasets, and create the training dataset first.  This dataset contains all observations from the year 2009 and earlier. You may face the same JDBC connection issue pointed out earlier, so this query uses a fetch size.

#BillboardTrain <- dbGetQuery(con, "SELECT * FROM sampledb.billboard WHERE year <= 2009")
#Running the preceding query results in the following error:-
#Error in .verify.JDBC.result(r, "Unable to retrieve JDBC result set for ", : Unable to retrieve #JDBC result set for SELECT * FROM sampledb.billboard WHERE year <= 2009 (Internal error)
#Follow the same approach as before to address this issue.

r <- dbSendQuery(con, "SELECT * FROM sampledb.billboard WHERE year <= 2009")
BillboardTrain <- fetch(r, n=-1, block=100)
dbClearResult(r)
## [1] TRUE
BillboardTrain[1:2,c(1:3,6:10)]
##   year           songtitle artistname timesignature
## 1 2009 The Awkward Goodbye    Athlete             3
## 2 2009        Rubik's Cube    Athlete             3
##   timesignature_confidence loudness   tempo tempo_confidence
## 1                    0.732   -6.320  89.614   0.652
## 2                    0.906   -9.541 117.742   0.542
nrow(BillboardTrain)
## [1] 7201

Create the test dataset

BillboardTest <- dbGetQuery(con, "SELECT * FROM sampledb.billboard where year = 2010")
BillboardTest[1:2,c(1:3,11:15)]
##   year              songtitle        artistname key
## 1 2010 This Is the House That Doubt Built A Day to Remember  11
## 2 2010        Sticks & Bricks A Day to Remember  10
##   key_confidence    energy pitch timbre_0_min
## 1          0.453 0.9666556 0.024        0.002
## 2          0.469 0.9847095 0.025        0.000
nrow(BillboardTest)
## [1] 373

Convert the training and test datasets into H2O dataframes

train.h2o <- as.h2o(BillboardTrain)
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=================================================================| 100%
test.h2o <- as.h2o(BillboardTest)
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=================================================================| 100%

Inspect the column names in your H2O dataframes.

colnames(train.h2o)
##  [1] "year"                     "songtitle"               
##  [3] "artistname"               "songid"                  
##  [5] "artistid"                 "timesignature"           
##  [7] "timesignature_confidence" "loudness"                
##  [9] "tempo"                    "tempo_confidence"        
## [11] "key"                      "key_confidence"          
## [13] "energy"                   "pitch"                   
## [15] "timbre_0_min"             "timbre_0_max"            
## [17] "timbre_1_min"             "timbre_1_max"            
## [19] "timbre_2_min"             "timbre_2_max"            
## [21] "timbre_3_min"             "timbre_3_max"            
## [23] "timbre_4_min"             "timbre_4_max"            
## [25] "timbre_5_min"             "timbre_5_max"            
## [27] "timbre_6_min"             "timbre_6_max"            
## [29] "timbre_7_min"             "timbre_7_max"            
## [31] "timbre_8_min"             "timbre_8_max"            
## [33] "timbre_9_min"             "timbre_9_max"            
## [35] "timbre_10_min"            "timbre_10_max"           
## [37] "timbre_11_min"            "timbre_11_max"           
## [39] "top10"

Create models

You need to designate the independent and dependent variables prior to applying your modeling algorithms. Because you’re trying to predict the ‘top10’ field, this would be your dependent variable and everything else would be independent.

Create your first model using GLM. Because GLM works best with numeric data, you create your model by dropping non-numeric variables. You only use the variables in the dataset that describe the numerical attributes of the song in the logistic regression model. You won’t use these variables:  “year”, “songtitle”, “artistname”, “songid”, or “artistid”.

y.dep <- 39
x.indep <- c(6:38)
x.indep
##  [1]  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
## [24] 29 30 31 32 33 34 35 36 37 38

Create Model 1: All numeric variables

Create Model 1 with the training dataset, using GLM as the modeling algorithm and H2O’s built-in h2o.glm function.

modelh1 <- h2o.glm( y = y.dep, x = x.indep, training_frame = train.h2o, family = "binomial")
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=====                                                            |   8%
  |                                                                       
  |=================================================================| 100%

Measure the performance of Model 1, using H2O’s built-in performance function.

h2o.performance(model=modelh1,newdata=test.h2o)
## H2OBinomialMetrics: glm
## 
## MSE:  0.09924684
## RMSE:  0.3150347
## LogLoss:  0.3220267
## Mean Per-Class Error:  0.2380168
## AUC:  0.8431394
## Gini:  0.6862787
## R^2:  0.254663
## Null Deviance:  326.0801
## Residual Deviance:  240.2319
## AIC:  308.2319
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##          0   1    Error     Rate
## 0      255  59 0.187898  =59/314
## 1       17  42 0.288136   =17/59
## Totals 272 101 0.203753  =76/373
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                         metric threshold    value idx
## 1                       max f1  0.192772 0.525000 100
## 2                       max f2  0.124912 0.650510 155
## 3                 max f0point5  0.416258 0.612903  23
## 4                 max accuracy  0.416258 0.879357  23
## 5                max precision  0.813396 1.000000   0
## 6                   max recall  0.037579 1.000000 282
## 7              max specificity  0.813396 1.000000   0
## 8             max absolute_mcc  0.416258 0.455251  23
## 9   max min_per_class_accuracy  0.161402 0.738854 125
## 10 max mean_per_class_accuracy  0.124912 0.765006 155
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or ` 
h2o.auc(h2o.performance(modelh1,test.h2o)) 
## [1] 0.8431394

The AUC metric provides insight into how well the classifier is able to separate the two classes. In this case, the value of 0.8431394 indicates that the classification is good. (A value of 0.5 indicates a worthless test, while a value of 1.0 indicates a perfect test.)

Next, inspect the coefficients of the variables in the dataset.

dfmodelh1 <- as.data.frame(h2o.varimp(modelh1))
dfmodelh1
##                       names coefficients sign
## 1              timbre_0_max  1.290938663  NEG
## 2                  loudness  1.262941934  POS
## 3                     pitch  0.616995941  NEG
## 4              timbre_1_min  0.422323735  POS
## 5              timbre_6_min  0.349016024  NEG
## 6                    energy  0.348092062  NEG
## 7             timbre_11_min  0.307331997  NEG
## 8              timbre_3_max  0.302225619  NEG
## 9             timbre_11_max  0.243632060  POS
## 10             timbre_4_min  0.224233951  POS
## 11             timbre_4_max  0.204134342  POS
## 12             timbre_5_min  0.199149324  NEG
## 13             timbre_0_min  0.195147119  POS
## 14 timesignature_confidence  0.179973904  POS
## 15         tempo_confidence  0.144242598  POS
## 16            timbre_10_max  0.137644568  POS
## 17             timbre_7_min  0.126995955  NEG
## 18            timbre_10_min  0.123851179  POS
## 19             timbre_7_max  0.100031481  NEG
## 20             timbre_2_min  0.096127636  NEG
## 21           key_confidence  0.083115820  POS
## 22             timbre_6_max  0.073712419  POS
## 23            timesignature  0.067241917  POS
## 24             timbre_8_min  0.061301881  POS
## 25             timbre_8_max  0.060041698  POS
## 26                      key  0.056158445  POS
## 27             timbre_3_min  0.050825116  POS
## 28             timbre_9_max  0.033733561  POS
## 29             timbre_2_max  0.030939072  POS
## 30             timbre_9_min  0.020708113  POS
## 31             timbre_1_max  0.014228818  NEG
## 32                    tempo  0.008199861  POS
## 33             timbre_5_max  0.004837870  POS
## 34                                    NA <NA>

Typically, songs with heavier instrumentation tend to be louder (have higher values in the variable “loudness”) and more energetic (have higher values in the variable “energy”). This knowledge is helpful for interpreting the modeling results.

You can make the following observations from the results:

  • The coefficient estimates for the confidence values associated with the time signature, key, and tempo variables are positive. This suggests that higher confidence leads to a higher predicted probability of a Top 10 hit.
  • The coefficient estimate for loudness is positive, meaning that mainstream listeners prefer louder songs with heavier instrumentation.
  • The coefficient estimate for energy is negative, meaning that mainstream listeners prefer songs that are less energetic, which are those songs with light instrumentation.

These coefficients lead to contradictory conclusions for Model 1. This could be due to multicollinearity issues. Inspect the correlation between the variables “loudness” and “energy” in the training set.

cor(train.h2o$loudness,train.h2o$energy)
## [1] 0.7399067

This number indicates that these two variables are highly correlated, and Model 1 does indeed suffer from multicollinearity. Typically, you associate a value of -1.0 to -0.5 or 1.0 to 0.5 to indicate strong correlation, and a value of 0.1 to 0.1 to indicate weak correlation. To avoid this correlation issue, omit one of these two variables and re-create the models.

You build two variations of the original model:

  • Model 2, in which you keep “energy” and omit “loudness”
  • Model 3, in which you keep “loudness” and omit “energy”

You compare these two models and choose the model with a better fit for this use case.

Create Model 2: Keep energy and omit loudness

colnames(train.h2o)
##  [1] "year"                     "songtitle"               
##  [3] "artistname"               "songid"                  
##  [5] "artistid"                 "timesignature"           
##  [7] "timesignature_confidence" "loudness"                
##  [9] "tempo"                    "tempo_confidence"        
## [11] "key"                      "key_confidence"          
## [13] "energy"                   "pitch"                   
## [15] "timbre_0_min"             "timbre_0_max"            
## [17] "timbre_1_min"             "timbre_1_max"            
## [19] "timbre_2_min"             "timbre_2_max"            
## [21] "timbre_3_min"             "timbre_3_max"            
## [23] "timbre_4_min"             "timbre_4_max"            
## [25] "timbre_5_min"             "timbre_5_max"            
## [27] "timbre_6_min"             "timbre_6_max"            
## [29] "timbre_7_min"             "timbre_7_max"            
## [31] "timbre_8_min"             "timbre_8_max"            
## [33] "timbre_9_min"             "timbre_9_max"            
## [35] "timbre_10_min"            "timbre_10_max"           
## [37] "timbre_11_min"            "timbre_11_max"           
## [39] "top10"
y.dep <- 39
x.indep <- c(6:7,9:38)
x.indep
##  [1]  6  7  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
## [24] 30 31 32 33 34 35 36 37 38
modelh2 <- h2o.glm( y = y.dep, x = x.indep, training_frame = train.h2o, family = "binomial")
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=======                                                          |  10%
  |                                                                       
  |=================================================================| 100%

Measure the performance of Model 2.

h2o.performance(model=modelh2,newdata=test.h2o)
## H2OBinomialMetrics: glm
## 
## MSE:  0.09922606
## RMSE:  0.3150017
## LogLoss:  0.3228213
## Mean Per-Class Error:  0.2490554
## AUC:  0.8431933
## Gini:  0.6863867
## R^2:  0.2548191
## Null Deviance:  326.0801
## Residual Deviance:  240.8247
## AIC:  306.8247
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##          0  1    Error     Rate
## 0      280 34 0.108280  =34/314
## 1       23 36 0.389831   =23/59
## Totals 303 70 0.152815  =57/373
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                         metric threshold    value idx
## 1                       max f1  0.254391 0.558140  69
## 2                       max f2  0.113031 0.647208 157
## 3                 max f0point5  0.413999 0.596026  22
## 4                 max accuracy  0.446250 0.876676  18
## 5                max precision  0.811739 1.000000   0
## 6                   max recall  0.037682 1.000000 283
## 7              max specificity  0.811739 1.000000   0
## 8             max absolute_mcc  0.254391 0.469060  69
## 9   max min_per_class_accuracy  0.141051 0.716561 131
## 10 max mean_per_class_accuracy  0.113031 0.761821 157
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`
dfmodelh2 <- as.data.frame(h2o.varimp(modelh2))
dfmodelh2
##                       names coefficients sign
## 1                     pitch  0.700331511  NEG
## 2              timbre_1_min  0.510270513  POS
## 3              timbre_0_max  0.402059546  NEG
## 4              timbre_6_min  0.333316236  NEG
## 5             timbre_11_min  0.331647383  NEG
## 6              timbre_3_max  0.252425901  NEG
## 7             timbre_11_max  0.227500308  POS
## 8              timbre_4_max  0.210663865  POS
## 9              timbre_0_min  0.208516163  POS
## 10             timbre_5_min  0.202748055  NEG
## 11             timbre_4_min  0.197246582  POS
## 12            timbre_10_max  0.172729619  POS
## 13         tempo_confidence  0.167523934  POS
## 14 timesignature_confidence  0.167398830  POS
## 15             timbre_7_min  0.142450727  NEG
## 16             timbre_8_max  0.093377516  POS
## 17            timbre_10_min  0.090333426  POS
## 18            timesignature  0.085851625  POS
## 19             timbre_7_max  0.083948442  NEG
## 20           key_confidence  0.079657073  POS
## 21             timbre_6_max  0.076426046  POS
## 22             timbre_2_min  0.071957831  NEG
## 23             timbre_9_max  0.071393189  POS
## 24             timbre_8_min  0.070225578  POS
## 25                      key  0.061394702  POS
## 26             timbre_3_min  0.048384697  POS
## 27             timbre_1_max  0.044721121  NEG
## 28                   energy  0.039698433  POS
## 29             timbre_5_max  0.039469064  POS
## 30             timbre_2_max  0.018461133  POS
## 31                    tempo  0.013279926  POS
## 32             timbre_9_min  0.005282143  NEG
## 33                                    NA <NA>

h2o.auc(h2o.performance(modelh2,test.h2o)) 
## [1] 0.8431933

You can make the following observations:

  • The AUC metric is 0.8431933.
  • Inspecting the coefficient of the variable energy, Model 2 suggests that songs with high energy levels tend to be more popular. This is as per expectation.
  • As H2O orders variables by significance, the variable energy is not significant in this model.

You can conclude that Model 2 is not ideal for this use , as energy is not significant.

CreateModel 3: Keep loudness but omit energy

colnames(train.h2o)
##  [1] "year"                     "songtitle"               
##  [3] "artistname"               "songid"                  
##  [5] "artistid"                 "timesignature"           
##  [7] "timesignature_confidence" "loudness"                
##  [9] "tempo"                    "tempo_confidence"        
## [11] "key"                      "key_confidence"          
## [13] "energy"                   "pitch"                   
## [15] "timbre_0_min"             "timbre_0_max"            
## [17] "timbre_1_min"             "timbre_1_max"            
## [19] "timbre_2_min"             "timbre_2_max"            
## [21] "timbre_3_min"             "timbre_3_max"            
## [23] "timbre_4_min"             "timbre_4_max"            
## [25] "timbre_5_min"             "timbre_5_max"            
## [27] "timbre_6_min"             "timbre_6_max"            
## [29] "timbre_7_min"             "timbre_7_max"            
## [31] "timbre_8_min"             "timbre_8_max"            
## [33] "timbre_9_min"             "timbre_9_max"            
## [35] "timbre_10_min"            "timbre_10_max"           
## [37] "timbre_11_min"            "timbre_11_max"           
## [39] "top10"
y.dep <- 39
x.indep <- c(6:12,14:38)
x.indep
##  [1]  6  7  8  9 10 11 12 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
## [24] 30 31 32 33 34 35 36 37 38
modelh3 <- h2o.glm( y = y.dep, x = x.indep, training_frame = train.h2o, family = "binomial")
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |========                                                         |  12%
  |                                                                       
  |=================================================================| 100%
perfh3<-h2o.performance(model=modelh3,newdata=test.h2o)
perfh3
## H2OBinomialMetrics: glm
## 
## MSE:  0.0978859
## RMSE:  0.3128672
## LogLoss:  0.3178367
## Mean Per-Class Error:  0.264925
## AUC:  0.8492389
## Gini:  0.6984778
## R^2:  0.2648836
## Null Deviance:  326.0801
## Residual Deviance:  237.1062
## AIC:  303.1062
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##          0  1    Error     Rate
## 0      286 28 0.089172  =28/314
## 1       26 33 0.440678   =26/59
## Totals 312 61 0.144772  =54/373
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                         metric threshold    value idx
## 1                       max f1  0.273799 0.550000  60
## 2                       max f2  0.125503 0.663265 155
## 3                 max f0point5  0.435479 0.628931  24
## 4                 max accuracy  0.435479 0.882038  24
## 5                max precision  0.821606 1.000000   0
## 6                   max recall  0.038328 1.000000 280
## 7              max specificity  0.821606 1.000000   0
## 8             max absolute_mcc  0.435479 0.471426  24
## 9   max min_per_class_accuracy  0.173693 0.745763 120
## 10 max mean_per_class_accuracy  0.125503 0.775073 155
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`
dfmodelh3 <- as.data.frame(h2o.varimp(modelh3))
dfmodelh3
##                       names coefficients sign
## 1              timbre_0_max 1.216621e+00  NEG
## 2                  loudness 9.780973e-01  POS
## 3                     pitch 7.249788e-01  NEG
## 4              timbre_1_min 3.891197e-01  POS
## 5              timbre_6_min 3.689193e-01  NEG
## 6             timbre_11_min 3.086673e-01  NEG
## 7              timbre_3_max 3.025593e-01  NEG
## 8             timbre_11_max 2.459081e-01  POS
## 9              timbre_4_min 2.379749e-01  POS
## 10             timbre_4_max 2.157627e-01  POS
## 11             timbre_0_min 1.859531e-01  POS
## 12             timbre_5_min 1.846128e-01  NEG
## 13 timesignature_confidence 1.729658e-01  POS
## 14             timbre_7_min 1.431871e-01  NEG
## 15            timbre_10_max 1.366703e-01  POS
## 16            timbre_10_min 1.215954e-01  POS
## 17         tempo_confidence 1.183698e-01  POS
## 18             timbre_2_min 1.019149e-01  NEG
## 19           key_confidence 9.109701e-02  POS
## 20             timbre_7_max 8.987908e-02  NEG
## 21             timbre_6_max 6.935132e-02  POS
## 22             timbre_8_max 6.878241e-02  POS
## 23            timesignature 6.120105e-02  POS
## 24                      key 5.814805e-02  POS
## 25             timbre_8_min 5.759228e-02  POS
## 26             timbre_1_max 2.930285e-02  NEG
## 27             timbre_9_max 2.843755e-02  POS
## 28             timbre_3_min 2.380245e-02  POS
## 29             timbre_2_max 1.917035e-02  POS
## 30             timbre_5_max 1.715813e-02  POS
## 31                    tempo 1.364418e-02  NEG
## 32             timbre_9_min 8.463143e-05  NEG
## 33                                    NA <NA>
h2o.sensitivity(perfh3,0.5)
## Warning in h2o.find_row_by_threshold(object, t): Could not find exact
## threshold: 0.5 for this set of metrics; using closest threshold found:
## 0.501855569251422. Run `h2o.predict` and apply your desired threshold on a
## probability column.
## [[1]]
## [1] 0.2033898
h2o.auc(perfh3)
## [1] 0.8492389

You can make the following observations:

  • The AUC metric is 0.8492389.
  • From the confusion matrix, the model correctly predicts that 33 songs will be top 10 hits (true positives). However, it has 26 false positives (songs that the model predicted would be Top 10 hits, but ended up not being Top 10 hits).
  • Loudness has a positive coefficient estimate, meaning that this model predicts that songs with heavier instrumentation tend to be more popular. This is the same conclusion from Model 2.
  • Loudness is significant in this model.

Overall, Model 3 predicts a higher number of top 10 hits with an accuracy rate that is acceptable. To choose the best fit for production runs, record labels should consider the following factors:

  • Desired model accuracy at a given threshold
  • Number of correct predictions for top10 hits
  • Tolerable number of false positives or false negatives

Next, make predictions using Model 3 on the test dataset.

predict.regh <- h2o.predict(modelh3, test.h2o)
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=================================================================| 100%
print(predict.regh)
##   predict        p0          p1
## 1       0 0.9654739 0.034526052
## 2       0 0.9654748 0.034525236
## 3       0 0.9635547 0.036445318
## 4       0 0.9343579 0.065642149
## 5       0 0.9978334 0.002166601
## 6       0 0.9779949 0.022005078
## 
## [373 rows x 3 columns]
predict.regh$predict
##   predict
## 1       0
## 2       0
## 3       0
## 4       0
## 5       0
## 6       0
## 
## [373 rows x 1 column]
dpr<-as.data.frame(predict.regh)
#Rename the predicted column 
colnames(dpr)[colnames(dpr) == 'predict'] <- 'predict_top10'
table(dpr$predict_top10)
## 
##   0   1 
## 312  61

The first set of output results specifies the probabilities associated with each predicted observation.  For example, observation 1 is 96.54739% likely to not be a Top 10 hit, and 3.4526052% likely to be a Top 10 hit (predict=1 indicates Top 10 hit and predict=0 indicates not a Top 10 hit).  The second set of results list the actual predictions made.  From the third set of results, this model predicts that 61 songs will be top 10 hits.

Compute the baseline accuracy, by assuming that the baseline predicts the most frequent outcome, which is that most songs are not Top 10 hits.

table(BillboardTest$top10)
## 
##   0   1 
## 314  59

Now observe that the baseline model would get 314 observations correct, and 59 wrong, for an accuracy of 314/(314+59) = 0.8418231.

It seems that Model 3, with an accuracy of 0.8552, provides you with a small improvement over the baseline model. But is this model useful for record labels?

View the two models from an investment perspective:

  • A production company is interested in investing in songs that are more likely to make it to the Top 10. The company’s objective is to minimize the risk of financial losses attributed to investing in songs that end up unpopular.
  • How many songs does Model 3 correctly predict as a Top 10 hit in 2010? Looking at the confusion matrix, you see that it predicts 33 top 10 hits correctly at an optimal threshold, which is more than half the number
  • It will be more useful to the record label if you can provide the production company with a list of songs that are highly likely to end up in the Top 10.
  • The baseline model is not useful, as it simply does not label any song as a hit.

Considering the three models built so far, you can conclude that Model 3 proves to be the best investment choice for the record label.

GBM model

H2O provides you with the ability to explore other learning models, such as GBM and deep learning. Explore building a model using the GBM technique, using the built-in h2o.gbm function.

Before you do this, you need to convert the target variable to a factor for multinomial classification techniques.

train.h2o$top10=as.factor(train.h2o$top10)
gbm.modelh <- h2o.gbm(y=y.dep, x=x.indep, training_frame = train.h2o, ntrees = 500, max_depth = 4, learn_rate = 0.01, seed = 1122,distribution="multinomial")
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |===                                                              |   5%
  |                                                                       
  |=====                                                            |   7%
  |                                                                       
  |======                                                           |   9%
  |                                                                       
  |=======                                                          |  10%
  |                                                                       
  |======================                                           |  33%
  |                                                                       
  |=====================================                            |  56%
  |                                                                       
  |====================================================             |  79%
  |                                                                       
  |================================================================ |  98%
  |                                                                       
  |=================================================================| 100%
perf.gbmh<-h2o.performance(gbm.modelh,test.h2o)
perf.gbmh
## H2OBinomialMetrics: gbm
## 
## MSE:  0.09860778
## RMSE:  0.3140188
## LogLoss:  0.3206876
## Mean Per-Class Error:  0.2120263
## AUC:  0.8630573
## Gini:  0.7261146
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##          0  1    Error     Rate
## 0      266 48 0.152866  =48/314
## 1       16 43 0.271186   =16/59
## Totals 282 91 0.171582  =64/373
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                       metric threshold    value idx
## 1                     max f1  0.189757 0.573333  90
## 2                     max f2  0.130895 0.693717 145
## 3               max f0point5  0.327346 0.598802  26
## 4               max accuracy  0.442757 0.876676  14
## 5              max precision  0.802184 1.000000   0
## 6                 max recall  0.049990 1.000000 284
## 7            max specificity  0.802184 1.000000   0
## 8           max absolute_mcc  0.169135 0.496486 104
## 9 max min_per_class_accuracy  0.169135 0.796610 104
## 10 max mean_per_class_accuracy  0.169135 0.805948 104
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `
h2o.sensitivity(perf.gbmh,0.5)
## Warning in h2o.find_row_by_threshold(object, t): Could not find exact
## threshold: 0.5 for this set of metrics; using closest threshold found:
## 0.501205344484314. Run `h2o.predict` and apply your desired threshold on a
## probability column.
## [[1]]
## [1] 0.1355932
h2o.auc(perf.gbmh)
## [1] 0.8630573

This model correctly predicts 43 top 10 hits, which is 10 more than the number predicted by Model 3. Moreover, the AUC metric is higher than the one obtained from Model 3.

As seen above, H2O’s API provides the ability to obtain key statistical measures required to analyze the models easily, using several built-in functions. The record label can experiment with different parameters to arrive at the model that predicts the maximum number of Top 10 hits at the desired level of accuracy and threshold.

H2O also allows you to experiment with deep learning models. Deep learning models have the ability to learn features implicitly, but can be more expensive computationally.

Now, create a deep learning model with the h2o.deeplearning function, using the same training and test datasets created before. The time taken to run this model depends on the type of EC2 instance chosen for this purpose.  For models that require more computation, consider using accelerated computing instances such as the P2 instance type.

system.time(
  dlearning.modelh <- h2o.deeplearning(y = y.dep,
                                      x = x.indep,
                                      training_frame = train.h2o,
                                      epoch = 250,
                                      hidden = c(250,250),
                                      activation = "Rectifier",
                                      seed = 1122,
                                      distribution="multinomial"
  )
)
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |===                                                              |   4%
  |                                                                       
  |=====                                                            |   8%
  |                                                                       
  |========                                                         |  12%
  |                                                                       
  |==========                                                       |  16%
  |                                                                       
  |=============                                                    |  20%
  |                                                                       
  |================                                                 |  24%
  |                                                                       
  |==================                                               |  28%
  |                                                                       
  |=====================                                            |  32%
  |                                                                       
  |=======================                                          |  36%
  |                                                                       
  |==========================                                       |  40%
  |                                                                       
  |=============================                                    |  44%
  |                                                                       
  |===============================                                  |  48%
  |                                                                       
  |==================================                               |  52%
  |                                                                       
  |====================================                             |  56%
  |                                                                       
  |=======================================                          |  60%
  |                                                                       
  |==========================================                       |  64%
  |                                                                       
  |============================================                     |  68%
  |                                                                       
  |===============================================                  |  72%
  |                                                                       
  |=================================================                |  76%
  |                                                                       
  |====================================================             |  80%
  |                                                                       
  |=======================================================          |  84%
  |                                                                       
  |=========================================================        |  88%
  |                                                                       
  |============================================================     |  92%
  |                                                                       
  |==============================================================   |  96%
  |                                                                       
  |=================================================================| 100%
##    user  system elapsed 
##   1.216   0.020 166.508
perf.dl<-h2o.performance(model=dlearning.modelh,newdata=test.h2o)
perf.dl
## H2OBinomialMetrics: deeplearning
## 
## MSE:  0.1678359
## RMSE:  0.4096778
## LogLoss:  1.86509
## Mean Per-Class Error:  0.3433013
## AUC:  0.7568822
## Gini:  0.5137644
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##          0  1    Error     Rate
## 0      290 24 0.076433  =24/314
## 1       36 23 0.610169   =36/59
## Totals 326 47 0.160858  =60/373
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                       metric threshold    value idx
## 1                     max f1  0.826267 0.433962  46
## 2                     max f2  0.000000 0.588235 239
## 3               max f0point5  0.999929 0.511811  16
## 4               max accuracy  0.999999 0.865952  10
## 5              max precision  1.000000 1.000000   0
## 6                 max recall  0.000000 1.000000 326
## 7            max specificity  1.000000 1.000000   0
## 8           max absolute_mcc  0.999929 0.363219  16
## 9 max min_per_class_accuracy  0.000004 0.662420 145
## 10 max mean_per_class_accuracy  0.000000 0.685334 224
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`
h2o.sensitivity(perf.dl,0.5)
## Warning in h2o.find_row_by_threshold(object, t): Could not find exact
## threshold: 0.5 for this set of metrics; using closest threshold found:
## 0.496293348880151. Run `h2o.predict` and apply your desired threshold on a
## probability column.
## [[1]]
## [1] 0.3898305
h2o.auc(perf.dl)
## [1] 0.7568822

The AUC metric for this model is 0.7568822, which is less than what you got from the earlier models. I recommend further experimentation using different hyper parameters, such as the learning rate, epoch or the number of hidden layers.

H2O’s built-in functions provide many key statistical measures that can help measure model performance. Here are some of these key terms.

MetricDescription
SensitivityMeasures the proportion of positives that have been correctly identified. It is also called the true positive rate, or recall.
SpecificityMeasures the proportion of negatives that have been correctly identified. It is also called the true negative rate.
ThresholdCutoff point that maximizes specificity and sensitivity. While the model may not provide the highest prediction at this point, it would not be biased towards positives or negatives.
PrecisionThe fraction of the documents retrieved that are relevant to the information needed, for example, how many of the positively classified are relevant
AUC

Provides insight into how well the classifier is able to separate the two classes. The implicit goal is to deal with situations where the sample distribution is highly skewed, with a tendency to overfit to a single class.

0.90 – 1 = excellent (A)

0.8 – 0.9 = good (B)

0.7 – 0.8 = fair (C)

.6 – 0.7 = poor (D)

0.5 – 0.5 = fail (F)

Here’s a summary of the metrics generated from H2O’s built-in functions for the three models that produced useful results.

Metric Model 3GBM ModelDeep Learning Model

Accuracy

(max)

0.882038

(t=0.435479)

0.876676

(t=0.442757)

0.865952

(t=0.999999)

Precision

(max)

1.0

(t=0.821606)

1.0

(t=0802184)

1.0

(t=1.0)

Recall

(max)

1.01.0

1.0

(t=0)

Specificity

(max)

1.01.0

1.0

(t=1)

Sensitivity

 

0.20338980.1355932

0.3898305

(t=0.5)

AUC0.84923890.86305730.756882

Note: ‘t’ denotes threshold.

Your options at this point could be narrowed down to Model 3 and the GBM model, based on the AUC and accuracy metrics observed earlier.  If the slightly lower accuracy of the GBM model is deemed acceptable, the record label can choose to go to production with the GBM model, as it can predict a higher number of Top 10 hits.  The AUC metric for the GBM model is also higher than that of Model 3.

Record labels can experiment with different learning techniques and parameters before arriving at a model that proves to be the best fit for their business. Because deep learning models can be computationally expensive, record labels can choose more powerful EC2 instances on AWS to run their experiments faster.

Conclusion

In this post, I showed how the popular music industry can use analytics to predict the type of songs that make the Top 10 Billboard charts. By running H2O’s scalable machine learning platform on AWS, data scientists can easily experiment with multiple modeling techniques and interactively query the data using Amazon Athena, without having to manage the underlying infrastructure. This helps record labels make critical decisions on the type of artists and songs to promote in a timely fashion, thereby increasing sales and revenue.

If you have questions or suggestions, please comment below.


Additional Reading

Learn how to build and explore a simple geospita simple GEOINT application using SparkR.


About the Authors

gopalGopal Wunnava is a Partner Solution Architect with the AWS GSI Team. He works with partners and customers on big data engagements, and is passionate about building analytical solutions that drive business capabilities and decision making. In his spare time, he loves all things sports and movies related and is fond of old classics like Asterix, Obelix comics and Hitchcock movies.

 

 

Bob Strahan, a Senior Consultant with AWS Professional Services, contributed to this post.

 

 

timeShift(GrafanaBuzz, 1w) Issue 16

Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2017/10/06/timeshiftgrafanabuzz-1w-issue-16/

Welcome to another issue of TimeShift. In addition to the roundup of articles and plugin updates, we had a big announcement this week – Early Bird tickets to GrafanaCon EU are now available! We’re also accepting CFPs through the end of October, so if you have a topic in mind, don’t wait until the last minute, please send it our way. Speakers who are selected will receive a comped ticket to the conference.


Early Bird Tickets Now Available

We’ve released a limited number of Early Bird tickets before General Admission tickets are available. Take advantage of this discount before they’re sold out!

Get Your Early Bird Ticket Now

Interested in speaking at GrafanaCon? We’re looking for technical and non-tecnical talks of all sizes. Submit a CFP Now.


From the Blogosphere

Get insights into your Azure Cosmos DB: partition heatmaps, OMS, and More: Microsoft recently announced the ability to access a subset of Azure Cosmos DB metrics via Azure Monitor API. Grafana Labs built an Azure Monitor Plugin for Grafana 4.5 to visualize the data.

How to monitor Docker for Mac/Windows: Brian was tired of guessing about the performance of his development machines and test environment. Here, he shows how to monitor Docker with Prometheus to get a better understanding of a dev environment in his quest to monitor all the things.

Prometheus and Grafana to Monitor 10,000 servers: This article covers enokido’s process of choosing a monitoring platform. He identifies three possible solutions, outlines the pros and cons of each, and discusses why he chose Prometheus.

GitLab Monitoring: It’s fascinating to see Grafana dashboards with production data from companies around the world. For instance, we’ve previously highlighted the huge number of dashboards Wikimedia publicly shares. This week, we found that GitLab also has public dashboards to explore.

Monitoring a Docker Swarm Cluster with cAdvisor, InfluxDB and Grafana | The Laboratory: It’s important to know the state of your applications in a scalable environment such as Docker Swarm. This video covers an overview of Docker, VM’s vs. containers, orchestration and how to monitor Docker Swarm.

Introducing Telemetry: Actionable Time Series Data from Counters: Learn how to use counters from mulitple disparate sources, devices, operating systems, and applications to generate actionable time series data.

ofp_sniffer Branch 1.2 (docker/influxdb/grafana) Upcoming Features: This video demo shows off some of the upcoming features for OFP_Sniffer, an OpenFlow sniffer to help network troubleshooting in production networks.


Grafana Plugins

Plugin authors add new features and bugfixes all the time, so it’s important to always keep your plugins up to date. To update plugins from on-prem Grafana, use the Grafana-cli tool, if you are using Hosted Grafana, you can update with 1 click! If you have questions or need help, hit up our community site, where the Grafana team and members of the community are happy to help.

UPDATED PLUGIN

PNP for Nagios Data Source – The latest release for the PNP data source has some fixes and adds a mathematical factor option.

Update

UPDATED PLUGIN

Google Calendar Data Source – This week, there was a small bug fix for the Google Calendar annotations data source.

Update

UPDATED PLUGIN

BT Plugins – Our friends at BT have been busy. All of the BT plugins in our catalog received and update this week. The plugins are the Status Dot Panel, the Peak Report Panel, the Trend Box Panel and the Alarm Box Panel.

Changes include:

  • Custom dashboard links now work in Internet Explorer.
  • The Peak Report panel no longer supports click-to-sort.
  • The Status Dot panel tooltips now look like Grafana tooltips.


This week’s MVC (Most Valuable Contributor)

Each week we highlight some of the important contributions from our amazing open source community. This week, we’d like to recognize a contributor who did a lot of work to improve Prometheus support.

pdoan017
Thanks to Alin Sinpaleanfor his Prometheus PR – that aligns the step and interval parameters. Alin got a lot of feedback from the Prometheus community and spent a lot of time and energy explaining, debating and iterating before the PR was ready.
Thank you!


Grafana Labs is Hiring!

We are passionate about open source software and thrive on tackling complex challenges to build the future. We ship code from every corner of the globe and love working with the community. If this sounds exciting, you’re in luck – WE’RE HIRING!

Check out our Open Positions


Tweet of the Week

We scour Twitter each week to find an interesting/beautiful dashboard and show it off! #monitoringLove

Wow – Excited to be a part of exploring data to find out how Mexico City is evolving.

We Need Your Help!

Do you have a graph that you love because the data is beautiful or because the graph provides interesting information? Please get in touch. Tweet or send us an email with a screenshot, and we’ll tell you about this fun experiment.

Tell Me More


What do you think?

That’s a wrap! How are we doing? Submit a comment on this article below, or post something at our community forum. Help us make these weekly roundups better!

Follow us on Twitter, like us on Facebook, and join the Grafana Labs community.

Cloud Storage Doesn’t have to be Convoluted, Complex, or Confusing

Post Syndicated from Ahin Thomas original https://www.backblaze.com/blog/cloud-storage-pricing-comparison/

business man frustrated over cloud storage pricing

So why do many vendors make it so hard to get information about how much you’re storing and how much you’re being charged?

Cloud storage is fast becoming the central repository for mission critical information, irreplaceable memories, and in some cases entire corporate and personal histories. Given this responsibility, we believe cloud storage vendors have an obligation to be transparent as possible in how they interact with their customers.

In that light we decided to challenge four cloud storage vendors and ask two simple questions:

  1. Can a customer understand how much data is stored?
  2. Can a customer understand the bill?

The detailed results are below, but if you wish to skip the details and the screen captures (TL;DR), we’ve summarized the results in the table below.

Summary of Cloud Storage Pricing Test

Our challenge was to upload 1 terabyte of data, store it for one month, and then download it.

Visibility to Data StoredEasy to Understand BillCost
Backblaze B2Accurate, intuitive display of storage information.Available on demand, and the site clearly defines what has and will be charged for.$25
Microsoft AzureStorage is being measured in KiB, but is billed by the GB. With a calculator, it is unclear how much storage we are using.Available, but difficult to find. The nearly 30 day lag in billing creates business and accounting challenges.$72
Amazon S3Incomplete. From the file browsing user interface, there is no reasonable way to understand how much data is being stored.Available on demand. While there are some line items that seem unnecessary for our test, the bill is generally straight-forward to understand.$71
Google Cloud ServiceIncomplete. From the file browsing user interface, there is no reasonable way to understand how much data is being stored.Available, but provides descriptions in units that are not on the pricing table nor commonly used.$100

Cloud Storage Test Details

For our tests, we choose Backblaze B2, Microsoft’s Azure, Amazon’s S3, and Google Cloud Storage. Our idea was simple: Upload 1 TB of data to the comparable service for each vendor, store it for 1 month, download that 1 TB, then document and share the results.

Let’s start with most obvious observation, the cost charged by each vendor for the test:

Cost
Backblaze B2$25
Microsoft Azure$72
Amazon S3$71
Google Cloud Service$100

Later in this post, we’ll see if we can determine the different cost components (storage, downloading, transactions, etc.) for each vendor, but our first step is to see if we can determine how much data we stored. In some cases, the answer is not as obvious as it would seem.

Test 1: Can a Customer Understand How Much Data Is Stored?

At the core, a provider of a service ought to be able to tell a customer how much of the service he or she is using. In this case, one might assume that providers of Cloud Storage would be able to tell customers how much data is being stored at any given moment. It turns out, it’s not that simple.

Backblaze B2
Logging into a Backblaze B2 account, one is presented with a summary screen that displays all “buckets.” Each bucket displays key summary information, including data currently stored.

B2 Cloud Storage Buckets screenshot

Clicking into a given bucket, one can browse individual files. Each file displays its size, and multiple files can be selected to create a size summary.

B2 file tree screenshot

Summary: Accurate, intuitive display of storage information.

Microsoft Azure

Moving on to Microsoft’s Azure, things get a little more “exciting.” There was no area that we could find where one can determine the total amount of data, in GB, stored with Azure.

There’s an area entitled “usage,” but that wasn’t helpful.

Microsoft Azure cloud storage screenshot

We then moved on to “Overview,” but had a couple challenges.The first issue was that we were presented with KiB (kibibyte) as a unit of measure. One GB (the unit of measure used in Azure’s pricing table) equates to roughly 976,563 KiB. It struck us as odd that things would be summarized by a unit of measure different from the billing unit of measure.

Microsoft Azure usage dashboard screenshot

Summary: Storage is being measured in KiB, but is billed by the GB. Even with a calculator, it is unclear how much storage we are using.

Amazon S3

Next we checked on the data we were storing in S3. We again ran into problems.

In the bucket overview, we were able to identify our buckets. However, we could not tell how much data was being stored.

Amazon S3 cloud storage buckets screenshot

Drilling into a bucket, the detail view does tell us file size. However, there was no method for summarizing the data stored within that bucket or for multiple files.

Amazon S3 cloud storage buckets usage screenshot

Summary: Incomplete. From the file browsing user interface, there is no reasonable way to understand how much data is being stored.

Google Cloud Storage (“GCS”)

GCS proved to have its own quirks, as well.

One can easily find the “bucket” summary, however, it does not provide information on data stored.

Google Cloud Storage Bucket screenshot

Clicking into the bucket, one can see files and the size of an individual file. However, no ability to see data total is provided.

Google Cloud Storage bucket files screenshot

Summary: Incomplete. From the file browsing user interface, there is no reasonable way to understand how much data is being stored.

Test 1 Conclusions

We knew how much storage we were uploading and, in many cases, the user will have some sense of the amount of data they are uploading. However, it strikes us as odd that many vendors won’t tell you how much data you have stored. Even stranger are the vendors that provide reporting in a unit of measure that is different from the units in their pricing table.

Test 2: Can a Customer Understand The Bill?

The cloud storage industry has done itself no favors with its tiered pricing that requires a calculator to figure out what’s going on. Setting that aside for a moment, one would presume that bills would be created in clear, auditable ways.

Backblaze

Inside of the Backblaze user interface, one finds a navigation link entitled “Billing.” Clicking on that, the user is presented with line items for previous bills, payments, and an estimate for the upcoming charges.

Backblaze B2 billing screenshot

One can expand any given row to see the the line item transactions composing each bill.

Backblaze B2 billing details screenshot

Summary: Available on demand, and the site clearly defines what has and will be charged for.

Azure

Trying to understand the Azure billing proved to be a bit tricky.

On August 6th, we logged into the billing console and were presented with this screen.

Microsoft Azure billing screenshot

As you can see, on Aug 6th, billing for the period of May-June was not available for download. For the period ending June 26th, we were charged nearly a month later, on July 24th. Clicking into that row item does display line item information.

Microsoft Azure cloud storage billing details screenshot

Summary: Available, but difficult to find. The nearly 30 day lag in billing creates business and accounting challenges.

Amazon S3

Amazon presents a clean billing summary and enables users to “drill down” into line items.

Going to the billing area of AWS, one can survey various monthly bills and is presented with a clean summary of billing charges.

AWS billing screenshot

Expanding into the billing detail, Amazon articulates each line item charge. Within each line item, charges are broken out into sub-line items for the different tiers of pricing.

AWS billing details screenshot

Summary: Available on demand. While there are some line items that seem unnecessary for our test, the bill is generally straight-forward to understand.

Google Cloud Storage (“GCS”)

This was an area where the GCS User Interface, which was otherwise relatively intuitive, became confusing.

Going to the Billing Overview page did not offer much in the way of an overview on charges.

Google Cloud Storage billing screenshot

However, moving down to the “Transactions” section did provide line item detail on all the charges incurred. However, similar to Azure introducing the concept of KiB, Google introduces the concept of the equally confusing Gibibyte (GiB). While all of Google’s pricing tables are listed in terms of GB, the line items reference GiB. 1 GiB is 1.07374 GBs.

Google Cloud Storage billing details screenshot

Summary: Available, but provides descriptions in units that are not on the pricing table nor commonly used.

Test 2 Conclusions

Clearly, some vendors do a better job than others in making their pricing available and understandable. From a transparency standpoint, it’s difficult to justify why a vendor would have their pricing table in units of X, but then put units of Y in the user interface.

Transparency: The Backblaze Way

Transparency isn’t easy. At Backblaze, we believe in investing time and energy into presenting the most intuitive user interfaces that we can create. We take pride in our heritage in the consumer backup space — servicing consumers has taught us how to make things understandable and usable. We do our best to apply those lessons to everything we do.

This philosophy reflects our desire to make our products usable, but it’s also part of a larger ethos of being transparent with our customers. We are being trusted with precious data. We want to repay that trust with, among other things, transparency.

It’s that spirit that was behind the decision to publish our hard drive performance stats, to open source the infrastructure that is behind us having the lowest cost of storage in the industry, and also to open source our erasure coding (the math that drives a significant portion of our redundancy for your data).

Why? We believe it’s not just about good user interface, it’s about the relationship we want to build with our customers.

The post Cloud Storage Doesn’t have to be Convoluted, Complex, or Confusing appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

AWS Hot Startups – August 2017

Post Syndicated from Tina Barr original https://aws.amazon.com/blogs/aws/aws-hot-startups-august-2017/

There’s no doubt about it – Artificial Intelligence is changing the world and how it operates. Across industries, organizations from startups to Fortune 500s are embracing AI to develop new products, services, and opportunities that are more efficient and accessible for their consumers. From driverless cars to better preventative healthcare to smart home devices, AI is driving innovation at a fast rate and will continue to play a more important role in our everyday lives.

This month we’d like to highlight startups using AI solutions to help companies grow. We are pleased to feature:

  • SignalBox – a simple and accessible deep learning platform to help businesses get started with AI.
  • Valossa – an AI video recognition platform for the media and entertainment industry.
  • Kaliber – innovative applications for businesses using facial recognition, deep learning, and big data.

SignalBox (UK)

In 2016, SignalBox founder Alain Richardt was hearing the same comments being made by developers, data scientists, and business leaders. They wanted to get into deep learning but didn’t know where to start. Alain saw an opportunity to commodify and apply deep learning by providing a platform that does the heavy lifting with an easy-to-use web interface, blueprints for common tasks, and just a single-click to productize the models. With SignalBox, companies can start building deep learning models with no coding at all – they just select a data set, choose a network architecture, and go. SignalBox also offers step-by-step tutorials, tips and tricks from industry experts, and consulting services for customers that want an end-to-end AI solution.

SignalBox offers a variety of solutions that are being used across many industries for energy modeling, fraud detection, customer segmentation, insurance risk modeling, inventory prediction, real estate prediction, and more. Existing data science teams are using SignalBox to accelerate their innovation cycle. One innovative UK startup, Energi Mine, recently worked with SignalBox to develop deep networks that predict anomalous energy consumption patterns and do time series predictions on energy usage for businesses with hundreds of sites.

SignalBox uses a variety of AWS services including Amazon EC2, Amazon VPC, Amazon Elastic Block Store, and Amazon S3. The ability to rapidly provision EC2 GPU instances has been a critical factor in their success – both in terms of keeping their operational expenses low, as well as speed to market. The Amazon API Gateway has allowed for operational automation, giving SignalBox the ability to control its infrastructure.

To learn more about SignalBox, visit here.

Valossa (Finland)

As students at the University of Oulu in Finland, the Valossa founders spent years doing research in the computer science and AI labs. During that time, the team witnessed how the world was moving beyond text, with video playing a greater role in day-to-day communication. This spawned an idea to use technology to automatically understand what an audience is viewing and share that information with a global network of content producers. Since 2015, Valossa has been building next generation AI applications to benefit the media and entertainment industry and is moving beyond the capabilities of traditional visual recognition systems.

Valossa’s AI is capable of analyzing any video stream. The AI studies a vast array of data within videos and converts that information into descriptive tags, categories, and overviews automatically. Basically, it sees, hears, and understands videos like a human does. The Valossa AI can detect people, visual and auditory concepts, key speech elements, and labels explicit content to make moderating and filtering content simpler. Valossa’s solutions are designed to provide value for the content production workflow, from media asset management to end-user applications for content discovery. AI-annotated content allows online viewers to jump directly to their favorite scenes or search specific topics and actors within a video.

Valossa leverages AWS to deliver the industry’s first complete AI video recognition platform. Using Amazon EC2 GPU instances, Valossa can easily scale their computation capacity based on customer activity. High-volume video processing with GPU instances provides the necessary speed for time-sensitive workflows. The geo-located Availability Zones in EC2 allow Valossa to bring resources close to their customers to minimize network delays. Valossa also uses Amazon S3 for video ingestion and to provide end-user video analytics, which makes managing and accessing media data easy and highly scalable.

To see how Valossa works, check out www.WhatIsMyMovie.com or enable the Alexa Skill, Valossa Movie Finder. To try the Valossa AI, sign up for free at www.valossa.com.

Kaliber (San Francisco, CA)

Serial entrepreneurs Ray Rahman and Risto Haukioja founded Kaliber in 2016. The pair had previously worked in startups building smart cities and online privacy tools, and teamed up to bring AI to the workplace and change the hospitality industry. Our world is designed to appeal to our senses – stores and warehouses have clearly marked aisles, products are colorfully packaged, and we use these designs to differentiate one thing from another. We tell each other apart by our faces, and previously that was something only humans could measure or act upon. Kaliber is using facial recognition, deep learning, and big data to create solutions for business use. Markets and companies that aren’t typically associated with cutting-edge technology will be able to use their existing camera infrastructure in a whole new way, making them more efficient and better able to serve their customers.

Computer video processing is rapidly expanding, and Kaliber believes that video recognition will extend to far more than security cameras and robots. Using the clients’ network of in-house cameras, Kaliber’s platform extracts key data points and maps them to actionable insights using their machine learning (ML) algorithm. Dashboards connect users to the client’s BI tools via the Kaliber enterprise APIs, and managers can view these analytics to improve their real-world processes, taking immediate corrective action with real-time alerts. Kaliber’s Real Metrics are aimed at combining the power of image recognition with ML to ultimately provide a more meaningful experience for all.

Kaliber uses many AWS services, including Amazon Rekognition, Amazon Kinesis, AWS Lambda, Amazon EC2 GPU instances, and Amazon S3. These services have been instrumental in helping Kaliber meet the needs of enterprise customers in record time.

Learn more about Kaliber here.

Thanks for reading and we’ll see you next month!

-Tina

 

[$] Power-efficient workqueues

Post Syndicated from corbet original https://lwn.net/Articles/731052/rss

Power-efficient workqueues were first introduced in the
3.11 kernel release; since then, fifty or so
subsystems and drivers have been updated to use them. These workqueues
can be especially useful on handheld devices (like tablets and
smartphones), where power is at a premium.
ARM platforms with power-efficient workqueues enabled on Ubuntu and
Android have shown significant improvements in energy consumption (up to
15% for some use cases).

Power Management and Energy-awareness Microconference Accepted into LPC

Post Syndicated from ris original https://lwn.net/Articles/727560/rss

The Power Management and Energy-awareness microconference has been
accepted for this year’s Linux Plumber’s Conference, which runs September
13-15 in Los Angeles, CA. “The agenda this year will focus on a
range of topics including CPUfreq
core improvements and schedutil governor extensions, how to best use
scheduler signals to balance energy consumption and performance and
user space interfaces to control capacity and utilization estimates.
We’ll also discuss selective throttling in thermally constrained
systems, runtime PM for ACPI, CPU cluster idling and the possibility to
implement resume from hibernation in a bootloader.

Bicrophonic Research Institute and the Sonic Bike

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/sonic-bike/

The Bicrophonic Sonic Bike, created by British sound artist Kaffe Matthews, utilises a Raspberry Pi and GPS signals to map location data and plays music and sound in response to the places you take it on your cycling adventures.

What is Bicrophonics?

Bicrophonics is about the mobility of sound, experienced and shared within a moving space, free of headphones and free of the internet. Music made by the journey you take, played with the space that you move through. The Bicrophonic Research Institute (BRI) http://sonicbikes.net

Cycling and music

I’m sure I wasn’t the only teen to go for bike rides with a group of friends and a radio. Spurred on by our favourite movie, the mid-nineties classic Now and Then, we’d hook up a pair of cheap portable speakers to our handlebars, crank up the volume, and sing our hearts out as we cycled aimlessly down country lanes in the cool light evenings of the British summer.

While Sonic Bikes don’t belt out the same classics that my precariously attached speakers provided, they do give you the same sense of connection to your travelling companions via sound. Linked to GPS locations on the same preset map of zones, each bike can produce the same music, creating a cloud of sound as you cycle.

Sonic Bikes

The Sonic Bike uses five physical components: a Raspberry Pi, power source, USB GPS receiver, rechargeable speakers, and subwoofer. Within the Raspberry Pi, the build utilises mapping software to divide a map into zones and connect each zone with a specific music track.

Sonic Bikes Raspberry Pi

Custom software enables the Raspberry Pi to locate itself among the zones using the USB GPS receiver. Then it plays back the appropriate track until it registers a new zone.

Bicrophonic Research Institute

The Bicrophonic Research Institute is a collective of artists and coders with the shared goal of creating sound directed by people and places via Sonic Bikes. In their own words:

Bicrophonics is about the mobility of sound, experienced and shared within a moving space, free of headphones and free of the internet. Music made by the journey you take, played with the space that you move through.

Their technology has potential beyond the aims of the BRI. The Sonic Bike software could be useful for navigation, logging data and playing beats to indicate when to alter speed or direction. You could even use it to create a guided cycle tour, including automatically reproduced information about specific places on the route.

For the creators of Sonic Bike, the project is ever-evolving, and “continues to be researched and developed to expand the compositional potentials and unique listening experiences it creates.”

Sensory Bike

A good example of this evolution is the Sensory Bike. This offshoot of the Sonic Bike idea plays sounds guided by the cyclist’s own movements – it acts like a two-wheeled musical instrument!

lean to go up, slow to go loud,

a work for Sensory Bikes, the Berlin wall and audience to ride it. ‘ lean to go up, slow to go loud ‘ explores freedom and celebrates escape. Celebrating human energy to find solutions, hot air balloons take off, train lines sing, people cheer and nature continues to grow.

Sensors on the wheels, handlebars, and brakes, together with a Sense HAT at the rear, register the unique way in which the rider navigates their location. The bike produces output based on these variables. Its creators at BRI say:

The Sensory Bike becomes a performative instrument – with riders choosing to go slow, go fast, to hop, zigzag, or circle, creating their own unique sound piece that speeds, reverses, and changes pitch while they dance on their bicycle.

Build your own Sonic Bike

As for many wonderful Raspberry Pi-based builds, the project’s code is available on GitHub, enabling makers to recreate it. All the BRI team ask is that you contact them so they can learn more of your plans and help in any way possible. They even provide code to create your own Sonic Kayak using GPS zones, temperature sensors, and an underwater microphone!

Sonic Kayaks explained

Sonic Kayaks are musical instruments for expanding our senses and scientific instruments for gathering marine micro-climate data. Made by foAm_Kernow with the Bicrophonic Research Institute (BRI), two were first launched at the British Science Festival in Swansea Bay September 6th 2016 and used by the public for 2 days.

The post Bicrophonic Research Institute and the Sonic Bike appeared first on Raspberry Pi.

AWS GovCloud (US) Heads East – New Region in the Works for 2018

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-govcloud-us-heads-east-new-region-in-the-works-for-2018/

AWS GovCloud (US) gives AWS customers a place to host sensitive data and regulated workloads in the AWS Cloud. The first AWS GovCloud (US) Region was launched in 2011 and is located on the west coast of the US.

I’m happy to announce that we are working on a second Region that we expect to open in 2018. The upcoming AWS GovCloud (US-East) Region will provide customers with added redundancy, data durability, and resiliency, and will also provide additional options for disaster recovery.

Like the existing region, which we now call AWS GovCloud (US-West), the new region will be isolated and meet top US government compliance requirements including International Traffic in Arms Regulations (ITAR), NIST standards, Federal Risk and Authorization Management Program (FedRAMP) Moderate and High, Department of Defense Impact Levels 2-4, DFARs, IRS1075, and Criminal Justice Information Services (CJIS) requirements. Visit the GovCloud (US) page to learn more about the compliance regimes that we support.

Government agencies and the IT contactors that serve them were early adopters of AWS GovCloud (US), as were companies in regulated industries. These organizations are able to enjoy the flexibility and cost-effectiveness of public cloud while benefiting from the isolation and data protection offered by a region designed and built to meet their regulatory needs and to help them to meet their compliance requirements. Here’s a small sample from our customer base:

Federal (US) GovernmentDepartment of Veterans Affairs, General Services Administration 18F (Digital Services Delivery), NASA JPL, Defense Digital Service, United States Air Force, United States Department of Justice.

Regulated IndustriesCSRA, Talen Energy, Cobham Electronics.

SaaS and Solution ProvidersFIGmd, Blackboard, Splunk, GitHub, Motorola.

Federal, state, and local agencies that want to move their existing applications to the AWS Cloud can take advantage of the AWS Cloud Adoption Framework (CAF) offered by AWS Professional Services.

Jeff;

 

 

AWS Hot Startups – May 2017

Post Syndicated from Tina Barr original https://aws.amazon.com/blogs/aws/aws-hot-startups-may-2017/

April showers bring May startups! This month we have three hot startups for you to check out. Keep reading to find out what they’re up to, and how they’re using AWS to do it.

Today’s post features the following startups:

  • Lobster – an AI-powered platform connecting creative social media users to professionals.
  • Visii – helping consumers find the perfect product using visual search.
  • Tiqets – a curated marketplace for culture and entertainment.

Lobster (London, England)

Every day, social media users generate billions of authentic images and videos to rival typical stock photography. Powered by Artificial Intelligence, Lobster enables brands, agencies, and the press to license visual content directly from social media users so they can find that piece of content that perfectly fits their brand or story. Lobster does the work of sorting through major social networks (Instagram, Flickr, Facebook, Vk, YouTube, and Vimeo) and cloud storage providers (Dropbox, Google Photos, and Verizon) to find media, saving brands and agencies time and energy. Using filters like gender, color, age, and geolocation can help customers find the unique content they’re looking for, while Lobster’s AI and visual recognition finds images instantly. Lobster also runs photo challenges to help customers discover the perfect image to fit their needs.

Lobster is an excellent platform for creative people to get their work discovered while also protecting their content. Users are treated as copyright holders and earn 75% of the final price of every sale. The platform is easy to use: new users simply sign in with an existing social media or cloud account and can start showcasing their artistic talent right away. Lobster allows users to connect to any number of photo storage sources so they’re able to choose which items to share and which to keep private. Once users have selected their favorite photos and videos to share, they can sit back and watch as their work is picked to become the signature for a new campaign or featured on a cool website – and start earning money for their work.

Lobster is using a variety of AWS services to keep everything running smoothly. The company uses Amazon S3 to store photography that was previously ordered by customers. When a customer purchases content, the respective piece of content must be available at any given moment, independent from the original source. Lobster is also using Amazon EC2 for its application servers and Elastic Load Balancing to monitor the state of each server.

To learn more about Lobster, check them out here!

Visii (London, England)

In today’s vast web, a growing number of products are being sold online and searching for something specific can be difficult. Visii was created to cater to businesses and help them extract value from an asset they already have – their images. Their SaaS platform allows clients to leverage an intelligent visual search on their websites and apps to help consumers find the perfect product for them. With Visii, consumers can choose an image and immediately discover more based on their tastes and preferences. Whether it’s clothing, artwork, or home decor, Visii will make recommendations to get consumers to search visually and subsequently help businesses increase their conversion rates.

There are multiple ways for businesses to integrate Visii on their website or app. Many of Visii’s clients choose to build against their API, but Visii also work closely with many clients to figure out the most effective way to do this for each unique case. This has led Visii to help build innovative user interfaces and figure out the best integration points to get consumers to search visually. Businesses can also integrate Visii on their website with a widget – they just need to provide a list of links to their products and Visii does the rest.

Visii runs their entire infrastructure on AWS. Their APIs and pipeline all sit in auto-scaling groups, with ELBs in front of them, sending things across into Amazon Simple Queue Service and Amazon Aurora. Recently, Visii moved from Amazon RDS to Aurora and noted that the process was incredibly quick and easy. Because they make heavy use of machine learning, it is crucial that their pipeline only runs when required and that they maximize the efficiency of their uptime.

To see how companies are using Visii, check out Style Picker and Saatchi Art.

Tiqets (Amsterdam, Netherlands)

Tiqets is making the ticket-buying experience faster and easier for travelers around the world.  Founded in 2013, Tiqets is one of the leading curated marketplaces for admission tickets to museums, zoos, and attractions. Their mission is to help travelers get the most out of their trips by helping them find and experience a city’s culture and entertainment. Tiqets partners directly with vendors to adapt to a customer’s specific needs, and is now active in over 30 cities in the US, Europe, and the Middle East.

With Tiqets, travelers can book tickets either ahead of time or at their destination for a wide range of attractions. The Tiqets app provides real-time availability and delivers tickets straight to customer’s phones via email, direct download, or in the app. Customers save time skipping long lines (a perk of the app!), save trees (don’t need to physically print tickets), and most importantly, they can make the most out of their leisure time. For each attraction featured on Tiqets, there is a lot of helpful information including best modes of transportation, hours, commonly asked questions, and reviews from other customers.

The Tiqets platform consists of the consumer-facing website, the internal and external-facing APIs, and the partner self-service portals. For the app hosting and infrastructure, Tiqets uses AWS services such as Elastic Load Balancing, Amazon EC2, Amazon RDS, Amazon CloudFront, Amazon Route 53, and Amazon ElastiCache. Through the infrastructure orchestration of their AWS configuration, they can easily set up separate development or test environments while staying close to the production environment as well.

Tiqets is hiring! Be sure to check out their jobs page if you are interested in joining the Tiqets team.

Thanks for reading and don’t forget to check out April’s Hot Startups if you missed it.

-Tina Barr

 

 

Open source energy monitoring using Raspberry Pi

Post Syndicated from Helen Lynn original https://www.raspberrypi.org/blog/open-source-energy-monitoring-raspberry-pi/

OpenEnergyMonitor, who make open-source tools for energy monitoring, have been using Raspberry Pi since we launched in 2012. Like Raspberry Pi, they manufacture their hardware in Wales and send it to people all over the world. We invited co-founder Glyn Hudson to tell us why they do what they do, and how Raspberry Pi helps.

Hi, I’m Glyn from OpenEnergyMonitor. The OpenEnergyMonitor project was founded out of a desire for open-source tools to help people understand and relate to their use of energy, their energy systems, and the challenge of sustainable energy.

Photo: an emonPi energy monitoring unit in an aluminium case with an aerial and an LCD display, a mobile phone showing daily energy use as a histogram, and a bunch of daffodils in a glass bottle

The next 20 years will see a revolution in our energy systems, as we switch away from fossil fuels towards a zero-carbon energy supply.

By using energy monitoring, modelling, and assessment tools, we can take an informed approach to determine the best energy-saving measures to apply. We can then check to ensure solutions achieve their expected performance over time.

We started the OpenEnergyMonitor project in 2009, and the first versions of our energy monitoring system used an Arduino with Ethernet Shield, and later a Nanode RF with an embedded Ethernet controller. These early versions were limited by a very basic TCP/IP stack; running any sort of web application locally was totally out of the question!

I can remember my excitement at getting hold of the very first version of the Raspberry Pi in early 2012. Within a few hours of tearing open the padded envelope, we had Emoncms (our open-source web logging, graphing, and visualisation application) up and running locally on the Raspberry Pi. The Pi quickly became our web-connected base station of choice (emonBase). The following year, 2013, we launched the RFM12Pi receiver board (now updated to RFM69Pi). This allowed the Raspberry Pi to receive data via low-power RF 433Mhz from our emonTx energy monitoring unit, and later from our emonTH remote temperature and humidity monitoring node.

Diagram: communication between OpenEnergyMonitor monitoring units, base station and web interface

In 2015 we went all-in with Raspberry Pi when we launched the emonPi, an all-in-one Raspberry Pi energy monitoring unit, via Kickstarter. Thanks to the hard work of the Raspberry Pi Foundation, the emonPi has enjoyed several upgrades: extra processing power from the Raspberry Pi 2, then even more power and integrated wireless LAN thanks to the Raspberry Pi 3. With all this extra processing power, we have been able to build an open software stack including Emoncms, MQTT, Node-RED, and openHAB, allowing the emonPi to function as a powerful home automation hub.

Screenshot: Emoncms Apps interface to emonPi home automation hub, with histogram of daily electricity use

Emoncms Apps interface to emonPi home automation hub

Inspired by the Raspberry Pi Foundation, we manufacture and assemble our hardware in Wales, UK, and ship worldwide via our online store.

All of our work is fully open source. We believe this is a better way of doing things: we can learn from and build upon each other’s work, creating better solutions to the challenges we face. Using Raspberry Pi has allowed us to draw on the expertise and work of many other projects. With lots of help from our fantastic community, we have built an online learning resource section of our website to help others get started: it covers things like basic AC power theory, Arduino, and the bigger picture of sustainable energy.

To learn more about OpenEnergyMonitor systems, take a look at our Getting Started User Guide. We hope you’ll join our community.

The post Open source energy monitoring using Raspberry Pi appeared first on Raspberry Pi.

Analyze your GitHub Project With Elasticsearch And Grafana

Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2017/05/10/analyze-your-github-project-with-elasticsearch-and-grafana/

The Dream

I have, for a long time, wished there was a way to easily export GitHub issues and comments to
Elasticsearch. The standard GitHub graphs for commits and traffic are great but I have
really been missing graphs and analytics on issues and comments.

If we had issues & comments in Elasticsearch, with a well-defined index mapping, we could do some
interesting analytics. For example:

  • Look at project history in terms of issues created
  • Look at project history in terms of comments (can be a measure of community engagement)
  • See how different labels trend over time.
  • Look at distributions (histograms) on the number of issues or comments created per user. Are there a few very active users that represent 70% or 90% of all issues & comments?
  • How long do PRs stay open?
  • How long until issues get their first response?

Why Elasticsearch?

Grafana is most often used with time series databases like Graphite, but for this sort of use case,
it’s about much more than measurements. Part of the power of Grafana is bringing together data from
many different places, and leveraging the strengths of its diverse set of data sources.

Elasticsearch isn’t technically a time series database, but it’s been one of our fastest growing data source
because it really shines for use cases like this. Plus, Grafana’s support for Elasticsearch is getting
better and better.

Elasticsearch is not only a document search DB. Its real power is in the kinds of aggregations you can do. It’s not ideal
for the high volume & high-resolution time series workloads that most time series databases can handle, but for
data with high cardinality (like documents with usernames, issue numbers, etc) it can really shine. It also allows
you to do ad-hoc filtering in a way that time series would not allow, as it would require a unique time series
for every possible filter condition and value.

The GitHub API Crawler

So a few weekends ago I had some left over programming energy and spent a few hours hacking together
this node.js app that uses the GitHub API to crawl all issues and comments which it
then saves as separate documents in Elasticsearch.

It stores them in Elasticsearch with this index mapping:

"mappings": {
  "issue": {
    "properties": {
      "title":            { "type": "text"  },
        "state":            { "type": "keyword"  },
        "repo":             { "type": "keyword"  },
        "labels":           { "type": "keyword"  },
        "number":           { "type": "keyword"  },
        "comments":         { "type": "long"  },
        "assignee":         { "type": "keyword"  },
        "user_login":       { "type": "keyword"  },
        "milestone":        { "type": "keyword"  },
        "created_at":       { "type": "date"  },
        "closed_at":        { "type": "date"  },
        "updated_at":       { "type": "date"  },
        "is_pull_request":  { "type": "boolean"  },
    }
  },
    "comment": {
      "properties": {
        "issue":           { "type": "keyword"  },
        "repo":            { "type": "keyword"  },
        "user_login":      { "type": "keyword"  },
        "created_at":      { "type": "date"     },
      }
    }
}

There are some more numeric fields being saved for reactions that do not need to be defined
in the index mapping.

The Dashboards

With the data finally collected, I built two dashboards; one focused on issues and another one
focused on comments. Both dashboards are templated and allow you to specify which repository
to look at and the granularity (group by time) of the data. You can also add any ad-hoc filter. For example,
only look at issues created by a specific user, or only look at issues with no comments.

Check out the dasboard on our play site. I configured the
github-to-es collector to fetch issues and comments for the main Kubernetes repo, the
main Grafana repo and the Microsoft VS Code editor repository.

The second dashboard shows comment analytics:

Useful How?

I am not exactly sure how useful this data & dashboards are yet. It was mostly a fun hobby project to see some trends and stats
for issue and comment volume. But this could also be useful data that can help you track things like issue label stats. Stats that could
be used to improve categorizing issues and visualizing changes in labeling trends. For example, the graphs could answer questions like:
How did a concerted effort to improve docs change the trend of issues labeled question?

Try it and help me improve it

Check out the GitHub repo grafana/github-to-es it has a basic README with instructions
for how to get started.

Once you have the import working you need to add an Elasticsearch data source in Grafana. For index name you specify github
and for the Timestamp field you specify created_at. Then you can import the the two dashboards i published on Grafana.com:

There are some limitations for how many issues and comments that can be imported in the initial full import due to the paging limit
in the GitHub API. GitHub API returns a maximum of 100 issues or comments per “page” and has a page limit of a maximum of 400 pages. This
means that the full import can only handle 40,000 issues and 40,000 comments.

More data & more cool graphs

There are probably many more interesting queries you can build and the collector could also be improved to fetch and store more fields.

For example:

  • Collect stars & fork stats (needs to be recorded as snapshot docs as there is no API to get historical data for this)
  • Calculate time between issue created and first comment during issue fetching to have that as a field on the issue docs
  • PR details, currently issue API does not include merge status (only a flag if its a PR)
  • Commit docs

There are probably a lot more cool things you can collect & query.

Until next time, keep on graphing!
Torkel Ödegaard
Grafana Creator & Project Lead

Let’s Encrypt 2016 In Review

Post Syndicated from Let's Encrypt - Free SSL/TLS Certificates original https://letsencrypt.org/2017/01/06/le-2016-in-review.html

<p>Our first full year as a live CA was an exciting one. I’m incredibly proud of what our team and community accomplished during 2016. I’d like to share some thoughts about how we’ve changed, what we’ve accomplished, and what we’ve learned.</p>

<p>At the start of 2016, Let’s Encrypt certificates had been available to the public for less than a month and we were supporting approximately 240,000 active (unexpired) certificates. That seemed like a lot at the time! Now we’re frequently issuing that many new certificates in a single day while supporting more than 20,000,000 active certificates in total. We’ve issued more than a million certificates in a single day a few times recently. We’re currently serving an average of 6,700 OCSP responses per second. We’ve done a lot of optimization work, we’ve had to add some hardware, and there have been some long nights for our staff, but we’ve been able to keep up and we’re ready for another year of <a href="https://letsencrypt.org/stats/">strong growth</a>.</p>

<p><center><p><img src="/images/Jan-6-2017-Cert-Stats.png" alt="Let's Encrypt certificate issuance statistics." style="width: 650px; margin-bottom: 17px;"/></p></center></p>

<p>We added a number of <a href="https://letsencrypt.org/upcoming-features/">new features</a> during the past year, including support for the ACME DNS challenge, ECDSA signing, IPv6, and Internationalized Domain Names.</p>

<p>When 2016 started, our root certificate had not been accepted into any major root programs. Today we’ve been accepted into the Mozilla, Apple, and Google root programs. We’re close to announcing acceptance into another major root program. These are major steps towards being able to operate as an independent CA. You can read more about why <a href="https://letsencrypt.org/2016/08/05/le-root-to-be-trusted-by-mozilla.html">here</a>.</p>

<p>The ACME protocol for issuing and managing certificates is at the heart of how Let’s Encrypt works. Having a well-defined and heavily audited specification developed in public on a standards track has been a major contributor to our growth and the growth of our client ecosystem. Great progress was made in 2016 towards standardizing ACME in the <a href="https://datatracker.ietf.org/wg/acme/charter/">IETF ACME working group</a>. We’re hoping for a final document around the end of Q2 2017, and we’ll announce plans for implementation of the updated protocol around that time as well.</p>

<p>Supporting the kind of growth we saw in 2016 meant adding staff, and during the past year Internet Security Research Group (ISRG), the non-profit entity behind Let’s Encrypt, went from four full-time employees to nine. We’re still a pretty small crew given that we’re now one of the largest CAs in the world (if not the largest), but it works because of our intense focus on automation, the fact that we’ve been able to hire great people, and because of the incredible support we receive from the Let’s Encrypt community.</p>

<p>Let’s Encrypt exists in order to help create a 100% encrypted Web. Our own metrics can be interesting, but they’re only really meaningful in terms of the impact they have on progress towards a more secure and privacy-respecting Web. The metric we use to track progress towards that goal is the percentage of page loads using HTTPS, as seen by browsers. According to Firefox Telemetry, the Web has gone from approximately 39% of page loads using HTTPS each day to just about 49% during the past year. We’re incredibly close to a Web that is more encrypted than not. We’re proud to have been a big part of that, but we can’t take credit for all of it. Many people and organizations around the globe have come to realize that we need to invest in a more secure and privacy-respecting Web, and have taken steps to secure their own sites as well as their customers’. Thank you to everyone that has advocated for HTTPS this year, or helped to make it easier for people to make the switch.</p>

<p>We learned some lessons this year. When we had service interruptions they were usually related to managing the rapidly growing database backing our CA. Also, while most of our code had proper tests, some small pieces didn’t and that led to <a href="https://community.letsencrypt.org/c/incidents">incidents</a> that shouldn’t have happened. That said, I’m proud of the way we handle incidents promptly, including quick and transparent public disclosure.</p>

<p>We also learned a lot about our client ecosystem. At the beginning of 2016, ISRG / Let’s Encrypt provided client software called letsencrypt. We’ve always known that we would never be able produce software that would work for every Web server/stack, but we felt that we needed to offer a client that would work well for a large number of people and that could act as a reference client. By March of 2016, earlier than we had foreseen, it had become clear that our community was up to the task of creating a wide range of quality clients, and that our energy would be better spent fostering that community than producing our own client. That’s when we made the decision to <a href="https://letsencrypt.org/2016/03/09/le-client-new-home.html">hand off development of our client</a> to the Electronic Frontier Foundation (EFF). EFF renamed the client to <a href="https://certbot.eff.org/">Certbot</a> and has been doing an excellent job maintaining and improving it as one of <a href="https://letsencrypt.org/docs/client-options/">many client options</a>.</p>

<p>As exciting as 2016 was for Let’s Encrypt and encryption on the Web, 2017 seems set to be an even more incredible year. Much of the infrastructure and many of the plans necessary for a 100% encrypted Web came into being or solidified in 2016. More and more hosting providers and CDNs are supporting HTTPS with one click or by default, often without additional fees. It has never been easier for people and organizations running their own sites to find the tools, services, and information they need to move to HTTPS. Browsers are planning to update their user interfaces to better reflect the risks associated with non-secure connections.</p>

<p>We’d like to thank our community, including our <a href="https://letsencrypt.org/sponsors/">sponsors</a>, for making everything we did this past year possible. Please consider <a href="https://letsencrypt.org/getinvolved/">getting involved</a> or making a <a href="https://letsencrypt.org/donate/">donation</a>, and if your company or organization would like to sponsor Let’s Encrypt please email us at <a href="mailto:[email protected]">[email protected]</a>.</p>

Bluetooth LED bulbs

Post Syndicated from Matthew Garrett original https://mjg59.dreamwidth.org/43722.html

The best known smart bulb setups (such as the Philips Hue and the Belkin Wemo) are based on Zigbee, a low-energy, low-bandwidth protocol that operates on various unlicensed radio bands. The problem with Zigbee is that basically no home routers or mobile devices have a Zigbee radio, so to communicate with them you need an additional device (usually called a hub or bridge) that can speak Zigbee and also hook up to your existing home network. Requests are sent to the hub (either directly if you’re on the same network, or via some external control server if you’re on a different network) and it sends appropriate Zigbee commands to the bulbs.

But requiring an additional device adds some expense. People have attempted to solve this in a couple of ways. The first is building direct network connectivity into the bulbs, in the form of adding an 802.11 controller. Go through some sort of setup process[1], the bulb joins your network and you can communicate with it happily. Unfortunately adding wifi costs more than adding Zigbee, both in terms of money and power – wifi bulbs consume noticeably more power when “off” than Zigbee ones.

There’s a middle ground. There’s a large number of bulbs available from Amazon advertising themselves as Bluetooth, which is true but slightly misleading. They’re actually implementing Bluetooth Low Energy, which is part of the Bluetooth 4.0 spec. Implementing this requires both OS and hardware support, so older systems are unable to communicate. Android 4.3 devices tend to have all the necessary features, and modern desktop Linux is also fine as long as you have a Bluetooth 4.0 controller.

Bluetooth is intended as a low power communications protocol. Bluetooth Low Energy (or BLE) is even lower than that, running in a similar power range to Zigbee. Most semi-modern phones can speak it, so it seems like a pretty good choice. Obviously you lose the ability to access the device remotely, but given the track record on this sort of thing that’s arguably a benefit. There’s a couple of other downsides – the range is worse than Zigbee (but probably still acceptable for any reasonably sized house or apartment), and only one device can be connected to a given BLE server at any one time. That means that if you have the control app open while you’re near a bulb, nobody else can control that bulb until you disconnect.

The quality of the bulbs varies a great deal. Some of them are pure RGB bulbs and incapable of producing a convincing white at a reasonable intensity[2]. Some have additional white LEDs but don’t support running them at the same time as the colour LEDs, so you have the choice between colour or a fixed (and usually more intense) white. Some allow running the white LEDs at the same time as the RGB ones, which means you can vary the colour temperature of the “white” output.

But while the quality of the bulbs varies, the quality of the apps doesn’t really. They’re typically all dreadful, competing on features like changing bulb colour in time to music rather than on providing a pleasant user experience. And the whole “Only one person can control the lights at a time” thing doesn’t really work so well if you actually live with anyone else. I was dissatisfied.

I’d met Mike Ryan at Kiwicon a couple of years back after watching him demonstrate hacking a BLE skateboard. He offered a couple of good hints for reverse engineering these devices, the first being that Android already does almost everything you need. Hidden in the developer settings is an option marked “Enable Bluetooth HCI snoop log”. Turn that on and all Bluetooth traffic (including BLE) is dumped into /sdcard/btsnoop_hci.log. Turn that on, start the app, make some changes, retrieve the file and check it out using Wireshark. Easy.

Conveniently, BLE is very straightforward when it comes to network protocol. The only thing you have is GATT, the Generic Attribute Protocol. Using this you can read and write multiple characteristics. Each packet is limited to a maximum of 20 bytes. Most implementations use a single characteristic for light control, so it’s then just a matter of staring at the dumped packets until something jumps out at you. A pretty typical implementation is something like:

0x56,r,g,b,0x00,0xf0,0x00,0xaa

where r, g and b are each just a single byte representing the corresponding red, green or blue intensity. 0x56 presumably indicates a “Set the light to these values” command, 0xaa indicates end of command and 0xf0 indicates that it’s a request to set the colour LEDs. Sending 0x0f instead results in the previous byte (0x00 in this example) being interpreted as the intensity of the white LEDs. Unfortunately the bulb I tested that speaks this protocol didn’t allow you to drive the white LEDs at the same time as anything else – setting the selection byte to 0xff didn’t result in both sets of intensities being interpreted at once. Boo.

You can test this out fairly easily using the gatttool app. Run hcitool lescan to look for the device (remember that it won’t show up if anything else is connected to it at the time), then do gatttool -b deviceid -I to get an interactive shell. Type connect to initiate a connection, and once connected send commands by doing char-write-cmd handle value using the handle obtained from your hci dump.

I did this successfully for various bulbs, but annoyingly hit a problem with one from Tikteck. The leading byte of each packet was clearly a counter, but the rest of the packet appeared to be garbage. For reasons best known to themselves, they’ve implemented application-level encryption on top of BLE. This was a shame, because they were easily the best of the bulbs I’d used – the white LEDs work in conjunction with the colour ones once you’re sufficiently close to white, giving you good intensity and letting you modify the colour temperature. That gave me incentive, but figuring out the protocol took quite some time. Earlier this week, I finally cracked it. I’ve put a Python implementation on Github. The idea is to tie it into Ulfire running on a central machine with a Bluetooth controller, making it possible for me to control the lights from multiple different apps simultaneously and also integrating with my Echo.

I’d write something about the encryption, but I honestly don’t know. Large parts of this make no sense to me whatsoever. I haven’t even had any gin in the past two weeks. If anybody can explain how anything that’s being done there makes any sense at all[3] that would be appreciated.

[1] typically via the bulb pretending to be an access point, but also these days through a terrifying hack involving spewing UDP multicast packets of varying lengths in order to broadcast the password to associated but unauthenticated devices and good god the future is terrifying

[2] For a given power input, blue LEDs produce more light than other colours. To get white with RGB LEDs you either need to have more red and green LEDs than blue ones (which costs more), or you need to reduce the intensity of the blue ones (which means your headline intensity is lower). Neither is appealing, so most of these bulbs will just give you a blue “white” if you ask for full red, green and blue

[3] Especially the bit where we calculate something from the username and password and then encrypt that using some random numbers as the key, then send 50% of the random numbers and 50% of the encrypted output to the device, because I can’t even

comment count unavailable comments

systemd.conf 2015 Summary

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/systemdconf-2015-summary.html

systemd.conf 2015 is Over Now!

Last week our first systemd.conf conference
took place at betahaus, in Berlin, Germany. With almost 100 attendees,
a dense schedule of 23 high-quality talks stuffed into a single track
on just two days, a productive hackfest and numerous consumed
Club-Mates I believe it was quite a success!

If you couldn’t attend the conference, you may watch all talks on our
YouTube
Channel
. The
slides are available
online
,
too.

Many photos from the conference are available on the Google Events
Page
. Enjoy!

I’d specifically like to thank Daniel Mack, Chris Kühl and Nils Magnus
for running the conference, and making sure that it worked out as
smoothly as it did! Thank you very much, you did a fantastic job!

I’d also specifically like to thank the CCC Video Operation
Center
folks for the excellent video coverage of
the conference. Not only did they implement a live-stream for the
entire talks part of the conference, but also cut and uploaded videos
of all talks to our YouTube
Channel

within the same day (in fact, within a few hours after the talks
finished). That’s quite an impressive feat!

The folks from LinuxTag e.V. put a lot of time and energy in the
organization. It was great to see how well this all worked out!
Excellent work!

(BTW, LinuxTag e.V. and the CCC Video Operation Center folks are
willing to help with the organization of Free Software community
events in Germany (and Europe?). Hence, if you need an entity that can
do the financial work and other stuff for your Free Software project’s
conference, consider pinging LinuxTag, they might be willing to
help. Similar, if you are organizing such an event and are thinking
about providing video coverage, consider pinging the the CCC VOC
folks! Both of them get our best recommendations!)

I’d also like to thank our conference
sponsors
!
Specifically, we’d like to thank our Gold Sponsors Red Hat and
CoreOS for their support. We’d also like to thank our Silver
Sponsor Codethink, and our Bronze Sponsors Pengutronix,
Pantheon, Collabora, Endocode, the Linux Foundation,
Samsung and Travelping, as well as our Cooperation Partners
LinuxTag and kinvolk.io, and our Media Partner Golem.de.

Last but not least I’d really like to thank our speakers and attendees
for presenting and participating in the conference. Of course, the
conference we put together specifically for you, and we really hope
you had as much fun at it as we did!

Thank you all for attending, supporting, and organizing systemd.conf
2015
! We are looking forward to seeing you
and working with you again at systemd.conf 2016!

Thanks!

The Biggest Myths

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/projects/the-biggest-myths.html

Since we first proposed systemd
for inclusion in the distributions it has been frequently discussed in
many forums, mailing lists and conferences. In these discussions one
can often hear certain myths about systemd, that are repeated over and
over again, but certainly don’t gain any truth by constant
repetition. Let’s take the time to debunk a few of them:

  1. Myth: systemd is monolithic.

    If you build systemd with all configuration options enabled you
    will build 69 individual binaries. These binaries all serve different
    tasks, and are neatly separated for a number of reasons. For example,
    we designed systemd with security in mind, hence most daemons run at
    minimal privileges (using kernel capabilities, for example) and are
    responsible for very specific tasks only, to minimize their security
    surface and impact. Also, systemd parallelizes the boot more than any
    prior solution. This parallization happens by running more processes
    in parallel. Thus it is essential that systemd is nicely split up into
    many binaries and thus processes. In fact, many of these
    binaries[1] are separated out so nicely, that they are very
    useful outside of systemd, too.

    A package involving 69 individual binaries can hardly be called
    monolithic. What is different from prior solutions however,
    is that we ship more components in a single tarball, and maintain them
    upstream in a single repository with a unified release cycle.

  2. Myth: systemd is about speed.

    Yes, systemd is fast (A
    pretty complete userspace boot-up in ~900ms, anyone?
    ), but that’s
    primarily just a side-effect of doing things right. In fact, we
    never really sat down and optimized the last tiny bit of performance
    out of systemd. Instead, we actually frequently knowingly picked the
    slightly slower code paths in order to keep the code more
    readable. This doesn’t mean being fast was irrelevant for us, but
    reducing systemd to its speed is certainly quite a misconception,
    since that is certainly not anywhere near the top of our list of
    goals.

  3. Myth: systemd’s fast boot-up is irrelevant for
    servers.

    That is just completely not true. Many administrators actually are
    keen on reduced downtimes during maintenance windows. In High
    Availability setups it’s kinda nice if the failed machine comes back
    up really fast. In cloud setups with a large number of VMs or
    containers the price of slow boots multiplies with the number of
    instances. Spending minutes of CPU and IO on really slow boots of
    hundreds of VMs or containers reduces your system’s density
    drastically, heck, it even costs you more energy. Slow boots can be
    quite financially expensive. Then, fast booting of containers allows
    you to implement a logic such as socket
    activated containers
    , allowing you to drastically increase the
    density of your cloud system.

    Of course, in many server setups boot-up is indeed irrelevant, but
    systemd is supposed to cover the whole range. And yes, I am aware
    that often it is the server firmware that costs the most time at
    boot-up, and the OS anyways fast compared to that, but well, systemd
    is still supposed to cover the whole range (see above…), and no,
    not all servers have such bad firmware, and certainly not VMs and
    containers, which are servers of a kind, too.[2]

  4. Myth: systemd is incompatible with shell scripts.

    This is entirely bogus. We just don’t use them for the boot
    process, because we believe they aren’t the best tool for that
    specific purpose, but that doesn’t mean systemd was incompatible with
    them. You can easily run shell scripts as systemd services, heck, you
    can run scripts written in any language as systemd services,
    systemd doesn’t care the slightest bit what’s inside your
    executable. Moreover, we heavily use shell scripts for our own
    purposes, for installing, building, testing systemd. And you can stick
    your scripts in the early boot process, use them for normal services,
    you can run them at latest shutdown, there are practically no
    limits.

  5. Myth: systemd is difficult.

    This also is entire non-sense. A systemd platform is actually much
    simpler than traditional Linuxes because it unifies
    system objects and their dependencies as systemd units. The
    configuration file language is very simple, and redundant
    configuration files we got rid of. We provide uniform tools for much
    of the configuration of the system. The system is much less
    conglomerate than traditional Linuxes are. We also have pretty
    comprehensive documentation (all linked
    from the homepage
    ) about pretty much every detail of systemd, and
    this not only covers admin/user-facing interfaces, but also developer
    APIs.

    systemd certainly comes with a learning curve. Everything
    does. However, we like to believe that it is actually simpler to
    understand systemd than a Shell-based boot for most people. Surprised
    we say that? Well, as it turns out, Shell is not a pretty language to
    learn, it’s syntax is arcane and complex. systemd unit files are
    substantially easier to understand, they do not expose a programming
    language, but are simple and declarative by nature. That all said, if
    you are experienced in shell, then yes, adopting systemd will take a
    bit of learning.

    To make learning easy we tried hard to provide the maximum
    compatibility to previous solutions. But not only that, on many
    distributions you’ll find that some of the traditional tools will now
    even tell you — while executing what you are asking for — how you
    could do it with the newer tools instead, in a possibly nicer way.

    Anyway, the take-away is probably that systemd is probably as
    simple as such a system can be, and that we try hard to make it easy
    to learn. But yes, if you know sysvinit then adopting systemd will
    require a bit learning, but quite frankly if you mastered sysvinit,
    then systemd should be easy for you.

  6. Myth: systemd is not modular.

    Not true at all. At compile time you have a number of
    configure switches to select what you want to build, and what
    not. And we
    document
    how you can select in even more detail what you need,
    going beyond our configure switches.

    This modularity is not totally unlike the one of the Linux kernel,
    where you can select many features individually at compile time. If the
    kernel is modular enough for you then systemd should be pretty close,
    too.

  7. Myth: systemd is only for desktops.

    That is certainly not true. With systemd we try to cover pretty
    much the same range as Linux itself does. While we care for desktop
    uses, we also care pretty much the same way for server uses, and
    embedded uses as well. You can bet that Red Hat wouldn’t make it a
    core piece of RHEL7 if it wasn’t the best option for managing services
    on servers.

    People from numerous companies work on systemd. Car manufactureres
    build it into cars, Red Hat uses it for a server operating system, and
    GNOME uses many of its interfaces for improving the desktop. You find
    it in toys, in space telescopes, and in wind turbines.

    Most features I most recently worked on are probably relevant
    primarily on servers, such as container
    support
    , resource
    management
    or the security
    features
    . We cover desktop systems pretty well already, and there
    are number of companies doing systemd development for embedded, some
    even offer consulting services in it.

  8. Myth: systemd was created as result of the NIH syndrome.

    This is not true. Before we began working on systemd we were
    pushing for Canonical’s Upstart to be widely adopted (and Fedora/RHEL
    used it too for a while). However, we eventually came to the
    conclusion that its design was inherently flawed at its core (at least
    in our eyes: most fundamentally, it leaves dependency management to
    the admin/developer, instead of solving this hard problem in code),
    and if something’s wrong in the core you better replace it, rather
    than fix it. This was hardly the only reason though, other things that
    came into play, such as the licensing/contribution agreement mess
    around it. NIH wasn’t one of the reasons, though…[3]

  9. Myth: systemd is a freedesktop.org project.

    Well, systemd is certainly hosted at fdo, but freedesktop.org is
    little else but a repository for code and documentation. Pretty much
    any coder can request a repository there and dump his stuff there (as
    long as it’s somewhat relevant for the infrastructure of free
    systems). There’s no cabal involved, no “standardization” scheme, no
    project vetting, nothing. It’s just a nice, free, reliable place to
    have your repository. In that regard it’s a bit like SourceForge,
    github, kernel.org, just not commercial and without over-the-top
    requirements, and hence a good place to keep our stuff.

    So yes, we host our stuff at fdo, but the implied assumption of
    this myth in that there was a group of people who meet and then agree
    on how the future free systems look like, is entirely bogus.

  10. Myth: systemd is not UNIX.

    There’s certainly some truth in that. systemd’s sources do not
    contain a single line of code originating from original UNIX. However,
    we derive inspiration from UNIX, and thus there’s a ton of UNIX in
    systemd. For example, the UNIX idea of “everything is a file” finds
    reflection in that in systemd all services are exposed at runtime in a
    kernel file system, the cgroupfs. Then, one of the original
    features of UNIX was multi-seat support, based on built-in terminal
    support. Text terminals are hardly the state of the art how you
    interface with your computer these days however. With systemd we
    brought native multi-seat
    support back, but this time with full support for today’s hardware,
    covering graphics, mice, audio, webcams and more, and all that fully
    automatic, hotplug-capable and without configuration. In fact the
    design of systemd as a suite of integrated tools that each have their
    individual purposes but when used together are more than just the sum
    of the parts, that’s pretty much at the core of UNIX philosophy. Then,
    the way our project is handled (i.e. maintaining much of the core OS
    in a single git repository) is much closer to the BSD model (which is
    a true UNIX, unlike Linux) of doing things (where most of the core OS
    is kept in a single CVS/SVN repository) than things on Linux ever
    were.

    Ultimately, UNIX is something different for everybody. For us
    systemd maintainers it is something we derive inspiration from. For
    others it is a religion, and much like the other world religions there
    are different readings and understandings of it. Some define UNIX
    based on specific pieces of code heritage, others see it just as a set
    of ideas, others as a set of commands or APIs, and even others as a
    definition of behaviours. Of course, it is impossible to ever make all
    these people happy.

    Ultimately the question whether something is UNIX or not matters
    very little. Being technically excellent is hardly exclusive to
    UNIX. For us, UNIX is a major influence (heck, the biggest one), but
    we also have other influences. Hence in some areas systemd will be
    very UNIXy, and in others a little bit less.

  11. Myth: systemd is complex.

    There’s certainly some truth in that. Modern computers are complex
    beasts, and the OS running on it will hence have to be complex
    too. However, systemd is certainly not more complex than prior
    implementations of the same components. Much rather, it’s simpler, and
    has less redundancy (see above). Moreover, building a simple OS based
    on systemd will involve much fewer packages than a traditional Linux
    did. Fewer packages makes it easier to build your system, gets rid of
    interdependencies and of much of the different behaviour of every
    component involved.

  12. Myth: systemd is bloated.

    Well, bloated certainly has many different definitions. But in
    most definitions systemd is probably the opposite of bloat. Since
    systemd components share a common code base, they tend to share much
    more code for common code paths. Here’s an example: in a traditional
    Linux setup, sysvinit, start-stop-daemon, inetd, cron, dbus, all
    implemented a scheme to execute processes with various configuration
    options in a certain, hopefully clean environment. On systemd the code
    paths for all of this, for the configuration parsing, as well as the
    actual execution is shared. This means less code, less place for
    mistakes, less memory and cache pressure, and is thus a very good
    thing. And as a side-effect you actually get a ton more functionality
    for it…

    As mentioned above, systemd is also pretty modular. You can choose
    at build time which components you need, and which you don’t
    need. People can hence specifically choose the level of “bloat” they
    want.

    When you build systemd, it only requires three dependencies: glibc,
    libcap and dbus. That’s it. It can make use of more dependencies, but
    these are entirely optional.

    So, yeah, whichever way you look at it, it’s really not
    bloated.

  13. Myth: systemd being Linux-only is not nice to the BSDs.

    Completely wrong. The BSD folks are pretty much uninterested in
    systemd. If systemd was portable, this would change nothing, they
    still wouldn’t adopt it. And the same is true for the other Unixes in
    the world. Solaris has SMF, BSD has their own “rc” system, and they
    always maintained it separately from Linux. The init system is very
    close to the core of the entire OS. And these other operating systems
    hence define themselves among other things by their core
    userspace. The assumption that they’d adopt our core userspace if we
    just made it portable, is completely without any foundation.

  14. Myth: systemd being Linux-only makes it impossible for Debian to adopt it as default.

    Debian supports non-Linux kernels in their distribution. systemd
    won’t run on those. Is that a problem though, and should that hinder
    them to adopt system as default? Not really. The folks who ported
    Debian to these other kernels were willing to invest time in a massive
    porting effort, they set up test and build systems, and patched and
    built numerous packages for their goal. The maintainance of both a
    systemd unit file and a classic init script for the packaged services
    is a negligable amount of work compared to that, especially since
    those scripts more often than not exist already.

  15. Myth: systemd could be ported to other kernels if its maintainers just wanted to.

    That is simply not true. Porting systemd to other kernel is not
    feasible. We just use too many Linux-specific interfaces. For a few
    one might find replacements on other kernels, some features one might
    want to turn off, but for most this is nor really possible. Here’s a
    small, very incomprehensive list: cgroups, fanotify, umount2(),
    /proc/self/mountinfo
    (including notification), /dev/swaps (same),
    udev, netlink,
    the structure of /sys, /proc/$PID/comm,
    /proc/$PID/cmdline, /proc/$PID/loginuid, /proc/$PID/stat,
    /proc/$PID/session, /proc/$PID/exe, /proc/$PID/fd, tmpfs, devtmpfs,
    capabilities, namespaces of all kinds, various prctl()s, numerous
    ioctls,
    the mount() system call and its semantics, selinux, audit,
    inotify, statfs, O_DIRECTORY, O_NOATIME, /proc/$PID/root, waitid(),
    SCM_CREDENTIALS, SCM_RIGHTS, mkostemp(), /dev/input, ...

    And no, if you look at this list and pick out the few where you can
    think of obvious counterparts on other kernels, then think again, and
    look at the others you didn’t pick, and the complexity of replacing
    them.

  16. Myth: systemd is not portable for no reason.

    Non-sense! We use the Linux-specific functionality because we need
    it to implement what we want. Linux has so many features that
    UNIX/POSIX didn’t have, and we want to empower the user with
    them. These features are incredibly useful, but only if they are
    actually exposed in a friendly way to the user, and that’s what we do
    with systemd.

  17. Myth: systemd uses binary configuration files.

    No idea who came up with this crazy myth, but it’s absolutely not
    true. systemd is configured pretty much exclusively via simple text
    files. A few settings you can also alter with the kernel command line
    and via environment variables. There’s nothing binary in its
    configuration (not even XML). Just plain, simple, easy-to-read text
    files.

  18. Myth: systemd is a feature creep.

    Well, systemd certainly covers more ground that it used to. It’s
    not just an init system anymore, but the basic userspace building
    block to build an OS from, but we carefully make sure to keep most of
    the features optional. You can turn a lot off at compile time, and
    even more at runtime. Thus you can choose freely how much feature
    creeping you want.

  19. Myth: systemd forces you to do something.

    systemd is not the mafia. It’s Free Software, you can do with it
    whatever you want, and that includes not using it. That’s pretty much
    the opposite of “forcing”.

  20. Myth: systemd makes it impossible to run syslog.

    Not true, we carefully made sure when we introduced
    the journal
    that all data is also passed on to any syslog daemon
    running. In fact, if something changed, then only that syslog gets
    more complete data now than it got before, since we now cover early
    boot stuff as well as STDOUT/STDERR of any system service.

  21. Myth: systemd is incompatible.

    We try very hard to provide the best possible compatibility with
    sysvinit. In fact, the vast majority of init scripts should work just
    fine on systemd, unmodified. However, there actually are indeed a few
    incompatibilities, but we try to document
    these
    and explain what to do about them. Ultimately every system
    that is not actually sysvinit itself will have a certain amount of
    incompatibilities with it since it will not share the exect same code
    paths.

    It is our goal to ensure that differences between the various
    distributions are kept at a minimum. That means unit files usually
    work just fine on a different distribution than you wrote it on, which
    is a big improvement over classic init scripts which are very hard to
    write in a way that they run on multiple Linux distributions, due to
    numerous incompatibilities between them.

  22. Myth: systemd is not scriptable, because of its D-Bus use.

    Not true. Pretty much every single D-Bus interface systemd provides
    is also available in a command line tool, for example in systemctl,
    loginctl,
    timedatectl,
    hostnamectl,
    localectl
    and suchlike. You can easily call these tools from shell scripts, they
    open up pretty much the entire API from the command line with
    easy-to-use commands.

    That said, D-Bus actually has bindings for almost any scripting
    language this world knows. Even from the shell you can invoke
    arbitrary D-Bus methods with dbus-send
    or gdbus. If
    anything, this improves scriptability due to the good support of D-Bus
    in the various scripting languages.

  23. Myth: systemd requires you to use some arcane configuration
    tools instead of allowing you to edit your configuration files
    directly.

    Not true at all. We offer some configuration tools, and using them
    gets you a bit of additional functionality (for example, command line
    completion for all settings!), but there’s no need at all to use
    them. You can always edit the files in question directly if you wish,
    and that’s fully supported. Of course sometimes you need to explicitly
    reload configuration of some daemon after editing the configuration,
    but that’s pretty much true for most UNIX services.

  24. Myth: systemd is unstable and buggy.

    Certainly not according to our data. We have been monitoring the
    Fedora bug tracker (and some others) closely for a long long time. The
    number of bugs is very low for such a central component of the OS,
    especially if you discount the numerous RFE bugs we track for the
    project. We are pretty good in keeping systemd out of the list of
    blocker bugs of the distribution. We have a relatively fast
    development cycle with mostly incremental changes to keep quality and
    stability high.

  25. Myth: systemd is not debuggable.

    False. Some people try to imply that the shell was a good
    debugger. Well, it isn’t really. In systemd we provide you with actual
    debugging features instead. For example: interactive debugging,
    verbose tracing, the ability to mask any component during boot, and
    more. Also, we provide documentation
    for it
    .

    It’s certainly well debuggable, we needed that for our own
    development work, after all. But we’ll grant you one thing: it uses
    different debugging tools, we believe more appropriate ones for the
    purpose, though.

  26. Myth: systemd makes changes for the changes’ sake.

    Very much untrue. We pretty much exclusively have technical
    reasons for the changes we make, and we explain them in the various
    pieces of documentation, wiki pages, blog articles, mailing list
    announcements. We try hard to avoid making incompatible changes, and
    if we do we try to document the why and how in detail. And if you
    wonder about something, just ask us!

  27. Myth: systemd is a Red-Hat-only project, is private property
    of some smart-ass developers, who use it to push their views to the
    world.

    Not true. Currently, there are 16 hackers with commit powers to the
    systemd git tree. Of these 16 only six are employed by Red Hat. The 10
    others are folks from ArchLinux, from Debian, from Intel, even from
    Canonical, Mandriva, Pantheon and a number of community folks with
    full commit rights. And they frequently commit big stuff, major
    changes. Then, there are 374 individuals with patches in our tree, and
    they too came from a number of different companies and backgrounds,
    and many of those have way more than one patch in the tree. The
    discussions about where we want to take systemd are done in the open,
    on our IRC channel (#systemd on freenode, you are always
    weclome), on our mailing
    list
    , and on public hackfests (such
    as our next one in Brno
    , you are invited). We regularly attend
    various conferences, to collect feedback, to explain what we are doing
    and why, like few others do. We maintain blogs, engage in social
    networks (we actually
    have some pretty interesting content on Google+
    , and our Google+
    Community is pretty alive, too
    .), and try really hard to explain
    the why and the how how we do things, and to listen to feedback and
    figure out where the current issues are (for example, from that
    feedback we compiled this lists of often heard myths about
    systemd…).

    What most systemd contributors probably share is a rough idea how a
    good OS should look like, and the desire to make it happen. However,
    by the very nature of the project being Open Source, and rooted in the
    community systemd is just what people want it to be, and if it’s not
    what they want then they can drive the direction with patches and
    code, and if that’s not feasible, then there are numerous other
    options to use, too, systemd is never exclusive.

    One goal of systemd is to unify the dispersed Linux landscape a
    bit. We try to get rid of many of the more pointless differences of
    the various distributions in various areas of the core OS. As part of
    that we sometimes adopt schemes that were previously used by only one
    of the distributions and push it to a level where it’s the default of
    systemd, trying to gently push everybody towards the same set of basic
    configuration. This is never exclusive though, distributions can
    continue to deviate from that if they wish, however, if they end-up
    using the well-supported default their work becomes much easier and
    they might gain a feature or two. Now, as it turns out, more
    frequently than not we actually adopted schemes that where Debianisms,
    rather than Fedoraisms/Redhatisms as best supported scheme by
    systemd. For example, systems running systemd now generally store
    their hostname in /etc/hostname, something that used to be
    specific to Debian and now is used across distributions.

    One thing we’ll grant you though, we sometimes can be
    smart-asses. We try to be prepared whenever we open our mouth, in
    order to be able to back-up with facts what we claim. That might make
    us appear as smart-asses.

    But in general, yes, some of the more influental contributors of
    systemd work for Red Hat, but they are in the minority, and systemd is
    a healthy, open community with different interests, different
    backgrounds, just unified by a few rough ideas where the trip should
    go, a community where code and its design counts, and certainly not
    company affiliation.

  28. Myth: systemd doesn’t support /usr split from the root directory.

    Non-sense. Since its beginnings systemd supports the
    --with-rootprefix= option to its configure script
    which allows you to tell systemd to neatly split up the stuff needed
    for early boot and the stuff needed for later on. All this logic is
    fully present and we keep it up-to-date right there in systemd’s build
    system.

    Of course, we still don’t think that actually
    booting with /usr unavailable is a good idea
    , but we
    support this just fine in our build system. This won’t fix the
    inherent problems of the scheme that you’ll encounter all across the
    board, but you can’t blame that on systemd, because in systemd we
    support this just fine.

  29. Myth: systemd doesn’t allow your to replace its components.

    Not true, you can turn off and replace pretty much any part of
    systemd, with very few exceptions. And those exceptions (such as
    journald) generally allow you to run an alternative side by side to
    it, while cooperating nicely with it.

  30. Myth: systemd’s use of D-Bus instead of sockets makes it intransparent.

    This claim is already contradictory in itself: D-Bus uses sockets
    as transport, too. Hence whenever D-Bus is used to send something
    around, a socket is used for that too. D-Bus is mostly a standardized
    serialization of messages to send over these sockets. If anything this
    makes it more transparent, since this serialization is well
    documented, understood and there are numerous tracing tools and
    language bindings for it. This is very much unlike the usual
    homegrown protocols the various classic UNIX daemons use to
    communicate locally.

Hmm, did I write I just wanted to debunk a “few” myths? Maybe these
were more than just a few… Anyway, I hope I managed to clear up a
couple of misconceptions. Thanks for your time.

Footnotes

[1] For example, systemd-detect-virt,
systemd-tmpfiles,
systemd-udevd are.

[2] Also, we are trying to do our little part on maybe
making this better. By exposing boot-time performance of the firmware
more prominently in systemd’s boot output we hope to shame the
firmware writers to clean up their stuff.

[3] And anyways, guess which project includes a library “libnih” — Upstart or systemd?[4]

[4] Hint: it’s not systemd!

The Most Awesome, Least-Advertised Fedora 17 Feature

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/projects/multi-seat.html

There’s one feature In the upcoming Fedora 17 release that is
immensly useful but very little known, since its feature page
‘ckremoval’
does not explicitly refer to it in its name: true
automatic multi-seat support for Linux.

A multi-seat computer is a system that offers not only one local
seat for a user, but multiple, at the same time. A seat refers to a
combination of a screen, a set of input devices (such as mice and
keyboards), and maybe an audio card or webcam, as individual local
workplace for a user. A multi-seat computer can drive an entire class
room of seats with only a fraction of the cost in hardware, energy,
administration and space: you only have one PC, which usually has way
enough CPU power to drive 10 or more workplaces. (In fact, even a
Netbook has fast enough to drive a couple of seats!) Automatic
multi-seat
refers to an entirely automatically managed seat setup:
whenever a new seat is plugged in a new login screen immediately
appears — without any manual configuration –, and when the seat is
unplugged all user sessions on it are removed without delay.

In Fedora 17 we added this functionality to the low-level user and
device tracking of systemd, replacing the previous ConsoleKit logic
that lacked support for automatic multi-seat. With all the ground work
done in systemd, udev and the other components of our plumbing layer
the last remaining bits were surprisingly easy to add.

Currently, the automatic multi-seat logic works best with the USB
multi-seat hardware from Plugable
you can buy cheaply on Amazon
(US)
. These devices require exactly zero configuration with the
new scheme implemented in Fedora 17: just plug them in at any time,
login screens pop up on them, and you have your additional
seats. Alternatively you can also assemble your seat manually with a
few easy loginctl
attach
commands, from any kind of hardware you might have lying
around. To get a full seat you need multiple graphics cards, keyboards
and mice: one set for each seat. (Later on we’ll probably have a graphical
setup utility for additional seats, but that’s not a pressing issue we
believe, as the plug-n-play multi-seat support with the Plugable
devices is so awesomely nice.)

Plugable provided us for free with hardware for testing
multi-seat. They are also involved with the upstream development of
the USB DisplayLink driver for Linux. Due to their positive
involvement with Linux we can only recommend to buy their
hardware. They are good guys, and support Free Software the way all
hardware vendors should! (And besides that, their hardware is also
nicely put together. For example, in contrast to most similar vendors
they actually assign proper vendor/product IDs to their USB hardware
so that we can easily recognize their hardware when plugged in to set
up automatic seats.)

Currently, all this magic is only implemented in the GNOME stack
with the biggest component getting updated being the GNOME Display
Manager. On the Plugable USB hardware you get a full GNOME Shell
session with all the usual graphical gimmicks, the same way as on any
other hardware. (Yes, GNOME 3 works perfectly fine on simpler graphics
cards such as these USB devices!) If you are hacking on a different
desktop environment, or on a different display manager, please have a
look at the
multi-seat documentation
we put together, and particularly at
our short piece about writing
display managers
which are multi-seat capable.

If you work on a major desktop environment or display manager and
would like to implement multi-seat support for it, but lack the
aforementioned Plugable hardware, we might be able to provide you with
the hardware for free. Please contact us directly, and we might be
able to send you a device. Note that we don’t have unlimited devices
available, hence we’ll probably not be able to pass hardware to
everybody who asks, and we will pass the hardware preferably to people
who work on well-known software or otherwise have contributed good
code to the community already. Anyway, if in doubt, ping us, and
explain to us why you should get the hardware, and we’ll consider you!
(Oh, and this not only applies to display managers, if you hack on some other
software where multi-seat awareness would be truly useful, then don’t
hesitate and ping us!)

Phoronix has this
story about this new multi-seat
support which is quite interesting and
full of pictures. Please have a look.

Plugable started a Pledge
drive
to lower the price of the Plugable USB multi-seat terminals
further. It’s full of pictures (and a video showing all this in action!), and uses the code we now make
available in Fedora 17 as base. Please consider pledging a few
bucks.

Recently David Zeuthen added
multi-seat support to udisks
as well. With this in place, a user
logged in on a specific seat can only see the USB storage plugged into
his individual seat, but does not see any USB storage plugged into any
other local seat. With this in place we closed the last missing bit of
multi-seat support in our desktop stack.

With this code in Fedora 17 we cover the big use cases of
multi-seat already: internet cafes, class rooms and similar
installations can provide PC workplaces cheaply and easily without any
manual configuration. Later on we want to build on this and make this
useful for different uses too: for example, the ability to get a login
screen as easily as plugging in a USB connector makes this not useful
only for saving money in setups for many people, but also in embedded
environments (consider monitoring/debugging screens made available via
this hotplug logic) or servers (get trivially quick local access to
your otherwise head-less server). To be truly useful in these areas we
need one more thing though: the ability to run a simply getty
(i.e. text login) on the seat, without necessarily involving a
graphical UI.

The well-known X successor Wayland already comes out of the box with multi-seat
support based on this logic.

Oh, and BTW, as Ubuntu appears to be “focussing” on “clarity” in the
cloud” now ;-), and chose Upstart instead of systemd, this feature
won’t be available in Ubuntu any time soon. That’s (one detail of) the
price Ubuntu has to pay for choosing to maintain it’s own (largely
legacy, such as ConsoleKit) plumbing stack.

Multi-seat has a long history on Unix. Since the earliest days Unix
systems could be accessed by multiple local terminals at the same
time. Since then local terminal support (and hence multi-seat)
gradually moved out of view in computing. The fewest machines these
days have more than one seat, the concept of terminals survived almost
exclusively in the context of PTYs (i.e. fully virtualized API
objects, disconnected from any real hardware seat) and VCs (i.e. a
single virtualized local seat), but almost not in any other way (well,
server setups still use serial terminals for emergency remote access,
but they almost never have more than one serial terminal). All what we
do in systemd is based on the ideas originally brought forward in
Unix; with systemd we now try to bring back a number of the good ideas
of Unix that since the old times were lost on the roadside. For
example, in true Unix style we already started to expose the concept
of a service in the file system (in
/sys/fs/cgroup/systemd/system/), something where on Linux the
(often misunderstood) “everything is a file” mantra previously
fell short. With automatic multi-seat support we bring back support
for terminals, but updated with all the features of today’s desktops:
plug and play, zero configuration, full graphics, and not limited to
input devices and screens, but extending to all kinds of devices, such
as audio, webcams or USB memory sticks.

Anyway, this is all for now; I’d like to thank everybody who was
involved with making multi-seat work so nicely and natively on the
Linux platform. You know who you are! Thanks a ton!

Interlocking Quadrilaterals

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/iqlamp-stencil.html

As promised, here’s
a stencil drawing of the Mexican-style IQ Lamp: .ps, .svg, .pdf. (1:1, DIN A4/ISO 216 paper size)

Fake IQ Light from Mexico - Stencil

30 of these are needed to assemble one mexican style lamp, as depicted below.
The material to cut these patterns from needs to be a thin (less than .5 mm
thick) plastic (or maybe cardboard) which needs to be flexible – but not too
flexible, and not glossy. It might be advisable to use energy-saving light
bulbs for this lamp. They are entirely hidden inside the lamp and might be good
to avoid overheating of the plastic. Assembling
instructions
, Video, Instructable. Please note
that assembling the mexican-style IQ light needs a quite a bit manual force
because all pieces are bent a little, in contrast to the original danish
design which appears to be assembled without any force. (at least the video
clip suggests that.) For mounting a cable/lamp socket you might need to cut a
small hole in one of the plastic sheets, to put the cable through.

Once again the photo:

Fake IQ Light from Mexico

Have fun!