Tag Archives: 3.0

Predict Billboard Top 10 Hits Using RStudio, H2O and Amazon Athena

Post Syndicated from Gopal Wunnava original https://aws.amazon.com/blogs/big-data/predict-billboard-top-10-hits-using-rstudio-h2o-and-amazon-athena/

Success in the popular music industry is typically measured in terms of the number of Top 10 hits artists have to their credit. The music industry is a highly competitive multi-billion dollar business, and record labels incur various costs in exchange for a percentage of the profits from sales and concert tickets.

Predicting the success of an artist’s release in the popular music industry can be difficult. One release may be extremely popular, resulting in widespread play on TV, radio and social media, while another single may turn out quite unpopular, and therefore unprofitable. Record labels need to be selective in their decision making, and predictive analytics can help them with decision making around the type of songs and artists they need to promote.

In this walkthrough, you leverage H2O.ai, Amazon Athena, and RStudio to make predictions on whether a song might make it to the Top 10 Billboard charts. You explore the GLM, GBM, and deep learning modeling techniques using H2O’s rapid, distributed and easy-to-use open source parallel processing engine. RStudio is a popular IDE, licensed either commercially or under AGPLv3, for working with R. This is ideal if you don’t want to connect to a server via SSH and use code editors such as vi to do analytics. RStudio is available in a desktop version, or a server version that allows you to access R via a web browser. RStudio’s Notebooks feature is used to demonstrate the execution of code and output. In addition, this post showcases how you can leverage Athena for query and interactive analysis during the modeling phase. A working knowledge of statistics and machine learning would be helpful to interpret the analysis being performed in this post.

Walkthrough

Your goal is to predict whether a song will make it to the Top 10 Billboard charts. For this purpose, you will be using multiple modeling techniques―namely GLM, GBM and deep learning―and choose the model that is the best fit.

This solution involves the following steps:

  • Install and configure RStudio with Athena
  • Log in to RStudio
  • Install R packages
  • Connect to Athena
  • Create a dataset
  • Create models

Install and configure RStudio with Athena

Use the following AWS CloudFormation stack to install, configure, and connect RStudio on an Amazon EC2 instance with Athena.

Launching this stack creates all required resources and prerequisites:

  • Amazon EC2 instance with Amazon Linux (minimum size of t2.large is recommended)
  • Provisioning of the EC2 instance in an existing VPC and public subnet
  • Installation of Java 8
  • Assignment of an IAM role to the EC2 instance with the required permissions for accessing Athena and Amazon S3
  • Security group allowing access to the RStudio and SSH ports from the internet (I recommend restricting access to these ports)
  • S3 staging bucket required for Athena (referenced within RStudio as ATHENABUCKET)
  • RStudio username and password
  • Setup logs in Amazon CloudWatch Logs (if needed for additional troubleshooting)
  • Amazon EC2 Systems Manager agent, which makes it easy to manage and patch

All AWS resources are created in the US-East-1 Region. To avoid cross-region data transfer fees, launch the CloudFormation stack in the same region. To check the availability of Athena in other regions, see Region Table.

Log in to RStudio

The instance security group has been automatically configured to allow incoming connections on the RStudio port 8787 from any source internet address. You can edit the security group to restrict source IP access. If you have trouble connecting, ensure that port 8787 isn’t blocked by subnet network ACLS or by your outgoing proxy/firewall.

  1. In the CloudFormation stack, choose Outputs, Value, and then open the RStudio URL. You might need to wait for a few minutes until the instance has been launched.
  2. Log in to RStudio with the and password you provided during setup.

Install R packages

Next, install the required R packages from the RStudio console. You can download the R notebook file containing just the code.

#install pacman – a handy package manager for managing installs
if("pacman" %in% rownames(installed.packages()) == FALSE)
{install.packages("pacman")}  
library(pacman)
p_load(h2o,rJava,RJDBC,awsjavasdk)
h2o.init(nthreads = -1)
##  Connection successful!
## 
## R is connected to the H2O cluster: 
##     H2O cluster uptime:         2 hours 42 minutes 
##     H2O cluster version:        3.10.4.6 
##     H2O cluster version age:    4 months and 4 days !!! 
##     H2O cluster name:           H2O_started_from_R_rstudio_hjx881 
##     H2O cluster total nodes:    1 
##     H2O cluster total memory:   3.30 GB 
##     H2O cluster total cores:    4 
##     H2O cluster allowed cores:  4 
##     H2O cluster healthy:        TRUE 
##     H2O Connection ip:          localhost 
##     H2O Connection port:        54321 
##     H2O Connection proxy:       NA 
##     H2O Internal Security:      FALSE 
##     R Version:                  R version 3.3.3 (2017-03-06)
## Warning in h2o.clusterInfo(): 
## Your H2O cluster version is too old (4 months and 4 days)!
## Please download and install the latest version from http://h2o.ai/download/
#install aws sdk if not present (pre-requisite for using Athena with an IAM role)
if (!aws_sdk_present()) {
  install_aws_sdk()
}

load_sdk()
## NULL

Connect to Athena

Next, establish a connection to Athena from RStudio, using an IAM role associated with your EC2 instance. Use ATHENABUCKET to specify the S3 staging directory.

URL <- 'https://s3.amazonaws.com/athena-downloads/drivers/AthenaJDBC41-1.0.1.jar'
fil <- basename(URL)
#download the file into current working directory
if (!file.exists(fil)) download.file(URL, fil)
#verify that the file has been downloaded successfully
list.files()
## [1] "AthenaJDBC41-1.0.1.jar"
drv <- JDBC(driverClass="com.amazonaws.athena.jdbc.AthenaDriver", fil, identifier.quote="'")

con <- jdbcConnection <- dbConnect(drv, 'jdbc:awsathena://athena.us-east-1.amazonaws.com:443/',
                                   s3_staging_dir=Sys.getenv("ATHENABUCKET"),
                                   aws_credentials_provider_class="com.amazonaws.auth.DefaultAWSCredentialsProviderChain")

Verify the connection. The results returned depend on your specific Athena setup.

con
## <JDBCConnection>
dbListTables(con)
##  [1] "gdelt"               "wikistats"           "elb_logs_raw_native"
##  [4] "twitter"             "twitter2"            "usermovieratings"   
##  [7] "eventcodes"          "events"              "billboard"          
## [10] "billboardtop10"      "elb_logs"            "gdelthist"          
## [13] "gdeltmaster"         "twitter"             "twitter3"

Create a dataset

For this analysis, you use a sample dataset combining information from Billboard and Wikipedia with Echo Nest data in the Million Songs Dataset. Upload this dataset into your own S3 bucket. The table below provides a description of the fields used in this dataset.

Field Description
year Year that song was released
songtitle Title of the song
artistname Name of the song artist
songid Unique identifier for the song
artistid Unique identifier for the song artist
timesignature Variable estimating the time signature of the song
timesignature_confidence Confidence in the estimate for the timesignature
loudness Continuous variable indicating the average amplitude of the audio in decibels
tempo Variable indicating the estimated beats per minute of the song
tempo_confidence Confidence in the estimate for tempo
key Variable with twelve levels indicating the estimated key of the song (C, C#, B)
key_confidence Confidence in the estimate for key
energy Variable that represents the overall acoustic energy of the song, using a mix of features such as loudness
pitch Continuous variable that indicates the pitch of the song
timbre_0_min thru timbre_11_min Variables that indicate the minimum values over all segments for each of the twelve values in the timbre vector
timbre_0_max thru timbre_11_max Variables that indicate the maximum values over all segments for each of the twelve values in the timbre vector
top10 Indicator for whether or not the song made it to the Top 10 of the Billboard charts (1 if it was in the top 10, and 0 if not)

Create an Athena table based on the dataset

In the Athena console, select the default database, sampled, or create a new database.

Run the following create table statement.

create external table if not exists billboard
(
year int,
songtitle string,
artistname string,
songID string,
artistID string,
timesignature int,
timesignature_confidence double,
loudness double,
tempo double,
tempo_confidence double,
key int,
key_confidence double,
energy double,
pitch double,
timbre_0_min double,
timbre_0_max double,
timbre_1_min double,
timbre_1_max double,
timbre_2_min double,
timbre_2_max double,
timbre_3_min double,
timbre_3_max double,
timbre_4_min double,
timbre_4_max double,
timbre_5_min double,
timbre_5_max double,
timbre_6_min double,
timbre_6_max double,
timbre_7_min double,
timbre_7_max double,
timbre_8_min double,
timbre_8_max double,
timbre_9_min double,
timbre_9_max double,
timbre_10_min double,
timbre_10_max double,
timbre_11_min double,
timbre_11_max double,
Top10 int
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION 's3://aws-bigdata-blog/artifacts/predict-billboard/data'
;

Inspect the table definition for the ‘billboard’ table that you have created. If you chose a database other than sampledb, replace that value with your choice.

dbGetQuery(con, "show create table sampledb.billboard")
##                                      createtab_stmt
## 1       CREATE EXTERNAL TABLE `sampledb.billboard`(
## 2                                       `year` int,
## 3                               `songtitle` string,
## 4                              `artistname` string,
## 5                                  `songid` string,
## 6                                `artistid` string,
## 7                              `timesignature` int,
## 8                `timesignature_confidence` double,
## 9                                `loudness` double,
## 10                                  `tempo` double,
## 11                       `tempo_confidence` double,
## 12                                       `key` int,
## 13                         `key_confidence` double,
## 14                                 `energy` double,
## 15                                  `pitch` double,
## 16                           `timbre_0_min` double,
## 17                           `timbre_0_max` double,
## 18                           `timbre_1_min` double,
## 19                           `timbre_1_max` double,
## 20                           `timbre_2_min` double,
## 21                           `timbre_2_max` double,
## 22                           `timbre_3_min` double,
## 23                           `timbre_3_max` double,
## 24                           `timbre_4_min` double,
## 25                           `timbre_4_max` double,
## 26                           `timbre_5_min` double,
## 27                           `timbre_5_max` double,
## 28                           `timbre_6_min` double,
## 29                           `timbre_6_max` double,
## 30                           `timbre_7_min` double,
## 31                           `timbre_7_max` double,
## 32                           `timbre_8_min` double,
## 33                           `timbre_8_max` double,
## 34                           `timbre_9_min` double,
## 35                           `timbre_9_max` double,
## 36                          `timbre_10_min` double,
## 37                          `timbre_10_max` double,
## 38                          `timbre_11_min` double,
## 39                          `timbre_11_max` double,
## 40                                     `top10` int)
## 41                             ROW FORMAT DELIMITED 
## 42                         FIELDS TERMINATED BY ',' 
## 43                            STORED AS INPUTFORMAT 
## 44       'org.apache.hadoop.mapred.TextInputFormat' 
## 45                                     OUTPUTFORMAT 
## 46  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
## 47                                        LOCATION
## 48    's3://aws-bigdata-blog/artifacts/predict-billboard/data'
## 49                                  TBLPROPERTIES (
## 50            'transient_lastDdlTime'='1505484133')

Run a sample query

Next, run a sample query to obtain a list of all songs from Janet Jackson that made it to the Billboard Top 10 charts.

dbGetQuery(con, " SELECT songtitle,artistname,top10   FROM sampledb.billboard WHERE lower(artistname) =     'janet jackson' AND top10 = 1")
##                       songtitle    artistname top10
## 1                       Runaway Janet Jackson     1
## 2               Because Of Love Janet Jackson     1
## 3                         Again Janet Jackson     1
## 4                            If Janet Jackson     1
## 5  Love Will Never Do (Without You) Janet Jackson 1
## 6                     Black Cat Janet Jackson     1
## 7               Come Back To Me Janet Jackson     1
## 8                       Alright Janet Jackson     1
## 9                      Escapade Janet Jackson     1
## 10                Rhythm Nation Janet Jackson     1

Determine how many songs in this dataset are specifically from the year 2010.

dbGetQuery(con, " SELECT count(*)   FROM sampledb.billboard WHERE year = 2010")
##   _col0
## 1   373

The sample dataset provides certain song properties of interest that can be analyzed to gauge the impact to the song’s overall popularity. Look at one such property, timesignature, and determine the value that is the most frequent among songs in the database. Timesignature is a measure of the number of beats and the type of note involved.

Running the query directly may result in an error, as shown in the commented lines below. This error is a result of trying to retrieve a large result set over a JDBC connection, which can cause out-of-memory issues at the client level. To address this, reduce the fetch size and run again.

#t<-dbGetQuery(con, " SELECT timesignature FROM sampledb.billboard")
#Note:  Running the preceding query results in the following error: 
#Error in .jcall(rp, "I", "fetch", stride, block): java.sql.SQLException: The requested #fetchSize is more than the allowed value in Athena. Please reduce the fetchSize and try #again. Refer to the Athena documentation for valid fetchSize values.
# Use the dbSendQuery function, reduce the fetch size, and run again
r <- dbSendQuery(con, " SELECT timesignature     FROM sampledb.billboard")
dftimesignature<- fetch(r, n=-1, block=100)
dbClearResult(r)
## [1] TRUE
table(dftimesignature)
## dftimesignature
##    0    1    3    4    5    7 
##   10  143  503 6787  112   19
nrow(dftimesignature)
## [1] 7574

From the results, observe that 6787 songs have a timesignature of 4.

Next, determine the song with the highest tempo.

dbGetQuery(con, " SELECT songtitle,artistname,tempo   FROM sampledb.billboard WHERE tempo = (SELECT max(tempo) FROM sampledb.billboard) ")
##                   songtitle      artistname   tempo
## 1 Wanna Be Startin' Somethin' Michael Jackson 244.307

Create the training dataset

Your model needs to be trained such that it can learn and make accurate predictions. Split the data into training and test datasets, and create the training dataset first.  This dataset contains all observations from the year 2009 and earlier. You may face the same JDBC connection issue pointed out earlier, so this query uses a fetch size.

#BillboardTrain <- dbGetQuery(con, "SELECT * FROM sampledb.billboard WHERE year <= 2009")
#Running the preceding query results in the following error:-
#Error in .verify.JDBC.result(r, "Unable to retrieve JDBC result set for ", : Unable to retrieve #JDBC result set for SELECT * FROM sampledb.billboard WHERE year <= 2009 (Internal error)
#Follow the same approach as before to address this issue.

r <- dbSendQuery(con, "SELECT * FROM sampledb.billboard WHERE year <= 2009")
BillboardTrain <- fetch(r, n=-1, block=100)
dbClearResult(r)
## [1] TRUE
BillboardTrain[1:2,c(1:3,6:10)]
##   year           songtitle artistname timesignature
## 1 2009 The Awkward Goodbye    Athlete             3
## 2 2009        Rubik's Cube    Athlete             3
##   timesignature_confidence loudness   tempo tempo_confidence
## 1                    0.732   -6.320  89.614   0.652
## 2                    0.906   -9.541 117.742   0.542
nrow(BillboardTrain)
## [1] 7201

Create the test dataset

BillboardTest <- dbGetQuery(con, "SELECT * FROM sampledb.billboard where year = 2010")
BillboardTest[1:2,c(1:3,11:15)]
##   year              songtitle        artistname key
## 1 2010 This Is the House That Doubt Built A Day to Remember  11
## 2 2010        Sticks & Bricks A Day to Remember  10
##   key_confidence    energy pitch timbre_0_min
## 1          0.453 0.9666556 0.024        0.002
## 2          0.469 0.9847095 0.025        0.000
nrow(BillboardTest)
## [1] 373

Convert the training and test datasets into H2O dataframes

train.h2o <- as.h2o(BillboardTrain)
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=================================================================| 100%
test.h2o <- as.h2o(BillboardTest)
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=================================================================| 100%

Inspect the column names in your H2O dataframes.

colnames(train.h2o)
##  [1] "year"                     "songtitle"               
##  [3] "artistname"               "songid"                  
##  [5] "artistid"                 "timesignature"           
##  [7] "timesignature_confidence" "loudness"                
##  [9] "tempo"                    "tempo_confidence"        
## [11] "key"                      "key_confidence"          
## [13] "energy"                   "pitch"                   
## [15] "timbre_0_min"             "timbre_0_max"            
## [17] "timbre_1_min"             "timbre_1_max"            
## [19] "timbre_2_min"             "timbre_2_max"            
## [21] "timbre_3_min"             "timbre_3_max"            
## [23] "timbre_4_min"             "timbre_4_max"            
## [25] "timbre_5_min"             "timbre_5_max"            
## [27] "timbre_6_min"             "timbre_6_max"            
## [29] "timbre_7_min"             "timbre_7_max"            
## [31] "timbre_8_min"             "timbre_8_max"            
## [33] "timbre_9_min"             "timbre_9_max"            
## [35] "timbre_10_min"            "timbre_10_max"           
## [37] "timbre_11_min"            "timbre_11_max"           
## [39] "top10"

Create models

You need to designate the independent and dependent variables prior to applying your modeling algorithms. Because you’re trying to predict the ‘top10’ field, this would be your dependent variable and everything else would be independent.

Create your first model using GLM. Because GLM works best with numeric data, you create your model by dropping non-numeric variables. You only use the variables in the dataset that describe the numerical attributes of the song in the logistic regression model. You won’t use these variables:  “year”, “songtitle”, “artistname”, “songid”, or “artistid”.

y.dep <- 39
x.indep <- c(6:38)
x.indep
##  [1]  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
## [24] 29 30 31 32 33 34 35 36 37 38

Create Model 1: All numeric variables

Create Model 1 with the training dataset, using GLM as the modeling algorithm and H2O’s built-in h2o.glm function.

modelh1 <- h2o.glm( y = y.dep, x = x.indep, training_frame = train.h2o, family = "binomial")
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=====                                                            |   8%
  |                                                                       
  |=================================================================| 100%

Measure the performance of Model 1, using H2O’s built-in performance function.

h2o.performance(model=modelh1,newdata=test.h2o)
## H2OBinomialMetrics: glm
## 
## MSE:  0.09924684
## RMSE:  0.3150347
## LogLoss:  0.3220267
## Mean Per-Class Error:  0.2380168
## AUC:  0.8431394
## Gini:  0.6862787
## R^2:  0.254663
## Null Deviance:  326.0801
## Residual Deviance:  240.2319
## AIC:  308.2319
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##          0   1    Error     Rate
## 0      255  59 0.187898  =59/314
## 1       17  42 0.288136   =17/59
## Totals 272 101 0.203753  =76/373
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                         metric threshold    value idx
## 1                       max f1  0.192772 0.525000 100
## 2                       max f2  0.124912 0.650510 155
## 3                 max f0point5  0.416258 0.612903  23
## 4                 max accuracy  0.416258 0.879357  23
## 5                max precision  0.813396 1.000000   0
## 6                   max recall  0.037579 1.000000 282
## 7              max specificity  0.813396 1.000000   0
## 8             max absolute_mcc  0.416258 0.455251  23
## 9   max min_per_class_accuracy  0.161402 0.738854 125
## 10 max mean_per_class_accuracy  0.124912 0.765006 155
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or ` 
h2o.auc(h2o.performance(modelh1,test.h2o)) 
## [1] 0.8431394

The AUC metric provides insight into how well the classifier is able to separate the two classes. In this case, the value of 0.8431394 indicates that the classification is good. (A value of 0.5 indicates a worthless test, while a value of 1.0 indicates a perfect test.)

Next, inspect the coefficients of the variables in the dataset.

dfmodelh1 <- as.data.frame(h2o.varimp(modelh1))
dfmodelh1
##                       names coefficients sign
## 1              timbre_0_max  1.290938663  NEG
## 2                  loudness  1.262941934  POS
## 3                     pitch  0.616995941  NEG
## 4              timbre_1_min  0.422323735  POS
## 5              timbre_6_min  0.349016024  NEG
## 6                    energy  0.348092062  NEG
## 7             timbre_11_min  0.307331997  NEG
## 8              timbre_3_max  0.302225619  NEG
## 9             timbre_11_max  0.243632060  POS
## 10             timbre_4_min  0.224233951  POS
## 11             timbre_4_max  0.204134342  POS
## 12             timbre_5_min  0.199149324  NEG
## 13             timbre_0_min  0.195147119  POS
## 14 timesignature_confidence  0.179973904  POS
## 15         tempo_confidence  0.144242598  POS
## 16            timbre_10_max  0.137644568  POS
## 17             timbre_7_min  0.126995955  NEG
## 18            timbre_10_min  0.123851179  POS
## 19             timbre_7_max  0.100031481  NEG
## 20             timbre_2_min  0.096127636  NEG
## 21           key_confidence  0.083115820  POS
## 22             timbre_6_max  0.073712419  POS
## 23            timesignature  0.067241917  POS
## 24             timbre_8_min  0.061301881  POS
## 25             timbre_8_max  0.060041698  POS
## 26                      key  0.056158445  POS
## 27             timbre_3_min  0.050825116  POS
## 28             timbre_9_max  0.033733561  POS
## 29             timbre_2_max  0.030939072  POS
## 30             timbre_9_min  0.020708113  POS
## 31             timbre_1_max  0.014228818  NEG
## 32                    tempo  0.008199861  POS
## 33             timbre_5_max  0.004837870  POS
## 34                                    NA <NA>

Typically, songs with heavier instrumentation tend to be louder (have higher values in the variable “loudness”) and more energetic (have higher values in the variable “energy”). This knowledge is helpful for interpreting the modeling results.

You can make the following observations from the results:

  • The coefficient estimates for the confidence values associated with the time signature, key, and tempo variables are positive. This suggests that higher confidence leads to a higher predicted probability of a Top 10 hit.
  • The coefficient estimate for loudness is positive, meaning that mainstream listeners prefer louder songs with heavier instrumentation.
  • The coefficient estimate for energy is negative, meaning that mainstream listeners prefer songs that are less energetic, which are those songs with light instrumentation.

These coefficients lead to contradictory conclusions for Model 1. This could be due to multicollinearity issues. Inspect the correlation between the variables “loudness” and “energy” in the training set.

cor(train.h2o$loudness,train.h2o$energy)
## [1] 0.7399067

This number indicates that these two variables are highly correlated, and Model 1 does indeed suffer from multicollinearity. Typically, you associate a value of -1.0 to -0.5 or 1.0 to 0.5 to indicate strong correlation, and a value of 0.1 to 0.1 to indicate weak correlation. To avoid this correlation issue, omit one of these two variables and re-create the models.

You build two variations of the original model:

  • Model 2, in which you keep “energy” and omit “loudness”
  • Model 3, in which you keep “loudness” and omit “energy”

You compare these two models and choose the model with a better fit for this use case.

Create Model 2: Keep energy and omit loudness

colnames(train.h2o)
##  [1] "year"                     "songtitle"               
##  [3] "artistname"               "songid"                  
##  [5] "artistid"                 "timesignature"           
##  [7] "timesignature_confidence" "loudness"                
##  [9] "tempo"                    "tempo_confidence"        
## [11] "key"                      "key_confidence"          
## [13] "energy"                   "pitch"                   
## [15] "timbre_0_min"             "timbre_0_max"            
## [17] "timbre_1_min"             "timbre_1_max"            
## [19] "timbre_2_min"             "timbre_2_max"            
## [21] "timbre_3_min"             "timbre_3_max"            
## [23] "timbre_4_min"             "timbre_4_max"            
## [25] "timbre_5_min"             "timbre_5_max"            
## [27] "timbre_6_min"             "timbre_6_max"            
## [29] "timbre_7_min"             "timbre_7_max"            
## [31] "timbre_8_min"             "timbre_8_max"            
## [33] "timbre_9_min"             "timbre_9_max"            
## [35] "timbre_10_min"            "timbre_10_max"           
## [37] "timbre_11_min"            "timbre_11_max"           
## [39] "top10"
y.dep <- 39
x.indep <- c(6:7,9:38)
x.indep
##  [1]  6  7  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
## [24] 30 31 32 33 34 35 36 37 38
modelh2 <- h2o.glm( y = y.dep, x = x.indep, training_frame = train.h2o, family = "binomial")
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=======                                                          |  10%
  |                                                                       
  |=================================================================| 100%

Measure the performance of Model 2.

h2o.performance(model=modelh2,newdata=test.h2o)
## H2OBinomialMetrics: glm
## 
## MSE:  0.09922606
## RMSE:  0.3150017
## LogLoss:  0.3228213
## Mean Per-Class Error:  0.2490554
## AUC:  0.8431933
## Gini:  0.6863867
## R^2:  0.2548191
## Null Deviance:  326.0801
## Residual Deviance:  240.8247
## AIC:  306.8247
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##          0  1    Error     Rate
## 0      280 34 0.108280  =34/314
## 1       23 36 0.389831   =23/59
## Totals 303 70 0.152815  =57/373
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                         metric threshold    value idx
## 1                       max f1  0.254391 0.558140  69
## 2                       max f2  0.113031 0.647208 157
## 3                 max f0point5  0.413999 0.596026  22
## 4                 max accuracy  0.446250 0.876676  18
## 5                max precision  0.811739 1.000000   0
## 6                   max recall  0.037682 1.000000 283
## 7              max specificity  0.811739 1.000000   0
## 8             max absolute_mcc  0.254391 0.469060  69
## 9   max min_per_class_accuracy  0.141051 0.716561 131
## 10 max mean_per_class_accuracy  0.113031 0.761821 157
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`
dfmodelh2 <- as.data.frame(h2o.varimp(modelh2))
dfmodelh2
##                       names coefficients sign
## 1                     pitch  0.700331511  NEG
## 2              timbre_1_min  0.510270513  POS
## 3              timbre_0_max  0.402059546  NEG
## 4              timbre_6_min  0.333316236  NEG
## 5             timbre_11_min  0.331647383  NEG
## 6              timbre_3_max  0.252425901  NEG
## 7             timbre_11_max  0.227500308  POS
## 8              timbre_4_max  0.210663865  POS
## 9              timbre_0_min  0.208516163  POS
## 10             timbre_5_min  0.202748055  NEG
## 11             timbre_4_min  0.197246582  POS
## 12            timbre_10_max  0.172729619  POS
## 13         tempo_confidence  0.167523934  POS
## 14 timesignature_confidence  0.167398830  POS
## 15             timbre_7_min  0.142450727  NEG
## 16             timbre_8_max  0.093377516  POS
## 17            timbre_10_min  0.090333426  POS
## 18            timesignature  0.085851625  POS
## 19             timbre_7_max  0.083948442  NEG
## 20           key_confidence  0.079657073  POS
## 21             timbre_6_max  0.076426046  POS
## 22             timbre_2_min  0.071957831  NEG
## 23             timbre_9_max  0.071393189  POS
## 24             timbre_8_min  0.070225578  POS
## 25                      key  0.061394702  POS
## 26             timbre_3_min  0.048384697  POS
## 27             timbre_1_max  0.044721121  NEG
## 28                   energy  0.039698433  POS
## 29             timbre_5_max  0.039469064  POS
## 30             timbre_2_max  0.018461133  POS
## 31                    tempo  0.013279926  POS
## 32             timbre_9_min  0.005282143  NEG
## 33                                    NA <NA>

h2o.auc(h2o.performance(modelh2,test.h2o)) 
## [1] 0.8431933

You can make the following observations:

  • The AUC metric is 0.8431933.
  • Inspecting the coefficient of the variable energy, Model 2 suggests that songs with high energy levels tend to be more popular. This is as per expectation.
  • As H2O orders variables by significance, the variable energy is not significant in this model.

You can conclude that Model 2 is not ideal for this use , as energy is not significant.

CreateModel 3: Keep loudness but omit energy

colnames(train.h2o)
##  [1] "year"                     "songtitle"               
##  [3] "artistname"               "songid"                  
##  [5] "artistid"                 "timesignature"           
##  [7] "timesignature_confidence" "loudness"                
##  [9] "tempo"                    "tempo_confidence"        
## [11] "key"                      "key_confidence"          
## [13] "energy"                   "pitch"                   
## [15] "timbre_0_min"             "timbre_0_max"            
## [17] "timbre_1_min"             "timbre_1_max"            
## [19] "timbre_2_min"             "timbre_2_max"            
## [21] "timbre_3_min"             "timbre_3_max"            
## [23] "timbre_4_min"             "timbre_4_max"            
## [25] "timbre_5_min"             "timbre_5_max"            
## [27] "timbre_6_min"             "timbre_6_max"            
## [29] "timbre_7_min"             "timbre_7_max"            
## [31] "timbre_8_min"             "timbre_8_max"            
## [33] "timbre_9_min"             "timbre_9_max"            
## [35] "timbre_10_min"            "timbre_10_max"           
## [37] "timbre_11_min"            "timbre_11_max"           
## [39] "top10"
y.dep <- 39
x.indep <- c(6:12,14:38)
x.indep
##  [1]  6  7  8  9 10 11 12 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
## [24] 30 31 32 33 34 35 36 37 38
modelh3 <- h2o.glm( y = y.dep, x = x.indep, training_frame = train.h2o, family = "binomial")
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |========                                                         |  12%
  |                                                                       
  |=================================================================| 100%
perfh3<-h2o.performance(model=modelh3,newdata=test.h2o)
perfh3
## H2OBinomialMetrics: glm
## 
## MSE:  0.0978859
## RMSE:  0.3128672
## LogLoss:  0.3178367
## Mean Per-Class Error:  0.264925
## AUC:  0.8492389
## Gini:  0.6984778
## R^2:  0.2648836
## Null Deviance:  326.0801
## Residual Deviance:  237.1062
## AIC:  303.1062
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##          0  1    Error     Rate
## 0      286 28 0.089172  =28/314
## 1       26 33 0.440678   =26/59
## Totals 312 61 0.144772  =54/373
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                         metric threshold    value idx
## 1                       max f1  0.273799 0.550000  60
## 2                       max f2  0.125503 0.663265 155
## 3                 max f0point5  0.435479 0.628931  24
## 4                 max accuracy  0.435479 0.882038  24
## 5                max precision  0.821606 1.000000   0
## 6                   max recall  0.038328 1.000000 280
## 7              max specificity  0.821606 1.000000   0
## 8             max absolute_mcc  0.435479 0.471426  24
## 9   max min_per_class_accuracy  0.173693 0.745763 120
## 10 max mean_per_class_accuracy  0.125503 0.775073 155
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`
dfmodelh3 <- as.data.frame(h2o.varimp(modelh3))
dfmodelh3
##                       names coefficients sign
## 1              timbre_0_max 1.216621e+00  NEG
## 2                  loudness 9.780973e-01  POS
## 3                     pitch 7.249788e-01  NEG
## 4              timbre_1_min 3.891197e-01  POS
## 5              timbre_6_min 3.689193e-01  NEG
## 6             timbre_11_min 3.086673e-01  NEG
## 7              timbre_3_max 3.025593e-01  NEG
## 8             timbre_11_max 2.459081e-01  POS
## 9              timbre_4_min 2.379749e-01  POS
## 10             timbre_4_max 2.157627e-01  POS
## 11             timbre_0_min 1.859531e-01  POS
## 12             timbre_5_min 1.846128e-01  NEG
## 13 timesignature_confidence 1.729658e-01  POS
## 14             timbre_7_min 1.431871e-01  NEG
## 15            timbre_10_max 1.366703e-01  POS
## 16            timbre_10_min 1.215954e-01  POS
## 17         tempo_confidence 1.183698e-01  POS
## 18             timbre_2_min 1.019149e-01  NEG
## 19           key_confidence 9.109701e-02  POS
## 20             timbre_7_max 8.987908e-02  NEG
## 21             timbre_6_max 6.935132e-02  POS
## 22             timbre_8_max 6.878241e-02  POS
## 23            timesignature 6.120105e-02  POS
## 24                      key 5.814805e-02  POS
## 25             timbre_8_min 5.759228e-02  POS
## 26             timbre_1_max 2.930285e-02  NEG
## 27             timbre_9_max 2.843755e-02  POS
## 28             timbre_3_min 2.380245e-02  POS
## 29             timbre_2_max 1.917035e-02  POS
## 30             timbre_5_max 1.715813e-02  POS
## 31                    tempo 1.364418e-02  NEG
## 32             timbre_9_min 8.463143e-05  NEG
## 33                                    NA <NA>
h2o.sensitivity(perfh3,0.5)
## Warning in h2o.find_row_by_threshold(object, t): Could not find exact
## threshold: 0.5 for this set of metrics; using closest threshold found:
## 0.501855569251422. Run `h2o.predict` and apply your desired threshold on a
## probability column.
## [[1]]
## [1] 0.2033898
h2o.auc(perfh3)
## [1] 0.8492389

You can make the following observations:

  • The AUC metric is 0.8492389.
  • From the confusion matrix, the model correctly predicts that 33 songs will be top 10 hits (true positives). However, it has 26 false positives (songs that the model predicted would be Top 10 hits, but ended up not being Top 10 hits).
  • Loudness has a positive coefficient estimate, meaning that this model predicts that songs with heavier instrumentation tend to be more popular. This is the same conclusion from Model 2.
  • Loudness is significant in this model.

Overall, Model 3 predicts a higher number of top 10 hits with an accuracy rate that is acceptable. To choose the best fit for production runs, record labels should consider the following factors:

  • Desired model accuracy at a given threshold
  • Number of correct predictions for top10 hits
  • Tolerable number of false positives or false negatives

Next, make predictions using Model 3 on the test dataset.

predict.regh <- h2o.predict(modelh3, test.h2o)
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=================================================================| 100%
print(predict.regh)
##   predict        p0          p1
## 1       0 0.9654739 0.034526052
## 2       0 0.9654748 0.034525236
## 3       0 0.9635547 0.036445318
## 4       0 0.9343579 0.065642149
## 5       0 0.9978334 0.002166601
## 6       0 0.9779949 0.022005078
## 
## [373 rows x 3 columns]
predict.regh$predict
##   predict
## 1       0
## 2       0
## 3       0
## 4       0
## 5       0
## 6       0
## 
## [373 rows x 1 column]
dpr<-as.data.frame(predict.regh)
#Rename the predicted column 
colnames(dpr)[colnames(dpr) == 'predict'] <- 'predict_top10'
table(dpr$predict_top10)
## 
##   0   1 
## 312  61

The first set of output results specifies the probabilities associated with each predicted observation.  For example, observation 1 is 96.54739% likely to not be a Top 10 hit, and 3.4526052% likely to be a Top 10 hit (predict=1 indicates Top 10 hit and predict=0 indicates not a Top 10 hit).  The second set of results list the actual predictions made.  From the third set of results, this model predicts that 61 songs will be top 10 hits.

Compute the baseline accuracy, by assuming that the baseline predicts the most frequent outcome, which is that most songs are not Top 10 hits.

table(BillboardTest$top10)
## 
##   0   1 
## 314  59

Now observe that the baseline model would get 314 observations correct, and 59 wrong, for an accuracy of 314/(314+59) = 0.8418231.

It seems that Model 3, with an accuracy of 0.8552, provides you with a small improvement over the baseline model. But is this model useful for record labels?

View the two models from an investment perspective:

  • A production company is interested in investing in songs that are more likely to make it to the Top 10. The company’s objective is to minimize the risk of financial losses attributed to investing in songs that end up unpopular.
  • How many songs does Model 3 correctly predict as a Top 10 hit in 2010? Looking at the confusion matrix, you see that it predicts 33 top 10 hits correctly at an optimal threshold, which is more than half the number
  • It will be more useful to the record label if you can provide the production company with a list of songs that are highly likely to end up in the Top 10.
  • The baseline model is not useful, as it simply does not label any song as a hit.

Considering the three models built so far, you can conclude that Model 3 proves to be the best investment choice for the record label.

GBM model

H2O provides you with the ability to explore other learning models, such as GBM and deep learning. Explore building a model using the GBM technique, using the built-in h2o.gbm function.

Before you do this, you need to convert the target variable to a factor for multinomial classification techniques.

train.h2o$top10=as.factor(train.h2o$top10)
gbm.modelh <- h2o.gbm(y=y.dep, x=x.indep, training_frame = train.h2o, ntrees = 500, max_depth = 4, learn_rate = 0.01, seed = 1122,distribution="multinomial")
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |===                                                              |   5%
  |                                                                       
  |=====                                                            |   7%
  |                                                                       
  |======                                                           |   9%
  |                                                                       
  |=======                                                          |  10%
  |                                                                       
  |======================                                           |  33%
  |                                                                       
  |=====================================                            |  56%
  |                                                                       
  |====================================================             |  79%
  |                                                                       
  |================================================================ |  98%
  |                                                                       
  |=================================================================| 100%
perf.gbmh<-h2o.performance(gbm.modelh,test.h2o)
perf.gbmh
## H2OBinomialMetrics: gbm
## 
## MSE:  0.09860778
## RMSE:  0.3140188
## LogLoss:  0.3206876
## Mean Per-Class Error:  0.2120263
## AUC:  0.8630573
## Gini:  0.7261146
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##          0  1    Error     Rate
## 0      266 48 0.152866  =48/314
## 1       16 43 0.271186   =16/59
## Totals 282 91 0.171582  =64/373
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                       metric threshold    value idx
## 1                     max f1  0.189757 0.573333  90
## 2                     max f2  0.130895 0.693717 145
## 3               max f0point5  0.327346 0.598802  26
## 4               max accuracy  0.442757 0.876676  14
## 5              max precision  0.802184 1.000000   0
## 6                 max recall  0.049990 1.000000 284
## 7            max specificity  0.802184 1.000000   0
## 8           max absolute_mcc  0.169135 0.496486 104
## 9 max min_per_class_accuracy  0.169135 0.796610 104
## 10 max mean_per_class_accuracy  0.169135 0.805948 104
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `
h2o.sensitivity(perf.gbmh,0.5)
## Warning in h2o.find_row_by_threshold(object, t): Could not find exact
## threshold: 0.5 for this set of metrics; using closest threshold found:
## 0.501205344484314. Run `h2o.predict` and apply your desired threshold on a
## probability column.
## [[1]]
## [1] 0.1355932
h2o.auc(perf.gbmh)
## [1] 0.8630573

This model correctly predicts 43 top 10 hits, which is 10 more than the number predicted by Model 3. Moreover, the AUC metric is higher than the one obtained from Model 3.

As seen above, H2O’s API provides the ability to obtain key statistical measures required to analyze the models easily, using several built-in functions. The record label can experiment with different parameters to arrive at the model that predicts the maximum number of Top 10 hits at the desired level of accuracy and threshold.

H2O also allows you to experiment with deep learning models. Deep learning models have the ability to learn features implicitly, but can be more expensive computationally.

Now, create a deep learning model with the h2o.deeplearning function, using the same training and test datasets created before. The time taken to run this model depends on the type of EC2 instance chosen for this purpose.  For models that require more computation, consider using accelerated computing instances such as the P2 instance type.

system.time(
  dlearning.modelh <- h2o.deeplearning(y = y.dep,
                                      x = x.indep,
                                      training_frame = train.h2o,
                                      epoch = 250,
                                      hidden = c(250,250),
                                      activation = "Rectifier",
                                      seed = 1122,
                                      distribution="multinomial"
  )
)
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |===                                                              |   4%
  |                                                                       
  |=====                                                            |   8%
  |                                                                       
  |========                                                         |  12%
  |                                                                       
  |==========                                                       |  16%
  |                                                                       
  |=============                                                    |  20%
  |                                                                       
  |================                                                 |  24%
  |                                                                       
  |==================                                               |  28%
  |                                                                       
  |=====================                                            |  32%
  |                                                                       
  |=======================                                          |  36%
  |                                                                       
  |==========================                                       |  40%
  |                                                                       
  |=============================                                    |  44%
  |                                                                       
  |===============================                                  |  48%
  |                                                                       
  |==================================                               |  52%
  |                                                                       
  |====================================                             |  56%
  |                                                                       
  |=======================================                          |  60%
  |                                                                       
  |==========================================                       |  64%
  |                                                                       
  |============================================                     |  68%
  |                                                                       
  |===============================================                  |  72%
  |                                                                       
  |=================================================                |  76%
  |                                                                       
  |====================================================             |  80%
  |                                                                       
  |=======================================================          |  84%
  |                                                                       
  |=========================================================        |  88%
  |                                                                       
  |============================================================     |  92%
  |                                                                       
  |==============================================================   |  96%
  |                                                                       
  |=================================================================| 100%
##    user  system elapsed 
##   1.216   0.020 166.508
perf.dl<-h2o.performance(model=dlearning.modelh,newdata=test.h2o)
perf.dl
## H2OBinomialMetrics: deeplearning
## 
## MSE:  0.1678359
## RMSE:  0.4096778
## LogLoss:  1.86509
## Mean Per-Class Error:  0.3433013
## AUC:  0.7568822
## Gini:  0.5137644
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##          0  1    Error     Rate
## 0      290 24 0.076433  =24/314
## 1       36 23 0.610169   =36/59
## Totals 326 47 0.160858  =60/373
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                       metric threshold    value idx
## 1                     max f1  0.826267 0.433962  46
## 2                     max f2  0.000000 0.588235 239
## 3               max f0point5  0.999929 0.511811  16
## 4               max accuracy  0.999999 0.865952  10
## 5              max precision  1.000000 1.000000   0
## 6                 max recall  0.000000 1.000000 326
## 7            max specificity  1.000000 1.000000   0
## 8           max absolute_mcc  0.999929 0.363219  16
## 9 max min_per_class_accuracy  0.000004 0.662420 145
## 10 max mean_per_class_accuracy  0.000000 0.685334 224
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`
h2o.sensitivity(perf.dl,0.5)
## Warning in h2o.find_row_by_threshold(object, t): Could not find exact
## threshold: 0.5 for this set of metrics; using closest threshold found:
## 0.496293348880151. Run `h2o.predict` and apply your desired threshold on a
## probability column.
## [[1]]
## [1] 0.3898305
h2o.auc(perf.dl)
## [1] 0.7568822

The AUC metric for this model is 0.7568822, which is less than what you got from the earlier models. I recommend further experimentation using different hyper parameters, such as the learning rate, epoch or the number of hidden layers.

H2O’s built-in functions provide many key statistical measures that can help measure model performance. Here are some of these key terms.

Metric Description
Sensitivity Measures the proportion of positives that have been correctly identified. It is also called the true positive rate, or recall.
Specificity Measures the proportion of negatives that have been correctly identified. It is also called the true negative rate.
Threshold Cutoff point that maximizes specificity and sensitivity. While the model may not provide the highest prediction at this point, it would not be biased towards positives or negatives.
Precision The fraction of the documents retrieved that are relevant to the information needed, for example, how many of the positively classified are relevant
AUC

Provides insight into how well the classifier is able to separate the two classes. The implicit goal is to deal with situations where the sample distribution is highly skewed, with a tendency to overfit to a single class.

0.90 – 1 = excellent (A)

0.8 – 0.9 = good (B)

0.7 – 0.8 = fair (C)

.6 – 0.7 = poor (D)

0.5 – 0.5 = fail (F)

Here’s a summary of the metrics generated from H2O’s built-in functions for the three models that produced useful results.

Metric Model 3 GBM Model Deep Learning Model

Accuracy

(max)

0.882038

(t=0.435479)

0.876676

(t=0.442757)

0.865952

(t=0.999999)

Precision

(max)

1.0

(t=0.821606)

1.0

(t=0802184)

1.0

(t=1.0)

Recall

(max)

1.0 1.0

1.0

(t=0)

Specificity

(max)

1.0 1.0

1.0

(t=1)

Sensitivity

 

0.2033898 0.1355932

0.3898305

(t=0.5)

AUC 0.8492389 0.8630573 0.756882

Note: ‘t’ denotes threshold.

Your options at this point could be narrowed down to Model 3 and the GBM model, based on the AUC and accuracy metrics observed earlier.  If the slightly lower accuracy of the GBM model is deemed acceptable, the record label can choose to go to production with the GBM model, as it can predict a higher number of Top 10 hits.  The AUC metric for the GBM model is also higher than that of Model 3.

Record labels can experiment with different learning techniques and parameters before arriving at a model that proves to be the best fit for their business. Because deep learning models can be computationally expensive, record labels can choose more powerful EC2 instances on AWS to run their experiments faster.

Conclusion

In this post, I showed how the popular music industry can use analytics to predict the type of songs that make the Top 10 Billboard charts. By running H2O’s scalable machine learning platform on AWS, data scientists can easily experiment with multiple modeling techniques and interactively query the data using Amazon Athena, without having to manage the underlying infrastructure. This helps record labels make critical decisions on the type of artists and songs to promote in a timely fashion, thereby increasing sales and revenue.

If you have questions or suggestions, please comment below.


Additional Reading

Learn how to build and explore a simple geospita simple GEOINT application using SparkR.


About the Authors

gopalGopal Wunnava is a Partner Solution Architect with the AWS GSI Team. He works with partners and customers on big data engagements, and is passionate about building analytical solutions that drive business capabilities and decision making. In his spare time, he loves all things sports and movies related and is fond of old classics like Asterix, Obelix comics and Hitchcock movies.

 

 

Bob Strahan, a Senior Consultant with AWS Professional Services, contributed to this post.

 

 

Roku Shows FBI Warning to Pirate Channel Users

Post Syndicated from Ernesto original https://torrentfreak.com/roku-shows-fbi-warning-to-pirate-channel-users-171009/

In recent years it has become much easier to stream movies and TV-shows over the Internet.

Legal services such as Netflix and HBO are flourishing, but at the same time millions of people are streaming from unauthorized sources, often paired with perfectly legal streaming platforms and devices.

Hollywood insiders have dubbed this trend “Piracy 3.0” and are actively working with stakeholders to address the threat. One of the companies rightsholders are working with is Roku, known for its easy-to-use media players.

Earlier this year a Mexican court ordered retailers to take the Roku media player off the shelves. This legal battle is still ongoing, but it was a clear signal to the company, which now has its own anti-piracy team.

Several third-party “private” channels have been removed from the player in recent weeks as they violate Roku’s terms and conditions. These include the hugely popular streaming channel XTV, which offered access to infringing content.

After its removal, XTV briefly returned as XTV 2, but that didn’t last for long. The infringing channel was soon removed again, this time showing the FBI’s anti-piracy seal followed by a rather ominous message.

“FBI Anti-Piracy Warning: Unauthorized copying is punishable under federal law,” it reads. “Roku has removed this unauthorized service due to repeated claims of copyright infringement.”

FBI Warning (via Cordcuttersnews)

The unusual warning was picked up by Cordcuttersnews and states that Roku itself removed the channel.

To some it may seem that the FBI is cracking down on Roku channels, but this is not the case. The anti-piracy seal and associated warning are often used in cases where the organization is not actively involved, to add extra weight. The FBI supports this, as long as certain standards are met.

A Roku spokesperson confirmed to TorrentFreak that they’re using it on their own accord here.

“We want to send a clear message to Roku customers and to publishers that any publication of pirated content on our platform is a violation of law and our platform rules,” the company says.

“We have recently expanded the messaging that we display to customers that install non-certified channels to alert them to the associated risks, and we display the FBI’s publicly available warning when we remove channels for copyright violations.”

The strong language shows that Roku is taking its efforts to crack down on infringing channels very seriously. A few weeks ago the company started to warn users that pirate channels may be removed without prior notice.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Now Available – Amazon Linux AMI 2017.09

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/now-available-amazon-linux-ami-2017-09/

I’m happy to announce that the latest version of the Amazon Linux AMI (2017.09) is now available in all AWS Regions for all current-generation EC2 instances. The AMI contains a supported and maintained Linux image that is designed to provide a stable, secure, high performance environment for applications running on EC2.

Easy Upgrade
You can upgrade your existing instances by running two commands and then rebooting:

$ sudo yum clean all
$ sudo yum update

Lots of Goodies
The AMI contains many new features, many of which were added in response to requests from our customers. Here’s a summary:

Kernel 4.9.51 – Based on the 4.9 stable kernel series, this kernel includes the ENA 1.3.0 driver along with support for TCP Bottleneck Bandwidth and RTT (BBR). Read my post, Elastic Network Adapter – High-Performance Network Interface for Amazon EC2 to learn more about ENA. Read the Release Notes to learn how to enable BBR.

Amazon SSM Agent – The Amazon SSM Agent is now installed by default. This means that you can now use EC2 Run Command to configure and run scripts on your instances with no further setup. To learn more, read Executing Commands Using Systems Manager Run Command or Manage Instances at Scale Without SSH Access Using EC2 Run Command.

Python 3.6 – The newest version of Python is now included and can be managed via virtualenv and alternatives. You can install Python 3.6 like this:

$ sudo yum install python36 python36-virtualenv python36-pip

Ruby 2.4 – The latest version of Ruby in the 2.4 series is now available. Install it like this:

$ sudo yum install ruby24

OpenSSL – The AMI now uses OpenSSL 1.0.2k.

HTTP/2 – The HTTP/2 protocol is now supported by the AMI’s httpd24, nginx, and curl packages.

Relational DatabasesPostgres 9.6 and MySQL 5.7 are now available, and can be installed like this:

$ sudo yum install postgresql96
$ sudo yum install mysql57

OpenMPI – The OpenMPI package has been upgraded from 1.6.4 to 2.1.1. OpenMPI compatibility packages are available and can be used to build and run older OpenMPI applications.

And More – Other updated packages include Squid 3.5, Nginx 1.12, Tomcat 8.5, and GCC 6.4.

Launch it Today
You can use this AMI to launch EC2 instances in all AWS Regions today. It is available for EBS-backed and Instance Store-backed instances and supports HVM and PV modes.

Jeff;

Evergreen 3.0.0 released

Post Syndicated from ris original https://lwn.net/Articles/735379/rss

The Evergreen community has announced the
release
of Evergreen 3.0.0, software for libraries. This release
includes community support of the web staff client for production use,
serials and offline circulation modules for the web staff client,
improvements to the display of headings in the public catalog browse list,
and more.

‘China Should Crack Down on Pirate Streaming Box Distributors’

Post Syndicated from Ernesto original https://torrentfreak.com/china-should-crack-down-on-pirate-streaming-box-distributors-171001/

The International Intellectual Property Alliance (IIPA) has informed the U.S. Government that China must step up its game to better protect the interests of copyright holders.

The US Trade Representative is reviewing whether China has done enough to comply with its WTO obligations, but IIPA members including RIAA and MPAA believe there is still work to be done.

One of the areas to which the Chinese Government should pay more attention is enforcement. Although a lot of progress has been made in recent years, especially in combating music piracy, new threats have emerged.

One of the areas highlighted by IIPA is the streaming box ecosystem, aptly dubbed as “piracy 3.0” by the Motion Picture Association. This appeals to a new breed of pirates who rely on set-top boxes which are filled with pirate add-ons.

Industry groups often refer to these boxes as Illicit Streaming Devices (ISDs) and they see China as a major hub through which these are shipped around the world.

“ISDs are media boxes, set-top boxes or other devices that allow users, through the use of piracy apps, to stream, download, or otherwise access unauthorized content from the Internet,” IIPA writes.

“These devices have emerged as a significant means through which pirated motion picture and television content is accessed on televisions in homes in China as well as elsewhere in Asia and increasingly around the world. China is a hub for the manufacture of these devices.”

Although the hardware and media players are perfectly legal, things get problematic when they’re loaded with pirate add-ons and promoted as tools to facilitate copyright infringement.

IIPA states that the Chinese Government should do more to stop these devices from being sold. Cracking down on the main distribution points would be a good start, they say.

“However it is done, the Chinese government must increase enforcement efforts, including cracking down on piracy apps and on device retailers and/or distributors who preload the devices with apps that facilitate infringement.

“Moreover, because China is the main source of this problem spreading across Asia, the Chinese government should take immediate actions against key distribution points for devices that are being used illegally,” IIPA adds.

In addition to pirate boxes, the industry groups also want China to beef up its enforcement against online journal piracy, pirate apps, unauthorized camcording, and unlicensed streaming platforms.

IIPA intends to explain the above and several other shortcomings in detail during a hearing in Washington, DC, next Wednesday. The group has submitted an overview of its testimony to the Trade Representative, which is available here (pdf).

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Creating a Cost-Efficient Amazon ECS Cluster for Scheduled Tasks

Post Syndicated from Nathan Taber original https://aws.amazon.com/blogs/compute/creating-a-cost-efficient-amazon-ecs-cluster-for-scheduled-tasks/

Madhuri Peri
Sr. DevOps Consultant

When you use Amazon Relational Database Service (Amazon RDS), depending on the logging levels on the RDS instances and the volume of transactions, you could generate a lot of log data. To ensure that everything is running smoothly, many customers search for log error patterns using different log aggregation and visualization systems, such as Amazon Elasticsearch Service, Splunk, or other tool of their choice. A module needs to periodically retrieve the RDS logs using the SDK, and then send them to Amazon S3. From there, you can stream them to your log aggregation tool.

One option is writing an AWS Lambda function to retrieve the log files. However, because of the time that this function needs to execute, depending on the volume of log files retrieved and transferred, it is possible that Lambda could time out on many instances.  Another approach is launching an Amazon EC2 instance that runs this job periodically. However, this would require you to run an EC2 instance continuously, not an optimal use of time or money.

Using the new Amazon CloudWatch integration with Amazon EC2 Container Service, you can trigger this job to run in a container on an existing Amazon ECS cluster. Additionally, this would allow you to improve costs by running containers on a fleet of Spot Instances.

In this post, I will show you how to use the new scheduled tasks (cron) feature in Amazon ECS and launch tasks using CloudWatch events, while leveraging Spot Fleet to maximize availability and cost optimization for containerized workloads.

Architecture

The following diagram shows how the various components described schedule a task that retrieves log files from Amazon RDS database instances, and deposits the logs into an S3 bucket.

Amazon ECS cluster container instances are using Spot Fleet, which is a perfect match for the workload that needs to run when it can. This improves cluster costs.

The task definition defines which Docker image to retrieve from the Amazon EC2 Container Registry (Amazon ECR) repository and run on the Amazon ECS cluster.

The container image has Python code functions to make AWS API calls using boto3. It iterates over the RDS database instances, retrieves the logs, and deposits them in the S3 bucket. Many customers choose these logs to be delivered to their centralized log-store. CloudWatch Events defines the schedule for when the container task has to be launched.

Walkthrough

To provide the basic framework, we have built an AWS CloudFormation template that creates the following resources:

  • Amazon ECR repository for storing the Docker image to be used in the task definition
  • S3 bucket that holds the transferred logs
  • Task definition, with image name and S3 bucket as environment variables provided via input parameter
  • CloudWatch Events rule
  • Amazon ECS cluster
  • Amazon ECS container instances using Spot Fleet
  • IAM roles required for the container instance profiles

Before you begin

Ensure that Git, Docker, and the AWS CLI are installed on your computer.

In your AWS account, instantiate one Amazon Aurora instance using the console. For more information, see Creating an Amazon Aurora DB Cluster.

Implementation Steps

  1. Clone the code from GitHub that performs RDS API calls to retrieve the log files.
    git clone https://github.com/awslabs/aws-ecs-scheduled-tasks.git
  2. Build and tag the image.
    cd aws-ecs-scheduled-tasks/container-code/src && ls

    Dockerfile		rdslogsshipper.py	requirements.txt

    docker build -t rdslogsshipper .

    Sending build context to Docker daemon 9.728 kB
    Step 1 : FROM python:3
     ---> 41397f4f2887
    Step 2 : WORKDIR /usr/src/app
     ---> Using cache
     ---> 59299c020e7e
    Step 3 : COPY requirements.txt ./
     ---> 8c017e931c3b
    Removing intermediate container df09e1bed9f2
    Step 4 : COPY rdslogsshipper.py /usr/src/app
     ---> 099a49ca4325
    Removing intermediate container 1b1da24a6699
    Step 5 : RUN pip install --no-cache-dir -r requirements.txt
     ---> Running in 3ed98b30901d
    Collecting boto3 (from -r requirements.txt (line 1))
      Downloading boto3-1.4.6-py2.py3-none-any.whl (128kB)
    Collecting botocore (from -r requirements.txt (line 2))
      Downloading botocore-1.6.7-py2.py3-none-any.whl (3.6MB)
    Collecting s3transfer<0.2.0,>=0.1.10 (from boto3->-r requirements.txt (line 1))
      Downloading s3transfer-0.1.10-py2.py3-none-any.whl (54kB)
    Collecting jmespath<1.0.0,>=0.7.1 (from boto3->-r requirements.txt (line 1))
      Downloading jmespath-0.9.3-py2.py3-none-any.whl
    Collecting python-dateutil<3.0.0,>=2.1 (from botocore->-r requirements.txt (line 2))
      Downloading python_dateutil-2.6.1-py2.py3-none-any.whl (194kB)
    Collecting docutils>=0.10 (from botocore->-r requirements.txt (line 2))
      Downloading docutils-0.14-py3-none-any.whl (543kB)
    Collecting six>=1.5 (from python-dateutil<3.0.0,>=2.1->botocore->-r requirements.txt (line 2))
      Downloading six-1.10.0-py2.py3-none-any.whl
    Installing collected packages: six, python-dateutil, docutils, jmespath, botocore, s3transfer, boto3
    Successfully installed boto3-1.4.6 botocore-1.6.7 docutils-0.14 jmespath-0.9.3 python-dateutil-2.6.1 s3transfer-0.1.10 six-1.10.0
     ---> f892d3cb7383
    Removing intermediate container 3ed98b30901d
    Step 6 : COPY . .
     ---> ea7550c04fea
    Removing intermediate container b558b3ebd406
    Successfully built ea7550c04fea
  3. Run the CloudFormation stack and get the names for the Amazon ECR repo and S3 bucket. In the stack, choose Outputs.
  4. Open the ECS console and choose Repositories. The rdslogs repo has been created. Choose View Push Commands and follow the instructions to connect to the repository and push the image for the code that you built in Step 2. The screenshot shows the final result:
  5. Associate the CloudWatch scheduled task with the created Amazon ECS Task Definition, using a new CloudWatch event rule that is scheduled to run at intervals. The following rule is scheduled to run every 15 minutes:
    aws --profile default --region us-west-2 events put-rule --name demo-ecs-task-rule  --schedule-expression "rate(15 minutes)"

    {
        "RuleArn": "arn:aws:events:us-west-2:12345678901:rule/demo-ecs-task-rule"
    }
  6. CloudWatch requires IAM permissions to place a task on the Amazon ECS cluster when the CloudWatch event rule is executed, in addition to an IAM role that can be assumed by CloudWatch Events. This is done in three steps:
    1. Create the IAM role to be assumed by CloudWatch.
      aws --profile default --region us-west-2 iam create-role --role-name Test-Role --assume-role-policy-document file://event-role.json

      {
          "Role": {
              "AssumeRolePolicyDocument": {
                  "Version": "2012-10-17", 
                  "Statement": [
                      {
                          "Action": "sts:AssumeRole", 
                          "Effect": "Allow", 
                          "Principal": {
                              "Service": "events.amazonaws.com"
                          }
                      }
                  ]
              }, 
              "RoleId": "AROAIRYYLDCVZCUACT7FS", 
              "CreateDate": "2017-07-14T22:44:52.627Z", 
              "RoleName": "Test-Role", 
              "Path": "/", 
              "Arn": "arn:aws:iam::12345678901:role/Test-Role"
          }
      }

      The following is an example of the event-role.json file used earlier:

      {
          "Version": "2012-10-17",
          "Statement": [
              {
                  "Effect": "Allow",
                  "Principal": {
                    "Service": "events.amazonaws.com"
                  },
                  "Action": "sts:AssumeRole"
              }
          ]
      }
    2. Create the IAM policy defining the ECS cluster and task definition. You need to get these values from the CloudFormation outputs and resources.
      aws --profile default --region us-west-2 iam create-policy --policy-name test-policy --policy-document file://event-policy.json

      {
          "Policy": {
              "PolicyName": "test-policy", 
              "CreateDate": "2017-07-14T22:51:20.293Z", 
              "AttachmentCount": 0, 
              "IsAttachable": true, 
              "PolicyId": "ANPAI7XDIQOLTBUMDWGJW", 
              "DefaultVersionId": "v1", 
              "Path": "/", 
              "Arn": "arn:aws:iam::123455678901:policy/test-policy", 
              "UpdateDate": "2017-07-14T22:51:20.293Z"
          }
      }

      The following is an example of the event-policy.json file used earlier:

      {
          "Version": "2012-10-17",
          "Statement": [
            {
                "Effect": "Allow",
                "Action": [
                    "ecs:RunTask"
                ],
                "Resource": [
                    "arn:aws:ecs:*::task-definition/"
                ],
                "Condition": {
                    "ArnLike": {
                        "ecs:cluster": "arn:aws:ecs:*::cluster/"
                    }
                }
            }
          ]
      }
    3. Attach the IAM policy to the role.
      aws --profile default --region us-west-2 iam attach-role-policy --role-name Test-Role --policy-arn arn:aws:iam::1234567890:policy/test-policy
  7. Associate the CloudWatch rule created earlier to place the task on the ECS cluster. The following command shows an example. Replace the AWS account ID and region with your settings.
    aws events put-targets --rule demo-ecs-task-rule --targets "Id"="1","Arn"="arn:aws:ecs:us-west-2:12345678901:cluster/test-cwe-blog-ecsCluster-15HJFWCH4SP67","EcsParameters"={"TaskDefinitionArn"="arn:aws:ecs:us-west-2:12345678901:task-definition/test-cwe-blog-taskdef:8"},"RoleArn"="arn:aws:iam::12345678901:role/Test-Role"

    {
        "FailedEntries": [], 
        "FailedEntryCount": 0
    }

That’s it. The logs now run based on the defined schedule.

To test this, open the Amazon ECS console, select the Amazon ECS cluster that you created, and then choose Tasks, Run New Task. Select the task definition created by the CloudFormation template, and the cluster should be selected automatically. As this runs, the S3 bucket should be populated with the RDS logs for the instance.

Conclusion

In this post, you’ve seen that the choices for workloads that need to run at a scheduled time include Lambda with CloudWatch events or EC2 with cron. However, sometimes the job could run outside of Lambda execution time limits or be not cost-effective for an EC2 instance.

In such cases, you can schedule the tasks on an ECS cluster using CloudWatch rules. In addition, you can use a Spot Fleet cluster with Amazon ECS for cost-conscious workloads that do not have hard requirements on execution time or instance availability in the Spot Fleet. For more information, see Powering your Amazon ECS Cluster with Amazon EC2 Spot Instances and Scheduled Events.

If you have questions or suggestions, please comment below.

Announcing Intel Clear Containers 3.0

Post Syndicated from ris original https://lwn.net/Articles/734648/rss

The Clear Containers team at Intel has announced
the release
of Clear Containers 3.0. “Completely rewritten and refactored, Clear Containers 3.0 uses Go language instead of C and introduces many new components and features. The 3.0 release of Clear Containers brings better integration into the container ecosystem and an ability to leverage code used for namespace based containers.

Delivering Graphics Apps with Amazon AppStream 2.0

Post Syndicated from Deepak Suryanarayanan original https://aws.amazon.com/blogs/compute/delivering-graphics-apps-with-amazon-appstream-2-0/

Sahil Bahri, Sr. Product Manager, Amazon AppStream 2.0

Do you need to provide a workstation class experience for users who run graphics apps? With Amazon AppStream 2.0, you can stream graphics apps from AWS to a web browser running on any supported device. AppStream 2.0 offers a choice of GPU instance types. The range includes the newly launched Graphics Design instance, which allows you to offer a fast, fluid user experience at a fraction of the cost of using a graphics workstation, without upfront investments or long-term commitments.

In this post, I discuss the Graphics Design instance type in detail, and how you can use it to deliver a graphics application such as Siemens NX―a popular CAD/CAM application that we have been testing on AppStream 2.0 with engineers from Siemens PLM.

Graphics Instance Types on AppStream 2.0

First, a quick recap on the GPU instance types available with AppStream 2.0. In July, 2017, we launched graphics support for AppStream 2.0 with two new instance types that Jeff Barr discussed on the AWS Blog:

  • Graphics Desktop
  • Graphics Pro

Many customers in industries such as engineering, media, entertainment, and oil and gas are using these instances to deliver high-performance graphics applications to their users. These instance types are based on dedicated NVIDIA GPUs and can run the most demanding graphics applications, including those that rely on CUDA graphics API libraries.

Last week, we added a new lower-cost instance type: Graphics Design. This instance type is a great fit for engineers, 3D modelers, and designers who use graphics applications that rely on the hardware acceleration of DirectX, OpenGL, or OpenCL APIs, such as Siemens NX, Autodesk AutoCAD, or Adobe Photoshop. The Graphics Design instance is based on AMD’s FirePro S7150x2 Server GPUs and equipped with AMD Multiuser GPU technology. The instance type uses virtualized GPUs to achieve lower costs, and is available in four instance sizes to scale and match the requirements of your applications.

Instance vCPUs Instance RAM (GiB) GPU Memory (GiB)
stream.graphics-design.large 2 7.5 GiB 1
stream.graphics-design.xlarge 4 15.3 GiB 2
stream.graphics-design.2xlarge 8 30.5 GiB 4
stream.graphics-design.4xlarge 16 61 GiB 8

The following table compares all three graphics instance types on AppStream 2.0, along with example applications you could use with each.

  Graphics Design Graphics Desktop Graphics Pro
Number of instance sizes 4 1 3
GPU memory range
1–8 GiB 4 GiB 8–32 GiB
vCPU range 2–16 8 16–32
Memory range 7.5–61 GiB 15 GiB 122–488 GiB
Graphics libraries supported AMD FirePro S7150x2 NVIDIA GRID K520 NVIDIA Tesla M60
Price range (N. Virginia AWS Region) $0.25 – $2.00/hour $0.5/hour $2.05 – $8.20/hour
Example applications Adobe Premiere Pro, AutoDesk Revit, Siemens NX AVEVA E3D, SOLIDWORKS AutoDesk Maya, Landmark DecisionSpace, Schlumberger Petrel

Example graphics instance set up with Siemens NX

In the section, I walk through setting up Siemens NX with Graphics Design instances on AppStream 2.0. After set up is complete, users can able to access NX from within their browser and also access their design files from a file share. You can also use these steps to set up and test your own graphics applications on AppStream 2.0. Here’s the workflow:

  1. Create a file share to load and save design files.
  2. Create an AppStream 2.0 image with Siemens NX installed.
  3. Create an AppStream 2.0 fleet and stack.
  4. Invite users to access Siemens NX through a browser.
  5. Validate the setup.

To learn more about AppStream 2.0 concepts and set up, see the previous post Scaling Your Desktop Application Streams with Amazon AppStream 2.0. For a deeper review of all the setup and maintenance steps, see Amazon AppStream 2.0 Developer Guide.

Step 1: Create a file share to load and save design files

To launch and configure the file server

  1. Open the EC2 console and choose Launch Instance.
  2. Scroll to the Microsoft Windows Server 2016 Base Image and choose Select.
  3. Choose an instance type and size for your file server (I chose the general purpose m4.large instance). Choose Next: Configure Instance Details.
  4. Select a VPC and subnet. You launch AppStream 2.0 resources in the same VPC. Choose Next: Add Storage.
  5. If necessary, adjust the size of your EBS volume. Choose Review and Launch, Launch.
  6. On the Instances page, give your file server a name, such as My File Server.
  7. Ensure that the security group associated with the file server instance allows for incoming traffic from the security group that you select for your AppStream 2.0 fleets or image builders. You can use the default security group and select the same group while creating the image builder and fleet in later steps.

Log in to the file server using a remote access client such as Microsoft Remote Desktop. For more information about connecting to an EC2 Windows instance, see Connect to Your Windows Instance.

To enable file sharing

  1. Create a new folder (such as C:\My Graphics Files) and upload the shared files to make available to your users.
  2. From the Windows control panel, enable network discovery.
  3. Choose Server Manager, File and Storage Services, Volumes.
  4. Scroll to Shares and choose Start the Add Roles and Features Wizard. Go through the wizard to install the File Server and Share role.
  5. From the left navigation menu, choose Shares.
  6. Choose Start the New Share Wizard to set up your folder as a file share.
  7. Open the context (right-click) menu on the share and choose Properties, Permissions, Customize Permissions.
  8. Choose Permissions, Add. Add Read and Execute permissions for everyone on the network.

Step 2:  Create an AppStream 2.0 image with Siemens NX installed

To connect to the image builder and install applications

  1. Open the AppStream 2.0 management console and choose Images, Image Builder, Launch Image Builder.
  2. Create a graphics design image builder in the same VPC as your file server.
  3. From the Image builder tab, select your image builder and choose Connect. This opens a new browser tab and display a desktop to log in to.
  4. Log in to your image builder as ImageBuilderAdmin.
  5. Launch the Image Assistant.
  6. Download and install Siemens NX and other applications on the image builder. I added Blender and Firefox, but you could replace these with your own applications.
  7. To verify the user experience, you can test the application performance on the instance.

Before you finish creating the image, you must mount the file share by enabling a few Microsoft Windows services.

To mount the file share

  1. Open services.msc and check the following services:
  • DNS Client
  • Function Discovery Resource Publication
  • SSDP Discovery
  • UPnP Device H
  1. If any of the preceding services have Startup Type set to Manual, open the context (right-click) menu on the service and choose Start. Otherwise, open the context (right-click) menu on the service and choose Properties. For Startup Type, choose Manual, Apply. To start the service, choose Start.
  2. From the Windows control panel, enable network discovery.
  3. Create a batch script that mounts a file share from the storage server set up earlier. The file share is mounted automatically when a user connects to the AppStream 2.0 environment.

Logon Script Location: C:\Users\Public\logon.bat

Script Contents:

:loop

net use H: \\path\to\network\share 

PING localhost -n 30 >NUL

IF NOT EXIST H:\ GOTO loop

  1. Open gpedit.msc and choose User Configuration, Windows Settings, Scripts. Set logon.bat as the user logon script.
  2. Next, create a batch script that makes the mounted drive visible to the user.

Logon Script Location: C:\Users\Public\startup.bat

Script Contents:
REG DELETE “HKEY_LOCAL_MACHINE\Software\Microsoft\Windows\CurrentVersion\Policies\Explorer” /v “NoDrives” /f

  1. Open Task Scheduler and choose Create Task.
  2. Choose General, provide a task name, and then choose Change User or Group.
  3. For Enter the object name to select, enter SYSTEM and choose Check Names, OK.
  4. Choose Triggers, New. For Begin the task, choose At startup. Under Advanced Settings, change Delay task for to 5 minutes. Choose OK.
  5. Choose Actions, New. Under Settings, for Program/script, enter C:\Users\Public\startup.bat. Choose OK.
  6. Choose Conditions. Under Power, clear the Start the task only if the computer is on AC power Choose OK.
  7. To view your scheduled task, choose Task Scheduler Library. Close Task Scheduler when you are done.

Step 3:  Create an AppStream 2.0 fleet and stack

To create a fleet and stack

  1. In the AppStream 2.0 management console, choose Fleets, Create Fleet.
  2. Give the fleet a name, such as Graphics-Demo-Fleet, that uses the newly created image and the same VPC as your file server.
  3. Choose Stacks, Create Stack. Give the stack a name, such as Graphics-Demo-Stack.
  4. After the stack is created, select it and choose Actions, Associate Fleet. Associate the stack with the fleet you created in step 1.

Step 4:  Invite users to access Siemens NX through a browser

To invite users

  1. Choose User Pools, Create User to create users.
  2. Enter a name and email address for each user.
  3. Select the users just created, and choose Actions, Assign Stack to provide access to the stack created in step 2. You can also provide access using SAML 2.0 and connect to your Active Directory if necessary. For more information, see the Enabling Identity Federation with AD FS 3.0 and Amazon AppStream 2.0 post.

Your user receives an email invitation to set up an account and use a web portal to access the applications that you have included in your stack.

Step 5:  Validate the setup

Time for a test drive with Siemens NX on AppStream 2.0!

  1. Open the link for the AppStream 2.0 web portal shared through the email invitation. The web portal opens in your default browser. You must sign in with the temporary password and set a new password. After that, you get taken to your app catalog.
  2. Launch Siemens NX and interact with it using the demo files available in the shared storage folder – My Graphics Files. 

After I launched NX, I captured the screenshot below. The Siemens PLM team also recorded a video with NX running on AppStream 2.0.

Summary

In this post, I discussed the GPU instances available for delivering rich graphics applications to users in a web browser. While I demonstrated a simple setup, you can scale this out to launch a production environment with users signing in using Active Directory credentials,  accessing persistent storage with Amazon S3, and using other commonly requested features reviewed in the Amazon AppStream 2.0 Launch Recap – Domain Join, Simple Network Setup, and Lots More post.

To learn more about AppStream 2.0 and capabilities added this year, see Amazon AppStream 2.0 Resources.

The 4.13 kernel is out

Post Syndicated from corbet original https://lwn.net/Articles/732793/rss

Linus has released the 4.13 kernel, right on schedule.
Headline features in this release include
kernel hardening via structure layout
randomization
,
native TLS protocol support,
better huge-page swapping,
improved handling of writeback errors,
better asynchronous I/O support,
better power management via next-interrupt
prediction
,
the elimination of the DocBook toolchain for formatted documentation,
and more. There is one other change that is called out explicitly in the
announcement: “The change in question is simply changing the default cifs behavior:
instead of defaulting to SMB 1.0 (which you really should not use:
just google for ‘stop using SMB1’ or similar), the default cifs mount
now defaults to a rather more modern SMB 3.0.

Police Confiscate 245 ‘Pirate’ Media Players

Post Syndicated from Ernesto original https://torrentfreak.com/police-confiscate-245-pirate-media-players-170829/

More and more people are starting to use “fully-loaded” set-top boxes to stream video content directly to their TVs.

Although the media players themselves can be used for perfectly legal means, third-party add-ons turn them into pirate machines, providing access to movies, TV-shows and IPTV channels.

Over the past several years, there has been little enforcement effort on this front. However, this changed earlier this year, when the European Court of Justice ruled that selling devices pre-configured to obtain copyright-infringing content is illegal.

The hardware can still be sold and media player software such as Kodi is legal too, but vendors who ship boxes with pirate add-ons could get a letter or visit from rightsholders. Dutch anti-piracy outfit BREIN is particularly active on this front and has convinced hundreds of sellers to clean up shop.

One of these vendors, located in The Hague, recently promised that it would stop offering these boxes. However, BREIN discovered that while the pirate media players disappeared from the online store, they were still sold in the bricks-and-mortar store.

The anti-piracy group obviously wasn’t happy with this and reported the shop owner to the local police, who went in and confiscated 245 “pirate” media players a few days ago.

“We summoned this merchant to stop but, despite his promise to do so, he continued. We have therefore reported it to the police. These players cause great damage because people no longer pay for the movies and series they watch,” BREIN director Tim Kuik says.

It is now up to the authorities to determine if any further action is needed. BREIN expects that the prosecutor’s office will try to settle the case with a fine, but if the vendor refuses to pay it may also lead to a prosecution. At the same time, BREIN also has the option to file a civil case.

Although BREIN’s actions usually don’t result in criminal prosecutions, the anti-piracy group continues to pressure people who are involved in selling and developing these platforms. Ultimately, they hope that this will deter others from getting involved.

Earlier this year the Motion Picture Association described pirate media players as a major threat, dubbing them “Piracy 3.0.” While this threat is far from over, it has definitely become riskier for people to get involved in developing and selling these boxes.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Raspbian Stretch has arrived for Raspberry Pi

Post Syndicated from Simon Long original https://www.raspberrypi.org/blog/raspbian-stretch/

It’s now just under two years since we released the Jessie version of Raspbian. Those of you who know that Debian run their releases on a two-year cycle will therefore have been wondering when we might be releasing the next version, codenamed Stretch. Well, wonder no longer – Raspbian Stretch is available for download today!

Disney Pixar Toy Story Raspbian Stretch Raspberry Pi

Debian releases are named after characters from Disney Pixar’s Toy Story trilogy. In case, like me, you were wondering: Stretch is a purple octopus from Toy Story 3. Hi, Stretch!

The differences between Jessie and Stretch are mostly under-the-hood optimisations, and you really shouldn’t notice any differences in day-to-day use of the desktop and applications. (If you’re really interested, the technical details are in the Debian release notes here.)

However, we’ve made a few small changes to our image that are worth mentioning.

New versions of applications

Version 3.0.1 of Sonic Pi is included – this includes a lot of new functionality in terms of input/output. See the Sonic Pi release notes for more details of exactly what has changed.

Raspbian Stretch Raspberry Pi

The Chromium web browser has been updated to version 60, the most recent stable release. This offers improved memory usage and more efficient code, so you may notice it running slightly faster than before. The visual appearance has also been changed very slightly.

Raspbian Stretch Raspberry Pi

Bluetooth audio

In Jessie, we used PulseAudio to provide support for audio over Bluetooth, but integrating this with the ALSA architecture used for other audio sources was clumsy. For Stretch, we are using the bluez-alsa package to make Bluetooth audio work with ALSA itself. PulseAudio is therefore no longer installed by default, and the volume plugin on the taskbar will no longer start and stop PulseAudio. From a user point of view, everything should still work exactly as before – the only change is that if you still wish to use PulseAudio for some other reason, you will need to install it yourself.

Better handling of other usernames

The default user account in Raspbian has always been called ‘pi’, and a lot of the desktop applications assume that this is the current user. This has been changed for Stretch, so now applications like Raspberry Pi Configuration no longer assume this to be the case. This means, for example, that the option to automatically log in as the ‘pi’ user will now automatically log in with the name of the current user instead.

One other change is how sudo is handled. By default, the ‘pi’ user is set up with passwordless sudo access. We are no longer assuming this to be the case, so now desktop applications which require sudo access will prompt for the password rather than simply failing to work if a user without passwordless sudo uses them.

Scratch 2 SenseHAT extension

In the last Jessie release, we added the offline version of Scratch 2. While Scratch 2 itself hasn’t changed for this release, we have added a new extension to allow the SenseHAT to be used with Scratch 2. Look under ‘More Blocks’ and choose ‘Add an Extension’ to load the extension.

This works with either a physical SenseHAT or with the SenseHAT emulator. If a SenseHAT is connected, the extension will control that in preference to the emulator.

Raspbian Stretch Raspberry Pi

Fix for Broadpwn exploit

A couple of months ago, a vulnerability was discovered in the firmware of the BCM43xx wireless chipset which is used on Pi 3 and Pi Zero W; this potentially allows an attacker to take over the chip and execute code on it. The Stretch release includes a patch that addresses this vulnerability.

There is also the usual set of minor bug fixes and UI improvements – I’ll leave you to spot those!

How to get Raspbian Stretch

As this is a major version upgrade, we recommend using a clean image; these are available from the Downloads page on our site as usual.

Upgrading an existing Jessie image is possible, but is not guaranteed to work in every circumstance. If you wish to try upgrading a Jessie image to Stretch, we strongly recommend taking a backup first – we can accept no responsibility for loss of data from a failed update.

To upgrade, first modify the files /etc/apt/sources.list and /etc/apt/sources.list.d/raspi.list. In both files, change every occurrence of the word ‘jessie’ to ‘stretch’. (Both files will require sudo to edit.)

Then open a terminal window and execute

sudo apt-get update
sudo apt-get -y dist-upgrade

Answer ‘yes’ to any prompts. There may also be a point at which the install pauses while a page of information is shown on the screen – hold the ‘space’ key to scroll through all of this and then hit ‘q’ to continue.

Finally, if you are not using PulseAudio for anything other than Bluetooth audio, remove it from the image by entering

sudo apt-get -y purge pulseaudio*

The post Raspbian Stretch has arrived for Raspberry Pi appeared first on Raspberry Pi.

ВАС, тричленен състав: Отнемането на лицензията на БиБиТи незаконно

Post Syndicated from nellyo original https://nellyo.wordpress.com/2017/08/16/cem_bbt-2/

Както вече е известно, през септември 2016 г. Съветът за електронни медии отне лицензиите за телевизионна дейност  на две търговски дружества  – ТВ Седем и Балкан Българска Телевизия.

На 7 август 2017 г.  петчленен състав на ВАС потвърди отнемането на лицензиите на ТВ Седем за две програми. Решението е окончателно.

На 14 август 2017 г. тричленен състав на ВАС с Решение 10470 се произнася и по решението на СЕМ за лицензията на БиБиТи  ЕАД  –  търговски доставчик на медийни услуги, притежаващ Индивидуална лицензия № ЛРР-01-3-016-01 за доставяне на аудио-визуална услуга с наименование  News 7.

За правното основание, възприето от СЕМ –  неверни декларации  – съдът пише следното:

В конкретния случай повече от очевидно е, че процесният казус не третира отказ за издаване на лицензия,а за прекратяването на вече издадена такава.Прекратяване и отнемането на лицензията, като отделни регулаторни правомощия на СЕМ са обект на регламентация в разпоредбите на чл. 121 и 122 ЗРТ, и в този смисъл е налице ясна и конкретна нормативна регулация на двете хипотези и те не следва да се извличат по тълкувателен път. Нито една от двете разпоредби не предвижда откриване на производство по несъстоятелност като основание за отнемане или прекратяване на вече издадена лицензия за доставяне на аудио-визуална услуга.

 
Съвсем логично

Настъпилите в последствие обстоятелства в правната сфера на лицензианта,не могат да бъдат приравнени на невярно деклариране към момента на кандидатстването за лицензията. Декларацията представлява документ с официален характер, който удостоверява факти и обстоятелства за предходен или настоящият момент. Чл.111, ал. 1, т.6 ЗРТ изрично предвижда кандидатите да декларират, ”че не са налице” а не, че няма да настъпят определени обстоятелства. Декларацията за наличие на конкретни обстоятелства няма характер на обещание занапред.

Съдът

ОТМЕНЯ Решение № РД-05-143 от 13.09.2016г. на Съвета за електронни медии с което се отнема и прекратява индивидуална лицензия № ЛЛР-01-3-016-01 за доставяне на аудио-визуална услуга с наименование News 7, издадена на Балкан Българска Телевизия ЕАД.

РЕШЕНИЕТО подлежи на обжалване пред петчленен състав на Върховния административен съд в 14-дневен срок от деня на съобщаването му на страните по делото, че е изготвено.

В някои медии неточно са приели, че решението за ТВ Седем, което наистина е окончателно, се отнася и до БиБиТи.

Filed under: BG Law Making, BG Media, BG Regulator, Media Law

Roku Gets Tough on Pirate Channels, Warns Users

Post Syndicated from Ernesto original https://torrentfreak.com/roku-gets-tough-on-pirate-channels-warns-users-170815/

In recent years it has become much easier to stream movies and TV-shows over the Internet.

Legal services such as Netflix and HBO are flourishing, but there’s also a darker side to this streaming epidemic. Millions of people are streaming from unauthorized sources, often paired with perfectly legal streaming platforms and devices.

Hollywood insiders have dubbed this trend “Piracy 3.0” are actively working with stakeholders to address the threat. One of the companies rightsholders are working with is Roku, known for its easy-to-use media players.

Earlier this year Roku was harshly confronted with this new piracy crackdown when a Mexican court ordered local retailers to take its media player off the shelves. While this legal battle isn’t over yet, it was clear to Roku that misuse of its platform wasn’t without consequences.

While Roku never permitted any infringing content, it appears that the company has recently made some adjustments to better deal with the problem, or at least clarify its stance.

Pirate content generally doesn’t show up in the official Roku Channel Store but is directly loaded onto the device through third-party “private” channels. A few weeks ago, Roku renamed these “private” channels to “non-certified” channels, while making it very clear that copyright infringement is not allowed.

A “WARNING!” message that pops up during the installation of these third-party channels stresses that Roku has no control over the content. In addition, the company notes that these channels may be removed if it links to copyright infringing content.

Roku Warning

“By continuing, you acknowledge you are accessing a non-certified channel that may include content that is offensive or inappropriate for some audiences,” Roku’s warning reads.

“Moreover, if Roku determines that this channel violates copyright, contains illegal content, or otherwise violates Roku’s terms and conditions, then ROKU MAY REMOVE THIS CHANNEL WITHOUT PRIOR NOTICE.”

TorrentFreak reached out to Roku to find out how they plan to enforce this policy, but we have yet to hear back. According to Cord Cutters News, several piracy channels have already been removed recently, with other developers opting to leave the platform.

Roku’s General Counsel Steve Kay previously informed us that the company is taking the piracy problem seriously. Together with various stakeholders, they are working hard to address the problem.

“We actively work to prevent third-parties from using our platform to distribute copyright infringing content. Moreover, we have been actively working with other industry stakeholders on a wide range of anti-piracy initiatives,” Kay said.

Roku is not the only platform dealing with the piracy epidemic, the popular media player software Kodi is in the same boat. Kodi has also taken an active anti-piracy stance but they’re not banning any add-ons. They believe it would be pointless due to the open source nature of their software.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

timeShift(GrafanaBuzz, 1w) Issue 5

Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2017/07/21/timeshiftgrafanabuzz-1w-issue-5/

We cover a lot of ground in this week’s timeShift. From diving into building your own plugin, finding the right dashboard, configuration options in the alerting feature, to monitoring your local weather, there’s something for everyone. Are you writing an article about Grafana, or have you come across an article you found interesting? Please get in touch, we’ll add it to our roundup.


From the Blogosphere

  • Going open-source in monitoring, part III: 10 most useful Grafana dashboards to monitor Kubernetes and services: We have hundreds of pre-made dashboards ready for you to install into your on-prem or hosted Grafana, but not every one will fit your specific monitoring needs. In part three of the series, Sergey discusses is experiences with finding useful dashboards and shows off ten of the best dashboards you can install for monitoring Kubernetes clusters and the services deployed on them.

  • Using AWS Lambda and API gateway for server-less Grafana adapters: Sometimes you’ll want to visualize metrics from a data source that may not yet be supported in Grafana natively. With the plugin functionality introduced in Grafana 3.0, anyone can create their own data sources. Using the SimpleJson data source, Jonas describes how he used AWS Lambda and AWS API gateway to write data source adapters for Grafana.

  • How to Use Grafana to Monitor JMeter Non-GUI Results – Part 2: A few issues ago we listed an article for using Grafana to monitor JMeter Non-GUI results, which required a number of non-trivial steps to complete. This article shows of an easier way to accomplish this that doesn’t require any additional configuration of InfluxDB.

  • Programming your Personal Weather Chart: It’s always great to see Grafana used outside of the typical dev-ops usecase. This article runs you through the steps to create your own weather chart and show off your local weather stats in Grafana. BONUS: Rob shows off a magic mirror he created, which can display this data.

  • vSphere Performance data – Part 6 – The Dashboard(s): This 6-part series goes into a ton of detail and walks you through the various methods of retrieving vSphere performance data, storing the data in a TSDB, and creating dashboards for the metrics. Part 6 deals specifically with Grafana, but I highly recommend reading all of the articles, as it chronicles the journey of metrics exploration, storage, and visualization from someone who had no prior experience with time series data.

  • Alerting in Grafana: Alerting in Grafana is a fairly new feature and one that we’re continuing to iterate on. We’re soon adding additional data source support, new notification channels, clustering, silencing rules, and more. This article steps you through all the configuration options to get you to your first alert.


Plugins and Dashboards

It can seem like work slows during July and August, but we’re still seeing a lot of activity in the community. This week we have a new graph panel to show off that gives you some unique looking dashboards, and an update to the Zabbix data source, which adds some really great features. You can install both of the plugins now on your on-prem Grafana via our cli, or with one-click on GrafanaCloud.

NEW PLUGIN

Bubble Chart Panel This super-cool looking panel groups your tag values into clusters of circles. The size of the circle represents the aggregated value of the time series data. There are also multiple color schemes to make those bubbles POP (pun intended)! Currently it works against OpenTSDB and Bosun, so give it a try!

Install Now

UPDATED PLUGIN

Zabbix Alex has been hard at work, making improvements on the Zabbix App for Grafana. This update adds annotations, template variables, alerting and more. Thanks Alex! If you’d like to try out the app, head over to http://play.grafana-zabbix.org/dashboard/db/zabbix-db-mysql?orgId=2

Install 3.5.1 Now


This week’s MVC (Most Valuable Contributor)

Open source software can’t thrive without the contributions from the community. Each week we’ll recognize a Grafana contributor and thank them for all of their PRs, bug reports and feedback.

mk-dhia (Dhia)
Thank you so much for your improvements to the Elasticsearch data source!


Tweet of the Week

We scour Twitter each week to find an interesting/beautiful dashboard and show it off! #monitoringLove

This week’s tweet comes from @geek_dave

Great looking dashboard Dave! And thank you for adding new features and keeping it updated. It’s creators like you who make the dashboard repository so awesome!


Upcoming Events

We love when people talk about Grafana at meetups and conferences.

Monday, July 24, 2017 – 7:30pm | Google Campus Warsaw


Ząbkowska 27/31, Warsaw, Poland

Iot & HOME AUTOMATION #3 openHAB, InfluxDB, Grafana:
If you are interested in topics of the internet of things and home automation, this might be a good occasion to meet people similar to you. If you are into it, we will also show you how we can all work together on our common projects.

RSVP


Tell us how we’re Doing.

We’d love your feedback on what kind of content you like, length, format, etc – so please keep the comments coming! You can submit a comment on this article below, or post something at our community forum. Help us make this better.

Follow us on Twitter, like us on Facebook, and join the Grafana Labs community.

Burner laptops for DEF CON

Post Syndicated from Robert Graham original http://blog.erratasec.com/2017/07/burner-laptops-for-def-con.html

Hacker summer camp (Defcon, Blackhat, BSidesLV) is upon us, so I thought I’d write up some quick notes about bringing a “burner” laptop. Chrome is your best choice in terms of security, but I need Windows/Linux tools, so I got a Windows laptop.

I chose the Asus e200ha for $199 from Amazon with free (and fast) shipping. There are similar notebooks with roughly the same hardware and price from other manufacturers (HP, Dell, etc.), so I’m not sure how this compares against those other ones. However, it fits my needs as a “burner” laptop, namely:

  • cheap
  • lasts 10 hours easily on battery
  • weighs 2.2 pounds (1 kilogram)
  • 11.6 inch and thin

Some other specs are:

  • 4 gigs of RAM
  • 32 gigs of eMMC flash memory
  • quad core 1.44 GHz Intel Atom CPU
  • Windows 10
  • free Microsoft Office 365 for one year
  • good, large keyboard
  • good, large touchpad
  • USB 3.0
  • microSD
  • WiFi ac
  • no fans, completely silent

There are compromises, of course.

  • The Atom CPU is slow, thought it’s only noticeable when churning through heavy webpages. Adblocking addons or Brave are a necessity. Most things are usably fast, such as using Microsoft Word.
  • Crappy sound and video, though VLC does a fine job playing movies with headphones on the airplane. Using in bright sunlight will be difficult.
  • micro-HDMI, keep in mind if intending to do presos from it, you’ll need an HDMI adapter
  • It has limited storage, 32gigs in theory, about half that usable.
  • Does special Windows 10 compressed install that you can’t actually upgrade without a completely new install. It doesn’t have the latest Windows 10 Creators update. I lost a gig thinking I could compress system files.

Copying files across the 802.11ac WiFi to the disk was quite fast, several hundred megabits-per-second. The eMMC isn’t as fast as an SSD, but its a lot faster than typical SD card speeds.

The first thing I did once I got the notebook was to install the free VeraCrypt full disk encryption. The CPU has AES acceleration, so it’s fast. There is a problem with the keyboard driver during boot that makes it really hard to enter long passwords — you have to carefully type one key at a time to prevent extra keystrokes from being entered.

You can’t really install Linux on this computer, but you can use virtual machines. I installed VirtualBox and downloaded the Kali VM. I had some problems attaching USB devices to the VM. First of all, VirtualBox requires a separate downloaded extension to get USB working. Second, it conflicts with USBpcap that I installed for Wireshark.

It comes with one year of free Office 365. Obviously, Microsoft is hoping to hook the user into a longer term commitment, but in practice next year at this time I’d get another burner $200 laptop rather than spend $99 on extending the Office 365 license.

Let’s talk about the CPU. It’s Intel’s “Atom” processor, not their mainstream (Core i3 etc.) processor. Even though it has roughly the same GHz as the processor in a 11inch MacBook Air and twice the cores, it’s noticeably and painfully slower. This is especially noticeable on ad-heavy web pages, while other things seem to work just fine. It has hardware acceleration for most video formats, though I had trouble getting Netflix to work.

The tradeoff for a slow CPU is phenomenal battery life. It seems to last forever on battery. It’s really pretty cool.

Conclusion

A Chromebook is likely more secure, but for my needs, this $200 is perfect.

New – API & CloudFormation Support for Amazon CloudWatch Dashboards

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-api-cloudformation-support-for-amazon-cloudwatch-dashboards/

We launched CloudWatch Dashboards a couple of years ago. In the post that I wrote for the launch, I showed you how to interactively create a dashboard that displayed chosen CloudWatch metrics in graphical form. After the launch, we added additional features including a full screen mode, a dark theme, control over the range of the Y axis, simplified renaming, persistent storage, and new visualization options.

New API & CLI
While console support is wonderful for interactive use, many customers have asked us to support programmatic creation and manipulation of dashboards and the widgets within. They would like to dynamically build and maintain dashboards, adding and removing widgets as the corresponding AWS resources are created and destroyed. Other customers are interested in setting up and maintaining a consistent set of dashboards across two or more AWS accounts.

I am happy to announce that API, CLI, and AWS CloudFormation support for CloudWatch Dashboards is available now and that you can start using it today!

There are four new API functions (and equivalent CLI commands):

ListDashboards / aws cloudwatch list-dashboards – Fetch a list of all dashboards within an account, or a subset that share a common prefix.

GetDashboard / aws cloudwatch get-dashboard – Fetch details for a single dashboard.

PutDashboard / aws cloudwatch put-dashboard – Create a new dashboard or update an existing one.

DeleteDashboards / aws cloudwatch delete-dashboards – Delete one or more dashboards.

Dashboard Concepts
I want to show you how to use these functions and commands. Before I dive in, I should review a couple of important dashboard concepts and attributes.

Global – Dashboards are part of an AWS account, and are not associated with a specific AWS Region. Each account can have up to 500 dashboards.

Named – Each dashboard has a name that is unique within the AWS account. Names can be up to 255 characters long.

Grid Model – Each dashboard is composed of a grid of cells. The grid is 24 cells across and as tall as necessary. Each widget on the dashboard is positioned at a particular set of grid coordinates, and has a size that spans an integral number of grid cells.

Widgets (Visualizations) – Each widget can display text or a set of CloudWatch metrics. Text is specified using Markdown; metrics can be displayed as single values, line charts, or stacked area charts. Each dashboard can have up to 100 widgets. Widgets that display metrics can also be associated with a CloudWatch Alarm.

Dashboards have a JSON representation that you can now see and edit from within the console. Simply click on the Action menu and choose View/edit source:

Here’s the source for my dashboard:

You can use this JSON as a starting point for your own applications. As you can see, there’s an entry in the widgets array for each widget on the dashboard; each entry describes one widget, starting with its type, position, and size.

Creating a Dashboard Using the API
Let’s say I want to create a dashboard that has a widget for each of my EC2 instances in a particular region. I’ll use Python and the AWS SDK for Python, and start as follows (excuse the amateur nature of my code):

import boto3
import json

cw  = boto3.client("cloudwatch")
ec2 = boto3.client("ec2")

x, y          = [0, 0]
width, height = [3, 3]
max_width     = 12
widgets       = []

Then I simply iterate over the instances, creating a widget dictionary for each one, and appending it to the widgets array:

instances = ec2.describe_instances()
for r in instances['Reservations']:
    for i in r['Instances']:

        widget = {'type'      : 'metric',
                  'x'         : x,
                  'y'         : y,
                  'height'    : height,
                  'width'     : width,
                  'properties': {'view'    : 'timeSeries',
                                 'stacked' : False,
                                 'metrics' : [['AWS/EC2', 'NetworkIn', 'InstanceId', i['InstanceId']],
                                              ['.',       'NetworkOut', '.',         '.']
                                             ],
                                 'period'  : 300,
                                 'stat'    : 'Average',
                                 'region'  : 'us-east-1',
                                 'title'   : i['InstanceId']
                                }
                 }

        widgets.append(widget)

I update the position (x and y) within the loop, and form a grid (if I don’t specify positions, the widgets will be laid out left to right, top to bottom):

        x += width
        if (x + width > max_width):
            x = 0
            y += height

After I have processed all of the instances, I create a JSON version of the widget array:

body   = {'widgets' : widgets}
body_j = json.dumps(body)

And I create or update my dashboard:

cw.put_dashboard(DashboardName = "EC2_Networking",
                 DashboardBody = body_j)

I run the code, and get the following dashboard:

The CloudWatch team recommends that dashboards created programmatically include a text widget indicating that the dashboard was generated automatically, along with a link to the source code or CloudFormation template that did the work. This will discourage users from making manual, out-of-band changers to the dashboards.

As I mentioned earlier, each metric widget can also be associated with a CloudWatch Alarm. You can create the alarms programmatically or by using a CloudFormation template such as the Sample CPU Utilization Alarm. If you decide to do this, the alarm threshold will be displayed in the widget. To learn more about this, read Tara Walker’s recent post, Amazon CloudWatch Launches Alarms on Dashboards.

Going one step further, I could use CloudWatch Events and a Lamba Function to track the creation and deletion of certain resources and update a dashboard in concert with the changes. To learn how to do this, read Keeping CloudWatch Dashboards up to Date Using AWS Lambda.

Accessing a Dashboard Using the CLI
I can also access and manipulate my dashboards from the command line. For example, I can generate a simple list:

$ aws cloudwatch list-dashboards --output table
----------------------------------------------
|               ListDashboards               |
+--------------------------------------------+
||             DashboardEntries             ||
|+-----------------+----------------+-------+|
||  DashboardName  | LastModified   | Size  ||
|+-----------------+----------------+-------+|
||  Disk-Metrics   |  1496405221.0  |  316  ||
||  EC2_Networking |  1498090434.0  |  2830 ||
||  Main-Metrics   |  1498085173.0  |  234  ||
|+-----------------+----------------+-------+|

And I can get rid of the Disk-Metrics dashboard:

$ aws cloudwatch delete-dashboards --dashboard-names Disk-Metrics

I can also retrieve the JSON that defines a dashboard:

Creating a Dashboard Using CloudFormation
Dashboards can also be specified in CloudFormation templates. Here’s a simple template in YAML (the DashboardBody is still specified in JSON):

Resources:
  MyDashboard:
    Type: "AWS::CloudWatch::Dashboard"
    Properties:
      DashboardName: SampleDashboard
      DashboardBody: '{"widgets":[{"type":"text","x":0,"y":0,"width":6,"height":6,"properties":{"markdown":"Hi there from CloudFormation"}}]}'

I place the template in a file and then create a stack using the console or the CLI:

$ aws cloudformation create-stack --stack-name MyDashboard --template-body file://dash.yaml
{
    "StackId": "arn:aws:cloudformation:us-east-1:xxxxxxxxxxxx:stack/MyDashboard/a2a3fb20-5708-11e7-8ffd-500c21311262"
}

Here’s the dashboard:

Available Now
This feature is available now and you can start using it today. You can create 3 dashboards with up to 50 metrics per dashboard at no charge; additional dashboards are priced at $3 per month, as listed on the CloudWatch Pricing page. You can make up to 1 million calls to the new API functions each month at no charge; beyond that you pay $.01 for every 1,000 calls.

Jeff;

Monitoring HiveMQ with InfluxDB and Grafana

Post Syndicated from The HiveMQ Team original http://www.hivemq.com/blog/monitoring-hivemq-influxdb-grafana

hivemq_monitoring-influx

You need to monitor your system

System monitoring is an essential part of any production software deployment. Some people believe it to be as critical as security and it should be given the same attention. Historical challenges to effective monitoring are a lack of cohesive tools and the wrong mindset. These can lead to a false sense of security, which it is important to not fall victim of. At the end of this blog post we will provide you with a standardized dashboard, including metrics we believe to be useful for live monitoring MQTT brokers. This does in no way mean that these are all the metrics you need to monitor or that we could possibly know what’s crucial to your use case and deployment.

In order to provide you with the opportunity of implementing cohesive monitoring tools, the HiveMQ core distribution comes with the JVM Metrics Plugin and the JMX Plugin. The JVM Plugin will add crucial JVM metrics to the already existing available HiveMQ metrics and the JMX Plugin will enable JMX monitoring for any JMX monitoring tool like JConsole.

Real-time monitoring with the use of tools like JConsole is certainly better than nothing but has its own disadvantages. HiveMQ is often deployed in a container environment and therefore direct access to the HiveMQ process might not be possible. Despite that, using a time series monitoring solution also provides the added benefit of functioning as a great debugging tool, when trying to find the root cause of a system crash or similar.

The AWS Cloudwatch Plugin, Graphite Plugin and InfluxDB Plugin are free of charge and ready to use plugins provided by HiveMQ to enable time series monitoring.

Our recommendation

We routinely get asked about recommendations for monitoring tools. At the end of the day this is down to preference and ultimately your decision. In the past we have had good experiences with the combination of Telegraf, InfluxDB and a Grafana dashboard.

Telegraf can be used for gathering system metrics and writing them to the InfluxDB. HiveMQ is able to write its own metrics to the InfluxDB as well and a Grafana dashboard is a good solution for visualizing these gathered metrics.

Example Dashboard

Example Dashboard

Please note that there are countless other viable monitoring options available.

Installation and configuration

The first step to achieving our desired monitoring setup is installing and starting InfluxDB. InfluxDB works out of the box without adding additional configuration.
When InfluxDB is installed and running, use the command line tool to create a database called ‘hivemq’.

$ influx
Connected to http://localhost:8086 version 1.3.0
InfluxDB shell version: v1.2.3
> CREATE DATABASE hivemq

Attention: InfluxDB does not provide authentication by default, which could open your metrics up to a third party when running the InfluxDB on an external server. Make sure you cover this potential security issue.

InfluxDB data will grow rapidly. This can and will lead to the use of large amounts of disc space after running your InfluxDB for some time. To deal with this challenge InfluxDB offers the possibility to create so called retention policies. In our opinion it is sufficient to retain your InfluxDB data for two weeks. The syntax for creating this retention policy looks like this:

$ influx
Connected to http://localhost:8086 version 1.3.0
InfluxDB shell version: v1.2.3
> CREATE RETENTION POLICY "two_weeks_only" ON "hivemq" DURATION 2w REPLICATION 1

Which, if any, retention policy is best for your individual use case has to be decided by you.

The second step is downloading the InfluxDB HiveMQ Plugin. For this demonstration all the services will be running locally, so we can use the influxdb.properties file that is included in the HiveMQ Plugin without any adjustments. Bear in mind that you need to change the IP address, when running an external InfluxDB.

When running HiveMQ in a cluster it is important you use the exact same influxdb.properties on each node with the exception of this property:

tags:host=hivemq1

This property should be set individually for each HiveMQ node in the cluster for better transparency.

This plugin will now gather all the available HiveMQ metrics (given the JMX Plugin is also running) and write them to the configured InfluxDB.

The third step is installing Telegraf on each HiveMQ cluster node.

Now a telegraf.conf needs to be configured, telling Telegraf which metrics it should gather and eventually write to an InfluxDB. The default telegraf.conf is very inflated and full of comments and options, that are not needed for HiveMQ monitoring. The config we propose looks like this:

[tags]
node = "example-node"

[agent]
interval = "5s"

# OUTPUTS
[outputs]
[outputs.influxdb]
url = "http://localhost:8086"
database = "hivemq" # required.
precision = "s"

# PLUGINS
[cpu]
percpu = true
totalcpu = true

[system]

[disk]

[mem]

[diskio]

[net]

[kernel]

[processes]

This configuration provides metrics for

  • CPU: CPU Usage divided into spaces
  • System: Load, Uptime
  • Disk: Disk Usage, Free Space, INodes used
  • DiskIO: IO Time, Operations
  • Memory: RAM Used, Buffered, Cached
  • Kernel: Linux specific information like context switching
  • Processes: Systemwide process information

Note that some modules like Kernel may not be available on non-Linux systems.

Make sure to change the url, when not using a local InfluxDB.

This configuration will gather the CPU’s percentage and total usage every five seconds. See this page for other possible configurations of the system input.

At this point the terminal window, you are running Influxd in, should be showing something like this:

[httpd] ::1 - - [20/Jun/2017:13:36:46 +0200] "POST /write?db=hivemq HTTP/1.1" 204 0 "-" "-" bdad5fd9-55ac-11e7-8550-000000000000 9743

Showing a successful write of the Telegraf metrics to the InfluxDb.

The next step is installing and starting Grafana.

Grafana works out of the box and can be reached via localhost:3000.

The next step is configuring our InfluxDB as the Grafana’s data source.

Step 1: Add Data Source

Step 1: Add Data Source

Step 2: Configure InfluxDB

Step 2: Configure InfluxDB

Now we need a dashboard. As this question comes up quite often, we decided to provide a dashboard template, that displays some useful metrics for most MQTT deployments and should give you a good starting point for building your own individual dashboard tailored to your use case at hand. You can download the template here
The JSON file inside the zip can be imported to Grafana.

Step 3: Import Dashboard

Step 3: Import Dashboard

That’s it. We now have a working dashboard displaying metrics, who’s monitoring has proven vital in many MQTT deployments.

Disclaimer: This is one possibility and a good starting point we like to give you for monitoring your MQTT use case. Logically the requirements for your individual case may vary. We suggest reading the getting started guide from Grafana and to find what works best for you and your deployment.