Tag Archives: open source

Backing Up Linux to Backblaze B2 with Duplicity and Restic

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/backing-linux-backblaze-b2-duplicity-restic/

Linux users have a variety of options for handling data backup. The choices range from free and open-source programs to paid commercial tools, and include applications that are purely command-line based (CLI) and others that have a graphical interface (GUI), or both.

If you take a look at our Backblaze B2 Cloud Storage Integrations page, you will see a number of offerings that enable you to back up your Linux desktops and servers to Backblaze B2. These include CloudBerry, Duplicity, Duplicacy, 45 Drives, GoodSync, HashBackup, QNAP, Restic, and Rclone, plus other choices for NAS and hybrid uses.

In this post, we’ll discuss two popular command line and open-source programs: one older, Duplicity, and a new player, Restic.

Old School vs. New School

We’re highlighting Duplicity and Restic today because they exemplify two different philosophical approaches to data backup: “Old School” (Duplicity) vs “New School” (Restic).

Old School (Duplicity)

In the old school model, data is written sequentially to the storage medium. Once a section of data is recorded, new data is written starting where that section of data ends. It’s not possible to go back and change the data that’s already been written.

This old-school model has long been associated with the use of magnetic tape, a prime example of which is the LTO (Linear Tape-Open) standard. In this “write once” model, files are always appended to the end of the tape. If a file is modified and overwritten or removed from the volume, the associated tape blocks used are not freed up: they are simply marked as unavailable, and the used volume capacity is not recovered. Data is deleted and capacity recovered only if the whole tape is reformatted. As a Linux/Unix user, you undoubtedly are familiar with the TAR archive format, which is an acronym for Tape ARchive. TAR has been around since 1979 and was originally developed to write data to sequential I/O devices with no file system of their own.

It is from the use of tape that we get the full backup/incremental backup approach to backups. A backup sequence beings with a full backup of data. Each incremental backup contains what’s been changed since the last full backup until the next full backup is made and the process starts over, filling more and more tape or whatever medium is being used.

This is the model used by Duplicity: full and incremental backups. Duplicity backs up files by producing encrypted, digitally signed, versioned, TAR-format volumes and uploading them to a remote location, including Backblaze B2 Cloud Storage. Released under the terms of the GNU General Public License (GPL), Duplicity is free software.

With Duplicity, the first archive is a complete (full) backup, and subsequent (incremental) backups only add differences from the latest full or incremental backup. Chains consisting of a full backup and a series of incremental backups can be recovered to the point in time that any of the incremental steps were taken. If any of the incremental backups are missing, then reconstructing a complete and current backup is much more difficult and sometimes impossible.

Duplicity is available under many Unix-like operating systems (such as Linux, BSD, and Mac OS X) and ships with many popular Linux distributions including Ubuntu, Debian, and Fedora. It also can be used with Windows under Cygwin.

We recently published a KB article on How to configure Backblaze B2 with Duplicity on Linux that demonstrates how to set up Duplicity with B2 and back up and restore a directory from Linux.

New School (Restic)

With the arrival of non-sequential storage medium, such as disk drives, and new ideas such as deduplication, comes the new school approach, which is used by Restic. Data can be written and changed anywhere on the storage medium. This efficiency comes largely through the use of deduplication. Deduplication is a process that eliminates redundant copies of data and reduces storage overhead. Data deduplication techniques ensure that only one unique instance of data is retained on storage media, greatly increasing storage efficiency and flexibility.

Restic is a recently available multi-platform command line backup software program that is designed to be fast, efficient, and secure. Restic supports a variety of backends for storing backups, including a local server, SFTP server, HTTP Rest server, and a number of cloud storage providers, including Backblaze B2.

Files are uploaded to a B2 bucket as deduplicated, encrypted chunks. Each time a backup runs, only changed data is backed up. On each backup run, a snapshot is created enabling restores to a specific date or time.

Restic assumes that the storage location for repository is shared, so it always encrypts the backed up data. This is in addition to any encryption and security from the storage provider.

Restic is open source and free software and licensed under the BSD 2-Clause License and actively developed on GitHub.

There’s a lot more you can do with Restic, including adding tags, mounting a repository locally, and scripting. To learn more, you can review the documentation at https://restic.readthedocs.io.

Coincidentally with this blog post, we published a KB article, How to configure Backblaze B2 with Restic on Linux, in which we show how to set up Restic for use with B2 and how to back up and restore a home directory from Linux to B2.

Which is Right for You?

While Duplicity is a popular, widely-available, and useful program, many users of cloud storage solutions such as B2 are moving to new-school solutions like Restic that take better advantage of the non-sequential access capabilities and speed of modern storage media used by cloud storage providers.

Tell us how you’re backing up Linux

Please let us know in the comments what you’re using for Linux backups, and if you have experience using Duplicity, Restic, or other backup software with Backblaze B2.

The post Backing Up Linux to Backblaze B2 with Duplicity and Restic appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

N O D E’s Handheld Linux Terminal

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/n-o-d-es-handheld-linux-terminal/

Fit an entire Raspberry Pi-based laptop into your pocket with N O D E’s latest Handheld Linux Terminal build.

The Handheld Linux Terminal Version 3 (Portable Pi 3)

Hey everyone. Today I want to show you the new version 3 of the Handheld Linux Terminal. It’s taken a long time, but I’m finally finished. This one takes all the things I’ve learned so far, and improves on many of the features from the previous iterations.

N O D E

With interests in modding tech, exploring the boundaries of the digital world, and open source, YouTuber N O D E has become one to watch within the digital maker world. He maintains a channel focused on “the transformative power of technology.”

“Understanding that electronics isn’t voodoo is really powerful”, he explains in his Patreon video. “And learning how to build your own stuff opens up so many possibilities.”

NODE Youtube channel logo - Handheld Linux Terminal v3

The topics of his videos range from stripped-down devices, upgraded tech, and security upgrades, to the philosophy behind technology. He also provides weekly roundups of, and discussions about, new releases.

Essentially, if you like technology, you’ll like N O D E.

Handheld Linux Terminal v3

Subscribers to N O D E’s YouTube channel, of whom there are currently over 44000, will have seen him documenting variations of this handheld build throughout the last year. By stripping down a Raspberry Pi 3, and incorporating a Zero W, he’s been able to create interesting projects while always putting functionality first.

Handheld Linux Terminal v3

With the third version of his terminal, N O D E has taken experiences gained from previous builds to create something of which he’s obviously extremely proud. And so he should be. The v3 handheld is impressively small considering he managed to incorporate a fully functional keyboard with mouse, a 3.5″ screen, and a fan within the 3D-printed body.

Handheld Linux Terminal v3

“The software side of things is where it really shines though, and the Pi 3 is more than capable of performing most non-intensive tasks,” N O D E goes on to explain. He demonstrates various applications running on Raspbian, plus other operating systems he has pre-loaded onto additional SD cards:

“I have also installed Exagear Desktop, which allows it to run x86 apps too, and this works great. I have x86 apps such as Sublime Text and Spotify running without any problems, and it’s technically possible to use Wine to also run Windows apps on the device.”

We think this is an incredibly neat build, and we can’t wait to see where N O D E takes it next!

The post N O D E’s Handheld Linux Terminal appeared first on Raspberry Pi.

Anti-Piracy Group Joins Internet Organization That Controls Top-Level Domain

Post Syndicated from Andy original https://torrentfreak.com/anti-piracy-group-joins-internet-organization-that-controls-top-level-domain-171019/

All around the world, content creators and rightsholders continue to protest against the unauthorized online distribution of copyrighted content.

While pirating end-users obviously share some of the burden, the main emphasis has traditionally been placed on the shuttering of illicit sites, whether torrent, streaming, or hosting based.

Over time, however, sites have become more prevalent and increasingly resilient, leaving the music, movie and publishing industries to play a frustrating game of whac-a-mole. With this in mind, their focus has increasingly shifted towards Internet gatekeepers, including ISPs and bodies with influence over domain availability.

While most of these efforts take place via cooperation or legal action, there’s regularly conflict when Hollywood, for example, wants a particular domain rendered inaccessible or the music industry wants pirates kicked off the Internet.

As a result, there’s nearly always a disconnect, with copyright holders on one side and Internet technology companies worried about mission creep on the other. In Denmark, however, those lines have just been blurred in the most intriguing way possible after an infamous anti-piracy outfit joined an organization with significant control over the Internet in the country.

RettighedsAlliancen (or Rights Alliance as it’s more commonly known) is an anti-piracy group which counts some of the most powerful local and international movie companies among its members. It also operates on behalf of IFPI and by extension, most of the world’s major recording labels.

The group has been involved in dozens of legal processes over the years against file-sharers and file-sharing sites, most recently fighting for and winning ISP blockades against most major pirate portals including The Pirate Bay, RARBG, Torrentz, and many more.

In a somewhat surprising new announcement, the group has revealed it’s become the latest member of DIFO, the Danish Internet Forum (DIFO) which “works for a secure and accessible Internet” under the top-level .DK domain. Indeed, DIFO has overall responsibility for Danish internet infrastructure.

“For DIFO it is important to have a strong link to the Danish internet community. Therefore, we are very pleased that the Alliance wishes to be part of the association,” DIFO said in a statement.

Rights Alliance will be DIFO’s third new member this year but uniquely it will get the opportunity to represent the interests of more than 100,000 Danish and international rightholders from inside an influential Internet-focused organization.

Looking at DIFO’s membership, Rights Alliance certainly stands out as unusual. The majority of the members are made up of IT-based organizations, such as the Internet Industry Association, The Association of Open Source Suppliers and DKRegistrar, the industry association for Danish domain registrars.

A meeting around a table with these players and their often conflicting interests is likely to be an experience for all involved. However, all parties seem more than happy with the new partnership.

“We want to help create a more secure internet for companies that invest in doing business online, and for users to be safe, so combating digital crime is a key and shared goal,” says Rights Alliance chief, Maria Fredenslund. “I am therefore looking forward to the future cooperation with DIFO.”

Only time will tell how this partnership will play out but if common ground can be found, it’s certainly possible that the anti-piracy scene in Denmark could step up a couple of gears in the future.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Predict Billboard Top 10 Hits Using RStudio, H2O and Amazon Athena

Post Syndicated from Gopal Wunnava original https://aws.amazon.com/blogs/big-data/predict-billboard-top-10-hits-using-rstudio-h2o-and-amazon-athena/

Success in the popular music industry is typically measured in terms of the number of Top 10 hits artists have to their credit. The music industry is a highly competitive multi-billion dollar business, and record labels incur various costs in exchange for a percentage of the profits from sales and concert tickets.

Predicting the success of an artist’s release in the popular music industry can be difficult. One release may be extremely popular, resulting in widespread play on TV, radio and social media, while another single may turn out quite unpopular, and therefore unprofitable. Record labels need to be selective in their decision making, and predictive analytics can help them with decision making around the type of songs and artists they need to promote.

In this walkthrough, you leverage H2O.ai, Amazon Athena, and RStudio to make predictions on whether a song might make it to the Top 10 Billboard charts. You explore the GLM, GBM, and deep learning modeling techniques using H2O’s rapid, distributed and easy-to-use open source parallel processing engine. RStudio is a popular IDE, licensed either commercially or under AGPLv3, for working with R. This is ideal if you don’t want to connect to a server via SSH and use code editors such as vi to do analytics. RStudio is available in a desktop version, or a server version that allows you to access R via a web browser. RStudio’s Notebooks feature is used to demonstrate the execution of code and output. In addition, this post showcases how you can leverage Athena for query and interactive analysis during the modeling phase. A working knowledge of statistics and machine learning would be helpful to interpret the analysis being performed in this post.

Walkthrough

Your goal is to predict whether a song will make it to the Top 10 Billboard charts. For this purpose, you will be using multiple modeling techniques―namely GLM, GBM and deep learning―and choose the model that is the best fit.

This solution involves the following steps:

  • Install and configure RStudio with Athena
  • Log in to RStudio
  • Install R packages
  • Connect to Athena
  • Create a dataset
  • Create models

Install and configure RStudio with Athena

Use the following AWS CloudFormation stack to install, configure, and connect RStudio on an Amazon EC2 instance with Athena.

Launching this stack creates all required resources and prerequisites:

  • Amazon EC2 instance with Amazon Linux (minimum size of t2.large is recommended)
  • Provisioning of the EC2 instance in an existing VPC and public subnet
  • Installation of Java 8
  • Assignment of an IAM role to the EC2 instance with the required permissions for accessing Athena and Amazon S3
  • Security group allowing access to the RStudio and SSH ports from the internet (I recommend restricting access to these ports)
  • S3 staging bucket required for Athena (referenced within RStudio as ATHENABUCKET)
  • RStudio username and password
  • Setup logs in Amazon CloudWatch Logs (if needed for additional troubleshooting)
  • Amazon EC2 Systems Manager agent, which makes it easy to manage and patch

All AWS resources are created in the US-East-1 Region. To avoid cross-region data transfer fees, launch the CloudFormation stack in the same region. To check the availability of Athena in other regions, see Region Table.

Log in to RStudio

The instance security group has been automatically configured to allow incoming connections on the RStudio port 8787 from any source internet address. You can edit the security group to restrict source IP access. If you have trouble connecting, ensure that port 8787 isn’t blocked by subnet network ACLS or by your outgoing proxy/firewall.

  1. In the CloudFormation stack, choose Outputs, Value, and then open the RStudio URL. You might need to wait for a few minutes until the instance has been launched.
  2. Log in to RStudio with the and password you provided during setup.

Install R packages

Next, install the required R packages from the RStudio console. You can download the R notebook file containing just the code.

#install pacman – a handy package manager for managing installs
if("pacman" %in% rownames(installed.packages()) == FALSE)
{install.packages("pacman")}  
library(pacman)
p_load(h2o,rJava,RJDBC,awsjavasdk)
h2o.init(nthreads = -1)
##  Connection successful!
## 
## R is connected to the H2O cluster: 
##     H2O cluster uptime:         2 hours 42 minutes 
##     H2O cluster version:        3.10.4.6 
##     H2O cluster version age:    4 months and 4 days !!! 
##     H2O cluster name:           H2O_started_from_R_rstudio_hjx881 
##     H2O cluster total nodes:    1 
##     H2O cluster total memory:   3.30 GB 
##     H2O cluster total cores:    4 
##     H2O cluster allowed cores:  4 
##     H2O cluster healthy:        TRUE 
##     H2O Connection ip:          localhost 
##     H2O Connection port:        54321 
##     H2O Connection proxy:       NA 
##     H2O Internal Security:      FALSE 
##     R Version:                  R version 3.3.3 (2017-03-06)
## Warning in h2o.clusterInfo(): 
## Your H2O cluster version is too old (4 months and 4 days)!
## Please download and install the latest version from http://h2o.ai/download/
#install aws sdk if not present (pre-requisite for using Athena with an IAM role)
if (!aws_sdk_present()) {
  install_aws_sdk()
}

load_sdk()
## NULL

Connect to Athena

Next, establish a connection to Athena from RStudio, using an IAM role associated with your EC2 instance. Use ATHENABUCKET to specify the S3 staging directory.

URL <- 'https://s3.amazonaws.com/athena-downloads/drivers/AthenaJDBC41-1.0.1.jar'
fil <- basename(URL)
#download the file into current working directory
if (!file.exists(fil)) download.file(URL, fil)
#verify that the file has been downloaded successfully
list.files()
## [1] "AthenaJDBC41-1.0.1.jar"
drv <- JDBC(driverClass="com.amazonaws.athena.jdbc.AthenaDriver", fil, identifier.quote="'")

con <- jdbcConnection <- dbConnect(drv, 'jdbc:awsathena://athena.us-east-1.amazonaws.com:443/',
                                   s3_staging_dir=Sys.getenv("ATHENABUCKET"),
                                   aws_credentials_provider_class="com.amazonaws.auth.DefaultAWSCredentialsProviderChain")

Verify the connection. The results returned depend on your specific Athena setup.

con
## <JDBCConnection>
dbListTables(con)
##  [1] "gdelt"               "wikistats"           "elb_logs_raw_native"
##  [4] "twitter"             "twitter2"            "usermovieratings"   
##  [7] "eventcodes"          "events"              "billboard"          
## [10] "billboardtop10"      "elb_logs"            "gdelthist"          
## [13] "gdeltmaster"         "twitter"             "twitter3"

Create a dataset

For this analysis, you use a sample dataset combining information from Billboard and Wikipedia with Echo Nest data in the Million Songs Dataset. Upload this dataset into your own S3 bucket. The table below provides a description of the fields used in this dataset.

Field Description
year Year that song was released
songtitle Title of the song
artistname Name of the song artist
songid Unique identifier for the song
artistid Unique identifier for the song artist
timesignature Variable estimating the time signature of the song
timesignature_confidence Confidence in the estimate for the timesignature
loudness Continuous variable indicating the average amplitude of the audio in decibels
tempo Variable indicating the estimated beats per minute of the song
tempo_confidence Confidence in the estimate for tempo
key Variable with twelve levels indicating the estimated key of the song (C, C#, B)
key_confidence Confidence in the estimate for key
energy Variable that represents the overall acoustic energy of the song, using a mix of features such as loudness
pitch Continuous variable that indicates the pitch of the song
timbre_0_min thru timbre_11_min Variables that indicate the minimum values over all segments for each of the twelve values in the timbre vector
timbre_0_max thru timbre_11_max Variables that indicate the maximum values over all segments for each of the twelve values in the timbre vector
top10 Indicator for whether or not the song made it to the Top 10 of the Billboard charts (1 if it was in the top 10, and 0 if not)

Create an Athena table based on the dataset

In the Athena console, select the default database, sampled, or create a new database.

Run the following create table statement.

create external table if not exists billboard
(
year int,
songtitle string,
artistname string,
songID string,
artistID string,
timesignature int,
timesignature_confidence double,
loudness double,
tempo double,
tempo_confidence double,
key int,
key_confidence double,
energy double,
pitch double,
timbre_0_min double,
timbre_0_max double,
timbre_1_min double,
timbre_1_max double,
timbre_2_min double,
timbre_2_max double,
timbre_3_min double,
timbre_3_max double,
timbre_4_min double,
timbre_4_max double,
timbre_5_min double,
timbre_5_max double,
timbre_6_min double,
timbre_6_max double,
timbre_7_min double,
timbre_7_max double,
timbre_8_min double,
timbre_8_max double,
timbre_9_min double,
timbre_9_max double,
timbre_10_min double,
timbre_10_max double,
timbre_11_min double,
timbre_11_max double,
Top10 int
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION 's3://aws-bigdata-blog/artifacts/predict-billboard/data'
;

Inspect the table definition for the ‘billboard’ table that you have created. If you chose a database other than sampledb, replace that value with your choice.

dbGetQuery(con, "show create table sampledb.billboard")
##                                      createtab_stmt
## 1       CREATE EXTERNAL TABLE `sampledb.billboard`(
## 2                                       `year` int,
## 3                               `songtitle` string,
## 4                              `artistname` string,
## 5                                  `songid` string,
## 6                                `artistid` string,
## 7                              `timesignature` int,
## 8                `timesignature_confidence` double,
## 9                                `loudness` double,
## 10                                  `tempo` double,
## 11                       `tempo_confidence` double,
## 12                                       `key` int,
## 13                         `key_confidence` double,
## 14                                 `energy` double,
## 15                                  `pitch` double,
## 16                           `timbre_0_min` double,
## 17                           `timbre_0_max` double,
## 18                           `timbre_1_min` double,
## 19                           `timbre_1_max` double,
## 20                           `timbre_2_min` double,
## 21                           `timbre_2_max` double,
## 22                           `timbre_3_min` double,
## 23                           `timbre_3_max` double,
## 24                           `timbre_4_min` double,
## 25                           `timbre_4_max` double,
## 26                           `timbre_5_min` double,
## 27                           `timbre_5_max` double,
## 28                           `timbre_6_min` double,
## 29                           `timbre_6_max` double,
## 30                           `timbre_7_min` double,
## 31                           `timbre_7_max` double,
## 32                           `timbre_8_min` double,
## 33                           `timbre_8_max` double,
## 34                           `timbre_9_min` double,
## 35                           `timbre_9_max` double,
## 36                          `timbre_10_min` double,
## 37                          `timbre_10_max` double,
## 38                          `timbre_11_min` double,
## 39                          `timbre_11_max` double,
## 40                                     `top10` int)
## 41                             ROW FORMAT DELIMITED 
## 42                         FIELDS TERMINATED BY ',' 
## 43                            STORED AS INPUTFORMAT 
## 44       'org.apache.hadoop.mapred.TextInputFormat' 
## 45                                     OUTPUTFORMAT 
## 46  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
## 47                                        LOCATION
## 48    's3://aws-bigdata-blog/artifacts/predict-billboard/data'
## 49                                  TBLPROPERTIES (
## 50            'transient_lastDdlTime'='1505484133')

Run a sample query

Next, run a sample query to obtain a list of all songs from Janet Jackson that made it to the Billboard Top 10 charts.

dbGetQuery(con, " SELECT songtitle,artistname,top10   FROM sampledb.billboard WHERE lower(artistname) =     'janet jackson' AND top10 = 1")
##                       songtitle    artistname top10
## 1                       Runaway Janet Jackson     1
## 2               Because Of Love Janet Jackson     1
## 3                         Again Janet Jackson     1
## 4                            If Janet Jackson     1
## 5  Love Will Never Do (Without You) Janet Jackson 1
## 6                     Black Cat Janet Jackson     1
## 7               Come Back To Me Janet Jackson     1
## 8                       Alright Janet Jackson     1
## 9                      Escapade Janet Jackson     1
## 10                Rhythm Nation Janet Jackson     1

Determine how many songs in this dataset are specifically from the year 2010.

dbGetQuery(con, " SELECT count(*)   FROM sampledb.billboard WHERE year = 2010")
##   _col0
## 1   373

The sample dataset provides certain song properties of interest that can be analyzed to gauge the impact to the song’s overall popularity. Look at one such property, timesignature, and determine the value that is the most frequent among songs in the database. Timesignature is a measure of the number of beats and the type of note involved.

Running the query directly may result in an error, as shown in the commented lines below. This error is a result of trying to retrieve a large result set over a JDBC connection, which can cause out-of-memory issues at the client level. To address this, reduce the fetch size and run again.

#t<-dbGetQuery(con, " SELECT timesignature FROM sampledb.billboard")
#Note:  Running the preceding query results in the following error: 
#Error in .jcall(rp, "I", "fetch", stride, block): java.sql.SQLException: The requested #fetchSize is more than the allowed value in Athena. Please reduce the fetchSize and try #again. Refer to the Athena documentation for valid fetchSize values.
# Use the dbSendQuery function, reduce the fetch size, and run again
r <- dbSendQuery(con, " SELECT timesignature     FROM sampledb.billboard")
dftimesignature<- fetch(r, n=-1, block=100)
dbClearResult(r)
## [1] TRUE
table(dftimesignature)
## dftimesignature
##    0    1    3    4    5    7 
##   10  143  503 6787  112   19
nrow(dftimesignature)
## [1] 7574

From the results, observe that 6787 songs have a timesignature of 4.

Next, determine the song with the highest tempo.

dbGetQuery(con, " SELECT songtitle,artistname,tempo   FROM sampledb.billboard WHERE tempo = (SELECT max(tempo) FROM sampledb.billboard) ")
##                   songtitle      artistname   tempo
## 1 Wanna Be Startin' Somethin' Michael Jackson 244.307

Create the training dataset

Your model needs to be trained such that it can learn and make accurate predictions. Split the data into training and test datasets, and create the training dataset first.  This dataset contains all observations from the year 2009 and earlier. You may face the same JDBC connection issue pointed out earlier, so this query uses a fetch size.

#BillboardTrain <- dbGetQuery(con, "SELECT * FROM sampledb.billboard WHERE year <= 2009")
#Running the preceding query results in the following error:-
#Error in .verify.JDBC.result(r, "Unable to retrieve JDBC result set for ", : Unable to retrieve #JDBC result set for SELECT * FROM sampledb.billboard WHERE year <= 2009 (Internal error)
#Follow the same approach as before to address this issue.

r <- dbSendQuery(con, "SELECT * FROM sampledb.billboard WHERE year <= 2009")
BillboardTrain <- fetch(r, n=-1, block=100)
dbClearResult(r)
## [1] TRUE
BillboardTrain[1:2,c(1:3,6:10)]
##   year           songtitle artistname timesignature
## 1 2009 The Awkward Goodbye    Athlete             3
## 2 2009        Rubik's Cube    Athlete             3
##   timesignature_confidence loudness   tempo tempo_confidence
## 1                    0.732   -6.320  89.614   0.652
## 2                    0.906   -9.541 117.742   0.542
nrow(BillboardTrain)
## [1] 7201

Create the test dataset

BillboardTest <- dbGetQuery(con, "SELECT * FROM sampledb.billboard where year = 2010")
BillboardTest[1:2,c(1:3,11:15)]
##   year              songtitle        artistname key
## 1 2010 This Is the House That Doubt Built A Day to Remember  11
## 2 2010        Sticks & Bricks A Day to Remember  10
##   key_confidence    energy pitch timbre_0_min
## 1          0.453 0.9666556 0.024        0.002
## 2          0.469 0.9847095 0.025        0.000
nrow(BillboardTest)
## [1] 373

Convert the training and test datasets into H2O dataframes

train.h2o <- as.h2o(BillboardTrain)
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=================================================================| 100%
test.h2o <- as.h2o(BillboardTest)
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=================================================================| 100%

Inspect the column names in your H2O dataframes.

colnames(train.h2o)
##  [1] "year"                     "songtitle"               
##  [3] "artistname"               "songid"                  
##  [5] "artistid"                 "timesignature"           
##  [7] "timesignature_confidence" "loudness"                
##  [9] "tempo"                    "tempo_confidence"        
## [11] "key"                      "key_confidence"          
## [13] "energy"                   "pitch"                   
## [15] "timbre_0_min"             "timbre_0_max"            
## [17] "timbre_1_min"             "timbre_1_max"            
## [19] "timbre_2_min"             "timbre_2_max"            
## [21] "timbre_3_min"             "timbre_3_max"            
## [23] "timbre_4_min"             "timbre_4_max"            
## [25] "timbre_5_min"             "timbre_5_max"            
## [27] "timbre_6_min"             "timbre_6_max"            
## [29] "timbre_7_min"             "timbre_7_max"            
## [31] "timbre_8_min"             "timbre_8_max"            
## [33] "timbre_9_min"             "timbre_9_max"            
## [35] "timbre_10_min"            "timbre_10_max"           
## [37] "timbre_11_min"            "timbre_11_max"           
## [39] "top10"

Create models

You need to designate the independent and dependent variables prior to applying your modeling algorithms. Because you’re trying to predict the ‘top10’ field, this would be your dependent variable and everything else would be independent.

Create your first model using GLM. Because GLM works best with numeric data, you create your model by dropping non-numeric variables. You only use the variables in the dataset that describe the numerical attributes of the song in the logistic regression model. You won’t use these variables:  “year”, “songtitle”, “artistname”, “songid”, or “artistid”.

y.dep <- 39
x.indep <- c(6:38)
x.indep
##  [1]  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
## [24] 29 30 31 32 33 34 35 36 37 38

Create Model 1: All numeric variables

Create Model 1 with the training dataset, using GLM as the modeling algorithm and H2O’s built-in h2o.glm function.

modelh1 <- h2o.glm( y = y.dep, x = x.indep, training_frame = train.h2o, family = "binomial")
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=====                                                            |   8%
  |                                                                       
  |=================================================================| 100%

Measure the performance of Model 1, using H2O’s built-in performance function.

h2o.performance(model=modelh1,newdata=test.h2o)
## H2OBinomialMetrics: glm
## 
## MSE:  0.09924684
## RMSE:  0.3150347
## LogLoss:  0.3220267
## Mean Per-Class Error:  0.2380168
## AUC:  0.8431394
## Gini:  0.6862787
## R^2:  0.254663
## Null Deviance:  326.0801
## Residual Deviance:  240.2319
## AIC:  308.2319
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##          0   1    Error     Rate
## 0      255  59 0.187898  =59/314
## 1       17  42 0.288136   =17/59
## Totals 272 101 0.203753  =76/373
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                         metric threshold    value idx
## 1                       max f1  0.192772 0.525000 100
## 2                       max f2  0.124912 0.650510 155
## 3                 max f0point5  0.416258 0.612903  23
## 4                 max accuracy  0.416258 0.879357  23
## 5                max precision  0.813396 1.000000   0
## 6                   max recall  0.037579 1.000000 282
## 7              max specificity  0.813396 1.000000   0
## 8             max absolute_mcc  0.416258 0.455251  23
## 9   max min_per_class_accuracy  0.161402 0.738854 125
## 10 max mean_per_class_accuracy  0.124912 0.765006 155
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or ` 
h2o.auc(h2o.performance(modelh1,test.h2o)) 
## [1] 0.8431394

The AUC metric provides insight into how well the classifier is able to separate the two classes. In this case, the value of 0.8431394 indicates that the classification is good. (A value of 0.5 indicates a worthless test, while a value of 1.0 indicates a perfect test.)

Next, inspect the coefficients of the variables in the dataset.

dfmodelh1 <- as.data.frame(h2o.varimp(modelh1))
dfmodelh1
##                       names coefficients sign
## 1              timbre_0_max  1.290938663  NEG
## 2                  loudness  1.262941934  POS
## 3                     pitch  0.616995941  NEG
## 4              timbre_1_min  0.422323735  POS
## 5              timbre_6_min  0.349016024  NEG
## 6                    energy  0.348092062  NEG
## 7             timbre_11_min  0.307331997  NEG
## 8              timbre_3_max  0.302225619  NEG
## 9             timbre_11_max  0.243632060  POS
## 10             timbre_4_min  0.224233951  POS
## 11             timbre_4_max  0.204134342  POS
## 12             timbre_5_min  0.199149324  NEG
## 13             timbre_0_min  0.195147119  POS
## 14 timesignature_confidence  0.179973904  POS
## 15         tempo_confidence  0.144242598  POS
## 16            timbre_10_max  0.137644568  POS
## 17             timbre_7_min  0.126995955  NEG
## 18            timbre_10_min  0.123851179  POS
## 19             timbre_7_max  0.100031481  NEG
## 20             timbre_2_min  0.096127636  NEG
## 21           key_confidence  0.083115820  POS
## 22             timbre_6_max  0.073712419  POS
## 23            timesignature  0.067241917  POS
## 24             timbre_8_min  0.061301881  POS
## 25             timbre_8_max  0.060041698  POS
## 26                      key  0.056158445  POS
## 27             timbre_3_min  0.050825116  POS
## 28             timbre_9_max  0.033733561  POS
## 29             timbre_2_max  0.030939072  POS
## 30             timbre_9_min  0.020708113  POS
## 31             timbre_1_max  0.014228818  NEG
## 32                    tempo  0.008199861  POS
## 33             timbre_5_max  0.004837870  POS
## 34                                    NA <NA>

Typically, songs with heavier instrumentation tend to be louder (have higher values in the variable “loudness”) and more energetic (have higher values in the variable “energy”). This knowledge is helpful for interpreting the modeling results.

You can make the following observations from the results:

  • The coefficient estimates for the confidence values associated with the time signature, key, and tempo variables are positive. This suggests that higher confidence leads to a higher predicted probability of a Top 10 hit.
  • The coefficient estimate for loudness is positive, meaning that mainstream listeners prefer louder songs with heavier instrumentation.
  • The coefficient estimate for energy is negative, meaning that mainstream listeners prefer songs that are less energetic, which are those songs with light instrumentation.

These coefficients lead to contradictory conclusions for Model 1. This could be due to multicollinearity issues. Inspect the correlation between the variables “loudness” and “energy” in the training set.

cor(train.h2o$loudness,train.h2o$energy)
## [1] 0.7399067

This number indicates that these two variables are highly correlated, and Model 1 does indeed suffer from multicollinearity. Typically, you associate a value of -1.0 to -0.5 or 1.0 to 0.5 to indicate strong correlation, and a value of 0.1 to 0.1 to indicate weak correlation. To avoid this correlation issue, omit one of these two variables and re-create the models.

You build two variations of the original model:

  • Model 2, in which you keep “energy” and omit “loudness”
  • Model 3, in which you keep “loudness” and omit “energy”

You compare these two models and choose the model with a better fit for this use case.

Create Model 2: Keep energy and omit loudness

colnames(train.h2o)
##  [1] "year"                     "songtitle"               
##  [3] "artistname"               "songid"                  
##  [5] "artistid"                 "timesignature"           
##  [7] "timesignature_confidence" "loudness"                
##  [9] "tempo"                    "tempo_confidence"        
## [11] "key"                      "key_confidence"          
## [13] "energy"                   "pitch"                   
## [15] "timbre_0_min"             "timbre_0_max"            
## [17] "timbre_1_min"             "timbre_1_max"            
## [19] "timbre_2_min"             "timbre_2_max"            
## [21] "timbre_3_min"             "timbre_3_max"            
## [23] "timbre_4_min"             "timbre_4_max"            
## [25] "timbre_5_min"             "timbre_5_max"            
## [27] "timbre_6_min"             "timbre_6_max"            
## [29] "timbre_7_min"             "timbre_7_max"            
## [31] "timbre_8_min"             "timbre_8_max"            
## [33] "timbre_9_min"             "timbre_9_max"            
## [35] "timbre_10_min"            "timbre_10_max"           
## [37] "timbre_11_min"            "timbre_11_max"           
## [39] "top10"
y.dep <- 39
x.indep <- c(6:7,9:38)
x.indep
##  [1]  6  7  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
## [24] 30 31 32 33 34 35 36 37 38
modelh2 <- h2o.glm( y = y.dep, x = x.indep, training_frame = train.h2o, family = "binomial")
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=======                                                          |  10%
  |                                                                       
  |=================================================================| 100%

Measure the performance of Model 2.

h2o.performance(model=modelh2,newdata=test.h2o)
## H2OBinomialMetrics: glm
## 
## MSE:  0.09922606
## RMSE:  0.3150017
## LogLoss:  0.3228213
## Mean Per-Class Error:  0.2490554
## AUC:  0.8431933
## Gini:  0.6863867
## R^2:  0.2548191
## Null Deviance:  326.0801
## Residual Deviance:  240.8247
## AIC:  306.8247
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##          0  1    Error     Rate
## 0      280 34 0.108280  =34/314
## 1       23 36 0.389831   =23/59
## Totals 303 70 0.152815  =57/373
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                         metric threshold    value idx
## 1                       max f1  0.254391 0.558140  69
## 2                       max f2  0.113031 0.647208 157
## 3                 max f0point5  0.413999 0.596026  22
## 4                 max accuracy  0.446250 0.876676  18
## 5                max precision  0.811739 1.000000   0
## 6                   max recall  0.037682 1.000000 283
## 7              max specificity  0.811739 1.000000   0
## 8             max absolute_mcc  0.254391 0.469060  69
## 9   max min_per_class_accuracy  0.141051 0.716561 131
## 10 max mean_per_class_accuracy  0.113031 0.761821 157
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`
dfmodelh2 <- as.data.frame(h2o.varimp(modelh2))
dfmodelh2
##                       names coefficients sign
## 1                     pitch  0.700331511  NEG
## 2              timbre_1_min  0.510270513  POS
## 3              timbre_0_max  0.402059546  NEG
## 4              timbre_6_min  0.333316236  NEG
## 5             timbre_11_min  0.331647383  NEG
## 6              timbre_3_max  0.252425901  NEG
## 7             timbre_11_max  0.227500308  POS
## 8              timbre_4_max  0.210663865  POS
## 9              timbre_0_min  0.208516163  POS
## 10             timbre_5_min  0.202748055  NEG
## 11             timbre_4_min  0.197246582  POS
## 12            timbre_10_max  0.172729619  POS
## 13         tempo_confidence  0.167523934  POS
## 14 timesignature_confidence  0.167398830  POS
## 15             timbre_7_min  0.142450727  NEG
## 16             timbre_8_max  0.093377516  POS
## 17            timbre_10_min  0.090333426  POS
## 18            timesignature  0.085851625  POS
## 19             timbre_7_max  0.083948442  NEG
## 20           key_confidence  0.079657073  POS
## 21             timbre_6_max  0.076426046  POS
## 22             timbre_2_min  0.071957831  NEG
## 23             timbre_9_max  0.071393189  POS
## 24             timbre_8_min  0.070225578  POS
## 25                      key  0.061394702  POS
## 26             timbre_3_min  0.048384697  POS
## 27             timbre_1_max  0.044721121  NEG
## 28                   energy  0.039698433  POS
## 29             timbre_5_max  0.039469064  POS
## 30             timbre_2_max  0.018461133  POS
## 31                    tempo  0.013279926  POS
## 32             timbre_9_min  0.005282143  NEG
## 33                                    NA <NA>

h2o.auc(h2o.performance(modelh2,test.h2o)) 
## [1] 0.8431933

You can make the following observations:

  • The AUC metric is 0.8431933.
  • Inspecting the coefficient of the variable energy, Model 2 suggests that songs with high energy levels tend to be more popular. This is as per expectation.
  • As H2O orders variables by significance, the variable energy is not significant in this model.

You can conclude that Model 2 is not ideal for this use , as energy is not significant.

CreateModel 3: Keep loudness but omit energy

colnames(train.h2o)
##  [1] "year"                     "songtitle"               
##  [3] "artistname"               "songid"                  
##  [5] "artistid"                 "timesignature"           
##  [7] "timesignature_confidence" "loudness"                
##  [9] "tempo"                    "tempo_confidence"        
## [11] "key"                      "key_confidence"          
## [13] "energy"                   "pitch"                   
## [15] "timbre_0_min"             "timbre_0_max"            
## [17] "timbre_1_min"             "timbre_1_max"            
## [19] "timbre_2_min"             "timbre_2_max"            
## [21] "timbre_3_min"             "timbre_3_max"            
## [23] "timbre_4_min"             "timbre_4_max"            
## [25] "timbre_5_min"             "timbre_5_max"            
## [27] "timbre_6_min"             "timbre_6_max"            
## [29] "timbre_7_min"             "timbre_7_max"            
## [31] "timbre_8_min"             "timbre_8_max"            
## [33] "timbre_9_min"             "timbre_9_max"            
## [35] "timbre_10_min"            "timbre_10_max"           
## [37] "timbre_11_min"            "timbre_11_max"           
## [39] "top10"
y.dep <- 39
x.indep <- c(6:12,14:38)
x.indep
##  [1]  6  7  8  9 10 11 12 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
## [24] 30 31 32 33 34 35 36 37 38
modelh3 <- h2o.glm( y = y.dep, x = x.indep, training_frame = train.h2o, family = "binomial")
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |========                                                         |  12%
  |                                                                       
  |=================================================================| 100%
perfh3<-h2o.performance(model=modelh3,newdata=test.h2o)
perfh3
## H2OBinomialMetrics: glm
## 
## MSE:  0.0978859
## RMSE:  0.3128672
## LogLoss:  0.3178367
## Mean Per-Class Error:  0.264925
## AUC:  0.8492389
## Gini:  0.6984778
## R^2:  0.2648836
## Null Deviance:  326.0801
## Residual Deviance:  237.1062
## AIC:  303.1062
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##          0  1    Error     Rate
## 0      286 28 0.089172  =28/314
## 1       26 33 0.440678   =26/59
## Totals 312 61 0.144772  =54/373
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                         metric threshold    value idx
## 1                       max f1  0.273799 0.550000  60
## 2                       max f2  0.125503 0.663265 155
## 3                 max f0point5  0.435479 0.628931  24
## 4                 max accuracy  0.435479 0.882038  24
## 5                max precision  0.821606 1.000000   0
## 6                   max recall  0.038328 1.000000 280
## 7              max specificity  0.821606 1.000000   0
## 8             max absolute_mcc  0.435479 0.471426  24
## 9   max min_per_class_accuracy  0.173693 0.745763 120
## 10 max mean_per_class_accuracy  0.125503 0.775073 155
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`
dfmodelh3 <- as.data.frame(h2o.varimp(modelh3))
dfmodelh3
##                       names coefficients sign
## 1              timbre_0_max 1.216621e+00  NEG
## 2                  loudness 9.780973e-01  POS
## 3                     pitch 7.249788e-01  NEG
## 4              timbre_1_min 3.891197e-01  POS
## 5              timbre_6_min 3.689193e-01  NEG
## 6             timbre_11_min 3.086673e-01  NEG
## 7              timbre_3_max 3.025593e-01  NEG
## 8             timbre_11_max 2.459081e-01  POS
## 9              timbre_4_min 2.379749e-01  POS
## 10             timbre_4_max 2.157627e-01  POS
## 11             timbre_0_min 1.859531e-01  POS
## 12             timbre_5_min 1.846128e-01  NEG
## 13 timesignature_confidence 1.729658e-01  POS
## 14             timbre_7_min 1.431871e-01  NEG
## 15            timbre_10_max 1.366703e-01  POS
## 16            timbre_10_min 1.215954e-01  POS
## 17         tempo_confidence 1.183698e-01  POS
## 18             timbre_2_min 1.019149e-01  NEG
## 19           key_confidence 9.109701e-02  POS
## 20             timbre_7_max 8.987908e-02  NEG
## 21             timbre_6_max 6.935132e-02  POS
## 22             timbre_8_max 6.878241e-02  POS
## 23            timesignature 6.120105e-02  POS
## 24                      key 5.814805e-02  POS
## 25             timbre_8_min 5.759228e-02  POS
## 26             timbre_1_max 2.930285e-02  NEG
## 27             timbre_9_max 2.843755e-02  POS
## 28             timbre_3_min 2.380245e-02  POS
## 29             timbre_2_max 1.917035e-02  POS
## 30             timbre_5_max 1.715813e-02  POS
## 31                    tempo 1.364418e-02  NEG
## 32             timbre_9_min 8.463143e-05  NEG
## 33                                    NA <NA>
h2o.sensitivity(perfh3,0.5)
## Warning in h2o.find_row_by_threshold(object, t): Could not find exact
## threshold: 0.5 for this set of metrics; using closest threshold found:
## 0.501855569251422. Run `h2o.predict` and apply your desired threshold on a
## probability column.
## [[1]]
## [1] 0.2033898
h2o.auc(perfh3)
## [1] 0.8492389

You can make the following observations:

  • The AUC metric is 0.8492389.
  • From the confusion matrix, the model correctly predicts that 33 songs will be top 10 hits (true positives). However, it has 26 false positives (songs that the model predicted would be Top 10 hits, but ended up not being Top 10 hits).
  • Loudness has a positive coefficient estimate, meaning that this model predicts that songs with heavier instrumentation tend to be more popular. This is the same conclusion from Model 2.
  • Loudness is significant in this model.

Overall, Model 3 predicts a higher number of top 10 hits with an accuracy rate that is acceptable. To choose the best fit for production runs, record labels should consider the following factors:

  • Desired model accuracy at a given threshold
  • Number of correct predictions for top10 hits
  • Tolerable number of false positives or false negatives

Next, make predictions using Model 3 on the test dataset.

predict.regh <- h2o.predict(modelh3, test.h2o)
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=================================================================| 100%
print(predict.regh)
##   predict        p0          p1
## 1       0 0.9654739 0.034526052
## 2       0 0.9654748 0.034525236
## 3       0 0.9635547 0.036445318
## 4       0 0.9343579 0.065642149
## 5       0 0.9978334 0.002166601
## 6       0 0.9779949 0.022005078
## 
## [373 rows x 3 columns]
predict.regh$predict
##   predict
## 1       0
## 2       0
## 3       0
## 4       0
## 5       0
## 6       0
## 
## [373 rows x 1 column]
dpr<-as.data.frame(predict.regh)
#Rename the predicted column 
colnames(dpr)[colnames(dpr) == 'predict'] <- 'predict_top10'
table(dpr$predict_top10)
## 
##   0   1 
## 312  61

The first set of output results specifies the probabilities associated with each predicted observation.  For example, observation 1 is 96.54739% likely to not be a Top 10 hit, and 3.4526052% likely to be a Top 10 hit (predict=1 indicates Top 10 hit and predict=0 indicates not a Top 10 hit).  The second set of results list the actual predictions made.  From the third set of results, this model predicts that 61 songs will be top 10 hits.

Compute the baseline accuracy, by assuming that the baseline predicts the most frequent outcome, which is that most songs are not Top 10 hits.

table(BillboardTest$top10)
## 
##   0   1 
## 314  59

Now observe that the baseline model would get 314 observations correct, and 59 wrong, for an accuracy of 314/(314+59) = 0.8418231.

It seems that Model 3, with an accuracy of 0.8552, provides you with a small improvement over the baseline model. But is this model useful for record labels?

View the two models from an investment perspective:

  • A production company is interested in investing in songs that are more likely to make it to the Top 10. The company’s objective is to minimize the risk of financial losses attributed to investing in songs that end up unpopular.
  • How many songs does Model 3 correctly predict as a Top 10 hit in 2010? Looking at the confusion matrix, you see that it predicts 33 top 10 hits correctly at an optimal threshold, which is more than half the number
  • It will be more useful to the record label if you can provide the production company with a list of songs that are highly likely to end up in the Top 10.
  • The baseline model is not useful, as it simply does not label any song as a hit.

Considering the three models built so far, you can conclude that Model 3 proves to be the best investment choice for the record label.

GBM model

H2O provides you with the ability to explore other learning models, such as GBM and deep learning. Explore building a model using the GBM technique, using the built-in h2o.gbm function.

Before you do this, you need to convert the target variable to a factor for multinomial classification techniques.

train.h2o$top10=as.factor(train.h2o$top10)
gbm.modelh <- h2o.gbm(y=y.dep, x=x.indep, training_frame = train.h2o, ntrees = 500, max_depth = 4, learn_rate = 0.01, seed = 1122,distribution="multinomial")
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |===                                                              |   5%
  |                                                                       
  |=====                                                            |   7%
  |                                                                       
  |======                                                           |   9%
  |                                                                       
  |=======                                                          |  10%
  |                                                                       
  |======================                                           |  33%
  |                                                                       
  |=====================================                            |  56%
  |                                                                       
  |====================================================             |  79%
  |                                                                       
  |================================================================ |  98%
  |                                                                       
  |=================================================================| 100%
perf.gbmh<-h2o.performance(gbm.modelh,test.h2o)
perf.gbmh
## H2OBinomialMetrics: gbm
## 
## MSE:  0.09860778
## RMSE:  0.3140188
## LogLoss:  0.3206876
## Mean Per-Class Error:  0.2120263
## AUC:  0.8630573
## Gini:  0.7261146
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##          0  1    Error     Rate
## 0      266 48 0.152866  =48/314
## 1       16 43 0.271186   =16/59
## Totals 282 91 0.171582  =64/373
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                       metric threshold    value idx
## 1                     max f1  0.189757 0.573333  90
## 2                     max f2  0.130895 0.693717 145
## 3               max f0point5  0.327346 0.598802  26
## 4               max accuracy  0.442757 0.876676  14
## 5              max precision  0.802184 1.000000   0
## 6                 max recall  0.049990 1.000000 284
## 7            max specificity  0.802184 1.000000   0
## 8           max absolute_mcc  0.169135 0.496486 104
## 9 max min_per_class_accuracy  0.169135 0.796610 104
## 10 max mean_per_class_accuracy  0.169135 0.805948 104
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `
h2o.sensitivity(perf.gbmh,0.5)
## Warning in h2o.find_row_by_threshold(object, t): Could not find exact
## threshold: 0.5 for this set of metrics; using closest threshold found:
## 0.501205344484314. Run `h2o.predict` and apply your desired threshold on a
## probability column.
## [[1]]
## [1] 0.1355932
h2o.auc(perf.gbmh)
## [1] 0.8630573

This model correctly predicts 43 top 10 hits, which is 10 more than the number predicted by Model 3. Moreover, the AUC metric is higher than the one obtained from Model 3.

As seen above, H2O’s API provides the ability to obtain key statistical measures required to analyze the models easily, using several built-in functions. The record label can experiment with different parameters to arrive at the model that predicts the maximum number of Top 10 hits at the desired level of accuracy and threshold.

H2O also allows you to experiment with deep learning models. Deep learning models have the ability to learn features implicitly, but can be more expensive computationally.

Now, create a deep learning model with the h2o.deeplearning function, using the same training and test datasets created before. The time taken to run this model depends on the type of EC2 instance chosen for this purpose.  For models that require more computation, consider using accelerated computing instances such as the P2 instance type.

system.time(
  dlearning.modelh <- h2o.deeplearning(y = y.dep,
                                      x = x.indep,
                                      training_frame = train.h2o,
                                      epoch = 250,
                                      hidden = c(250,250),
                                      activation = "Rectifier",
                                      seed = 1122,
                                      distribution="multinomial"
  )
)
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |===                                                              |   4%
  |                                                                       
  |=====                                                            |   8%
  |                                                                       
  |========                                                         |  12%
  |                                                                       
  |==========                                                       |  16%
  |                                                                       
  |=============                                                    |  20%
  |                                                                       
  |================                                                 |  24%
  |                                                                       
  |==================                                               |  28%
  |                                                                       
  |=====================                                            |  32%
  |                                                                       
  |=======================                                          |  36%
  |                                                                       
  |==========================                                       |  40%
  |                                                                       
  |=============================                                    |  44%
  |                                                                       
  |===============================                                  |  48%
  |                                                                       
  |==================================                               |  52%
  |                                                                       
  |====================================                             |  56%
  |                                                                       
  |=======================================                          |  60%
  |                                                                       
  |==========================================                       |  64%
  |                                                                       
  |============================================                     |  68%
  |                                                                       
  |===============================================                  |  72%
  |                                                                       
  |=================================================                |  76%
  |                                                                       
  |====================================================             |  80%
  |                                                                       
  |=======================================================          |  84%
  |                                                                       
  |=========================================================        |  88%
  |                                                                       
  |============================================================     |  92%
  |                                                                       
  |==============================================================   |  96%
  |                                                                       
  |=================================================================| 100%
##    user  system elapsed 
##   1.216   0.020 166.508
perf.dl<-h2o.performance(model=dlearning.modelh,newdata=test.h2o)
perf.dl
## H2OBinomialMetrics: deeplearning
## 
## MSE:  0.1678359
## RMSE:  0.4096778
## LogLoss:  1.86509
## Mean Per-Class Error:  0.3433013
## AUC:  0.7568822
## Gini:  0.5137644
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##          0  1    Error     Rate
## 0      290 24 0.076433  =24/314
## 1       36 23 0.610169   =36/59
## Totals 326 47 0.160858  =60/373
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                       metric threshold    value idx
## 1                     max f1  0.826267 0.433962  46
## 2                     max f2  0.000000 0.588235 239
## 3               max f0point5  0.999929 0.511811  16
## 4               max accuracy  0.999999 0.865952  10
## 5              max precision  1.000000 1.000000   0
## 6                 max recall  0.000000 1.000000 326
## 7            max specificity  1.000000 1.000000   0
## 8           max absolute_mcc  0.999929 0.363219  16
## 9 max min_per_class_accuracy  0.000004 0.662420 145
## 10 max mean_per_class_accuracy  0.000000 0.685334 224
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`
h2o.sensitivity(perf.dl,0.5)
## Warning in h2o.find_row_by_threshold(object, t): Could not find exact
## threshold: 0.5 for this set of metrics; using closest threshold found:
## 0.496293348880151. Run `h2o.predict` and apply your desired threshold on a
## probability column.
## [[1]]
## [1] 0.3898305
h2o.auc(perf.dl)
## [1] 0.7568822

The AUC metric for this model is 0.7568822, which is less than what you got from the earlier models. I recommend further experimentation using different hyper parameters, such as the learning rate, epoch or the number of hidden layers.

H2O’s built-in functions provide many key statistical measures that can help measure model performance. Here are some of these key terms.

Metric Description
Sensitivity Measures the proportion of positives that have been correctly identified. It is also called the true positive rate, or recall.
Specificity Measures the proportion of negatives that have been correctly identified. It is also called the true negative rate.
Threshold Cutoff point that maximizes specificity and sensitivity. While the model may not provide the highest prediction at this point, it would not be biased towards positives or negatives.
Precision The fraction of the documents retrieved that are relevant to the information needed, for example, how many of the positively classified are relevant
AUC

Provides insight into how well the classifier is able to separate the two classes. The implicit goal is to deal with situations where the sample distribution is highly skewed, with a tendency to overfit to a single class.

0.90 – 1 = excellent (A)

0.8 – 0.9 = good (B)

0.7 – 0.8 = fair (C)

.6 – 0.7 = poor (D)

0.5 – 0.5 = fail (F)

Here’s a summary of the metrics generated from H2O’s built-in functions for the three models that produced useful results.

Metric Model 3 GBM Model Deep Learning Model

Accuracy

(max)

0.882038

(t=0.435479)

0.876676

(t=0.442757)

0.865952

(t=0.999999)

Precision

(max)

1.0

(t=0.821606)

1.0

(t=0802184)

1.0

(t=1.0)

Recall

(max)

1.0 1.0

1.0

(t=0)

Specificity

(max)

1.0 1.0

1.0

(t=1)

Sensitivity

 

0.2033898 0.1355932

0.3898305

(t=0.5)

AUC 0.8492389 0.8630573 0.756882

Note: ‘t’ denotes threshold.

Your options at this point could be narrowed down to Model 3 and the GBM model, based on the AUC and accuracy metrics observed earlier.  If the slightly lower accuracy of the GBM model is deemed acceptable, the record label can choose to go to production with the GBM model, as it can predict a higher number of Top 10 hits.  The AUC metric for the GBM model is also higher than that of Model 3.

Record labels can experiment with different learning techniques and parameters before arriving at a model that proves to be the best fit for their business. Because deep learning models can be computationally expensive, record labels can choose more powerful EC2 instances on AWS to run their experiments faster.

Conclusion

In this post, I showed how the popular music industry can use analytics to predict the type of songs that make the Top 10 Billboard charts. By running H2O’s scalable machine learning platform on AWS, data scientists can easily experiment with multiple modeling techniques and interactively query the data using Amazon Athena, without having to manage the underlying infrastructure. This helps record labels make critical decisions on the type of artists and songs to promote in a timely fashion, thereby increasing sales and revenue.

If you have questions or suggestions, please comment below.


Additional Reading

Learn how to build and explore a simple geospita simple GEOINT application using SparkR.


About the Authors

gopalGopal Wunnava is a Partner Solution Architect with the AWS GSI Team. He works with partners and customers on big data engagements, and is passionate about building analytical solutions that drive business capabilities and decision making. In his spare time, he loves all things sports and movies related and is fond of old classics like Asterix, Obelix comics and Hitchcock movies.

 

 

Bob Strahan, a Senior Consultant with AWS Professional Services, contributed to this post.

 

 

timeShift(GrafanaBuzz, 1w) Issue 17

Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2017/10/13/timeshiftgrafanabuzz-1w-issue-17/

It’s been a busy week here at Grafana Labs. While we’ve been working on GrafanaCon EU preparations here at the NYC office, the Stockholm office has been diligently working to release Grafana 4.6-beta-1. We’re really excited about this latest release and look forward to your feedback on the new features.


Latest Release

Grafana 4.6-beta-1 is now available! Grafana v4.6 brings many enhancements to Annotations, Cloudwatch and Prometheus. It also adds support for Postgres as a metric and table data source!

To see more details on what’s in the newest version, please see the release notes.

Download Grafana 4.6.0-beta-1 Now


From the Blogosphere

Using Kafka and Grafana to Monitor Meteorological Conditions: Oliver was looking for a way to track historical mountain conditions around the UK, but only had available data for the last 24 hours. It seemed like a perfect job for Kafka. This post discusses how to get going with Kafka very easily, store the data in Graphite and visualize the data in Grafana.

Web Interfaces for your Syslog Server – An Overview: System administrators often prefer to use the command line, but complex queries can be completed much faster with logs indexed in a database and a web interface. This article provides a run-down of various GUI-based tools available for your syslog server.

JEE Performance with JMeter, Prometheus and Grafana. Complete Project from Scratch: This comprehensive article walks you through the steps of monitoring JEE application performance from scratch. We start with making implementation decisions, then how to collect data, visualization and dashboarding configuration, and conclude with alerting. Buckle up; it’s a long article, with a ton of information.


Early Bird Tickets Now Available

Early bird tickets are going fast, so take advantage of the discounted price before they’re gone! We will be announcing the first block of speakers in the coming week.

There’s still time to submit a talk. We’ll accept submissions through the end of October. We’re accepting technical and non-technical talks of all sizes. Submit a CFP.

Get Your Early Bird Ticket Now


Grafana Plugins

This week we add the Prometheus Alertmanager Data Source to our growing list of plugins, lots of updates to the GLPI Data source, and have a urgent bugfix for the WorldMap Panel. To update plugins from on-prem Grafana, use the Grafana-cli tool, or with 1 click if you are using Hosted Grafana.

NEW PLUGIN

Prometheus Alertmanager Data Source – This new data source lets you show data from the Prometheus Alertmanager in Grafana. The Alertmanager handles alerts sent by client applications such as the Prometheus server. With this data source, you can show data in Table form or as a SingleStat.

Install Now

UPDATED PLUGIN

WorldMap Panel – A new version with an urgent bugfix for Elasticsearch users:

  • A fix for Geohash maps after a breaking change in Grafana 4.5.0.
  • Last Geohash as center for the map – it centers the map on the last geohash position received. Useful for real time tracking (with auto refresh on in Grafana).

Update

UPDATED PLUGIN

GLPI App – Lots of fixes in the new version:

  • Compatibility with GLPI 9.2
  • Autofill the Timerange field based on the query
  • When adding new query, add by default a ticket query instead of undefined
  • Correct values in hover tooltip
  • Can have element count by hour of the day with the panel histogram

Update


Contributions of the week:

Each week we highlight some of the important contributions from our amazing open source community. Thank you for helping make Grafana better!


Grafana Labs is Hiring!

We are passionate about open source software and thrive on tackling complex challenges to build the future. We ship code from every corner of the globe and love working with the community. If this sounds exciting, you’re in luck – WE’RE HIRING!

Check out our Open Positions


New Annotation Function

In addition to being able to add annotations easily in the graph panel, you can also create ranges as shown above. Give 4.6.0-beta-1 a try and give us your feedback.

We Need Your Help!

Do you have a graph that you love because the data is beautiful or because the graph provides interesting information? Please get in touch. Tweet or send us an email with a screenshot, and we’ll tell you about this fun experiment.

Tell Me More


What do you think?

We want to keep these articles interesting and relevant, so please tell us how we’re doing. Submit a comment on this article below, or post something at our community forum. Help us make these weekly roundups better!

Follow us on Twitter, like us on Facebook, and join the Grafana Labs community.

Introducing Gluon: a new library for machine learning from AWS and Microsoft

Post Syndicated from Ana Visneski original https://aws.amazon.com/blogs/aws/introducing-gluon-a-new-library-for-machine-learning-from-aws-and-microsoft/

Post by Dr. Matt Wood

Today, AWS and Microsoft announced Gluon, a new open source deep learning interface which allows developers to more easily and quickly build machine learning models, without compromising performance.

Gluon Logo

Gluon provides a clear, concise API for defining machine learning models using a collection of pre-built, optimized neural network components. Developers who are new to machine learning will find this interface more familiar to traditional code, since machine learning models can be defined and manipulated just like any other data structure. More seasoned data scientists and researchers will value the ability to build prototypes quickly and utilize dynamic neural network graphs for entirely new model architectures, all without sacrificing training speed.

Gluon is available in Apache MXNet today, a forthcoming Microsoft Cognitive Toolkit release, and in more frameworks over time.

Neural Networks vs Developers
Machine learning with neural networks (including ‘deep learning’) has three main components: data for training; a neural network model, and an algorithm which trains the neural network. You can think of the neural network in a similar way to a directed graph; it has a series of inputs (which represent the data), which connect to a series of outputs (the prediction), through a series of connected layers and weights. During training, the algorithm adjusts the weights in the network based on the error in the network output. This is the process by which the network learns; it is a memory and compute intensive process which can take days.

Deep learning frameworks such as Caffe2, Cognitive Toolkit, TensorFlow, and Apache MXNet are, in part, an answer to the question ‘how can we speed this process up? Just like query optimizers in databases, the more a training engine knows about the network and the algorithm, the more optimizations it can make to the training process (for example, it can infer what needs to be re-computed on the graph based on what else has changed, and skip the unaffected weights to speed things up). These frameworks also provide parallelization to distribute the computation process, and reduce the overall training time.

However, in order to achieve these optimizations, most frameworks require the developer to do some extra work: specifically, by providing a formal definition of the network graph, up-front, and then ‘freezing’ the graph, and just adjusting the weights.

The network definition, which can be large and complex with millions of connections, usually has to be constructed by hand. Not only are deep learning networks unwieldy, but they can be difficult to debug and it’s hard to re-use the code between projects.

The result of this complexity can be difficult for beginners and is a time-consuming task for more experienced researchers. At AWS, we’ve been experimenting with some ideas in MXNet around new, flexible, more approachable ways to define and train neural networks. Microsoft is also a contributor to the open source MXNet project, and were interested in some of these same ideas. Based on this, we got talking, and found we had a similar vision: to use these techniques to reduce the complexity of machine learning, making it accessible to more developers.

Enter Gluon: dynamic graphs, rapid iteration, scalable training
Gluon introduces four key innovations.

  1. Friendly API: Gluon networks can be defined using a simple, clear, concise code – this is easier for developers to learn, and much easier to understand than some of the more arcane and formal ways of defining networks and their associated weighted scoring functions.
  2. Dynamic networks: the network definition in Gluon is dynamic: it can bend and flex just like any other data structure. This is in contrast to the more common, formal, symbolic definition of a network which the deep learning framework has to effectively carve into stone in order to be able to effectively optimizing computation during training. Dynamic networks are easier to manage, and with Gluon, developers can easily ‘hybridize’ between these fast symbolic representations and the more friendly, dynamic ‘imperative’ definitions of the network and algorithms.
  3. The algorithm can define the network: the model and the training algorithm are brought much closer together. Instead of separate definitions, the algorithm can adjust the network dynamically during definition and training. Not only does this mean that developers can use standard programming loops, and conditionals to create these networks, but researchers can now define even more sophisticated algorithms and models which were not possible before. They are all easier to create, change, and debug.
  4. High performance operators for training: which makes it possible to have a friendly, concise API and dynamic graphs, without sacrificing training speed. This is a huge step forward in machine learning. Some frameworks bring a friendly API or dynamic graphs to deep learning, but these previous methods all incur a cost in terms of training speed. As with other areas of software, abstraction can slow down computation since it needs to be negotiated and interpreted at run time. Gluon can efficiently blend together a concise API with the formal definition under the hood, without the developer having to know about the specific details or to accommodate the compiler optimizations manually.

The team here at AWS, and our collaborators at Microsoft, couldn’t be more excited to bring these improvements to developers through Gluon. We’re already seeing quite a bit of excitement from developers and researchers alike.

Getting started with Gluon
Gluon is available today in Apache MXNet, with support coming for the Microsoft Cognitive Toolkit in a future release. We’re also publishing the front-end interface and the low-level API specifications so it can be included in other frameworks in the fullness of time.

You can get started with Gluon today. Fire up the AWS Deep Learning AMI with a single click and jump into one of 50 fully worked, notebook examples. If you’re a contributor to a machine learning framework, check out the interface specs on GitHub.

-Dr. Matt Wood

AWS Developer Tools Expands Integration to Include GitHub

Post Syndicated from Balaji Iyer original https://aws.amazon.com/blogs/devops/aws-developer-tools-expands-integration-to-include-github/

AWS Developer Tools is a set of services that include AWS CodeCommit, AWS CodePipeline, AWS CodeBuild, and AWS CodeDeploy. Together, these services help you securely store and maintain version control of your application’s source code and automatically build, test, and deploy your application to AWS or your on-premises environment. These services are designed to enable developers and IT professionals to rapidly and safely deliver software.

As part of our continued commitment to extend the AWS Developer Tools ecosystem to third-party tools and services, we’re pleased to announce AWS CodeStar and AWS CodeBuild now integrate with GitHub. This will make it easier for GitHub users to set up a continuous integration and continuous delivery toolchain as part of their release process using AWS Developer Tools.

In this post, I will walk through the following:

Prerequisites:

You’ll need an AWS account, a GitHub account, an Amazon EC2 key pair, and administrator-level permissions for AWS Identity and Access Management (IAM), AWS CodeStar, AWS CodeBuild, AWS CodePipeline, Amazon EC2, Amazon S3.

 

Integrating GitHub with AWS CodeStar

AWS CodeStar enables you to quickly develop, build, and deploy applications on AWS. Its unified user interface helps you easily manage your software development activities in one place. With AWS CodeStar, you can set up your entire continuous delivery toolchain in minutes, so you can start releasing code faster.

When AWS CodeStar launched in April of this year, it used AWS CodeCommit as the hosted source repository. You can now choose between AWS CodeCommit or GitHub as the source control service for your CodeStar projects. In addition, your CodeStar project dashboard lets you centrally track GitHub activities, including commits, issues, and pull requests. This makes it easy to manage project activity across the components of your CI/CD toolchain. Adding the GitHub dashboard view will simplify development of your AWS applications.

In this section, I will show you how to use GitHub as the source provider for your CodeStar projects. I’ll also show you how to work with recent commits, issues, and pull requests in the CodeStar dashboard.

Sign in to the AWS Management Console and from the Services menu, choose CodeStar. In the CodeStar console, choose Create a new project. You should see the Choose a project template page.

CodeStar Project

Choose an option by programming language, application category, or AWS service. I am going to choose the Ruby on Rails web application that will be running on Amazon EC2.

On the Project details page, you’ll now see the GitHub option. Type a name for your project, and then choose Connect to GitHub.

Project details

You’ll see a message requesting authorization to connect to your GitHub repository. When prompted, choose Authorize, and then type your GitHub account password.

Authorize

This connects your GitHub identity to AWS CodeStar through OAuth. You can always review your settings by navigating to your GitHub application settings.

Installed GitHub Apps

You’ll see AWS CodeStar is now connected to GitHub:

Create project

You can choose a public or private repository. GitHub offers free accounts for users and organizations working on public and open source projects and paid accounts that offer unlimited private repositories and optional user management and security features.

In this example, I am going to choose the public repository option. Edit the repository description, if you like, and then choose Next.

Review your CodeStar project details, and then choose Create Project. On Choose an Amazon EC2 Key Pair, choose Create Project.

Key Pair

On the Review project details page, you’ll see Edit Amazon EC2 configuration. Choose this link to configure instance type, VPC, and subnet options. AWS CodeStar requires a service role to create and manage AWS resources and IAM permissions. This role will be created for you when you select the AWS CodeStar would like permission to administer AWS resources on your behalf check box.

Choose Create Project. It might take a few minutes to create your project and resources.

Review project details

When you create a CodeStar project, you’re added to the project team as an owner. If this is the first time you’ve used AWS CodeStar, you’ll be asked to provide the following information, which will be shown to others:

  • Your display name.
  • Your email address.

This information is used in your AWS CodeStar user profile. User profiles are not project-specific, but they are limited to a single AWS region. If you are a team member in projects in more than one region, you’ll have to create a user profile in each region.

User settings

User settings

Choose Next. AWS CodeStar will create a GitHub repository with your configuration settings (for example, https://github.com/biyer/ruby-on-rails-service).

When you integrate your integrated development environment (IDE) with AWS CodeStar, you can continue to write and develop code in your preferred environment. The changes you make will be included in the AWS CodeStar project each time you commit and push your code.

IDE

After setting up your IDE, choose Next to go to the CodeStar dashboard. Take a few minutes to familiarize yourself with the dashboard. You can easily track progress across your entire software development process, from your backlog of work items to recent code deployments.

Dashboard

After the application deployment is complete, choose the endpoint that will display the application.

Pipeline

This is what you’ll see when you open the application endpoint:

The Commit history section of the dashboard lists the commits made to the Git repository. If you choose the commit ID or the Open in GitHub option, you can use a hotlink to your GitHub repository.

Commit history

Your AWS CodeStar project dashboard is where you and your team view the status of your project resources, including the latest commits to your project, the state of your continuous delivery pipeline, and the performance of your instances. This information is displayed on tiles that are dedicated to a particular resource. To see more information about any of these resources, choose the details link on the tile. The console for that AWS service will open on the details page for that resource.

Issues

You can also filter issues based on their status and the assigned user.

Filter

AWS CodeBuild Now Supports Building GitHub Pull Requests

CodeBuild is a fully managed build service that compiles source code, runs tests, and produces software packages that are ready to deploy. With CodeBuild, you don’t need to provision, manage, and scale your own build servers. CodeBuild scales continuously and processes multiple builds concurrently, so your builds are not left waiting in a queue. You can use prepackaged build environments to get started quickly or you can create custom build environments that use your own build tools.

We recently announced support for GitHub pull requests in AWS CodeBuild. This functionality makes it easier to collaborate across your team while editing and building your application code with CodeBuild. You can use the AWS CodeBuild or AWS CodePipeline consoles to run AWS CodeBuild. You can also automate the running of AWS CodeBuild by using the AWS Command Line Interface (AWS CLI), the AWS SDKs, or the AWS CodeBuild Plugin for Jenkins.

AWS CodeBuild

In this section, I will show you how to trigger a build in AWS CodeBuild with a pull request from GitHub through webhooks.

Open the AWS CodeBuild console at https://console.aws.amazon.com/codebuild/. Choose Create project. If you already have a CodeBuild project, you can choose Edit project, and then follow along. CodeBuild can connect to AWS CodeCommit, S3, BitBucket, and GitHub to pull source code for builds. For Source provider, choose GitHub, and then choose Connect to GitHub.

Configure

After you’ve successfully linked GitHub and your CodeBuild project, you can choose a repository in your GitHub account. CodeBuild also supports connections to any public repository. You can review your settings by navigating to your GitHub application settings.

GitHub Apps

On Source: What to Build, for Webhook, select the Rebuild every time a code change is pushed to this repository check box.

Note: You can select this option only if, under Repository, you chose Use a repository in my account.

Source

In Environment: How to build, for Environment image, select Use an image managed by AWS CodeBuild. For Operating system, choose Ubuntu. For Runtime, choose Base. For Version, choose the latest available version. For Build specification, you can provide a collection of build commands and related settings, in YAML format (buildspec.yml) or you can override the build spec by inserting build commands directly in the console. AWS CodeBuild uses these commands to run a build. In this example, the output is the string “hello.”

Environment

On Artifacts: Where to put the artifacts from this build project, for Type, choose No artifacts. (This is also the type to choose if you are just running tests or pushing a Docker image to Amazon ECR.) You also need an AWS CodeBuild service role so that AWS CodeBuild can interact with dependent AWS services on your behalf. Unless you already have a role, choose Create a role, and for Role name, type a name for your role.

Artifacts

In this example, leave the advanced settings at their defaults.

If you expand Show advanced settings, you’ll see options for customizing your build, including:

  • A build timeout.
  • A KMS key to encrypt all the artifacts that the builds for this project will use.
  • Options for building a Docker image.
  • Elevated permissions during your build action (for example, accessing Docker inside your build container to build a Dockerfile).
  • Resource options for the build compute type.
  • Environment variables (built-in or custom). For more information, see Create a Build Project in the AWS CodeBuild User Guide.

Advanced settings

You can use the AWS CodeBuild console to create a parameter in Amazon EC2 Systems Manager. Choose Create a parameter, and then follow the instructions in the dialog box. (In that dialog box, for KMS key, you can optionally specify the ARN of an AWS KMS key in your account. Amazon EC2 Systems Manager uses this key to encrypt the parameter’s value during storage and decrypt during retrieval.)

Create parameter

Choose Continue. On the Review page, either choose Save and build or choose Save to run the build later.

Choose Start build. When the build is complete, the Build logs section should display detailed information about the build.

Logs

To demonstrate a pull request, I will fork the repository as a different GitHub user, make commits to the forked repo, check in the changes to a newly created branch, and then open a pull request.

Pull request

As soon as the pull request is submitted, you’ll see CodeBuild start executing the build.

Build

GitHub sends an HTTP POST payload to the webhook’s configured URL (highlighted here), which CodeBuild uses to download the latest source code and execute the build phases.

Build project

If you expand the Show all checks option for the GitHub pull request, you’ll see that CodeBuild has completed the build, all checks have passed, and a deep link is provided in Details, which opens the build history in the CodeBuild console.

Pull request

Summary:

In this post, I showed you how to use GitHub as the source provider for your CodeStar projects and how to work with recent commits, issues, and pull requests in the CodeStar dashboard. I also showed you how you can use GitHub pull requests to automatically trigger a build in AWS CodeBuild — specifically, how this functionality makes it easier to collaborate across your team while editing and building your application code with CodeBuild.


About the author:

Balaji Iyer is an Enterprise Consultant for the Professional Services Team at Amazon Web Services. In this role, he has helped several customers successfully navigate their journey to AWS. His specialties include architecting and implementing highly scalable distributed systems, serverless architectures, large scale migrations, operational security, and leading strategic AWS initiatives. Before he joined Amazon, Balaji spent more than a decade building operating systems, big data analytics solutions, mobile services, and web applications. In his spare time, he enjoys experiencing the great outdoors and spending time with his family.

 

timeShift(GrafanaBuzz, 1w) Issue 16

Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2017/10/06/timeshiftgrafanabuzz-1w-issue-16/

Welcome to another issue of TimeShift. In addition to the roundup of articles and plugin updates, we had a big announcement this week – Early Bird tickets to GrafanaCon EU are now available! We’re also accepting CFPs through the end of October, so if you have a topic in mind, don’t wait until the last minute, please send it our way. Speakers who are selected will receive a comped ticket to the conference.


Early Bird Tickets Now Available

We’ve released a limited number of Early Bird tickets before General Admission tickets are available. Take advantage of this discount before they’re sold out!

Get Your Early Bird Ticket Now

Interested in speaking at GrafanaCon? We’re looking for technical and non-tecnical talks of all sizes. Submit a CFP Now.


From the Blogosphere

Get insights into your Azure Cosmos DB: partition heatmaps, OMS, and More: Microsoft recently announced the ability to access a subset of Azure Cosmos DB metrics via Azure Monitor API. Grafana Labs built an Azure Monitor Plugin for Grafana 4.5 to visualize the data.

How to monitor Docker for Mac/Windows: Brian was tired of guessing about the performance of his development machines and test environment. Here, he shows how to monitor Docker with Prometheus to get a better understanding of a dev environment in his quest to monitor all the things.

Prometheus and Grafana to Monitor 10,000 servers: This article covers enokido’s process of choosing a monitoring platform. He identifies three possible solutions, outlines the pros and cons of each, and discusses why he chose Prometheus.

GitLab Monitoring: It’s fascinating to see Grafana dashboards with production data from companies around the world. For instance, we’ve previously highlighted the huge number of dashboards Wikimedia publicly shares. This week, we found that GitLab also has public dashboards to explore.

Monitoring a Docker Swarm Cluster with cAdvisor, InfluxDB and Grafana | The Laboratory: It’s important to know the state of your applications in a scalable environment such as Docker Swarm. This video covers an overview of Docker, VM’s vs. containers, orchestration and how to monitor Docker Swarm.

Introducing Telemetry: Actionable Time Series Data from Counters: Learn how to use counters from mulitple disparate sources, devices, operating systems, and applications to generate actionable time series data.

ofp_sniffer Branch 1.2 (docker/influxdb/grafana) Upcoming Features: This video demo shows off some of the upcoming features for OFP_Sniffer, an OpenFlow sniffer to help network troubleshooting in production networks.


Grafana Plugins

Plugin authors add new features and bugfixes all the time, so it’s important to always keep your plugins up to date. To update plugins from on-prem Grafana, use the Grafana-cli tool, if you are using Hosted Grafana, you can update with 1 click! If you have questions or need help, hit up our community site, where the Grafana team and members of the community are happy to help.

UPDATED PLUGIN

PNP for Nagios Data Source – The latest release for the PNP data source has some fixes and adds a mathematical factor option.

Update

UPDATED PLUGIN

Google Calendar Data Source – This week, there was a small bug fix for the Google Calendar annotations data source.

Update

UPDATED PLUGIN

BT Plugins – Our friends at BT have been busy. All of the BT plugins in our catalog received and update this week. The plugins are the Status Dot Panel, the Peak Report Panel, the Trend Box Panel and the Alarm Box Panel.

Changes include:

  • Custom dashboard links now work in Internet Explorer.
  • The Peak Report panel no longer supports click-to-sort.
  • The Status Dot panel tooltips now look like Grafana tooltips.


This week’s MVC (Most Valuable Contributor)

Each week we highlight some of the important contributions from our amazing open source community. This week, we’d like to recognize a contributor who did a lot of work to improve Prometheus support.

pdoan017
Thanks to Alin Sinpaleanfor his Prometheus PR – that aligns the step and interval parameters. Alin got a lot of feedback from the Prometheus community and spent a lot of time and energy explaining, debating and iterating before the PR was ready.
Thank you!


Grafana Labs is Hiring!

We are passionate about open source software and thrive on tackling complex challenges to build the future. We ship code from every corner of the globe and love working with the community. If this sounds exciting, you’re in luck – WE’RE HIRING!

Check out our Open Positions


Tweet of the Week

We scour Twitter each week to find an interesting/beautiful dashboard and show it off! #monitoringLove

Wow – Excited to be a part of exploring data to find out how Mexico City is evolving.

We Need Your Help!

Do you have a graph that you love because the data is beautiful or because the graph provides interesting information? Please get in touch. Tweet or send us an email with a screenshot, and we’ll tell you about this fun experiment.

Tell Me More


What do you think?

That’s a wrap! How are we doing? Submit a comment on this article below, or post something at our community forum. Help us make these weekly roundups better!

Follow us on Twitter, like us on Facebook, and join the Grafana Labs community.

MPAA Reports Pirate Sites, Hosts and Ad-Networks to US Government

Post Syndicated from Ernesto original https://torrentfreak.com/mpaa-reports-pirate-sites-hosts-and-ad-networks-to-us-government-171004/

Responding to a request from the Office of the US Trade Representative (USTR), the MPAA has submitted an updated list of “notorious markets” that it says promote the illegal distribution of movies and TV-shows.

These annual submissions help to guide the U.S. Government’s position towards foreign countries when it comes to copyright enforcement.

What stands out in the MPAA’s latest overview is that it no longer includes offline markets, only sites and services that are available on the Internet. This suggests that online copyright infringement is seen as a priority.

The MPAA’s report includes more than two dozen alleged pirate sites in various categories. While this is not an exhaustive list, the movie industry specifically highlights some of the worst offenders in various categories.

“Content thieves take advantage of a wide constellation of easy-to-use online technologies, such as direct download and streaming, to create infringing sites and applications, often with the look and feel of legitimate content distributors, luring unsuspecting consumers into piracy,” the MPAA writes.

According to the MPAA, torrent sites remain popular, serving millions of torrents to tens of millions of users at any given time.

The Pirate Bay has traditionally been one of the main targets. Based on data from Alexa and SimilarWeb, the MPAA says that TPB has about 62 million unique visitors per month. The other torrent sites mentioned are 1337x.to, Rarbg.to, Rutracker.org, and Torrentz2.eu.

MPAA calls out torrent sites

The second highlighted category covers various linking and streaming sites. This includes the likes of Fmovies.is, Gostream.is, Primewire.ag, Kinogo.club, MeWatchSeries.to, Movie4k.tv and Repelis.tv.

Direct download sites and video hosting services also get a mention. Nowvideo.sx, Openload.co, Rapidgator.net, Uploaded.net and the Russian social network VK.com. Many of these services refuse to properly process takedown notices, the MPAA claims.

The last category is new and centers around piracy apps. These sites offer mobile applications that allow users to stream pirated content, such as IpPlayBox.tv, MoreTV, 3DBoBoVR, TVBrowser, and KuaiKa, which are particularly popular in Asia.

Aside from listing specific sites, the MPAA also draws the US Government’s attention to the streaming box problem. The report specifically mentions that Kodi-powered boxes are regularly abused for infringing purposes.

“An emerging global threat is streaming piracy which is enabled by piracy devices preloaded with software to illicitly stream movies and television programming and a burgeoning ecosystem of infringing add-ons,” the MPAA notes.

“The most popular software is an open source media player software, Kodi. Although Kodi is not itself unlawful, and does not host or link to unlicensed content, it can be easily configured to direct consumers toward unlicensed films and television shows.”

Pirate streaming boxes

There are more than 750 websites offering infringing devices, the Hollywood group notes, adding that the rapid growth of this problem is startling. Interestingly, the report mentions TVAddons.ag as a “piracy add-on repository,” noting that it’s currently offline. Whether the new TVAddons is also seen a problematic is unclear.

The MPAA also continues its trend of calling out third-party intermediaries, including hosting providers. These companies refuse to take pirate sites offline following complaints, even when the MPAA views them as blatantly violating the law.

“Hosting companies provide the essential infrastructure required to operate a website,” the MPAA writes. “Given the central role of hosting providers in the online ecosystem, it is very concerning that many refuse to take action upon being notified…”

The Hollywood group specifically mentions Private Layer and Netbrella as notorious markets. CDN provider CloudFlare is also named. As a US-based company, the latter can’t be included in the list. However, the MPAA explains that it is often used as an anonymization tool by sites and services that are mentioned in the report.

Another group of intermediaries that play a role in fueling piracy (mentioned for the first time) are advertising networks. The MPAA specifically calls out the Canadian company WWWPromoter, which works with sites such as Primewire.ag, Projectfreetv.at and 123movies.to

“The companies connecting advertisers to infringing websites and inadvertently contribute to the prevalence and prosperity of infringing sites by providing funding to the operators of these sites through advertising revenue,” the MPAA writes.

The MPAA’s full report is available here (pdf). The USTR will use this input above to make up its own list of notorious markets. This will help to identify current threats and call on foreign governments to take appropriate action.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

LOIC Download – Low Orbit Ion Cannon DDoS Booter

Post Syndicated from Darknet original https://www.darknet.org.uk/2017/10/loic-download-low-orbit-ion-cannon-ddos-booter/?utm_source=rss&utm_medium=social&utm_campaign=darknetfeed

LOIC Download – Low Orbit Ion Cannon DDoS Booter

LOIC Download below – Low Orbit Ion Cannon is an Open Source Stress Testing and Denial of Service (DoS or DDoS) attack application written in C#.

It’s an interesting tool in that it’s often used in what are usually classified as political cyber-terrorist attacks against large capitalistic organisations. The hivemind version gives average non-technical users a way to give their bandwidth as a way of supporting a cause they agree with.

Read the rest of LOIC Download – Low Orbit Ion Cannon DDoS Booter now! Only available at Darknet.

EFF Warns Against Abusive Lawsuits Targeting Kodi Add-on Repository

Post Syndicated from Ernesto original https://torrentfreak.com/eff-warns-against-abusive-lawsuits-targeting-kodi-add-on-repository-171002/

The popular Kodi add-on repository TVAddons was dragged into two seperate lawsuits in recent months, in both Canada and the United States.

TV broadcasters such as Bell, Rogers, and Dish accuse the platform of inducing or contributing to copyright infringement by making ‘pirate’ add-ons to the public.

TVAddons itself has always maintained its innocence. A site representative recently told us that they rely on the safe harbor protection laws, available both in the US and Canada, which they believed would shield them from copyright infringement liability for merely distributing add-ons.

“TV ADDONS is not a piracy site, it’s a platform for developers of open source add-ons for the Kodi media center. As a community platform filled with user-generated content, we have always acted in accordance with the law and swiftly complied whenever we received a DMCA takedown notice.”

While both cases are still in an early stage, TVAddons is receiving support from Electronic Frontier Foundation (EFF), who warn against abusive lawsuits targeting neutral add-on distributors.

According to the digital rights group, holding platforms such as TVAddons liable for infringement users may commit after they download an add-on from the site goes too far.

“The lawsuit against TVAddons seeks to skirt that important [safe harbor] protection by arguing that by merely hosting, distributing and promoting Kodi add-ons, the TVAddons administrator is liable for inducing or authorizing copyright infringements later committed using those add-ons.

“This argument, were it to succeed, would create new uncertainty and risk for distributors of any software that could be used to engage in copyright infringement,” EFF adds.

The US case, started by Dish Networks, tries to expand copyright liability according to EFF. This lawsuit also targets the developers of the Zem TV add-on. While the latter may have crossed a line, TVAddons should be protected by the DMCA’s safe harbor when they merely host third-party content.

“Vicarious copyright liability requires that the defendant have the ‘right and ability to supervise’ the conduct of the direct infringer, and benefit financially. Dish claims only that the TVAddons site made ZemTV ‘available for download.’ That’s not enough to show an ability to supervise,” EFF notes.

The complaint in question goes a bit further than the “download” argument alone though. It also accuses TVAddons’ operator of having induced and encouraged Zem TV’s developer to retransmit popular television programs, which is of a different order.

However, EFF informs TorrentFreak that this allegation is not specific enough for a complaint to survive a motion to dismiss. If TVAddons’ operator indeed took some purposeful, knowing action to induce copyright infringement, it should be spelled out, they say.

According to the digital rights group, the goal of the current cases is to expand the borders of copyright infringement liability, calling on copyright holders to stop such abusive lawsuits.

“These lawsuits by big TV incumbents seem to have a few goals: to expand the scope of secondary copyright infringement yet again, to force major Kodi add-on distributors off of the Internet, and to smear and discourage open source, freely configurable media players by focusing on the few bad actors in that ecosystem.

“The courts should reject these expansions of copyright liability, and TV networks should not target neutral platforms and technologies for abusive lawsuits,” EFF concludes.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

EFF: The War on General-Purpose Computing Turns on the Streaming Media Box Community

Post Syndicated from corbet original https://lwn.net/Articles/735166/rss

The EFF highlights
a number of attacks
against distributors of add-ons for the Kodi streaming media system.
These lawsuits by big TV incumbents seem to have a few goals: to
expand the scope of secondary copyright infringement yet again, to force
major Kodi add-on distributors off of the Internet, and to smear and
discourage open source, freely configurable media players by focusing on
the few bad actors in that ecosystem. The courts should reject these
expansions of copyright liability, and TV networks should not target
neutral platforms and technologies for abusive lawsuits.

A message from the (former) OSI President

Post Syndicated from corbet original https://lwn.net/Articles/735122/rss

Alison Randall has sent out a message to the community saying that she is
moving on from the presidency of the Open Source Initiative.
I’m incredibly proud of what the organization has accomplished in that
time, continuing stewardship of the open source license list, and growing
our individual membership and affiliate programs which provide a path for
the entire open source community to have a say in the governance of the
OSI.
” Her replacement will be Simon Phipps.

Microsoft Becomes Sponsor of Open Source Initiative

Post Syndicated from ris original https://lwn.net/Articles/734941/rss

The Open Source Initiative (OSI) has announced that Microsoft has
joined the organization as a Premium Sponsor.
Microsoft’s history with the OSI dates back to 2005 with the submission of the Microsoft Community License, then again in August of 2007 with the submission of the Microsoft Permissive License. For many in the open source software community, it was Microsoft’s release of .NET in 2014 under an open source license that may have first caught their attention. Microsoft has increasingly participated in open source projects and communities as users, contributors, and creators, and has released even more open source products like Visual Studio Code and Typescript.

Open Sourcing Vespa, Yahoo’s Big Data Processing and Serving Engine

Post Syndicated from ris original https://lwn.net/Articles/734926/rss

Oath, parent company of Yahoo, has announced
that it has released Vespa as an open source
project on GitHub.
Building applications increasingly means dealing with huge amounts of data. While developers can use the the Hadoop stack to store and batch process big data, and Storm to stream-process data, these technologies do not help with serving results to end users. Serving is challenging at large scale, especially when it is necessary to make computations quickly over data while a user is waiting, as with applications that feature search, recommendation, and personalization.

By releasing Vespa, we are making it easy for anyone to build applications
that can compute responses to user requests, over large datasets, at real
time and at internet scale – capabilities that up until now, have been
within reach of only a few large companies.” (Thanks to Paul Wise)

TVAddons and ZemTV Operators Named in US Lawsuit

Post Syndicated from Ernesto original https://torrentfreak.com/tvaddons-and-zemtv-operators-named-in-us-lawsuit-170926/

Earlier this year, American satellite and broadcast provider Dish Network targeted two well-known players in the third-party Kodi add-on ecosystem.

In a complaint filed in a federal court in Texas, add-on ZemTV and the TVAddons library were accused of copyright infringement. As a result, both are facing up to $150,000 for each offense.

Initially, the true identities of the defendants unknown and listed as John Does, but an amended complaint that was submitted yesterday reveal their alleged names and hometowns.

The Texas court previously granted subpoenas which allowed Dish to request information from the defendants’ accounts on services including Amazon, Github, Google, Twitter, Facebook and PayPal, which likely helped with the identification.

According to Dish ZemTV was developed by Shahjahan Durrani, who’s based in London, UK. He allegedly controlled and maintained the addon which was used to stream infringing broadcasts of Dish content.

“Durrani developed the ZemTV add-on and managed and operated the ZemTV service. Durrani used the aliases ‘Shani’ and ‘Shani_08′ to communicate with users of the ZemTV service,” the complaint reads.

The owner and operator of TVAddons is listed as Adam Lackman, who resides in Montreal, Canada. This doesn’t really come as a surprise, since Lackman is publicly listed as TVAddons’ owner on Linkedin and was previously named in a Canadian lawsuit.

While both defendants are named, the allegations against them haven’t changed substantially. Both face copyright infringement charges and potentially risk millions of dollars in damages.

Durrani directly infringed Dish’s copyrights by making the streams available, the plaintiffs note. Lackman subsequently profited from this and failed to take any action in response.

“Lackman had the legal right and actual ability to supervise and control this infringing activity because Lackman made the ZemTV add-on, which is necessary to access the ZemTV service, available for download on his websites.

“Lackman refused to take any action to stop the infringement of DISH’s exclusive rights in the programs transmitted through the ZemTV service,” the complaint adds.

TorrentFreak spoke to a TVAddons representative who refutes the copyright infringement allegations. The website sees itself as a platform for user-generated content and cites the DMCA’s safe harbor as a defense.

“TV ADDONS is not a piracy site, it’s a platform for developers of open source add-ons for the Kodi media center. As a community platform filled with user-generated content, we have always acted in accordance with the law and swiftly complied whenever we received a DMCA takedown notice.”

The representative states that it will be very difficult for them to defend themselves against a billion dollar company with unlimited resources, but hopes that the site will prevail.

The new TVAddons

After the original TVAddons.ag domain was seized in the Canadian lawsuit the site returned on TVaddons.co. However, hundreds of allegedly infringing add-ons are no longer listed.

The site previously relied on the DMCA to shield it from liability but apparently, that wasn’t enough. As a result, they now check all submitted add-ons carefully.

“Since complying with the law is clearly not enough to prevent frivolous legal action from being taken against you, we have been forced to implement a more drastic code vetting process,” the TVAddons representative says.

If it’s not entirely clear that an add-on is properly licensed, it won’t be submitted for the time being. This hampers innovation, according to TVAddons, and threatens many communities that rely on user-generated content.

“When you visit any given web site, how can you be certain that every piece of media you see is licensed by the website displaying it? You can assume, but it’s very difficult to be certain. That’s why the DMCA is critical to the existence of online communities.”

Now that both defendants have been named the case will move forward. This may eventually lead to an in-depth discovery process where Dish will try to find more proof that both were knowingly engaging in infringing activity.

Durrani and Lackman, on the other hand, will try to prove their innocence.

A copy of the amended complaint is available here (pdf).

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

[$] Safety-critical realtime with Linux

Post Syndicated from corbet original https://lwn.net/Articles/734694/rss

Doing realtime processing with a general-purpose operating-system like
Linux can be a challenge by itself, but safety-critical realtime processing
ups the ante considerably. During a session at Open Source Summit North
America, Wolfgang Maurer discussed the difficulties involved in this kind
of work and what Linux has to offer.

GitLab 10.0 Released

Post Syndicated from ris original https://lwn.net/Articles/734649/rss

GitLab 10.0 has been released. “With every monthly release of GitLab, we introduce new capabilities and improve our existing features. GitLab 10.0 is no exception and includes numerous new additions, such as the ability to automatically resolve outdated merge request discussions, improvements to subgroups, and an API for Wiki thanks to a contribution from our open source community.

timeShift(GrafanaBuzz, 1w) Issue 14

Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2017/09/22/timeshiftgrafanabuzz-1w-issue-14/

Summer is officially in the rear-view mirror, but we at Grafana Labs are excited. Next week, the team will gather in Stockholm, Sweden where we’ll be discussing Grafana 5.0, GrafanaCon EU and setting other goals. If you’re attending Percona Live Europe 2017 in Dublin, be sure and catch Grafana developer, Daniel Lee on Tuesday, September 26. He’ll be showing off the new MySQL data source and a sneak peek of Grafana 5.0.

And with that – we hope you enjoy this issue of TimeShift!


Latest Release

Grafana 4.5.2 is now available! Various fixes to the Graphite data source, HTTP API, and templating.

To see details on what’s been fixed in the newest version, please see the release notes.

Download Grafana 4.5.2 Now


From the Blogosphere

A Monitoring Solution for Docker Hosts, Containers and Containerized Services: Stefan was searching for an open source, self-hosted monitoring solution. With an ever-growing number of open source TSDBs, Stefan outlines why he chose Prometheus and provides a rundown of how he’s monitoring his Docker hosts, containers and services.

Real-time API Performance Monitoring with ES, Beats, Logstash and Grafana: As APIs become a centerpiece for businesses, monitoring API performance is extremely important. Hiren recently configured real time API response time monitoring for a project and shares his implementation plan and configurations.

Monitoring SSL Certificate Expiry in GCP and Kubernetes: This article discusses how to use Prometheus and Grafana to automatically monitor SSL certificates in use by load balancers across GCP projects.

Node.js Performance Monitoring with Prometheus: This is a good primer for monitoring in general. It discusses what monitoring is, important signals to know, instrumentation, and things to consider when selecting a monitoring tool.

DIY Dashboard with Grafana and MariaDB: Mark was interested in testing out the new beta MySQL support in Grafana, so he wrote a short article on how he is using Grafana with MariaDB.

Collecting Temperature Data with Raspberry Pi Computers: Many of us use monitoring for tracking mission-critical systems, but setting up environment monitoring can be a fun way to improve your programming skills as well.


GrafanaCon EU CFP is Open

Have a big idea to share? A shorter talk or a demo you’d like to show off? We’re looking for technical and non-technical talks of all sizes. The proposals are rolling in, but we are happy to save a speaking slot for you!

I’d Like to Speak at GrafanaCon


Grafana Plugins

There were a lot of plugin updates to highlight this week, many of which were due to changes in Grafana 4.5. It’s important to keep your plugins up to date, since bug fixes and new features are added frequently. We’ve made the process of installing and updating plugins simple. On an on-prem instance, use the Grafana-cli, or on Hosted Grafana, install and update with 1-click.

NEW PLUGIN

Linksmart HDS Data Source – The LinkSmart Historical Data Store is a new Grafana data source plugin. LinkSmart is an open source IoT platform for developing IoT applications. IoT applications need to deal with large amounts of data produced by a growing number of sensors and other devices. The Historical Datastore is for storing, querying, and aggregating (time-series) sensor data.

Install Now

UPDATED PLUGIN

Simple JSON Data Source – This plugin received a bug fix for the query editor.

Update Now

UPDATED PLUGIN

Stagemonitor Elasticsearch App – Numerous small updates and the version updated to match the StageMonitor version number.

Update Now

UPDATED PLUGIN

Discrete Panel – Update to fix breaking change in Grafana 4.5.

Update Now

UPDATED PLUGIN

Status Dot Panel – Minor HTML Update in this version.

Update Now

UPDATED PLUGIN

Alarm Box Panel – This panel was updated to fix breaking changes in Grafana 4.5.

Update Now


This week’s MVC (Most Valuable Contributor)

Each week we highlight a contributor to Grafana or the surrounding ecosystem as a thank you for their participation in making open source software great.

Sven Klemm opened a PR for adding a new Postgres data source and has been very quick at implementing proposed changes. The Postgres data source is on our roadmap for Grafana 5.0 so this PR really helps. Thanks Sven!


Tweet of the Week

We scour Twitter each week to find an interesting/beautiful dashboard and show it off! #monitoringLove

Glad you’re finding Grafana useful! Curious about that annotation just before midnight 🙂

We Need Your Help

Last week we announced an experiment we were conducting, and need your help! Do you have a graph that you love because the data is beautiful or because the graph provides interesting information? Please get in touch. Tweet or send us an email with a screenshot, and we’ll tell you about this fun experiment.

I Want to Help


Grafana Labs is Hiring!

We are passionate about open source software and thrive on tackling complex challenges to build the future. We ship code from every corner of the globe and love working with the community. If this sounds exciting, you’re in luck – WE’RE HIRING!

Check out our Open Positions


What do you think?

What would you like to see here? Submit a comment on this article below, or post something at our community forum. Help us make these weekly roundups better!

Follow us on Twitter, like us on Facebook, and join the Grafana Labs community.