Tag Archives: 3.0

Don Jr.: I’ll bite

Post Syndicated from Robert Graham original http://blog.erratasec.com/2017/11/don-jr-ill-bite.html

So Don Jr. tweets the following, which is an excellent troll. So I thought I’d bite. The reason is I just got through debunk Democrat claims about NetNeutrality, so it seems like a good time to balance things out and debunk Trump nonsense.

The issue here is not which side is right. The issue here is whether you stand for truth, or whether you’ll seize any factoid that appears to support your side, regardless of the truthfulness of it. The ACLU obviously chose falsehoods, as I documented. In the following tweet, Don Jr. does the same.

It’s a preview of the hyperpartisan debates are you are likely to have across the dinner table tomorrow, which each side trying to outdo the other in the false-hoods they’ll claim.

What we see in this number is a steady trend of these statistics since the Great Recession, with no evidence in the graphs showing how Trump has influenced these numbers, one way or the other.

Stock markets at all time highs

This is true, but it’s obviously not due to Trump. The stock markers have been steadily rising since the Great Recession. Trump has done nothing substantive to change the market trajectory. Also, he hasn’t inspired the market to change it’s direction.
To be fair to Don Jr., we’ve all been crediting (or blaming) presidents for changes in the stock market despite the fact they have almost no influence over it. Presidents don’t run the economy, it’s an inappropriate conceit. The most influence they’ve had is in harming it.

Lowest jobless claims since 73

Again, let’s graph this:

As we can see, jobless claims have been on a smooth downward trajectory since the Great Recession. It’s difficult to see here how President Trump has influenced these numbers.

6 Trillion added to the economy

What he’s referring to is that assets have risen in value, like the stock market, homes, gold, and even Bitcoin.
But this is a well known fallacy known as Mercantilism, believing the “economy” is measured by the value of its assets. This was debunked by Adam Smith in his book “The Wealth of Nations“, where he showed instead the the “economy” is measured by how much it produces (GDP – Gross Domestic Product) and not assets.
GDP has grown at 3.0%, which is pretty good compared to the long term trend, and is better than Europe or Japan (though not as good as China). But Trump doesn’t deserve any credit for this — today’s rise in GDP is the result of stuff that happened years ago.
Assets have risen by $6 trillion, but that’s not a good thing. After all, when you sell your home for more money, the buyer has to pay more. So one person is better off and one is worse off, so the net effect is zero.
Actually, such asset price increase is a worrisome indicator — we are entering into bubble territory. It’s the result of a loose monetary policy, low interest rates and “quantitative easing” that was designed under the Obama administration to stimulate the economy. That’s why all assets are rising in value. Normally, a rise in one asset means a fall in another, like selling gold to pay for houses. But because of loose monetary policy, all assets are increasing in price. The amazing rise in Bitcoin over the last year is as much a result of this bubble growing in all assets as it is to an exuberant belief in Bitcoin.
When this bubble collapses, which may happen during Trump’s term, it’ll really be the Obama administration who is to blame. I mean, if Trump is willing to take credit for the asset price bubble now, I’m willing to give it to him, as long as he accepts the blame when it crashes.

1.5 million fewer people on food stamps

As you’d expect, I’m going to debunk this with a graph: the numbers have been falling since the great recession. Indeed, in the previous period under Obama, 1.9 fewer people got off food stamps, so Trump’s performance is slight ahead rather than behind Obama. Of course, neither president is really responsible.

Consumer confidence through the roof

Again we are going to graph this number:

Again we find nothing in the graph that suggests President Trump is responsible for any change — it’s been improving steadily since the Great Recession.

One thing to note is that, technically, it’s not “through the roof” — it still quite a bit below the roof set during the dot-com era.

Lowest Unemployment rate in 17 years

Again, let’s simply graph it over time and look for Trump’s contribution. as we can see, there doesn’t appear to be anything special Trump has done — unemployment has steadily been improving since the Great Recession.
But here’s the thing, the “unemployment rate” only measures those looking for work, not those who have given up. The number that concerns people more is the “labor force participation rate”. The Great Recession kicked a lot of workers out of the economy.
Mostly this is because Baby Boomer are now retiring an leaving the workforce, and some have chosen to retire early rather than look for another job. But there are still some other problems in our economy that cause this. President Trump has nothing particular in order to solve these problems.

Conclusion

As we see, Don Jr’s tweet is a troll. When we look at the graphs of these indicators going back to the Great Recession, we don’t see how President Trump has influenced anything. The improvements this year are in line with the improvements last year, which are in turn inline with the improvements in the previous year.
To be fair, all parties credit their President with improvements during their term. President Obama’s supporters did the same thing. But at least right now, with these numbers, we can see that there’s no merit to anything in Don Jr’s tweet.
The hyperpartisan rancor in this country is because neither side cares about the facts. We should care. We should care that these numbers suck, even if we are Republicans. Conversely, we should care that those NetNeutrality claims by Democrats suck, even if we are Democrats.

Now Available – Compute-Intensive C5 Instances for Amazon EC2

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/now-available-compute-intensive-c5-instances-for-amazon-ec2/

I’m thrilled to announce that the new compute-intensive C5 instances are available today in six sizes for launch in three AWS regions!

These instances designed for compute-heavy applications like batch processing, distributed analytics, high-performance computing (HPC), ad serving, highly scalable multiplayer gaming, and video encoding. The new instances offer a 25% price/performance improvement over the C4 instances, with over 50% for some workloads. They also have additional memory per vCPU, and (for code that can make use of the new AVX-512 instructions), twice the performance for vector and floating point workloads.

Over the years we have been working non-stop to provide our customers with the best possible networking, storage, and compute performance, with a long-term focus on offloading many types of work to dedicated hardware designed and built by AWS. The C5 instance type incorporates the latest generation of our hardware offloads, and also takes another big step forward with the addition of a new hypervisor that runs hand-in-glove with our hardware. The new hypervisor allows us to give you access to all of the processing power provided by the host hardware, while also making performance even more consistent and further raising the bar on security. We’ll be sharing many technical details about it at AWS re:Invent.

The New Instances
The C5 instances are available in six sizes:

Instance NamevCPUs
RAM
EBS BandwidthNetwork Bandwidth
c5.large24 GiBUp to 2.25 GbpsUp to 10 Gbps
c5.xlarge48 GiBUp to 2.25 GbpsUp to 10 Gbps
c5.2xlarge816 GiBUp to 2.25 GbpsUp to 10 Gbps
c5.4xlarge1632 GiB2.25 GbpsUp to 10 Gbps
c5.9xlarge3672 GiB4.5 Gbps10 Gbps
c5.18xlarge72144 GiB9 Gbps25 Gbps

Each vCPU is a hardware hyperthread on a 3.0 GHz Intel Xeon Platinum 8000-series processor. This custom processor, optimized for EC2, allows you have full control over the C-states on the two largest sizes, allowing you to run a single core at up to 3.5 GHz using Intel Turbo Boost Technology.

As you can see from the table, the four smallest instance sizes offer substantially more EBS and network bandwidth than the previous generation of compute-intensive instances.

Because all networking and storage functionality is implemented in hardware, C5 instances require HVM AMIs that include drivers for the Elastic Network Adapter (ENA) and NVMe. The latest Amazon Linux, Microsoft Windows, Ubuntu, RHEL, CentOS, SLES, Debian, and FreeBSD AMIs all support C5 instances. If you are doing machine learning inferencing, or other compute-intensive work, be sure to check out the most recent version of the Intel Math Kernel Library. It has been optimized for the Intel® Xeon® Platinum processor and has the potential to greatly accelerate your work.

In order to remain compatible with instances that use the Xen hypervisor, the device names for EBS volumes will continue to use the existing /dev/sd and /dev/xvd prefixes. The device name that you provide when you attach a volume to an instance is not used because the NVMe driver assigns its own device name (read Amazon EBS and NVMe to learn more):

The nvme command displays additional information about each volume (install it using sudo yum -y install nvme-cli if necessary):

The SN field in the output can be mapped to an EBS volume ID by inserting a “-” after the “vol” prefix (sadly, the NVMe SN field is not long enough to store the entire ID). Here’s a simple script that uses this information to create an EBS snapshot of each attached volume:

$ sudo nvme list | \
  awk '/dev/ {print(gensub("vol", "vol-", 1, $2))}' | \
  xargs -n 1 aws ec2 create-snapshot --volume-id

With a little more work (and a lot of testing), you could create a script that expands EBS volumes that are getting full.

Getting to C5
As I mentioned earlier, our effort to offload work to hardware accelerators has been underway for quite some time. Here’s a recap:

CC1 – Launched in 2010, the CC1 was designed to support scale-out HPC applications. It was the first EC2 instance to support 10 Gbps networking and one of the first to support HVM virtualization. The network fabric that we designed for the CC1 (based on our own switch hardware) has become the standard for all AWS data centers.

C3 – Launched in 2013, the C3 introduced Enhanced Networking and uses dedicated hardware accelerators to support the software defined network inside of each Virtual Private Cloud (VPC). Hardware virtualization removes the I/O stack from the hypervisor in favor of direct access by the guest OS, resulting in higher performance and reduced variability.

C4 – Launched in 2015, the C4 instances are EBS Optimized by default via a dedicated network connection, and also offload EBS processing (including CPU-intensive crypto operations for encrypted EBS volumes) to a hardware accelerator.

C5 – Launched today, the hypervisor that powers the C5 instances allow practically all of the resources of the host CPU to be devoted to customer instances. The ENA networking and the NVMe interface to EBS are both powered by hardware accelerators. The instances do not require (or support) the Xen paravirtual networking or block device drivers, both of which have been removed in order to increase efficiency.

Going forward, we’ll use this hypervisor to power other instance types and plan to share additional technical details in a set of AWS re:Invent sessions.

Launch a C5 Today
You can launch C5 instances today in the US East (Northern Virginia), US West (Oregon), and EU (Ireland) Regions in On-Demand and Spot form (Reserved Instances are also available), with additional Regions in the works.

One quick note before I go: The current NVMe driver is not optimized for high-performance sequential workloads and we don’t recommend the use of C5 instances in conjunction with sc1 or st1 volumes. We are aware of this issue and have been working to optimize the driver for this important use case.

Jeff;

Predict Billboard Top 10 Hits Using RStudio, H2O and Amazon Athena

Post Syndicated from Gopal Wunnava original https://aws.amazon.com/blogs/big-data/predict-billboard-top-10-hits-using-rstudio-h2o-and-amazon-athena/

Success in the popular music industry is typically measured in terms of the number of Top 10 hits artists have to their credit. The music industry is a highly competitive multi-billion dollar business, and record labels incur various costs in exchange for a percentage of the profits from sales and concert tickets.

Predicting the success of an artist’s release in the popular music industry can be difficult. One release may be extremely popular, resulting in widespread play on TV, radio and social media, while another single may turn out quite unpopular, and therefore unprofitable. Record labels need to be selective in their decision making, and predictive analytics can help them with decision making around the type of songs and artists they need to promote.

In this walkthrough, you leverage H2O.ai, Amazon Athena, and RStudio to make predictions on whether a song might make it to the Top 10 Billboard charts. You explore the GLM, GBM, and deep learning modeling techniques using H2O’s rapid, distributed and easy-to-use open source parallel processing engine. RStudio is a popular IDE, licensed either commercially or under AGPLv3, for working with R. This is ideal if you don’t want to connect to a server via SSH and use code editors such as vi to do analytics. RStudio is available in a desktop version, or a server version that allows you to access R via a web browser. RStudio’s Notebooks feature is used to demonstrate the execution of code and output. In addition, this post showcases how you can leverage Athena for query and interactive analysis during the modeling phase. A working knowledge of statistics and machine learning would be helpful to interpret the analysis being performed in this post.

Walkthrough

Your goal is to predict whether a song will make it to the Top 10 Billboard charts. For this purpose, you will be using multiple modeling techniques―namely GLM, GBM and deep learning―and choose the model that is the best fit.

This solution involves the following steps:

  • Install and configure RStudio with Athena
  • Log in to RStudio
  • Install R packages
  • Connect to Athena
  • Create a dataset
  • Create models

Install and configure RStudio with Athena

Use the following AWS CloudFormation stack to install, configure, and connect RStudio on an Amazon EC2 instance with Athena.

Launching this stack creates all required resources and prerequisites:

  • Amazon EC2 instance with Amazon Linux (minimum size of t2.large is recommended)
  • Provisioning of the EC2 instance in an existing VPC and public subnet
  • Installation of Java 8
  • Assignment of an IAM role to the EC2 instance with the required permissions for accessing Athena and Amazon S3
  • Security group allowing access to the RStudio and SSH ports from the internet (I recommend restricting access to these ports)
  • S3 staging bucket required for Athena (referenced within RStudio as ATHENABUCKET)
  • RStudio username and password
  • Setup logs in Amazon CloudWatch Logs (if needed for additional troubleshooting)
  • Amazon EC2 Systems Manager agent, which makes it easy to manage and patch

All AWS resources are created in the US-East-1 Region. To avoid cross-region data transfer fees, launch the CloudFormation stack in the same region. To check the availability of Athena in other regions, see Region Table.

Log in to RStudio

The instance security group has been automatically configured to allow incoming connections on the RStudio port 8787 from any source internet address. You can edit the security group to restrict source IP access. If you have trouble connecting, ensure that port 8787 isn’t blocked by subnet network ACLS or by your outgoing proxy/firewall.

  1. In the CloudFormation stack, choose Outputs, Value, and then open the RStudio URL. You might need to wait for a few minutes until the instance has been launched.
  2. Log in to RStudio with the and password you provided during setup.

Install R packages

Next, install the required R packages from the RStudio console. You can download the R notebook file containing just the code.

#install pacman – a handy package manager for managing installs
if("pacman" %in% rownames(installed.packages()) == FALSE)
{install.packages("pacman")}  
library(pacman)
p_load(h2o,rJava,RJDBC,awsjavasdk)
h2o.init(nthreads = -1)
##  Connection successful!
## 
## R is connected to the H2O cluster: 
##     H2O cluster uptime:         2 hours 42 minutes 
##     H2O cluster version:        3.10.4.6 
##     H2O cluster version age:    4 months and 4 days !!! 
##     H2O cluster name:           H2O_started_from_R_rstudio_hjx881 
##     H2O cluster total nodes:    1 
##     H2O cluster total memory:   3.30 GB 
##     H2O cluster total cores:    4 
##     H2O cluster allowed cores:  4 
##     H2O cluster healthy:        TRUE 
##     H2O Connection ip:          localhost 
##     H2O Connection port:        54321 
##     H2O Connection proxy:       NA 
##     H2O Internal Security:      FALSE 
##     R Version:                  R version 3.3.3 (2017-03-06)
## Warning in h2o.clusterInfo(): 
## Your H2O cluster version is too old (4 months and 4 days)!
## Please download and install the latest version from http://h2o.ai/download/
#install aws sdk if not present (pre-requisite for using Athena with an IAM role)
if (!aws_sdk_present()) {
  install_aws_sdk()
}

load_sdk()
## NULL

Connect to Athena

Next, establish a connection to Athena from RStudio, using an IAM role associated with your EC2 instance. Use ATHENABUCKET to specify the S3 staging directory.

URL <- 'https://s3.amazonaws.com/athena-downloads/drivers/AthenaJDBC41-1.0.1.jar'
fil <- basename(URL)
#download the file into current working directory
if (!file.exists(fil)) download.file(URL, fil)
#verify that the file has been downloaded successfully
list.files()
## [1] "AthenaJDBC41-1.0.1.jar"
drv <- JDBC(driverClass="com.amazonaws.athena.jdbc.AthenaDriver", fil, identifier.quote="'")

con <- jdbcConnection <- dbConnect(drv, 'jdbc:awsathena://athena.us-east-1.amazonaws.com:443/',
                                   s3_staging_dir=Sys.getenv("ATHENABUCKET"),
                                   aws_credentials_provider_class="com.amazonaws.auth.DefaultAWSCredentialsProviderChain")

Verify the connection. The results returned depend on your specific Athena setup.

con
## <JDBCConnection>
dbListTables(con)
##  [1] "gdelt"               "wikistats"           "elb_logs_raw_native"
##  [4] "twitter"             "twitter2"            "usermovieratings"   
##  [7] "eventcodes"          "events"              "billboard"          
## [10] "billboardtop10"      "elb_logs"            "gdelthist"          
## [13] "gdeltmaster"         "twitter"             "twitter3"

Create a dataset

For this analysis, you use a sample dataset combining information from Billboard and Wikipedia with Echo Nest data in the Million Songs Dataset. Upload this dataset into your own S3 bucket. The table below provides a description of the fields used in this dataset.

Field Description
yearYear that song was released
songtitleTitle of the song
artistnameName of the song artist
songidUnique identifier for the song
artistidUnique identifier for the song artist
timesignatureVariable estimating the time signature of the song
timesignature_confidenceConfidence in the estimate for the timesignature
loudnessContinuous variable indicating the average amplitude of the audio in decibels
tempoVariable indicating the estimated beats per minute of the song
tempo_confidenceConfidence in the estimate for tempo
keyVariable with twelve levels indicating the estimated key of the song (C, C#, B)
key_confidenceConfidence in the estimate for key
energyVariable that represents the overall acoustic energy of the song, using a mix of features such as loudness
pitchContinuous variable that indicates the pitch of the song
timbre_0_min thru timbre_11_minVariables that indicate the minimum values over all segments for each of the twelve values in the timbre vector
timbre_0_max thru timbre_11_maxVariables that indicate the maximum values over all segments for each of the twelve values in the timbre vector
top10Indicator for whether or not the song made it to the Top 10 of the Billboard charts (1 if it was in the top 10, and 0 if not)

Create an Athena table based on the dataset

In the Athena console, select the default database, sampled, or create a new database.

Run the following create table statement.

create external table if not exists billboard
(
year int,
songtitle string,
artistname string,
songID string,
artistID string,
timesignature int,
timesignature_confidence double,
loudness double,
tempo double,
tempo_confidence double,
key int,
key_confidence double,
energy double,
pitch double,
timbre_0_min double,
timbre_0_max double,
timbre_1_min double,
timbre_1_max double,
timbre_2_min double,
timbre_2_max double,
timbre_3_min double,
timbre_3_max double,
timbre_4_min double,
timbre_4_max double,
timbre_5_min double,
timbre_5_max double,
timbre_6_min double,
timbre_6_max double,
timbre_7_min double,
timbre_7_max double,
timbre_8_min double,
timbre_8_max double,
timbre_9_min double,
timbre_9_max double,
timbre_10_min double,
timbre_10_max double,
timbre_11_min double,
timbre_11_max double,
Top10 int
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION 's3://aws-bigdata-blog/artifacts/predict-billboard/data'
;

Inspect the table definition for the ‘billboard’ table that you have created. If you chose a database other than sampledb, replace that value with your choice.

dbGetQuery(con, "show create table sampledb.billboard")
##                                      createtab_stmt
## 1       CREATE EXTERNAL TABLE `sampledb.billboard`(
## 2                                       `year` int,
## 3                               `songtitle` string,
## 4                              `artistname` string,
## 5                                  `songid` string,
## 6                                `artistid` string,
## 7                              `timesignature` int,
## 8                `timesignature_confidence` double,
## 9                                `loudness` double,
## 10                                  `tempo` double,
## 11                       `tempo_confidence` double,
## 12                                       `key` int,
## 13                         `key_confidence` double,
## 14                                 `energy` double,
## 15                                  `pitch` double,
## 16                           `timbre_0_min` double,
## 17                           `timbre_0_max` double,
## 18                           `timbre_1_min` double,
## 19                           `timbre_1_max` double,
## 20                           `timbre_2_min` double,
## 21                           `timbre_2_max` double,
## 22                           `timbre_3_min` double,
## 23                           `timbre_3_max` double,
## 24                           `timbre_4_min` double,
## 25                           `timbre_4_max` double,
## 26                           `timbre_5_min` double,
## 27                           `timbre_5_max` double,
## 28                           `timbre_6_min` double,
## 29                           `timbre_6_max` double,
## 30                           `timbre_7_min` double,
## 31                           `timbre_7_max` double,
## 32                           `timbre_8_min` double,
## 33                           `timbre_8_max` double,
## 34                           `timbre_9_min` double,
## 35                           `timbre_9_max` double,
## 36                          `timbre_10_min` double,
## 37                          `timbre_10_max` double,
## 38                          `timbre_11_min` double,
## 39                          `timbre_11_max` double,
## 40                                     `top10` int)
## 41                             ROW FORMAT DELIMITED 
## 42                         FIELDS TERMINATED BY ',' 
## 43                            STORED AS INPUTFORMAT 
## 44       'org.apache.hadoop.mapred.TextInputFormat' 
## 45                                     OUTPUTFORMAT 
## 46  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
## 47                                        LOCATION
## 48    's3://aws-bigdata-blog/artifacts/predict-billboard/data'
## 49                                  TBLPROPERTIES (
## 50            'transient_lastDdlTime'='1505484133')

Run a sample query

Next, run a sample query to obtain a list of all songs from Janet Jackson that made it to the Billboard Top 10 charts.

dbGetQuery(con, " SELECT songtitle,artistname,top10   FROM sampledb.billboard WHERE lower(artistname) =     'janet jackson' AND top10 = 1")
##                       songtitle    artistname top10
## 1                       Runaway Janet Jackson     1
## 2               Because Of Love Janet Jackson     1
## 3                         Again Janet Jackson     1
## 4                            If Janet Jackson     1
## 5  Love Will Never Do (Without You) Janet Jackson 1
## 6                     Black Cat Janet Jackson     1
## 7               Come Back To Me Janet Jackson     1
## 8                       Alright Janet Jackson     1
## 9                      Escapade Janet Jackson     1
## 10                Rhythm Nation Janet Jackson     1

Determine how many songs in this dataset are specifically from the year 2010.

dbGetQuery(con, " SELECT count(*)   FROM sampledb.billboard WHERE year = 2010")
##   _col0
## 1   373

The sample dataset provides certain song properties of interest that can be analyzed to gauge the impact to the song’s overall popularity. Look at one such property, timesignature, and determine the value that is the most frequent among songs in the database. Timesignature is a measure of the number of beats and the type of note involved.

Running the query directly may result in an error, as shown in the commented lines below. This error is a result of trying to retrieve a large result set over a JDBC connection, which can cause out-of-memory issues at the client level. To address this, reduce the fetch size and run again.

#t<-dbGetQuery(con, " SELECT timesignature FROM sampledb.billboard")
#Note:  Running the preceding query results in the following error: 
#Error in .jcall(rp, "I", "fetch", stride, block): java.sql.SQLException: The requested #fetchSize is more than the allowed value in Athena. Please reduce the fetchSize and try #again. Refer to the Athena documentation for valid fetchSize values.
# Use the dbSendQuery function, reduce the fetch size, and run again
r <- dbSendQuery(con, " SELECT timesignature     FROM sampledb.billboard")
dftimesignature<- fetch(r, n=-1, block=100)
dbClearResult(r)
## [1] TRUE
table(dftimesignature)
## dftimesignature
##    0    1    3    4    5    7 
##   10  143  503 6787  112   19
nrow(dftimesignature)
## [1] 7574

From the results, observe that 6787 songs have a timesignature of 4.

Next, determine the song with the highest tempo.

dbGetQuery(con, " SELECT songtitle,artistname,tempo   FROM sampledb.billboard WHERE tempo = (SELECT max(tempo) FROM sampledb.billboard) ")
##                   songtitle      artistname   tempo
## 1 Wanna Be Startin' Somethin' Michael Jackson 244.307

Create the training dataset

Your model needs to be trained such that it can learn and make accurate predictions. Split the data into training and test datasets, and create the training dataset first.  This dataset contains all observations from the year 2009 and earlier. You may face the same JDBC connection issue pointed out earlier, so this query uses a fetch size.

#BillboardTrain <- dbGetQuery(con, "SELECT * FROM sampledb.billboard WHERE year <= 2009")
#Running the preceding query results in the following error:-
#Error in .verify.JDBC.result(r, "Unable to retrieve JDBC result set for ", : Unable to retrieve #JDBC result set for SELECT * FROM sampledb.billboard WHERE year <= 2009 (Internal error)
#Follow the same approach as before to address this issue.

r <- dbSendQuery(con, "SELECT * FROM sampledb.billboard WHERE year <= 2009")
BillboardTrain <- fetch(r, n=-1, block=100)
dbClearResult(r)
## [1] TRUE
BillboardTrain[1:2,c(1:3,6:10)]
##   year           songtitle artistname timesignature
## 1 2009 The Awkward Goodbye    Athlete             3
## 2 2009        Rubik's Cube    Athlete             3
##   timesignature_confidence loudness   tempo tempo_confidence
## 1                    0.732   -6.320  89.614   0.652
## 2                    0.906   -9.541 117.742   0.542
nrow(BillboardTrain)
## [1] 7201

Create the test dataset

BillboardTest <- dbGetQuery(con, "SELECT * FROM sampledb.billboard where year = 2010")
BillboardTest[1:2,c(1:3,11:15)]
##   year              songtitle        artistname key
## 1 2010 This Is the House That Doubt Built A Day to Remember  11
## 2 2010        Sticks & Bricks A Day to Remember  10
##   key_confidence    energy pitch timbre_0_min
## 1          0.453 0.9666556 0.024        0.002
## 2          0.469 0.9847095 0.025        0.000
nrow(BillboardTest)
## [1] 373

Convert the training and test datasets into H2O dataframes

train.h2o <- as.h2o(BillboardTrain)
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=================================================================| 100%
test.h2o <- as.h2o(BillboardTest)
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=================================================================| 100%

Inspect the column names in your H2O dataframes.

colnames(train.h2o)
##  [1] "year"                     "songtitle"               
##  [3] "artistname"               "songid"                  
##  [5] "artistid"                 "timesignature"           
##  [7] "timesignature_confidence" "loudness"                
##  [9] "tempo"                    "tempo_confidence"        
## [11] "key"                      "key_confidence"          
## [13] "energy"                   "pitch"                   
## [15] "timbre_0_min"             "timbre_0_max"            
## [17] "timbre_1_min"             "timbre_1_max"            
## [19] "timbre_2_min"             "timbre_2_max"            
## [21] "timbre_3_min"             "timbre_3_max"            
## [23] "timbre_4_min"             "timbre_4_max"            
## [25] "timbre_5_min"             "timbre_5_max"            
## [27] "timbre_6_min"             "timbre_6_max"            
## [29] "timbre_7_min"             "timbre_7_max"            
## [31] "timbre_8_min"             "timbre_8_max"            
## [33] "timbre_9_min"             "timbre_9_max"            
## [35] "timbre_10_min"            "timbre_10_max"           
## [37] "timbre_11_min"            "timbre_11_max"           
## [39] "top10"

Create models

You need to designate the independent and dependent variables prior to applying your modeling algorithms. Because you’re trying to predict the ‘top10’ field, this would be your dependent variable and everything else would be independent.

Create your first model using GLM. Because GLM works best with numeric data, you create your model by dropping non-numeric variables. You only use the variables in the dataset that describe the numerical attributes of the song in the logistic regression model. You won’t use these variables:  “year”, “songtitle”, “artistname”, “songid”, or “artistid”.

y.dep <- 39
x.indep <- c(6:38)
x.indep
##  [1]  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
## [24] 29 30 31 32 33 34 35 36 37 38

Create Model 1: All numeric variables

Create Model 1 with the training dataset, using GLM as the modeling algorithm and H2O’s built-in h2o.glm function.

modelh1 <- h2o.glm( y = y.dep, x = x.indep, training_frame = train.h2o, family = "binomial")
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=====                                                            |   8%
  |                                                                       
  |=================================================================| 100%

Measure the performance of Model 1, using H2O’s built-in performance function.

h2o.performance(model=modelh1,newdata=test.h2o)
## H2OBinomialMetrics: glm
## 
## MSE:  0.09924684
## RMSE:  0.3150347
## LogLoss:  0.3220267
## Mean Per-Class Error:  0.2380168
## AUC:  0.8431394
## Gini:  0.6862787
## R^2:  0.254663
## Null Deviance:  326.0801
## Residual Deviance:  240.2319
## AIC:  308.2319
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##          0   1    Error     Rate
## 0      255  59 0.187898  =59/314
## 1       17  42 0.288136   =17/59
## Totals 272 101 0.203753  =76/373
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                         metric threshold    value idx
## 1                       max f1  0.192772 0.525000 100
## 2                       max f2  0.124912 0.650510 155
## 3                 max f0point5  0.416258 0.612903  23
## 4                 max accuracy  0.416258 0.879357  23
## 5                max precision  0.813396 1.000000   0
## 6                   max recall  0.037579 1.000000 282
## 7              max specificity  0.813396 1.000000   0
## 8             max absolute_mcc  0.416258 0.455251  23
## 9   max min_per_class_accuracy  0.161402 0.738854 125
## 10 max mean_per_class_accuracy  0.124912 0.765006 155
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or ` 
h2o.auc(h2o.performance(modelh1,test.h2o)) 
## [1] 0.8431394

The AUC metric provides insight into how well the classifier is able to separate the two classes. In this case, the value of 0.8431394 indicates that the classification is good. (A value of 0.5 indicates a worthless test, while a value of 1.0 indicates a perfect test.)

Next, inspect the coefficients of the variables in the dataset.

dfmodelh1 <- as.data.frame(h2o.varimp(modelh1))
dfmodelh1
##                       names coefficients sign
## 1              timbre_0_max  1.290938663  NEG
## 2                  loudness  1.262941934  POS
## 3                     pitch  0.616995941  NEG
## 4              timbre_1_min  0.422323735  POS
## 5              timbre_6_min  0.349016024  NEG
## 6                    energy  0.348092062  NEG
## 7             timbre_11_min  0.307331997  NEG
## 8              timbre_3_max  0.302225619  NEG
## 9             timbre_11_max  0.243632060  POS
## 10             timbre_4_min  0.224233951  POS
## 11             timbre_4_max  0.204134342  POS
## 12             timbre_5_min  0.199149324  NEG
## 13             timbre_0_min  0.195147119  POS
## 14 timesignature_confidence  0.179973904  POS
## 15         tempo_confidence  0.144242598  POS
## 16            timbre_10_max  0.137644568  POS
## 17             timbre_7_min  0.126995955  NEG
## 18            timbre_10_min  0.123851179  POS
## 19             timbre_7_max  0.100031481  NEG
## 20             timbre_2_min  0.096127636  NEG
## 21           key_confidence  0.083115820  POS
## 22             timbre_6_max  0.073712419  POS
## 23            timesignature  0.067241917  POS
## 24             timbre_8_min  0.061301881  POS
## 25             timbre_8_max  0.060041698  POS
## 26                      key  0.056158445  POS
## 27             timbre_3_min  0.050825116  POS
## 28             timbre_9_max  0.033733561  POS
## 29             timbre_2_max  0.030939072  POS
## 30             timbre_9_min  0.020708113  POS
## 31             timbre_1_max  0.014228818  NEG
## 32                    tempo  0.008199861  POS
## 33             timbre_5_max  0.004837870  POS
## 34                                    NA <NA>

Typically, songs with heavier instrumentation tend to be louder (have higher values in the variable “loudness”) and more energetic (have higher values in the variable “energy”). This knowledge is helpful for interpreting the modeling results.

You can make the following observations from the results:

  • The coefficient estimates for the confidence values associated with the time signature, key, and tempo variables are positive. This suggests that higher confidence leads to a higher predicted probability of a Top 10 hit.
  • The coefficient estimate for loudness is positive, meaning that mainstream listeners prefer louder songs with heavier instrumentation.
  • The coefficient estimate for energy is negative, meaning that mainstream listeners prefer songs that are less energetic, which are those songs with light instrumentation.

These coefficients lead to contradictory conclusions for Model 1. This could be due to multicollinearity issues. Inspect the correlation between the variables “loudness” and “energy” in the training set.

cor(train.h2o$loudness,train.h2o$energy)
## [1] 0.7399067

This number indicates that these two variables are highly correlated, and Model 1 does indeed suffer from multicollinearity. Typically, you associate a value of -1.0 to -0.5 or 1.0 to 0.5 to indicate strong correlation, and a value of 0.1 to 0.1 to indicate weak correlation. To avoid this correlation issue, omit one of these two variables and re-create the models.

You build two variations of the original model:

  • Model 2, in which you keep “energy” and omit “loudness”
  • Model 3, in which you keep “loudness” and omit “energy”

You compare these two models and choose the model with a better fit for this use case.

Create Model 2: Keep energy and omit loudness

colnames(train.h2o)
##  [1] "year"                     "songtitle"               
##  [3] "artistname"               "songid"                  
##  [5] "artistid"                 "timesignature"           
##  [7] "timesignature_confidence" "loudness"                
##  [9] "tempo"                    "tempo_confidence"        
## [11] "key"                      "key_confidence"          
## [13] "energy"                   "pitch"                   
## [15] "timbre_0_min"             "timbre_0_max"            
## [17] "timbre_1_min"             "timbre_1_max"            
## [19] "timbre_2_min"             "timbre_2_max"            
## [21] "timbre_3_min"             "timbre_3_max"            
## [23] "timbre_4_min"             "timbre_4_max"            
## [25] "timbre_5_min"             "timbre_5_max"            
## [27] "timbre_6_min"             "timbre_6_max"            
## [29] "timbre_7_min"             "timbre_7_max"            
## [31] "timbre_8_min"             "timbre_8_max"            
## [33] "timbre_9_min"             "timbre_9_max"            
## [35] "timbre_10_min"            "timbre_10_max"           
## [37] "timbre_11_min"            "timbre_11_max"           
## [39] "top10"
y.dep <- 39
x.indep <- c(6:7,9:38)
x.indep
##  [1]  6  7  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
## [24] 30 31 32 33 34 35 36 37 38
modelh2 <- h2o.glm( y = y.dep, x = x.indep, training_frame = train.h2o, family = "binomial")
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=======                                                          |  10%
  |                                                                       
  |=================================================================| 100%

Measure the performance of Model 2.

h2o.performance(model=modelh2,newdata=test.h2o)
## H2OBinomialMetrics: glm
## 
## MSE:  0.09922606
## RMSE:  0.3150017
## LogLoss:  0.3228213
## Mean Per-Class Error:  0.2490554
## AUC:  0.8431933
## Gini:  0.6863867
## R^2:  0.2548191
## Null Deviance:  326.0801
## Residual Deviance:  240.8247
## AIC:  306.8247
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##          0  1    Error     Rate
## 0      280 34 0.108280  =34/314
## 1       23 36 0.389831   =23/59
## Totals 303 70 0.152815  =57/373
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                         metric threshold    value idx
## 1                       max f1  0.254391 0.558140  69
## 2                       max f2  0.113031 0.647208 157
## 3                 max f0point5  0.413999 0.596026  22
## 4                 max accuracy  0.446250 0.876676  18
## 5                max precision  0.811739 1.000000   0
## 6                   max recall  0.037682 1.000000 283
## 7              max specificity  0.811739 1.000000   0
## 8             max absolute_mcc  0.254391 0.469060  69
## 9   max min_per_class_accuracy  0.141051 0.716561 131
## 10 max mean_per_class_accuracy  0.113031 0.761821 157
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`
dfmodelh2 <- as.data.frame(h2o.varimp(modelh2))
dfmodelh2
##                       names coefficients sign
## 1                     pitch  0.700331511  NEG
## 2              timbre_1_min  0.510270513  POS
## 3              timbre_0_max  0.402059546  NEG
## 4              timbre_6_min  0.333316236  NEG
## 5             timbre_11_min  0.331647383  NEG
## 6              timbre_3_max  0.252425901  NEG
## 7             timbre_11_max  0.227500308  POS
## 8              timbre_4_max  0.210663865  POS
## 9              timbre_0_min  0.208516163  POS
## 10             timbre_5_min  0.202748055  NEG
## 11             timbre_4_min  0.197246582  POS
## 12            timbre_10_max  0.172729619  POS
## 13         tempo_confidence  0.167523934  POS
## 14 timesignature_confidence  0.167398830  POS
## 15             timbre_7_min  0.142450727  NEG
## 16             timbre_8_max  0.093377516  POS
## 17            timbre_10_min  0.090333426  POS
## 18            timesignature  0.085851625  POS
## 19             timbre_7_max  0.083948442  NEG
## 20           key_confidence  0.079657073  POS
## 21             timbre_6_max  0.076426046  POS
## 22             timbre_2_min  0.071957831  NEG
## 23             timbre_9_max  0.071393189  POS
## 24             timbre_8_min  0.070225578  POS
## 25                      key  0.061394702  POS
## 26             timbre_3_min  0.048384697  POS
## 27             timbre_1_max  0.044721121  NEG
## 28                   energy  0.039698433  POS
## 29             timbre_5_max  0.039469064  POS
## 30             timbre_2_max  0.018461133  POS
## 31                    tempo  0.013279926  POS
## 32             timbre_9_min  0.005282143  NEG
## 33                                    NA <NA>

h2o.auc(h2o.performance(modelh2,test.h2o)) 
## [1] 0.8431933

You can make the following observations:

  • The AUC metric is 0.8431933.
  • Inspecting the coefficient of the variable energy, Model 2 suggests that songs with high energy levels tend to be more popular. This is as per expectation.
  • As H2O orders variables by significance, the variable energy is not significant in this model.

You can conclude that Model 2 is not ideal for this use , as energy is not significant.

CreateModel 3: Keep loudness but omit energy

colnames(train.h2o)
##  [1] "year"                     "songtitle"               
##  [3] "artistname"               "songid"                  
##  [5] "artistid"                 "timesignature"           
##  [7] "timesignature_confidence" "loudness"                
##  [9] "tempo"                    "tempo_confidence"        
## [11] "key"                      "key_confidence"          
## [13] "energy"                   "pitch"                   
## [15] "timbre_0_min"             "timbre_0_max"            
## [17] "timbre_1_min"             "timbre_1_max"            
## [19] "timbre_2_min"             "timbre_2_max"            
## [21] "timbre_3_min"             "timbre_3_max"            
## [23] "timbre_4_min"             "timbre_4_max"            
## [25] "timbre_5_min"             "timbre_5_max"            
## [27] "timbre_6_min"             "timbre_6_max"            
## [29] "timbre_7_min"             "timbre_7_max"            
## [31] "timbre_8_min"             "timbre_8_max"            
## [33] "timbre_9_min"             "timbre_9_max"            
## [35] "timbre_10_min"            "timbre_10_max"           
## [37] "timbre_11_min"            "timbre_11_max"           
## [39] "top10"
y.dep <- 39
x.indep <- c(6:12,14:38)
x.indep
##  [1]  6  7  8  9 10 11 12 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
## [24] 30 31 32 33 34 35 36 37 38
modelh3 <- h2o.glm( y = y.dep, x = x.indep, training_frame = train.h2o, family = "binomial")
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |========                                                         |  12%
  |                                                                       
  |=================================================================| 100%
perfh3<-h2o.performance(model=modelh3,newdata=test.h2o)
perfh3
## H2OBinomialMetrics: glm
## 
## MSE:  0.0978859
## RMSE:  0.3128672
## LogLoss:  0.3178367
## Mean Per-Class Error:  0.264925
## AUC:  0.8492389
## Gini:  0.6984778
## R^2:  0.2648836
## Null Deviance:  326.0801
## Residual Deviance:  237.1062
## AIC:  303.1062
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##          0  1    Error     Rate
## 0      286 28 0.089172  =28/314
## 1       26 33 0.440678   =26/59
## Totals 312 61 0.144772  =54/373
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                         metric threshold    value idx
## 1                       max f1  0.273799 0.550000  60
## 2                       max f2  0.125503 0.663265 155
## 3                 max f0point5  0.435479 0.628931  24
## 4                 max accuracy  0.435479 0.882038  24
## 5                max precision  0.821606 1.000000   0
## 6                   max recall  0.038328 1.000000 280
## 7              max specificity  0.821606 1.000000   0
## 8             max absolute_mcc  0.435479 0.471426  24
## 9   max min_per_class_accuracy  0.173693 0.745763 120
## 10 max mean_per_class_accuracy  0.125503 0.775073 155
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`
dfmodelh3 <- as.data.frame(h2o.varimp(modelh3))
dfmodelh3
##                       names coefficients sign
## 1              timbre_0_max 1.216621e+00  NEG
## 2                  loudness 9.780973e-01  POS
## 3                     pitch 7.249788e-01  NEG
## 4              timbre_1_min 3.891197e-01  POS
## 5              timbre_6_min 3.689193e-01  NEG
## 6             timbre_11_min 3.086673e-01  NEG
## 7              timbre_3_max 3.025593e-01  NEG
## 8             timbre_11_max 2.459081e-01  POS
## 9              timbre_4_min 2.379749e-01  POS
## 10             timbre_4_max 2.157627e-01  POS
## 11             timbre_0_min 1.859531e-01  POS
## 12             timbre_5_min 1.846128e-01  NEG
## 13 timesignature_confidence 1.729658e-01  POS
## 14             timbre_7_min 1.431871e-01  NEG
## 15            timbre_10_max 1.366703e-01  POS
## 16            timbre_10_min 1.215954e-01  POS
## 17         tempo_confidence 1.183698e-01  POS
## 18             timbre_2_min 1.019149e-01  NEG
## 19           key_confidence 9.109701e-02  POS
## 20             timbre_7_max 8.987908e-02  NEG
## 21             timbre_6_max 6.935132e-02  POS
## 22             timbre_8_max 6.878241e-02  POS
## 23            timesignature 6.120105e-02  POS
## 24                      key 5.814805e-02  POS
## 25             timbre_8_min 5.759228e-02  POS
## 26             timbre_1_max 2.930285e-02  NEG
## 27             timbre_9_max 2.843755e-02  POS
## 28             timbre_3_min 2.380245e-02  POS
## 29             timbre_2_max 1.917035e-02  POS
## 30             timbre_5_max 1.715813e-02  POS
## 31                    tempo 1.364418e-02  NEG
## 32             timbre_9_min 8.463143e-05  NEG
## 33                                    NA <NA>
h2o.sensitivity(perfh3,0.5)
## Warning in h2o.find_row_by_threshold(object, t): Could not find exact
## threshold: 0.5 for this set of metrics; using closest threshold found:
## 0.501855569251422. Run `h2o.predict` and apply your desired threshold on a
## probability column.
## [[1]]
## [1] 0.2033898
h2o.auc(perfh3)
## [1] 0.8492389

You can make the following observations:

  • The AUC metric is 0.8492389.
  • From the confusion matrix, the model correctly predicts that 33 songs will be top 10 hits (true positives). However, it has 26 false positives (songs that the model predicted would be Top 10 hits, but ended up not being Top 10 hits).
  • Loudness has a positive coefficient estimate, meaning that this model predicts that songs with heavier instrumentation tend to be more popular. This is the same conclusion from Model 2.
  • Loudness is significant in this model.

Overall, Model 3 predicts a higher number of top 10 hits with an accuracy rate that is acceptable. To choose the best fit for production runs, record labels should consider the following factors:

  • Desired model accuracy at a given threshold
  • Number of correct predictions for top10 hits
  • Tolerable number of false positives or false negatives

Next, make predictions using Model 3 on the test dataset.

predict.regh <- h2o.predict(modelh3, test.h2o)
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=================================================================| 100%
print(predict.regh)
##   predict        p0          p1
## 1       0 0.9654739 0.034526052
## 2       0 0.9654748 0.034525236
## 3       0 0.9635547 0.036445318
## 4       0 0.9343579 0.065642149
## 5       0 0.9978334 0.002166601
## 6       0 0.9779949 0.022005078
## 
## [373 rows x 3 columns]
predict.regh$predict
##   predict
## 1       0
## 2       0
## 3       0
## 4       0
## 5       0
## 6       0
## 
## [373 rows x 1 column]
dpr<-as.data.frame(predict.regh)
#Rename the predicted column 
colnames(dpr)[colnames(dpr) == 'predict'] <- 'predict_top10'
table(dpr$predict_top10)
## 
##   0   1 
## 312  61

The first set of output results specifies the probabilities associated with each predicted observation.  For example, observation 1 is 96.54739% likely to not be a Top 10 hit, and 3.4526052% likely to be a Top 10 hit (predict=1 indicates Top 10 hit and predict=0 indicates not a Top 10 hit).  The second set of results list the actual predictions made.  From the third set of results, this model predicts that 61 songs will be top 10 hits.

Compute the baseline accuracy, by assuming that the baseline predicts the most frequent outcome, which is that most songs are not Top 10 hits.

table(BillboardTest$top10)
## 
##   0   1 
## 314  59

Now observe that the baseline model would get 314 observations correct, and 59 wrong, for an accuracy of 314/(314+59) = 0.8418231.

It seems that Model 3, with an accuracy of 0.8552, provides you with a small improvement over the baseline model. But is this model useful for record labels?

View the two models from an investment perspective:

  • A production company is interested in investing in songs that are more likely to make it to the Top 10. The company’s objective is to minimize the risk of financial losses attributed to investing in songs that end up unpopular.
  • How many songs does Model 3 correctly predict as a Top 10 hit in 2010? Looking at the confusion matrix, you see that it predicts 33 top 10 hits correctly at an optimal threshold, which is more than half the number
  • It will be more useful to the record label if you can provide the production company with a list of songs that are highly likely to end up in the Top 10.
  • The baseline model is not useful, as it simply does not label any song as a hit.

Considering the three models built so far, you can conclude that Model 3 proves to be the best investment choice for the record label.

GBM model

H2O provides you with the ability to explore other learning models, such as GBM and deep learning. Explore building a model using the GBM technique, using the built-in h2o.gbm function.

Before you do this, you need to convert the target variable to a factor for multinomial classification techniques.

train.h2o$top10=as.factor(train.h2o$top10)
gbm.modelh <- h2o.gbm(y=y.dep, x=x.indep, training_frame = train.h2o, ntrees = 500, max_depth = 4, learn_rate = 0.01, seed = 1122,distribution="multinomial")
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |===                                                              |   5%
  |                                                                       
  |=====                                                            |   7%
  |                                                                       
  |======                                                           |   9%
  |                                                                       
  |=======                                                          |  10%
  |                                                                       
  |======================                                           |  33%
  |                                                                       
  |=====================================                            |  56%
  |                                                                       
  |====================================================             |  79%
  |                                                                       
  |================================================================ |  98%
  |                                                                       
  |=================================================================| 100%
perf.gbmh<-h2o.performance(gbm.modelh,test.h2o)
perf.gbmh
## H2OBinomialMetrics: gbm
## 
## MSE:  0.09860778
## RMSE:  0.3140188
## LogLoss:  0.3206876
## Mean Per-Class Error:  0.2120263
## AUC:  0.8630573
## Gini:  0.7261146
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##          0  1    Error     Rate
## 0      266 48 0.152866  =48/314
## 1       16 43 0.271186   =16/59
## Totals 282 91 0.171582  =64/373
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                       metric threshold    value idx
## 1                     max f1  0.189757 0.573333  90
## 2                     max f2  0.130895 0.693717 145
## 3               max f0point5  0.327346 0.598802  26
## 4               max accuracy  0.442757 0.876676  14
## 5              max precision  0.802184 1.000000   0
## 6                 max recall  0.049990 1.000000 284
## 7            max specificity  0.802184 1.000000   0
## 8           max absolute_mcc  0.169135 0.496486 104
## 9 max min_per_class_accuracy  0.169135 0.796610 104
## 10 max mean_per_class_accuracy  0.169135 0.805948 104
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `
h2o.sensitivity(perf.gbmh,0.5)
## Warning in h2o.find_row_by_threshold(object, t): Could not find exact
## threshold: 0.5 for this set of metrics; using closest threshold found:
## 0.501205344484314. Run `h2o.predict` and apply your desired threshold on a
## probability column.
## [[1]]
## [1] 0.1355932
h2o.auc(perf.gbmh)
## [1] 0.8630573

This model correctly predicts 43 top 10 hits, which is 10 more than the number predicted by Model 3. Moreover, the AUC metric is higher than the one obtained from Model 3.

As seen above, H2O’s API provides the ability to obtain key statistical measures required to analyze the models easily, using several built-in functions. The record label can experiment with different parameters to arrive at the model that predicts the maximum number of Top 10 hits at the desired level of accuracy and threshold.

H2O also allows you to experiment with deep learning models. Deep learning models have the ability to learn features implicitly, but can be more expensive computationally.

Now, create a deep learning model with the h2o.deeplearning function, using the same training and test datasets created before. The time taken to run this model depends on the type of EC2 instance chosen for this purpose.  For models that require more computation, consider using accelerated computing instances such as the P2 instance type.

system.time(
  dlearning.modelh <- h2o.deeplearning(y = y.dep,
                                      x = x.indep,
                                      training_frame = train.h2o,
                                      epoch = 250,
                                      hidden = c(250,250),
                                      activation = "Rectifier",
                                      seed = 1122,
                                      distribution="multinomial"
  )
)
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |===                                                              |   4%
  |                                                                       
  |=====                                                            |   8%
  |                                                                       
  |========                                                         |  12%
  |                                                                       
  |==========                                                       |  16%
  |                                                                       
  |=============                                                    |  20%
  |                                                                       
  |================                                                 |  24%
  |                                                                       
  |==================                                               |  28%
  |                                                                       
  |=====================                                            |  32%
  |                                                                       
  |=======================                                          |  36%
  |                                                                       
  |==========================                                       |  40%
  |                                                                       
  |=============================                                    |  44%
  |                                                                       
  |===============================                                  |  48%
  |                                                                       
  |==================================                               |  52%
  |                                                                       
  |====================================                             |  56%
  |                                                                       
  |=======================================                          |  60%
  |                                                                       
  |==========================================                       |  64%
  |                                                                       
  |============================================                     |  68%
  |                                                                       
  |===============================================                  |  72%
  |                                                                       
  |=================================================                |  76%
  |                                                                       
  |====================================================             |  80%
  |                                                                       
  |=======================================================          |  84%
  |                                                                       
  |=========================================================        |  88%
  |                                                                       
  |============================================================     |  92%
  |                                                                       
  |==============================================================   |  96%
  |                                                                       
  |=================================================================| 100%
##    user  system elapsed 
##   1.216   0.020 166.508
perf.dl<-h2o.performance(model=dlearning.modelh,newdata=test.h2o)
perf.dl
## H2OBinomialMetrics: deeplearning
## 
## MSE:  0.1678359
## RMSE:  0.4096778
## LogLoss:  1.86509
## Mean Per-Class Error:  0.3433013
## AUC:  0.7568822
## Gini:  0.5137644
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##          0  1    Error     Rate
## 0      290 24 0.076433  =24/314
## 1       36 23 0.610169   =36/59
## Totals 326 47 0.160858  =60/373
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                       metric threshold    value idx
## 1                     max f1  0.826267 0.433962  46
## 2                     max f2  0.000000 0.588235 239
## 3               max f0point5  0.999929 0.511811  16
## 4               max accuracy  0.999999 0.865952  10
## 5              max precision  1.000000 1.000000   0
## 6                 max recall  0.000000 1.000000 326
## 7            max specificity  1.000000 1.000000   0
## 8           max absolute_mcc  0.999929 0.363219  16
## 9 max min_per_class_accuracy  0.000004 0.662420 145
## 10 max mean_per_class_accuracy  0.000000 0.685334 224
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`
h2o.sensitivity(perf.dl,0.5)
## Warning in h2o.find_row_by_threshold(object, t): Could not find exact
## threshold: 0.5 for this set of metrics; using closest threshold found:
## 0.496293348880151. Run `h2o.predict` and apply your desired threshold on a
## probability column.
## [[1]]
## [1] 0.3898305
h2o.auc(perf.dl)
## [1] 0.7568822

The AUC metric for this model is 0.7568822, which is less than what you got from the earlier models. I recommend further experimentation using different hyper parameters, such as the learning rate, epoch or the number of hidden layers.

H2O’s built-in functions provide many key statistical measures that can help measure model performance. Here are some of these key terms.

MetricDescription
SensitivityMeasures the proportion of positives that have been correctly identified. It is also called the true positive rate, or recall.
SpecificityMeasures the proportion of negatives that have been correctly identified. It is also called the true negative rate.
ThresholdCutoff point that maximizes specificity and sensitivity. While the model may not provide the highest prediction at this point, it would not be biased towards positives or negatives.
PrecisionThe fraction of the documents retrieved that are relevant to the information needed, for example, how many of the positively classified are relevant
AUC

Provides insight into how well the classifier is able to separate the two classes. The implicit goal is to deal with situations where the sample distribution is highly skewed, with a tendency to overfit to a single class.

0.90 – 1 = excellent (A)

0.8 – 0.9 = good (B)

0.7 – 0.8 = fair (C)

.6 – 0.7 = poor (D)

0.5 – 0.5 = fail (F)

Here’s a summary of the metrics generated from H2O’s built-in functions for the three models that produced useful results.

Metric Model 3GBM ModelDeep Learning Model

Accuracy

(max)

0.882038

(t=0.435479)

0.876676

(t=0.442757)

0.865952

(t=0.999999)

Precision

(max)

1.0

(t=0.821606)

1.0

(t=0802184)

1.0

(t=1.0)

Recall

(max)

1.01.0

1.0

(t=0)

Specificity

(max)

1.01.0

1.0

(t=1)

Sensitivity

 

0.20338980.1355932

0.3898305

(t=0.5)

AUC0.84923890.86305730.756882

Note: ‘t’ denotes threshold.

Your options at this point could be narrowed down to Model 3 and the GBM model, based on the AUC and accuracy metrics observed earlier.  If the slightly lower accuracy of the GBM model is deemed acceptable, the record label can choose to go to production with the GBM model, as it can predict a higher number of Top 10 hits.  The AUC metric for the GBM model is also higher than that of Model 3.

Record labels can experiment with different learning techniques and parameters before arriving at a model that proves to be the best fit for their business. Because deep learning models can be computationally expensive, record labels can choose more powerful EC2 instances on AWS to run their experiments faster.

Conclusion

In this post, I showed how the popular music industry can use analytics to predict the type of songs that make the Top 10 Billboard charts. By running H2O’s scalable machine learning platform on AWS, data scientists can easily experiment with multiple modeling techniques and interactively query the data using Amazon Athena, without having to manage the underlying infrastructure. This helps record labels make critical decisions on the type of artists and songs to promote in a timely fashion, thereby increasing sales and revenue.

If you have questions or suggestions, please comment below.


Additional Reading

Learn how to build and explore a simple geospita simple GEOINT application using SparkR.


About the Authors

gopalGopal Wunnava is a Partner Solution Architect with the AWS GSI Team. He works with partners and customers on big data engagements, and is passionate about building analytical solutions that drive business capabilities and decision making. In his spare time, he loves all things sports and movies related and is fond of old classics like Asterix, Obelix comics and Hitchcock movies.

 

 

Bob Strahan, a Senior Consultant with AWS Professional Services, contributed to this post.

 

 

Now Available – Amazon Linux AMI 2017.09

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/now-available-amazon-linux-ami-2017-09/

I’m happy to announce that the latest version of the Amazon Linux AMI (2017.09) is now available in all AWS Regions for all current-generation EC2 instances. The AMI contains a supported and maintained Linux image that is designed to provide a stable, secure, high performance environment for applications running on EC2.

Easy Upgrade
You can upgrade your existing instances by running two commands and then rebooting:

$ sudo yum clean all
$ sudo yum update

Lots of Goodies
The AMI contains many new features, many of which were added in response to requests from our customers. Here’s a summary:

Kernel 4.9.51 – Based on the 4.9 stable kernel series, this kernel includes the ENA 1.3.0 driver along with support for TCP Bottleneck Bandwidth and RTT (BBR). Read my post, Elastic Network Adapter – High-Performance Network Interface for Amazon EC2 to learn more about ENA. Read the Release Notes to learn how to enable BBR.

Amazon SSM Agent – The Amazon SSM Agent is now installed by default. This means that you can now use EC2 Run Command to configure and run scripts on your instances with no further setup. To learn more, read Executing Commands Using Systems Manager Run Command or Manage Instances at Scale Without SSH Access Using EC2 Run Command.

Python 3.6 – The newest version of Python is now included and can be managed via virtualenv and alternatives. You can install Python 3.6 like this:

$ sudo yum install python36 python36-virtualenv python36-pip

Ruby 2.4 – The latest version of Ruby in the 2.4 series is now available. Install it like this:

$ sudo yum install ruby24

OpenSSL – The AMI now uses OpenSSL 1.0.2k.

HTTP/2 – The HTTP/2 protocol is now supported by the AMI’s httpd24, nginx, and curl packages.

Relational DatabasesPostgres 9.6 and MySQL 5.7 are now available, and can be installed like this:

$ sudo yum install postgresql96
$ sudo yum install mysql57

OpenMPI – The OpenMPI package has been upgraded from 1.6.4 to 2.1.1. OpenMPI compatibility packages are available and can be used to build and run older OpenMPI applications.

And More – Other updated packages include Squid 3.5, Nginx 1.12, Tomcat 8.5, and GCC 6.4.

Launch it Today
You can use this AMI to launch EC2 instances in all AWS Regions today. It is available for EBS-backed and Instance Store-backed instances and supports HVM and PV modes.

Jeff;

Evergreen 3.0.0 released

Post Syndicated from ris original https://lwn.net/Articles/735379/rss

The Evergreen community has announced the
release
of Evergreen 3.0.0, software for libraries. This release
includes community support of the web staff client for production use,
serials and offline circulation modules for the web staff client,
improvements to the display of headings in the public catalog browse list,
and more.

Creating a Cost-Efficient Amazon ECS Cluster for Scheduled Tasks

Post Syndicated from Nathan Taber original https://aws.amazon.com/blogs/compute/creating-a-cost-efficient-amazon-ecs-cluster-for-scheduled-tasks/

Madhuri Peri
Sr. DevOps Consultant

When you use Amazon Relational Database Service (Amazon RDS), depending on the logging levels on the RDS instances and the volume of transactions, you could generate a lot of log data. To ensure that everything is running smoothly, many customers search for log error patterns using different log aggregation and visualization systems, such as Amazon Elasticsearch Service, Splunk, or other tool of their choice. A module needs to periodically retrieve the RDS logs using the SDK, and then send them to Amazon S3. From there, you can stream them to your log aggregation tool.

One option is writing an AWS Lambda function to retrieve the log files. However, because of the time that this function needs to execute, depending on the volume of log files retrieved and transferred, it is possible that Lambda could time out on many instances.  Another approach is launching an Amazon EC2 instance that runs this job periodically. However, this would require you to run an EC2 instance continuously, not an optimal use of time or money.

Using the new Amazon CloudWatch integration with Amazon EC2 Container Service, you can trigger this job to run in a container on an existing Amazon ECS cluster. Additionally, this would allow you to improve costs by running containers on a fleet of Spot Instances.

In this post, I will show you how to use the new scheduled tasks (cron) feature in Amazon ECS and launch tasks using CloudWatch events, while leveraging Spot Fleet to maximize availability and cost optimization for containerized workloads.

Architecture

The following diagram shows how the various components described schedule a task that retrieves log files from Amazon RDS database instances, and deposits the logs into an S3 bucket.

Amazon ECS cluster container instances are using Spot Fleet, which is a perfect match for the workload that needs to run when it can. This improves cluster costs.

The task definition defines which Docker image to retrieve from the Amazon EC2 Container Registry (Amazon ECR) repository and run on the Amazon ECS cluster.

The container image has Python code functions to make AWS API calls using boto3. It iterates over the RDS database instances, retrieves the logs, and deposits them in the S3 bucket. Many customers choose these logs to be delivered to their centralized log-store. CloudWatch Events defines the schedule for when the container task has to be launched.

Walkthrough

To provide the basic framework, we have built an AWS CloudFormation template that creates the following resources:

  • Amazon ECR repository for storing the Docker image to be used in the task definition
  • S3 bucket that holds the transferred logs
  • Task definition, with image name and S3 bucket as environment variables provided via input parameter
  • CloudWatch Events rule
  • Amazon ECS cluster
  • Amazon ECS container instances using Spot Fleet
  • IAM roles required for the container instance profiles

Before you begin

Ensure that Git, Docker, and the AWS CLI are installed on your computer.

In your AWS account, instantiate one Amazon Aurora instance using the console. For more information, see Creating an Amazon Aurora DB Cluster.

Implementation Steps

  1. Clone the code from GitHub that performs RDS API calls to retrieve the log files.
    git clone https://github.com/awslabs/aws-ecs-scheduled-tasks.git
  2. Build and tag the image.
    cd aws-ecs-scheduled-tasks/container-code/src && ls

    Dockerfile		rdslogsshipper.py	requirements.txt

    docker build -t rdslogsshipper .

    Sending build context to Docker daemon 9.728 kB
    Step 1 : FROM python:3
     ---> 41397f4f2887
    Step 2 : WORKDIR /usr/src/app
     ---> Using cache
     ---> 59299c020e7e
    Step 3 : COPY requirements.txt ./
     ---> 8c017e931c3b
    Removing intermediate container df09e1bed9f2
    Step 4 : COPY rdslogsshipper.py /usr/src/app
     ---> 099a49ca4325
    Removing intermediate container 1b1da24a6699
    Step 5 : RUN pip install --no-cache-dir -r requirements.txt
     ---> Running in 3ed98b30901d
    Collecting boto3 (from -r requirements.txt (line 1))
      Downloading boto3-1.4.6-py2.py3-none-any.whl (128kB)
    Collecting botocore (from -r requirements.txt (line 2))
      Downloading botocore-1.6.7-py2.py3-none-any.whl (3.6MB)
    Collecting s3transfer<0.2.0,>=0.1.10 (from boto3->-r requirements.txt (line 1))
      Downloading s3transfer-0.1.10-py2.py3-none-any.whl (54kB)
    Collecting jmespath<1.0.0,>=0.7.1 (from boto3->-r requirements.txt (line 1))
      Downloading jmespath-0.9.3-py2.py3-none-any.whl
    Collecting python-dateutil<3.0.0,>=2.1 (from botocore->-r requirements.txt (line 2))
      Downloading python_dateutil-2.6.1-py2.py3-none-any.whl (194kB)
    Collecting docutils>=0.10 (from botocore->-r requirements.txt (line 2))
      Downloading docutils-0.14-py3-none-any.whl (543kB)
    Collecting six>=1.5 (from python-dateutil<3.0.0,>=2.1->botocore->-r requirements.txt (line 2))
      Downloading six-1.10.0-py2.py3-none-any.whl
    Installing collected packages: six, python-dateutil, docutils, jmespath, botocore, s3transfer, boto3
    Successfully installed boto3-1.4.6 botocore-1.6.7 docutils-0.14 jmespath-0.9.3 python-dateutil-2.6.1 s3transfer-0.1.10 six-1.10.0
     ---> f892d3cb7383
    Removing intermediate container 3ed98b30901d
    Step 6 : COPY . .
     ---> ea7550c04fea
    Removing intermediate container b558b3ebd406
    Successfully built ea7550c04fea
  3. Run the CloudFormation stack and get the names for the Amazon ECR repo and S3 bucket. In the stack, choose Outputs.
  4. Open the ECS console and choose Repositories. The rdslogs repo has been created. Choose View Push Commands and follow the instructions to connect to the repository and push the image for the code that you built in Step 2. The screenshot shows the final result:
  5. Associate the CloudWatch scheduled task with the created Amazon ECS Task Definition, using a new CloudWatch event rule that is scheduled to run at intervals. The following rule is scheduled to run every 15 minutes:
    aws --profile default --region us-west-2 events put-rule --name demo-ecs-task-rule  --schedule-expression "rate(15 minutes)"

    {
        "RuleArn": "arn:aws:events:us-west-2:12345678901:rule/demo-ecs-task-rule"
    }
  6. CloudWatch requires IAM permissions to place a task on the Amazon ECS cluster when the CloudWatch event rule is executed, in addition to an IAM role that can be assumed by CloudWatch Events. This is done in three steps:
    1. Create the IAM role to be assumed by CloudWatch.
      aws --profile default --region us-west-2 iam create-role --role-name Test-Role --assume-role-policy-document file://event-role.json

      {
          "Role": {
              "AssumeRolePolicyDocument": {
                  "Version": "2012-10-17", 
                  "Statement": [
                      {
                          "Action": "sts:AssumeRole", 
                          "Effect": "Allow", 
                          "Principal": {
                              "Service": "events.amazonaws.com"
                          }
                      }
                  ]
              }, 
              "RoleId": "AROAIRYYLDCVZCUACT7FS", 
              "CreateDate": "2017-07-14T22:44:52.627Z", 
              "RoleName": "Test-Role", 
              "Path": "/", 
              "Arn": "arn:aws:iam::12345678901:role/Test-Role"
          }
      }

      The following is an example of the event-role.json file used earlier:

      {
          "Version": "2012-10-17",
          "Statement": [
              {
                  "Effect": "Allow",
                  "Principal": {
                    "Service": "events.amazonaws.com"
                  },
                  "Action": "sts:AssumeRole"
              }
          ]
      }
    2. Create the IAM policy defining the ECS cluster and task definition. You need to get these values from the CloudFormation outputs and resources.
      aws --profile default --region us-west-2 iam create-policy --policy-name test-policy --policy-document file://event-policy.json

      {
          "Policy": {
              "PolicyName": "test-policy", 
              "CreateDate": "2017-07-14T22:51:20.293Z", 
              "AttachmentCount": 0, 
              "IsAttachable": true, 
              "PolicyId": "ANPAI7XDIQOLTBUMDWGJW", 
              "DefaultVersionId": "v1", 
              "Path": "/", 
              "Arn": "arn:aws:iam::123455678901:policy/test-policy", 
              "UpdateDate": "2017-07-14T22:51:20.293Z"
          }
      }

      The following is an example of the event-policy.json file used earlier:

      {
          "Version": "2012-10-17",
          "Statement": [
            {
                "Effect": "Allow",
                "Action": [
                    "ecs:RunTask"
                ],
                "Resource": [
                    "arn:aws:ecs:*::task-definition/"
                ],
                "Condition": {
                    "ArnLike": {
                        "ecs:cluster": "arn:aws:ecs:*::cluster/"
                    }
                }
            }
          ]
      }
    3. Attach the IAM policy to the role.
      aws --profile default --region us-west-2 iam attach-role-policy --role-name Test-Role --policy-arn arn:aws:iam::1234567890:policy/test-policy
  7. Associate the CloudWatch rule created earlier to place the task on the ECS cluster. The following command shows an example. Replace the AWS account ID and region with your settings.
    aws events put-targets --rule demo-ecs-task-rule --targets "Id"="1","Arn"="arn:aws:ecs:us-west-2:12345678901:cluster/test-cwe-blog-ecsCluster-15HJFWCH4SP67","EcsParameters"={"TaskDefinitionArn"="arn:aws:ecs:us-west-2:12345678901:task-definition/test-cwe-blog-taskdef:8"},"RoleArn"="arn:aws:iam::12345678901:role/Test-Role"

    {
        "FailedEntries": [], 
        "FailedEntryCount": 0
    }

That’s it. The logs now run based on the defined schedule.

To test this, open the Amazon ECS console, select the Amazon ECS cluster that you created, and then choose Tasks, Run New Task. Select the task definition created by the CloudFormation template, and the cluster should be selected automatically. As this runs, the S3 bucket should be populated with the RDS logs for the instance.

Conclusion

In this post, you’ve seen that the choices for workloads that need to run at a scheduled time include Lambda with CloudWatch events or EC2 with cron. However, sometimes the job could run outside of Lambda execution time limits or be not cost-effective for an EC2 instance.

In such cases, you can schedule the tasks on an ECS cluster using CloudWatch rules. In addition, you can use a Spot Fleet cluster with Amazon ECS for cost-conscious workloads that do not have hard requirements on execution time or instance availability in the Spot Fleet. For more information, see Powering your Amazon ECS Cluster with Amazon EC2 Spot Instances and Scheduled Events.

If you have questions or suggestions, please comment below.

Announcing Intel Clear Containers 3.0

Post Syndicated from ris original https://lwn.net/Articles/734648/rss

The Clear Containers team at Intel has announced
the release
of Clear Containers 3.0. “Completely rewritten and refactored, Clear Containers 3.0 uses Go language instead of C and introduces many new components and features. The 3.0 release of Clear Containers brings better integration into the container ecosystem and an ability to leverage code used for namespace based containers.

Delivering Graphics Apps with Amazon AppStream 2.0

Post Syndicated from Deepak Suryanarayanan original https://aws.amazon.com/blogs/compute/delivering-graphics-apps-with-amazon-appstream-2-0/

Sahil Bahri, Sr. Product Manager, Amazon AppStream 2.0

Do you need to provide a workstation class experience for users who run graphics apps? With Amazon AppStream 2.0, you can stream graphics apps from AWS to a web browser running on any supported device. AppStream 2.0 offers a choice of GPU instance types. The range includes the newly launched Graphics Design instance, which allows you to offer a fast, fluid user experience at a fraction of the cost of using a graphics workstation, without upfront investments or long-term commitments.

In this post, I discuss the Graphics Design instance type in detail, and how you can use it to deliver a graphics application such as Siemens NX―a popular CAD/CAM application that we have been testing on AppStream 2.0 with engineers from Siemens PLM.

Graphics Instance Types on AppStream 2.0

First, a quick recap on the GPU instance types available with AppStream 2.0. In July, 2017, we launched graphics support for AppStream 2.0 with two new instance types that Jeff Barr discussed on the AWS Blog:

  • Graphics Desktop
  • Graphics Pro

Many customers in industries such as engineering, media, entertainment, and oil and gas are using these instances to deliver high-performance graphics applications to their users. These instance types are based on dedicated NVIDIA GPUs and can run the most demanding graphics applications, including those that rely on CUDA graphics API libraries.

Last week, we added a new lower-cost instance type: Graphics Design. This instance type is a great fit for engineers, 3D modelers, and designers who use graphics applications that rely on the hardware acceleration of DirectX, OpenGL, or OpenCL APIs, such as Siemens NX, Autodesk AutoCAD, or Adobe Photoshop. The Graphics Design instance is based on AMD’s FirePro S7150x2 Server GPUs and equipped with AMD Multiuser GPU technology. The instance type uses virtualized GPUs to achieve lower costs, and is available in four instance sizes to scale and match the requirements of your applications.

InstancevCPUsInstance RAM (GiB)GPU Memory (GiB)
stream.graphics-design.large27.5 GiB1
stream.graphics-design.xlarge415.3 GiB2
stream.graphics-design.2xlarge830.5 GiB4
stream.graphics-design.4xlarge1661 GiB8

The following table compares all three graphics instance types on AppStream 2.0, along with example applications you could use with each.

 Graphics DesignGraphics DesktopGraphics Pro
Number of instance sizes413
GPU memory range
1–8 GiB4 GiB8–32 GiB
vCPU range2–16816–32
Memory range7.5–61 GiB15 GiB122–488 GiB
Graphics libraries supportedAMD FirePro S7150x2NVIDIA GRID K520NVIDIA Tesla M60
Price range (N. Virginia AWS Region)$0.25 – $2.00/hour$0.5/hour$2.05 – $8.20/hour
Example applicationsAdobe Premiere Pro, AutoDesk Revit, Siemens NXAVEVA E3D, SOLIDWORKSAutoDesk Maya, Landmark DecisionSpace, Schlumberger Petrel

Example graphics instance set up with Siemens NX

In the section, I walk through setting up Siemens NX with Graphics Design instances on AppStream 2.0. After set up is complete, users can able to access NX from within their browser and also access their design files from a file share. You can also use these steps to set up and test your own graphics applications on AppStream 2.0. Here’s the workflow:

  1. Create a file share to load and save design files.
  2. Create an AppStream 2.0 image with Siemens NX installed.
  3. Create an AppStream 2.0 fleet and stack.
  4. Invite users to access Siemens NX through a browser.
  5. Validate the setup.

To learn more about AppStream 2.0 concepts and set up, see the previous post Scaling Your Desktop Application Streams with Amazon AppStream 2.0. For a deeper review of all the setup and maintenance steps, see Amazon AppStream 2.0 Developer Guide.

Step 1: Create a file share to load and save design files

To launch and configure the file server

  1. Open the EC2 console and choose Launch Instance.
  2. Scroll to the Microsoft Windows Server 2016 Base Image and choose Select.
  3. Choose an instance type and size for your file server (I chose the general purpose m4.large instance). Choose Next: Configure Instance Details.
  4. Select a VPC and subnet. You launch AppStream 2.0 resources in the same VPC. Choose Next: Add Storage.
  5. If necessary, adjust the size of your EBS volume. Choose Review and Launch, Launch.
  6. On the Instances page, give your file server a name, such as My File Server.
  7. Ensure that the security group associated with the file server instance allows for incoming traffic from the security group that you select for your AppStream 2.0 fleets or image builders. You can use the default security group and select the same group while creating the image builder and fleet in later steps.

Log in to the file server using a remote access client such as Microsoft Remote Desktop. For more information about connecting to an EC2 Windows instance, see Connect to Your Windows Instance.

To enable file sharing

  1. Create a new folder (such as C:\My Graphics Files) and upload the shared files to make available to your users.
  2. From the Windows control panel, enable network discovery.
  3. Choose Server Manager, File and Storage Services, Volumes.
  4. Scroll to Shares and choose Start the Add Roles and Features Wizard. Go through the wizard to install the File Server and Share role.
  5. From the left navigation menu, choose Shares.
  6. Choose Start the New Share Wizard to set up your folder as a file share.
  7. Open the context (right-click) menu on the share and choose Properties, Permissions, Customize Permissions.
  8. Choose Permissions, Add. Add Read and Execute permissions for everyone on the network.

Step 2:  Create an AppStream 2.0 image with Siemens NX installed

To connect to the image builder and install applications

  1. Open the AppStream 2.0 management console and choose Images, Image Builder, Launch Image Builder.
  2. Create a graphics design image builder in the same VPC as your file server.
  3. From the Image builder tab, select your image builder and choose Connect. This opens a new browser tab and display a desktop to log in to.
  4. Log in to your image builder as ImageBuilderAdmin.
  5. Launch the Image Assistant.
  6. Download and install Siemens NX and other applications on the image builder. I added Blender and Firefox, but you could replace these with your own applications.
  7. To verify the user experience, you can test the application performance on the instance.

Before you finish creating the image, you must mount the file share by enabling a few Microsoft Windows services.

To mount the file share

  1. Open services.msc and check the following services:
  • DNS Client
  • Function Discovery Resource Publication
  • SSDP Discovery
  • UPnP Device H
  1. If any of the preceding services have Startup Type set to Manual, open the context (right-click) menu on the service and choose Start. Otherwise, open the context (right-click) menu on the service and choose Properties. For Startup Type, choose Manual, Apply. To start the service, choose Start.
  2. From the Windows control panel, enable network discovery.
  3. Create a batch script that mounts a file share from the storage server set up earlier. The file share is mounted automatically when a user connects to the AppStream 2.0 environment.

Logon Script Location: C:\Users\Public\logon.bat

Script Contents:

:loop

net use H: \\path\to\network\share 

PING localhost -n 30 >NUL

IF NOT EXIST H:\ GOTO loop

  1. Open gpedit.msc and choose User Configuration, Windows Settings, Scripts. Set logon.bat as the user logon script.
  2. Next, create a batch script that makes the mounted drive visible to the user.

Logon Script Location: C:\Users\Public\startup.bat

Script Contents:
REG DELETE “HKEY_LOCAL_MACHINE\Software\Microsoft\Windows\CurrentVersion\Policies\Explorer” /v “NoDrives” /f

  1. Open Task Scheduler and choose Create Task.
  2. Choose General, provide a task name, and then choose Change User or Group.
  3. For Enter the object name to select, enter SYSTEM and choose Check Names, OK.
  4. Choose Triggers, New. For Begin the task, choose At startup. Under Advanced Settings, change Delay task for to 5 minutes. Choose OK.
  5. Choose Actions, New. Under Settings, for Program/script, enter C:\Users\Public\startup.bat. Choose OK.
  6. Choose Conditions. Under Power, clear the Start the task only if the computer is on AC power Choose OK.
  7. To view your scheduled task, choose Task Scheduler Library. Close Task Scheduler when you are done.

Step 3:  Create an AppStream 2.0 fleet and stack

To create a fleet and stack

  1. In the AppStream 2.0 management console, choose Fleets, Create Fleet.
  2. Give the fleet a name, such as Graphics-Demo-Fleet, that uses the newly created image and the same VPC as your file server.
  3. Choose Stacks, Create Stack. Give the stack a name, such as Graphics-Demo-Stack.
  4. After the stack is created, select it and choose Actions, Associate Fleet. Associate the stack with the fleet you created in step 1.

Step 4:  Invite users to access Siemens NX through a browser

To invite users

  1. Choose User Pools, Create User to create users.
  2. Enter a name and email address for each user.
  3. Select the users just created, and choose Actions, Assign Stack to provide access to the stack created in step 2. You can also provide access using SAML 2.0 and connect to your Active Directory if necessary. For more information, see the Enabling Identity Federation with AD FS 3.0 and Amazon AppStream 2.0 post.

Your user receives an email invitation to set up an account and use a web portal to access the applications that you have included in your stack.

Step 5:  Validate the setup

Time for a test drive with Siemens NX on AppStream 2.0!

  1. Open the link for the AppStream 2.0 web portal shared through the email invitation. The web portal opens in your default browser. You must sign in with the temporary password and set a new password. After that, you get taken to your app catalog.
  2. Launch Siemens NX and interact with it using the demo files available in the shared storage folder – My Graphics Files. 

After I launched NX, I captured the screenshot below. The Siemens PLM team also recorded a video with NX running on AppStream 2.0.

Summary

In this post, I discussed the GPU instances available for delivering rich graphics applications to users in a web browser. While I demonstrated a simple setup, you can scale this out to launch a production environment with users signing in using Active Directory credentials,  accessing persistent storage with Amazon S3, and using other commonly requested features reviewed in the Amazon AppStream 2.0 Launch Recap – Domain Join, Simple Network Setup, and Lots More post.

To learn more about AppStream 2.0 and capabilities added this year, see Amazon AppStream 2.0 Resources.

The 4.13 kernel is out

Post Syndicated from corbet original https://lwn.net/Articles/732793/rss

Linus has released the 4.13 kernel, right on schedule.
Headline features in this release include
kernel hardening via structure layout
randomization
,
native TLS protocol support,
better huge-page swapping,
improved handling of writeback errors,
better asynchronous I/O support,
better power management via next-interrupt
prediction
,
the elimination of the DocBook toolchain for formatted documentation,
and more. There is one other change that is called out explicitly in the
announcement: “The change in question is simply changing the default cifs behavior:
instead of defaulting to SMB 1.0 (which you really should not use:
just google for ‘stop using SMB1’ or similar), the default cifs mount
now defaults to a rather more modern SMB 3.0.

Raspbian Stretch has arrived for Raspberry Pi

Post Syndicated from Simon Long original https://www.raspberrypi.org/blog/raspbian-stretch/

It’s now just under two years since we released the Jessie version of Raspbian. Those of you who know that Debian run their releases on a two-year cycle will therefore have been wondering when we might be releasing the next version, codenamed Stretch. Well, wonder no longer – Raspbian Stretch is available for download today!

Disney Pixar Toy Story Raspbian Stretch Raspberry Pi

Debian releases are named after characters from Disney Pixar’s Toy Story trilogy. In case, like me, you were wondering: Stretch is a purple octopus from Toy Story 3. Hi, Stretch!

The differences between Jessie and Stretch are mostly under-the-hood optimisations, and you really shouldn’t notice any differences in day-to-day use of the desktop and applications. (If you’re really interested, the technical details are in the Debian release notes here.)

However, we’ve made a few small changes to our image that are worth mentioning.

New versions of applications

Version 3.0.1 of Sonic Pi is included – this includes a lot of new functionality in terms of input/output. See the Sonic Pi release notes for more details of exactly what has changed.

Raspbian Stretch Raspberry Pi

The Chromium web browser has been updated to version 60, the most recent stable release. This offers improved memory usage and more efficient code, so you may notice it running slightly faster than before. The visual appearance has also been changed very slightly.

Raspbian Stretch Raspberry Pi

Bluetooth audio

In Jessie, we used PulseAudio to provide support for audio over Bluetooth, but integrating this with the ALSA architecture used for other audio sources was clumsy. For Stretch, we are using the bluez-alsa package to make Bluetooth audio work with ALSA itself. PulseAudio is therefore no longer installed by default, and the volume plugin on the taskbar will no longer start and stop PulseAudio. From a user point of view, everything should still work exactly as before – the only change is that if you still wish to use PulseAudio for some other reason, you will need to install it yourself.

Better handling of other usernames

The default user account in Raspbian has always been called ‘pi’, and a lot of the desktop applications assume that this is the current user. This has been changed for Stretch, so now applications like Raspberry Pi Configuration no longer assume this to be the case. This means, for example, that the option to automatically log in as the ‘pi’ user will now automatically log in with the name of the current user instead.

One other change is how sudo is handled. By default, the ‘pi’ user is set up with passwordless sudo access. We are no longer assuming this to be the case, so now desktop applications which require sudo access will prompt for the password rather than simply failing to work if a user without passwordless sudo uses them.

Scratch 2 SenseHAT extension

In the last Jessie release, we added the offline version of Scratch 2. While Scratch 2 itself hasn’t changed for this release, we have added a new extension to allow the SenseHAT to be used with Scratch 2. Look under ‘More Blocks’ and choose ‘Add an Extension’ to load the extension.

This works with either a physical SenseHAT or with the SenseHAT emulator. If a SenseHAT is connected, the extension will control that in preference to the emulator.

Raspbian Stretch Raspberry Pi

Fix for Broadpwn exploit

A couple of months ago, a vulnerability was discovered in the firmware of the BCM43xx wireless chipset which is used on Pi 3 and Pi Zero W; this potentially allows an attacker to take over the chip and execute code on it. The Stretch release includes a patch that addresses this vulnerability.

There is also the usual set of minor bug fixes and UI improvements – I’ll leave you to spot those!

How to get Raspbian Stretch

As this is a major version upgrade, we recommend using a clean image; these are available from the Downloads page on our site as usual.

Upgrading an existing Jessie image is possible, but is not guaranteed to work in every circumstance. If you wish to try upgrading a Jessie image to Stretch, we strongly recommend taking a backup first – we can accept no responsibility for loss of data from a failed update.

To upgrade, first modify the files /etc/apt/sources.list and /etc/apt/sources.list.d/raspi.list. In both files, change every occurrence of the word ‘jessie’ to ‘stretch’. (Both files will require sudo to edit.)

Then open a terminal window and execute

sudo apt-get update
sudo apt-get -y dist-upgrade

Answer ‘yes’ to any prompts. There may also be a point at which the install pauses while a page of information is shown on the screen – hold the ‘space’ key to scroll through all of this and then hit ‘q’ to continue.

Finally, if you are not using PulseAudio for anything other than Bluetooth audio, remove it from the image by entering

sudo apt-get -y purge pulseaudio*

The post Raspbian Stretch has arrived for Raspberry Pi appeared first on Raspberry Pi.

ВАС, тричленен състав: Отнемането на лицензията на БиБиТи незаконно

Post Syndicated from nellyo original https://nellyo.wordpress.com/2017/08/16/cem_bbt-2/

Както вече е известно, през септември 2016 г. Съветът за електронни медии отне лицензиите за телевизионна дейност  на две търговски дружества  – ТВ Седем и Балкан Българска Телевизия.

На 7 август 2017 г.  петчленен състав на ВАС потвърди отнемането на лицензиите на ТВ Седем за две програми. Решението е окончателно.

На 14 август 2017 г. тричленен състав на ВАС с Решение 10470 се произнася и по решението на СЕМ за лицензията на БиБиТи  ЕАД  –  търговски доставчик на медийни услуги, притежаващ Индивидуална лицензия № ЛРР-01-3-016-01 за доставяне на аудио-визуална услуга с наименование  News 7.

За правното основание, възприето от СЕМ –  неверни декларации  – съдът пише следното:

В конкретния случай повече от очевидно е, че процесният казус не третира отказ за издаване на лицензия,а за прекратяването на вече издадена такава.Прекратяване и отнемането на лицензията, като отделни регулаторни правомощия на СЕМ са обект на регламентация в разпоредбите на чл. 121 и 122 ЗРТ, и в този смисъл е налице ясна и конкретна нормативна регулация на двете хипотези и те не следва да се извличат по тълкувателен път. Нито една от двете разпоредби не предвижда откриване на производство по несъстоятелност като основание за отнемане или прекратяване на вече издадена лицензия за доставяне на аудио-визуална услуга.

 
Съвсем логично

Настъпилите в последствие обстоятелства в правната сфера на лицензианта,не могат да бъдат приравнени на невярно деклариране към момента на кандидатстването за лицензията. Декларацията представлява документ с официален характер, който удостоверява факти и обстоятелства за предходен или настоящият момент. Чл.111, ал. 1, т.6 ЗРТ изрично предвижда кандидатите да декларират, ”че не са налице” а не, че няма да настъпят определени обстоятелства. Декларацията за наличие на конкретни обстоятелства няма характер на обещание занапред.

Съдът

ОТМЕНЯ Решение № РД-05-143 от 13.09.2016г. на Съвета за електронни медии с което се отнема и прекратява индивидуална лицензия № ЛЛР-01-3-016-01 за доставяне на аудио-визуална услуга с наименование News 7, издадена на Балкан Българска Телевизия ЕАД.

РЕШЕНИЕТО подлежи на обжалване пред петчленен състав на Върховния административен съд в 14-дневен срок от деня на съобщаването му на страните по делото, че е изготвено.

В някои медии неточно са приели, че решението за ТВ Седем, което наистина е окончателно, се отнася и до БиБиТи.

Filed under: BG Law Making, BG Media, BG Regulator, Media Law

timeShift(GrafanaBuzz, 1w) Issue 5

Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2017/07/21/timeshiftgrafanabuzz-1w-issue-5/

We cover a lot of ground in this week’s timeShift. From diving into building your own plugin, finding the right dashboard, configuration options in the alerting feature, to monitoring your local weather, there’s something for everyone. Are you writing an article about Grafana, or have you come across an article you found interesting? Please get in touch, we’ll add it to our roundup.


From the Blogosphere

  • Going open-source in monitoring, part III: 10 most useful Grafana dashboards to monitor Kubernetes and services: We have hundreds of pre-made dashboards ready for you to install into your on-prem or hosted Grafana, but not every one will fit your specific monitoring needs. In part three of the series, Sergey discusses is experiences with finding useful dashboards and shows off ten of the best dashboards you can install for monitoring Kubernetes clusters and the services deployed on them.

  • Using AWS Lambda and API gateway for server-less Grafana adapters: Sometimes you’ll want to visualize metrics from a data source that may not yet be supported in Grafana natively. With the plugin functionality introduced in Grafana 3.0, anyone can create their own data sources. Using the SimpleJson data source, Jonas describes how he used AWS Lambda and AWS API gateway to write data source adapters for Grafana.

  • How to Use Grafana to Monitor JMeter Non-GUI Results – Part 2: A few issues ago we listed an article for using Grafana to monitor JMeter Non-GUI results, which required a number of non-trivial steps to complete. This article shows of an easier way to accomplish this that doesn’t require any additional configuration of InfluxDB.

  • Programming your Personal Weather Chart: It’s always great to see Grafana used outside of the typical dev-ops usecase. This article runs you through the steps to create your own weather chart and show off your local weather stats in Grafana. BONUS: Rob shows off a magic mirror he created, which can display this data.

  • vSphere Performance data – Part 6 – The Dashboard(s): This 6-part series goes into a ton of detail and walks you through the various methods of retrieving vSphere performance data, storing the data in a TSDB, and creating dashboards for the metrics. Part 6 deals specifically with Grafana, but I highly recommend reading all of the articles, as it chronicles the journey of metrics exploration, storage, and visualization from someone who had no prior experience with time series data.

  • Alerting in Grafana: Alerting in Grafana is a fairly new feature and one that we’re continuing to iterate on. We’re soon adding additional data source support, new notification channels, clustering, silencing rules, and more. This article steps you through all the configuration options to get you to your first alert.


Plugins and Dashboards

It can seem like work slows during July and August, but we’re still seeing a lot of activity in the community. This week we have a new graph panel to show off that gives you some unique looking dashboards, and an update to the Zabbix data source, which adds some really great features. You can install both of the plugins now on your on-prem Grafana via our cli, or with one-click on GrafanaCloud.

NEW PLUGIN

Bubble Chart Panel This super-cool looking panel groups your tag values into clusters of circles. The size of the circle represents the aggregated value of the time series data. There are also multiple color schemes to make those bubbles POP (pun intended)! Currently it works against OpenTSDB and Bosun, so give it a try!

Install Now

UPDATED PLUGIN

Zabbix Alex has been hard at work, making improvements on the Zabbix App for Grafana. This update adds annotations, template variables, alerting and more. Thanks Alex! If you’d like to try out the app, head over to http://play.grafana-zabbix.org/dashboard/db/zabbix-db-mysql?orgId=2

Install 3.5.1 Now


This week’s MVC (Most Valuable Contributor)

Open source software can’t thrive without the contributions from the community. Each week we’ll recognize a Grafana contributor and thank them for all of their PRs, bug reports and feedback.

mk-dhia (Dhia)
Thank you so much for your improvements to the Elasticsearch data source!


Tweet of the Week

We scour Twitter each week to find an interesting/beautiful dashboard and show it off! #monitoringLove

This week’s tweet comes from @geek_dave

Great looking dashboard Dave! And thank you for adding new features and keeping it updated. It’s creators like you who make the dashboard repository so awesome!


Upcoming Events

We love when people talk about Grafana at meetups and conferences.

Monday, July 24, 2017 – 7:30pm | Google Campus Warsaw


Ząbkowska 27/31, Warsaw, Poland

Iot & HOME AUTOMATION #3 openHAB, InfluxDB, Grafana:
If you are interested in topics of the internet of things and home automation, this might be a good occasion to meet people similar to you. If you are into it, we will also show you how we can all work together on our common projects.

RSVP


Tell us how we’re Doing.

We’d love your feedback on what kind of content you like, length, format, etc – so please keep the comments coming! You can submit a comment on this article below, or post something at our community forum. Help us make this better.

Follow us on Twitter, like us on Facebook, and join the Grafana Labs community.

Burner laptops for DEF CON

Post Syndicated from Robert Graham original http://blog.erratasec.com/2017/07/burner-laptops-for-def-con.html

Hacker summer camp (Defcon, Blackhat, BSidesLV) is upon us, so I thought I’d write up some quick notes about bringing a “burner” laptop. Chrome is your best choice in terms of security, but I need Windows/Linux tools, so I got a Windows laptop.

I chose the Asus e200ha for $199 from Amazon with free (and fast) shipping. There are similar notebooks with roughly the same hardware and price from other manufacturers (HP, Dell, etc.), so I’m not sure how this compares against those other ones. However, it fits my needs as a “burner” laptop, namely:

  • cheap
  • lasts 10 hours easily on battery
  • weighs 2.2 pounds (1 kilogram)
  • 11.6 inch and thin

Some other specs are:

  • 4 gigs of RAM
  • 32 gigs of eMMC flash memory
  • quad core 1.44 GHz Intel Atom CPU
  • Windows 10
  • free Microsoft Office 365 for one year
  • good, large keyboard
  • good, large touchpad
  • USB 3.0
  • microSD
  • WiFi ac
  • no fans, completely silent

There are compromises, of course.

  • The Atom CPU is slow, thought it’s only noticeable when churning through heavy webpages. Adblocking addons or Brave are a necessity. Most things are usably fast, such as using Microsoft Word.
  • Crappy sound and video, though VLC does a fine job playing movies with headphones on the airplane. Using in bright sunlight will be difficult.
  • micro-HDMI, keep in mind if intending to do presos from it, you’ll need an HDMI adapter
  • It has limited storage, 32gigs in theory, about half that usable.
  • Does special Windows 10 compressed install that you can’t actually upgrade without a completely new install. It doesn’t have the latest Windows 10 Creators update. I lost a gig thinking I could compress system files.

Copying files across the 802.11ac WiFi to the disk was quite fast, several hundred megabits-per-second. The eMMC isn’t as fast as an SSD, but its a lot faster than typical SD card speeds.

The first thing I did once I got the notebook was to install the free VeraCrypt full disk encryption. The CPU has AES acceleration, so it’s fast. There is a problem with the keyboard driver during boot that makes it really hard to enter long passwords — you have to carefully type one key at a time to prevent extra keystrokes from being entered.

You can’t really install Linux on this computer, but you can use virtual machines. I installed VirtualBox and downloaded the Kali VM. I had some problems attaching USB devices to the VM. First of all, VirtualBox requires a separate downloaded extension to get USB working. Second, it conflicts with USBpcap that I installed for Wireshark.

It comes with one year of free Office 365. Obviously, Microsoft is hoping to hook the user into a longer term commitment, but in practice next year at this time I’d get another burner $200 laptop rather than spend $99 on extending the Office 365 license.

Let’s talk about the CPU. It’s Intel’s “Atom” processor, not their mainstream (Core i3 etc.) processor. Even though it has roughly the same GHz as the processor in a 11inch MacBook Air and twice the cores, it’s noticeably and painfully slower. This is especially noticeable on ad-heavy web pages, while other things seem to work just fine. It has hardware acceleration for most video formats, though I had trouble getting Netflix to work.

The tradeoff for a slow CPU is phenomenal battery life. It seems to last forever on battery. It’s really pretty cool.

Conclusion

A Chromebook is likely more secure, but for my needs, this $200 is perfect.

New – API & CloudFormation Support for Amazon CloudWatch Dashboards

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-api-cloudformation-support-for-amazon-cloudwatch-dashboards/

We launched CloudWatch Dashboards a couple of years ago. In the post that I wrote for the launch, I showed you how to interactively create a dashboard that displayed chosen CloudWatch metrics in graphical form. After the launch, we added additional features including a full screen mode, a dark theme, control over the range of the Y axis, simplified renaming, persistent storage, and new visualization options.

New API & CLI
While console support is wonderful for interactive use, many customers have asked us to support programmatic creation and manipulation of dashboards and the widgets within. They would like to dynamically build and maintain dashboards, adding and removing widgets as the corresponding AWS resources are created and destroyed. Other customers are interested in setting up and maintaining a consistent set of dashboards across two or more AWS accounts.

I am happy to announce that API, CLI, and AWS CloudFormation support for CloudWatch Dashboards is available now and that you can start using it today!

There are four new API functions (and equivalent CLI commands):

ListDashboards / aws cloudwatch list-dashboards – Fetch a list of all dashboards within an account, or a subset that share a common prefix.

GetDashboard / aws cloudwatch get-dashboard – Fetch details for a single dashboard.

PutDashboard / aws cloudwatch put-dashboard – Create a new dashboard or update an existing one.

DeleteDashboards / aws cloudwatch delete-dashboards – Delete one or more dashboards.

Dashboard Concepts
I want to show you how to use these functions and commands. Before I dive in, I should review a couple of important dashboard concepts and attributes.

Global – Dashboards are part of an AWS account, and are not associated with a specific AWS Region. Each account can have up to 500 dashboards.

Named – Each dashboard has a name that is unique within the AWS account. Names can be up to 255 characters long.

Grid Model – Each dashboard is composed of a grid of cells. The grid is 24 cells across and as tall as necessary. Each widget on the dashboard is positioned at a particular set of grid coordinates, and has a size that spans an integral number of grid cells.

Widgets (Visualizations) – Each widget can display text or a set of CloudWatch metrics. Text is specified using Markdown; metrics can be displayed as single values, line charts, or stacked area charts. Each dashboard can have up to 100 widgets. Widgets that display metrics can also be associated with a CloudWatch Alarm.

Dashboards have a JSON representation that you can now see and edit from within the console. Simply click on the Action menu and choose View/edit source:

Here’s the source for my dashboard:

You can use this JSON as a starting point for your own applications. As you can see, there’s an entry in the widgets array for each widget on the dashboard; each entry describes one widget, starting with its type, position, and size.

Creating a Dashboard Using the API
Let’s say I want to create a dashboard that has a widget for each of my EC2 instances in a particular region. I’ll use Python and the AWS SDK for Python, and start as follows (excuse the amateur nature of my code):

import boto3
import json

cw  = boto3.client("cloudwatch")
ec2 = boto3.client("ec2")

x, y          = [0, 0]
width, height = [3, 3]
max_width     = 12
widgets       = []

Then I simply iterate over the instances, creating a widget dictionary for each one, and appending it to the widgets array:

instances = ec2.describe_instances()
for r in instances['Reservations']:
    for i in r['Instances']:

        widget = {'type'      : 'metric',
                  'x'         : x,
                  'y'         : y,
                  'height'    : height,
                  'width'     : width,
                  'properties': {'view'    : 'timeSeries',
                                 'stacked' : False,
                                 'metrics' : [['AWS/EC2', 'NetworkIn', 'InstanceId', i['InstanceId']],
                                              ['.',       'NetworkOut', '.',         '.']
                                             ],
                                 'period'  : 300,
                                 'stat'    : 'Average',
                                 'region'  : 'us-east-1',
                                 'title'   : i['InstanceId']
                                }
                 }

        widgets.append(widget)

I update the position (x and y) within the loop, and form a grid (if I don’t specify positions, the widgets will be laid out left to right, top to bottom):

        x += width
        if (x + width > max_width):
            x = 0
            y += height

After I have processed all of the instances, I create a JSON version of the widget array:

body   = {'widgets' : widgets}
body_j = json.dumps(body)

And I create or update my dashboard:

cw.put_dashboard(DashboardName = "EC2_Networking",
                 DashboardBody = body_j)

I run the code, and get the following dashboard:

The CloudWatch team recommends that dashboards created programmatically include a text widget indicating that the dashboard was generated automatically, along with a link to the source code or CloudFormation template that did the work. This will discourage users from making manual, out-of-band changers to the dashboards.

As I mentioned earlier, each metric widget can also be associated with a CloudWatch Alarm. You can create the alarms programmatically or by using a CloudFormation template such as the Sample CPU Utilization Alarm. If you decide to do this, the alarm threshold will be displayed in the widget. To learn more about this, read Tara Walker’s recent post, Amazon CloudWatch Launches Alarms on Dashboards.

Going one step further, I could use CloudWatch Events and a Lamba Function to track the creation and deletion of certain resources and update a dashboard in concert with the changes. To learn how to do this, read Keeping CloudWatch Dashboards up to Date Using AWS Lambda.

Accessing a Dashboard Using the CLI
I can also access and manipulate my dashboards from the command line. For example, I can generate a simple list:

$ aws cloudwatch list-dashboards --output table
----------------------------------------------
|               ListDashboards               |
+--------------------------------------------+
||             DashboardEntries             ||
|+-----------------+----------------+-------+|
||  DashboardName  | LastModified   | Size  ||
|+-----------------+----------------+-------+|
||  Disk-Metrics   |  1496405221.0  |  316  ||
||  EC2_Networking |  1498090434.0  |  2830 ||
||  Main-Metrics   |  1498085173.0  |  234  ||
|+-----------------+----------------+-------+|

And I can get rid of the Disk-Metrics dashboard:

$ aws cloudwatch delete-dashboards --dashboard-names Disk-Metrics

I can also retrieve the JSON that defines a dashboard:

Creating a Dashboard Using CloudFormation
Dashboards can also be specified in CloudFormation templates. Here’s a simple template in YAML (the DashboardBody is still specified in JSON):

Resources:
  MyDashboard:
    Type: "AWS::CloudWatch::Dashboard"
    Properties:
      DashboardName: SampleDashboard
      DashboardBody: '{"widgets":[{"type":"text","x":0,"y":0,"width":6,"height":6,"properties":{"markdown":"Hi there from CloudFormation"}}]}'

I place the template in a file and then create a stack using the console or the CLI:

$ aws cloudformation create-stack --stack-name MyDashboard --template-body file://dash.yaml
{
    "StackId": "arn:aws:cloudformation:us-east-1:xxxxxxxxxxxx:stack/MyDashboard/a2a3fb20-5708-11e7-8ffd-500c21311262"
}

Here’s the dashboard:

Available Now
This feature is available now and you can start using it today. You can create 3 dashboards with up to 50 metrics per dashboard at no charge; additional dashboards are priced at $3 per month, as listed on the CloudWatch Pricing page. You can make up to 1 million calls to the new API functions each month at no charge; beyond that you pay $.01 for every 1,000 calls.

Jeff;

Pi-powered hands-on statistical model at the Royal Society

Post Syndicated from Janina Ander original https://www.raspberrypi.org/blog/royal-society-galton-board/

Physics! Particles! Statistical modelling! Quantum theory! How can non-scientists understand any of it? Well, students from Durham University are here to help you wrap your head around it all – and to our delight, they’re using the power of the Raspberry Pi to do it!

At the Royal Society’s Summer Science Exhibition, taking place in London from 4-9 July, the students are presenting a Pi-based experiment demonstrating the importance of statistics in their field of research.

Modelling the invisible – Summer Science Exhibition 2017

The Royal Society Summer Science Exhibition 2017 features 22 exhibits of cutting-edge, hands-on UK science , along with special events and talks. You can meet the scientists behind the research. Find out more about the exhibition at our website: https://royalsociety.org/science-events-and-lectures/2017/summer-science-exhibition/

Ramona, Matthew, and their colleagues are particle physicists keen to bring their science to those of us whose heads start to hurt as soon as we hear the word ‘subatomic’. In their work, they create computer models of subatomic particles to make predictions about real-world particles. Their models help scientists to design better experiments and to improve sensor calibrations. If this doesn’t sound straightforward to you, never fear – this group of scientists has set out to show exactly how statistical models are useful.

The Galton board model

They’ve built a Pi-powered Galton board, also called a bean machine (much less intimidating, I think). This is an upright board, shaped like an upside-down funnel, with nails hammered into it. Drop a ball in at the top, and it will randomly bounce off the nails on its way down. How the nails are spread out determines where a ball is most likely to land at the bottom of the board.

If you’re having trouble picturing this, you can try out an online Galton board. Go ahead, I’ll wait.

You’re back? All clear? Great!

Now, if you drop 100 balls down the board and collect them at the bottom, the result might look something like this:

Galton board

By Antoine Taveneaux CC BY-SA 3.0

The distribution of the balls is determined by the locations of the nails in the board. This means that, if you don’t know where the nails are, you can look at the distribution of balls to figure out where they are most likely to be located. And you’ll be able to do all this using … statistics!!!

Statistical models

Similarly, how particles behave is determined by the laws of physics – think of the particles as balls, and laws of physics as nails. Physicists can observe the behaviour of particles to learn about laws of physics, and create statistical models simulating the laws of physics to predict the behaviour of particles.

I can hear you say, “Alright, thanks for the info, but how does the Raspberry Pi come into this?” Don’t worry – I’m getting to that.

Modelling the invisible – the interactive exhibit

As I said, Ramona and the other physicists have not created a regular old Galton board. Instead, this one records where the balls land using a Raspberry Pi, and other portable Pis around the exhibition space can access the records of the experimental results. These Pis in turn run Galton board simulators, and visitors can use them to recreate a virtual Galton board that produces the same results as the physical one. Then, they can check whether their model board does, in fact, look like the one the physicists built. In this way, people directly experience the relationship between statistical models and experimental results.

Hurrah for science!

The other exhibit the Durham students will be showing is a demo dark matter detector! So if you decide to visit the Summer Science Exhibition, you will also have the chance to learn about the very boundaries of human understanding of the cosmos.

The Pi in museums

At the Raspberry Pi Foundation, education is our mission, and of course we love museums. It is always a pleasure to see our computers incorporated into exhibits: the Pi-powered visual theremin teaches visitors about music; the Museum in a Box uses Pis to engage people in hands-on encounters with exhibits; and this Pi is itself a museum piece! If you want to learn more about Raspberry Pis and museums, you can listen to this interview with Pi Towers’ social media maestro Alex Bate.

It’s amazing that our tech is used to educate people in areas beyond computer science. If you’ve created a pi-powered educational project, please share it with us in the comments.

The post Pi-powered hands-on statistical model at the Royal Society appeared first on Raspberry Pi.

Calibre 3.0 released

Post Syndicated from corbet original https://lwn.net/Articles/725588/rss

Version 3.0 of the
calibre electronic-book reader has been released. “It has been almost three years since calibre 2.0. In that time lots has happened. The biggest new feature, which was in development for almost that entire period, is a completely re-written calibre Content server.

The Content server allows you to wirelessly browse your calibre books on
any modern phone/tablet and even read the books right in your phone
browser.” Other additions include support for high-DPI screens and
support for multiple icon themes.