Tag Archives: ads

Epic Games Sues Man Over Bitcoin Mining Fortnite ‘Cheat’

Post Syndicated from Ernesto original https://torrentfreak.com/epic-games-sues-man-over-bitcoin-mining-fortnite-cheat-171019/

A few weeks ago, Epic Games released Fortnite’s free-to-play “Battle Royale” game mode for the PC and other platforms, generating massive interest among gamers.

The release also attracted attention from thousands of cheaters, many of whom were subsequently banned. In addition, Epic Games went a step further by taking several cheaters to court over copyright infringement.

This week the North Carolina-based game developer continued its a war against cheaters. In a new lawsuit, it targets two other cheaters who promoted their hacks through YouTube videos.

One of the defendants is a Swedish resident, Mr. Josefson. He created a cheat and promoted it in various videos, adding instructions on how to download and install it. In common with the previous defendants, he is being sued for copyright infringement.

The second cheater listed in the complaint, a Russian man named Mr. Yakovenko, is more unique. This man also promoted his Fortnite cheats through a series of YouTube videos, but they weren’t very effective.

When Epic downloaded the ‘cheat’ to see how it works, all they got was a Bitcoin miner.

“Epic downloaded the purported cheat from the links provided in Yakovenko’s YouTube videos. While the ‘cheat’ does not appear to be a functional Fortnite cheat, it functions as a bitcoin miner that infects the user’s computer with a virus that causes the user’s computer to mine bitcoin for the benefit of an unknown third party,” the complaint reads.

Epic ‘cheat’

Despite the non-working cheat, Epic Games maintains that Yakovenko created a cheat for Fortnite’s Battle Royale game mode, pointing to a YouTube video he posted last month.

“The First Yakovenko video and associated post contained instructions on how to download and install the cheat and showed full screen gameplay using the purported cheat,” the complaint reads.

All the videos have since been removed following takedown notices from Epic. Through the lawsuit, the game developer now hopes to get compensation for the damages it suffered.

In addition to the copyright infringement claims the two men are also accused of trademark infringement, unfair competition, and breach of contract.

There’s little doubt that Epic Games is doing its best to hold cheaters accountable. However, the problem is not easy to contain. A simple search for Fortnite Hack or Fortnite Cheat still yields tens of thousands of results, with new videos being added continuously.

A copy of the full complaint against Josefson and Yakovenko is available here (pdf).

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Getting Ready for AWS re:Invent 2017

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/getting-ready-for-aws-reinvent-2017/

With just 40 days remaining before AWS re:Invent begins, my colleagues and I want to share some tips that will help you to make the most of your time in Las Vegas. As always, our focus is on training and education, mixed in with some after-hours fun and recreation for balance.

Locations, Locations, Locations
The re:Invent Campus will span the length of the Las Vegas strip, with events taking place at the MGM Grand, Aria, Mirage, Venetian, Palazzo, the Sands Expo Hall, the Linq Lot, and the Encore. Each venue will host tracks devoted to specific topics:

MGM Grand – Business Apps, Enterprise, Security, Compliance, Identity, Windows.

Aria – Analytics & Big Data, Alexa, Container, IoT, AI & Machine Learning, and Serverless.

Mirage – Bootcamps, Certifications & Certification Exams.

Venetian / Palazzo / Sands Expo Hall – Architecture, AWS Marketplace & Service Catalog, Compute, Content Delivery, Database, DevOps, Mobile, Networking, and Storage.

Linq Lot – Alexa Hackathons, Gameday, Jam Sessions, re:Play Party, Speaker Meet & Greets.

EncoreBookable meeting space.

If your interests span more than one topic, plan to take advantage of the re:Invent shuttles that will be making the rounds between the venues.

Lots of Content
The re:Invent Session Catalog is now live and you should start to choose the sessions of interest to you now.

With more than 1100 sessions on the agenda, planning is essential! Some of the most popular “deep dive” sessions will be run more than once and others will be streamed to overflow rooms at other venues. We’ve analyzed a lot of data, run some simulations, and are doing our best to provide you with multiple opportunities to build an action-packed schedule.

We’re just about ready to let you reserve seats for your sessions (follow me and/or @awscloud on Twitter for a heads-up). Based on feedback from earlier years, we have fine-tuned our seat reservation model. This year, 75% of the seats for each session will be reserved and the other 25% are for walk-up attendees. We’ll start to admit walk-in attendees 10 minutes before the start of the session.

Las Vegas never sleeps and neither should you! This year we have a host of late-night sessions, workshops, chalk talks, and hands-on labs to keep you busy after dark.

To learn more about our plans for sessions and content, watch the Get Ready for re:Invent 2017 Content Overview video.

Have Fun
After you’ve had enough training and learning for the day, plan to attend the Pub Crawl, the re:Play party, the Tatonka Challenge (two locations this year), our Hands-On LEGO Activities, and the Harley Ride. Stay fit with our 4K Run, Spinning Challenge, Fitness Bootcamps, and Broomball (a longstanding Amazon tradition).

See You in Vegas
As always, I am looking forward to meeting as many AWS users and blog readers as possible. Never hesitate to stop me and to say hello!

Jeff;

 

 

Implementing Default Directory Indexes in Amazon S3-backed Amazon CloudFront Origins Using [email protected]

Post Syndicated from Ronnie Eichler original https://aws.amazon.com/blogs/compute/implementing-default-directory-indexes-in-amazon-s3-backed-amazon-cloudfront-origins-using-lambdaedge/

With the recent launch of [email protected], it’s now possible for you to provide even more robust functionality to your static websites. Amazon CloudFront is a content distribution network service. In this post, I show how you can use [email protected] along with the CloudFront origin access identity (OAI) for Amazon S3 and still provide simple URLs (such as www.example.com/about/ instead of www.example.com/about/index.html).

Background

Amazon S3 is a great platform for hosting a static website. You don’t need to worry about managing servers or underlying infrastructure—you just publish your static to content to an S3 bucket. S3 provides a DNS name such as <bucket-name>.s3-website-<AWS-region>.amazonaws.com. Use this name for your website by creating a CNAME record in your domain’s DNS environment (or Amazon Route 53) as follows:

www.example.com -> <bucket-name>.s3-website-<AWS-region>.amazonaws.com

You can also put CloudFront in front of S3 to further scale the performance of your site and cache the content closer to your users. CloudFront can enable HTTPS-hosted sites, by either using a custom Secure Sockets Layer (SSL) certificate or a managed certificate from AWS Certificate Manager. In addition, CloudFront also offers integration with AWS WAF, a web application firewall. As you can see, it’s possible to achieve some robust functionality by using S3, CloudFront, and other managed services and not have to worry about maintaining underlying infrastructure.

One of the key concerns that you might have when implementing any type of WAF or CDN is that you want to force your users to go through the CDN. If you implement CloudFront in front of S3, you can achieve this by using an OAI. However, in order to do this, you cannot use the HTTP endpoint that is exposed by S3’s static website hosting feature. Instead, CloudFront must use the S3 REST endpoint to fetch content from your origin so that the request can be authenticated using the OAI. This presents some challenges in that the REST endpoint does not support redirection to a default index page.

CloudFront does allow you to specify a default root object (index.html), but it only works on the root of the website (such as http://www.example.com > http://www.example.com/index.html). It does not work on any subdirectory (such as http://www.example.com/about/). If you were to attempt to request this URL through CloudFront, CloudFront would do a S3 GetObject API call against a key that does not exist.

Of course, it is a bad user experience to expect users to always type index.html at the end of every URL (or even know that it should be there). Until now, there has not been an easy way to provide these simpler URLs (equivalent to the DirectoryIndex Directive in an Apache Web Server configuration) to users through CloudFront. Not if you still want to be able to restrict access to the S3 origin using an OAI. However, with the release of [email protected], you can use a JavaScript function running on the CloudFront edge nodes to look for these patterns and request the appropriate object key from the S3 origin.

Solution

In this example, you use the compute power at the CloudFront edge to inspect the request as it’s coming in from the client. Then re-write the request so that CloudFront requests a default index object (index.html in this case) for any request URI that ends in ‘/’.

When a request is made against a web server, the client specifies the object to obtain in the request. You can use this URI and apply a regular expression to it so that these URIs get resolved to a default index object before CloudFront requests the object from the origin. Use the following code:

'use strict';
exports.handler = (event, context, callback) => {
    
    // Extract the request from the CloudFront event that is sent to [email protected] 
    var request = event.Records[0].cf.request;

    // Extract the URI from the request
    var olduri = request.uri;

    // Match any '/' that occurs at the end of a URI. Replace it with a default index
    var newuri = olduri.replace(/\/$/, '\/index.html');
    
    // Log the URI as received by CloudFront and the new URI to be used to fetch from origin
    console.log("Old URI: " + olduri);
    console.log("New URI: " + newuri);
    
    // Replace the received URI with the URI that includes the index page
    request.uri = newuri;
    
    // Return to CloudFront
    return callback(null, request);

};

To get started, create an S3 bucket to be the origin for CloudFront:

Create bucket

On the other screens, you can just accept the defaults for the purposes of this walkthrough. If this were a production implementation, I would recommend enabling bucket logging and specifying an existing S3 bucket as the destination for access logs. These logs can be useful if you need to troubleshoot issues with your S3 access.

Now, put some content into your S3 bucket. For this walkthrough, create two simple webpages to demonstrate the functionality:  A page that resides at the website root, and another that is in a subdirectory.

<s3bucketname>/index.html

<!doctype html>
<html>
    <head>
        <meta charset="utf-8">
        <title>Root home page</title>
    </head>
    <body>
        <p>Hello, this page resides in the root directory.</p>
    </body>
</html>

<s3bucketname>/subdirectory/index.html

<!doctype html>
<html>
    <head>
        <meta charset="utf-8">
        <title>Subdirectory home page</title>
    </head>
    <body>
        <p>Hello, this page resides in the /subdirectory/ directory.</p>
    </body>
</html>

When uploading the files into S3, you can accept the defaults. You add a bucket policy as part of the CloudFront distribution creation that allows CloudFront to access the S3 origin. You should now have an S3 bucket that looks like the following:

Root of bucket

Subdirectory in bucket

Next, create a CloudFront distribution that your users will use to access the content. Open the CloudFront console, and choose Create Distribution. For Select a delivery method for your content, under Web, choose Get Started.

On the next screen, you set up the distribution. Below are the options to configure:

  • Origin Domain Name:  Select the S3 bucket that you created earlier.
  • Restrict Bucket Access: Choose Yes.
  • Origin Access Identity: Create a new identity.
  • Grant Read Permissions on Bucket: Choose Yes, Update Bucket Policy.
  • Object Caching: Choose Customize (I am changing the behavior to avoid having CloudFront cache objects, as this could affect your ability to troubleshoot while implementing the Lambda code).
    • Minimum TTL: 0
    • Maximum TTL: 0
    • Default TTL: 0

You can accept all of the other defaults. Again, this is a proof-of-concept exercise. After you are comfortable that the CloudFront distribution is working properly with the origin and Lambda code, you can re-visit the preceding values and make changes before implementing it in production.

CloudFront distributions can take several minutes to deploy (because the changes have to propagate out to all of the edge locations). After that’s done, test the functionality of the S3-backed static website. Looking at the distribution, you can see that CloudFront assigns a domain name:

CloudFront Distribution Settings

Try to access the website using a combination of various URLs:

http://<domainname>/:  Works

› curl -v http://d3gt20ea1hllb.cloudfront.net/
*   Trying 54.192.192.214...
* TCP_NODELAY set
* Connected to d3gt20ea1hllb.cloudfront.net (54.192.192.214) port 80 (#0)
> GET / HTTP/1.1
> Host: d3gt20ea1hllb.cloudfront.net
> User-Agent: curl/7.51.0
> Accept: */*
>
< HTTP/1.1 200 OK
< ETag: "cb7e2634fe66c1fd395cf868087dd3b9"
< Accept-Ranges: bytes
< Server: AmazonS3
< X-Cache: Miss from cloudfront
< X-Amz-Cf-Id: -D2FSRwzfcwyKZKFZr6DqYFkIf4t7HdGw2MkUF5sE6YFDxRJgi0R1g==
< Content-Length: 209
< Content-Type: text/html
< Last-Modified: Wed, 19 Jul 2017 19:21:16 GMT
< Via: 1.1 6419ba8f3bd94b651d416054d9416f1e.cloudfront.net (CloudFront), 1.1 iad6-proxy-3.amazon.com:80 (Cisco-WSA/9.1.2-010)
< Connection: keep-alive
<
<!doctype html>
<html>
    <head>
        <meta charset="utf-8">
        <title>Root home page</title>
    </head>
    <body>
        <p>Hello, this page resides in the root directory.</p>
    </body>
</html>
* Curl_http_done: called premature == 0
* Connection #0 to host d3gt20ea1hllb.cloudfront.net left intact

This is because CloudFront is configured to request a default root object (index.html) from the origin.

http://<domainname>/subdirectory/:  Doesn’t work

› curl -v http://d3gt20ea1hllb.cloudfront.net/subdirectory/
*   Trying 54.192.192.214...
* TCP_NODELAY set
* Connected to d3gt20ea1hllb.cloudfront.net (54.192.192.214) port 80 (#0)
> GET /subdirectory/ HTTP/1.1
> Host: d3gt20ea1hllb.cloudfront.net
> User-Agent: curl/7.51.0
> Accept: */*
>
< HTTP/1.1 200 OK
< ETag: "d41d8cd98f00b204e9800998ecf8427e"
< x-amz-server-side-encryption: AES256
< Accept-Ranges: bytes
< Server: AmazonS3
< X-Cache: Miss from cloudfront
< X-Amz-Cf-Id: Iqf0Gy8hJLiW-9tOAdSFPkL7vCWBrgm3-1ly5tBeY_izU82ftipodA==
< Content-Length: 0
< Content-Type: application/x-directory
< Last-Modified: Wed, 19 Jul 2017 19:21:24 GMT
< Via: 1.1 6419ba8f3bd94b651d416054d9416f1e.cloudfront.net (CloudFront), 1.1 iad6-proxy-3.amazon.com:80 (Cisco-WSA/9.1.2-010)
< Connection: keep-alive
<
* Curl_http_done: called premature == 0
* Connection #0 to host d3gt20ea1hllb.cloudfront.net left intact

If you use a tool such like cURL to test this, you notice that CloudFront and S3 are returning a blank response. The reason for this is that the subdirectory does exist, but it does not resolve to an S3 object. Keep in mind that S3 is an object store, so there are no real directories. User interfaces such as the S3 console present a hierarchical view of a bucket with folders based on the presence of forward slashes, but behind the scenes the bucket is just a collection of keys that represent stored objects.

http://<domainname>/subdirectory/index.html:  Works

› curl -v http://d3gt20ea1hllb.cloudfront.net/subdirectory/index.html
*   Trying 54.192.192.130...
* TCP_NODELAY set
* Connected to d3gt20ea1hllb.cloudfront.net (54.192.192.130) port 80 (#0)
> GET /subdirectory/index.html HTTP/1.1
> Host: d3gt20ea1hllb.cloudfront.net
> User-Agent: curl/7.51.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Thu, 20 Jul 2017 20:35:15 GMT
< ETag: "ddf87c487acf7cef9d50418f0f8f8dae"
< Accept-Ranges: bytes
< Server: AmazonS3
< X-Cache: RefreshHit from cloudfront
< X-Amz-Cf-Id: bkh6opXdpw8pUomqG3Qr3UcjnZL8axxOH82Lh0OOcx48uJKc_Dc3Cg==
< Content-Length: 227
< Content-Type: text/html
< Last-Modified: Wed, 19 Jul 2017 19:21:45 GMT
< Via: 1.1 3f2788d309d30f41de96da6f931d4ede.cloudfront.net (CloudFront), 1.1 iad6-proxy-3.amazon.com:80 (Cisco-WSA/9.1.2-010)
< Connection: keep-alive
<
<!doctype html>
<html>
    <head>
        <meta charset="utf-8">
        <title>Subdirectory home page</title>
    </head>
    <body>
        <p>Hello, this page resides in the /subdirectory/ directory.</p>
    </body>
</html>
* Curl_http_done: called premature == 0
* Connection #0 to host d3gt20ea1hllb.cloudfront.net left intact

This request works as expected because you are referencing the object directly. Now, you implement the [email protected] function to return the default index.html page for any subdirectory. Looking at the example JavaScript code, here’s where the magic happens:

var newuri = olduri.replace(/\/$/, '\/index.html');

You are going to use a JavaScript regular expression to match any ‘/’ that occurs at the end of the URI and replace it with ‘/index.html’. This is the equivalent to what S3 does on its own with static website hosting. However, as I mentioned earlier, you can’t rely on this if you want to use a policy on the bucket to restrict it so that users must access the bucket through CloudFront. That way, all requests to the S3 bucket must be authenticated using the S3 REST API. Because of this, you implement a [email protected] function that takes any client request ending in ‘/’ and append a default ‘index.html’ to the request before requesting the object from the origin.

In the Lambda console, choose Create function. On the next screen, skip the blueprint selection and choose Author from scratch, as you’ll use the sample code provided.

Next, configure the trigger. Choosing the empty box shows a list of available triggers. Choose CloudFront and select your CloudFront distribution ID (created earlier). For this example, leave Cache Behavior as * and CloudFront Event as Origin Request. Select the Enable trigger and replicate box and choose Next.

Lambda Trigger

Next, give the function a name and a description. Then, copy and paste the following code:

'use strict';
exports.handler = (event, context, callback) => {
    
    // Extract the request from the CloudFront event that is sent to [email protected] 
    var request = event.Records[0].cf.request;

    // Extract the URI from the request
    var olduri = request.uri;

    // Match any '/' that occurs at the end of a URI. Replace it with a default index
    var newuri = olduri.replace(/\/$/, '\/index.html');
    
    // Log the URI as received by CloudFront and the new URI to be used to fetch from origin
    console.log("Old URI: " + olduri);
    console.log("New URI: " + newuri);
    
    // Replace the received URI with the URI that includes the index page
    request.uri = newuri;
    
    // Return to CloudFront
    return callback(null, request);

};

Next, define a role that grants permissions to the Lambda function. For this example, choose Create new role from template, Basic Edge Lambda permissions. This creates a new IAM role for the Lambda function and grants the following permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": [
                "arn:aws:logs:*:*:*"
            ]
        }
    ]
}

In a nutshell, these are the permissions that the function needs to create the necessary CloudWatch log group and log stream, and to put the log events so that the function is able to write logs when it executes.

After the function has been created, you can go back to the browser (or cURL) and re-run the test for the subdirectory request that failed previously:

› curl -v http://d3gt20ea1hllb.cloudfront.net/subdirectory/
*   Trying 54.192.192.202...
* TCP_NODELAY set
* Connected to d3gt20ea1hllb.cloudfront.net (54.192.192.202) port 80 (#0)
> GET /subdirectory/ HTTP/1.1
> Host: d3gt20ea1hllb.cloudfront.net
> User-Agent: curl/7.51.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Thu, 20 Jul 2017 21:18:44 GMT
< ETag: "ddf87c487acf7cef9d50418f0f8f8dae"
< Accept-Ranges: bytes
< Server: AmazonS3
< X-Cache: Miss from cloudfront
< X-Amz-Cf-Id: rwFN7yHE70bT9xckBpceTsAPcmaadqWB9omPBv2P6WkIfQqdjTk_4w==
< Content-Length: 227
< Content-Type: text/html
< Last-Modified: Wed, 19 Jul 2017 19:21:45 GMT
< Via: 1.1 3572de112011f1b625bb77410b0c5cca.cloudfront.net (CloudFront), 1.1 iad6-proxy-3.amazon.com:80 (Cisco-WSA/9.1.2-010)
< Connection: keep-alive
<
<!doctype html>
<html>
    <head>
        <meta charset="utf-8">
        <title>Subdirectory home page</title>
    </head>
    <body>
        <p>Hello, this page resides in the /subdirectory/ directory.</p>
    </body>
</html>
* Curl_http_done: called premature == 0
* Connection #0 to host d3gt20ea1hllb.cloudfront.net left intact

You have now configured a way for CloudFront to return a default index page for subdirectories in S3!

Summary

In this post, you used [email protected] to be able to use CloudFront with an S3 origin access identity and serve a default root object on subdirectory URLs. To find out some more about this use-case, see [email protected] integration with CloudFront in our documentation.

If you have questions or suggestions, feel free to comment below. For troubleshooting or implementation help, check out the Lambda forum.

Google Asked to Delist Pirate Movie Sites, ISPs Asked to Block Them

Post Syndicated from Andy original https://torrentfreak.com/google-asked-to-delist-pirate-movie-sites-isps-asked-to-block-them-171018/

After seizing several servers operated by popular private music tracker What.cd, last November French police went after a much bigger target.

Boasting millions of regular visitors, Zone-Telechargement (Zone-Download) was ranked the 11th most-visited website in the whole of the country. The site offered direct downloads of a wide variety of pirated content, including films, series, games, and music. Until the French Gendarmerie shut it down, that is.

After being founded in 2011 and enjoying huge growth following the 2012 raids against Megaupload, the Zone-Telechargement ‘brand’ was still popular with French users, despite the closure of the platform. It, therefore, came as no surprise that the site was quickly cloned by an unknown party and relaunched as Zone-Telechargement.ws.

The site has been doing extremely well following its makeover. To the annoyance of copyright holders, SimilarWeb reports the platform as France’s 37th most popular site with around 58 million visitors per month. That’s a huge achievement in less than 12 months.

Now, however, the site is receiving more unwanted attention. PCInpact says it has received information that several movie-focused organizations including the French National Film Center are requesting tough action against the site.

The National Federation of Film Distributors, the Video Publishing Union, the Association of Independent Producers and the Producers Union are all demanding the blocking of Zone-Telechargement by several local ISPs, alongside its delisting from search results.

The publication mentions four Internet service providers – Free, Numericable, Bouygues Telecom, and Orange – plus Google on the search engine front. At this stage, other search companies, such as Microsoft’s Bing, are not reported as part of the action.

In addition to Zone-Telechargement, several other ‘pirate’ sites (Papystreaming.org, Sokrostream.cc and Zonetelechargement.su, another site playing on the popular brand) are included in the legal process. All are described as “structurally infringing” by the complaining movie outfits, PCInpact notes.

The legal proceedings against the sites are based in Article 336-2 of the Intellectual Property Code. It’s ground already trodden by movie companies who following a 2011 complaint, achieved victory in 2013 against several Allostreaming-linked sites.

In that case, the High Court of Paris ordered ISPs, several of which appear in the current action, to “implement all appropriate means including blocking” to prevent access to the infringing sites.

The Court also ordered Google, Microsoft, and Yahoo to “take all necessary measures to prevent the occurrence on their services of any results referring to any of the sites” on their platforms.

Also of interest is that the action targets a service called DL-Protecte.com, which according to local anti-piracy agency HADOPI, makes it difficult for rightsholders to locate infringing content while at the same time generates more revenue for pirate sites.

A judgment is expected in “several months.”

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Amazon Redshift Dense Compute (DC2) Nodes Deliver Twice the Performance as DC1 at the Same Price

Post Syndicated from Quaseer Mujawar original https://aws.amazon.com/blogs/big-data/amazon-redshift-dense-compute-dc2-nodes-deliver-twice-the-performance-as-dc1-at-the-same-price/

Amazon Redshift makes analyzing exabyte-scale data fast, simple, and cost-effective. It delivers advanced data warehousing capabilities, including parallel execution, compressed columnar storage, and end-to-end encryption as a fully managed service, for less than $1,000/TB/year. With Amazon Redshift Spectrum, you can run SQL queries directly against exabytes of unstructured data in Amazon S3 for $5/TB scanned.

Today, we are making our Dense Compute (DC) family faster and more cost-effective with new second-generation Dense Compute (DC2) nodes at the same price as our previous generation DC1. DC2 is designed for demanding data warehousing workloads that require low latency and high throughput. DC2 features powerful Intel E5-2686 v4 (Broadwell) CPUs, fast DDR4 memory, and NVMe-based solid state disks.

We’ve tuned Amazon Redshift to take advantage of the better CPU, network, and disk on DC2 nodes, providing up to twice the performance of DC1 at the same price. Our DC2.8xlarge instances now provide twice the memory per slice of data and an optimized storage layout with 30 percent better storage utilization.

Customer successes

Several flagship customers, ranging from fast growing startups to large Fortune 100 companies, previewed the new DC2 node type. In their tests, DC2 provided up to twice the performance as DC1. Our preview customers saw faster ETL (extract, transform, and load) jobs, higher query throughput, better concurrency, faster reports, and shorter data-to-insights—all at the same cost as DC1. DC2.8xlarge customers also noted that their databases used up to 30 percent less disk space due to our optimized storage format, reducing their costs.

4Cite Marketing, one of America’s fastest growing private companies, uses Amazon Redshift to analyze customer data and determine personalized product recommendations for retailers. “Amazon Redshift’s new DC2 node is giving us a 100 percent performance increase, allowing us to provide faster insights for our retailers, more cost-effectively, to drive incremental revenue,” said Jim Finnerty, 4Cite’s senior vice president of product.

BrandVerity, a Seattle-based brand protection and compliance‎ company, provides solutions to monitor, detect, and mitigate online brand, trademark, and compliance abuse. “We saw a 70 percent performance boost with the DC2 nodes for running Redshift Spectrum queries. As a result, we can analyze far more data for our customers and deliver results much faster,” said Hyung-Joon Kim, principal software engineer at BrandVerity.

“Amazon Redshift is at the core of our operations and our marketing automation tools,” said Jarno Kartela, head of analytics and chief data scientist at DNA Plc, one of the leading Finnish telecommunications groups and Finland’s largest cable operator and pay TV provider. “We saw a 52 percent performance gain in moving to Amazon Redshift’s DC2 nodes. We can now run queries in half the time, allowing us to provide more analytics power and reduce time-to-insight for our analytics and marketing automation users.”

You can read about their experiences on our Customer Success page.

Get started

You can try the new node type using our getting started guide. Just choose dc2.large or dc2.8xlarge in the Amazon Redshift console:

If you have a DC1.large Amazon Redshift cluster, you can restore to a new DC2.large cluster using an existing snapshot. To migrate from DS2.xlarge, DS2.8xlarge, or DC1.8xlarge Amazon Redshift clusters, you can use the resize operation to move data to your new DC2 cluster. For more information, see Clusters and Nodes in Amazon Redshift.

To get the latest Amazon Redshift feature announcements, check out our What’s New page, and subscribe to the RSS feed.

Want to Learn More About AWS CloudHSM and Hardware Key Management? Register for and Attend this October 25 Tech Talk: “CloudHSM – Secure, Scalable Key Storage in AWS”

Post Syndicated from Craig Liebendorfer original https://aws.amazon.com/blogs/security/want-to-learn-more-about-aws-cloudhsm-and-hardware-key-management-register-for-and-attend-this-october-25-tech-talk-cloudhsm-secure-scalable-key-storage-in-aws/

AWS Online Tech Talks banner

As part of the AWS Online Tech Talks series, AWS will present CloudHSM – Secure, Scalable Key Storage in AWS on Wednesday, October 25. This tech talk will start at 9:00 A.M. Pacific Time and end at 9:40 A.M. Pacific Time.

Applications handling confidential or sensitive data are subject to corporate or regulatory requirements and therefore need validated control of encryption keys and cryptographic operations. AWS CloudHSM brings to your AWS resources the security and control of traditional HSMs. This Tech Talk will show how you can leverage CloudHSM to build scalable, reliable applications without sacrificing either security or performance. Attend this Tech Talk to learn how you can use CloudHSM to quickly and easily build secure, compliant, fast, and flexible applications.

You also will:

  • Learn about the challenges CloudHSM can help you address.
  • Understand how CloudHSM can secure your workloads and data.
  • Learn how to transfer and modernize workloads.

This tech talk is free. Register today.

– Craig

More Raspberry Pi labs in West Africa

Post Syndicated from Rachel Churcher original https://www.raspberrypi.org/blog/pi-based-ict-west-africa/

Back in May 2013, we heard from Dominique Laloux about an exciting project to bring Raspberry Pi labs to schools in rural West Africa. Until 2012, 75 percent of teachers there had never used a computer. The project has been very successful, and Dominique has been in touch again to bring us the latest news.

A view of the inside of the new Pi lab building

Preparing the new Pi labs building in Kuma Tokpli, Togo

Growing the project

Thanks to the continuing efforts of a dedicated team of teachers, parents and other supporters, the Centre Informatique de Kuma, now known as INITIC (from the French ‘INItiation aux TIC’), runs two Raspberry Pi labs in schools in Togo, and plans to open a third in December. The second lab was opened last year in Kpalimé, a town in the Plateaux Region in the west of the country.

Student using a Raspberry Pi computer

Using the new Raspberry Pi labs in Kpalimé, Togo

More than 400 students used the new lab intensively during the last school year. Dominique tells us more:

“The report made in early July by the seven teachers who accompanied the students was nothing short of amazing: the young people covered a very impressive number of concepts and skills, from the GUI and the file system, to a solid introduction to word processing and spreadsheets, and many other skills. The lab worked exactly as expected. Its 21 Raspberry Pis worked flawlessly, with the exception of a couple of SD cards that needed re-cloning, and a couple of old screens that needed to be replaced. All the Raspberry Pis worked without a glitch. They are so reliable!”

The teachers and students have enjoyed access to a range of software and resources, all running on Raspberry Pi 2s and 3s.

“Our current aim is to introduce the students to ICT using the Raspberry Pis, rather than introducing them to programming and electronics (a step that will certainly be considered later). We use Ubuntu Mate along with a large selection of applications, from LibreOffice, Firefox, GIMP, Audacity, and Calibre, to special maths, science, and geography applications. There are also special applications such as GnuCash and GanttProject, as well as logic games including PyChess. Since December, students also have access to a local server hosting Kiwix, Wiktionary (a local copy of Wikipedia in four languages), several hundred videos, and several thousand books. They really love it!”

Pi lab upgrade

This summer, INITIC upgraded the equipment in their Pi lab in Kuma Adamé, which has been running since 2014. 21 older model Raspberry Pis were replaced with Pi 2s and 3s, to bring this lab into line with the others, and encourage co-operation between the different locations.

“All 21 first-generation Raspberry Pis worked flawlessly for three years, despite the less-than-ideal conditions in which they were used — tropical conditions, dust, frequent power outages, etc. I brought them all back to Brussels, and they all still work fine. The rationale behind the upgrade was to bring more computing power to the lab, and also to have the same equipment in our two Raspberry Pi labs (and in other planned installations).”

Students and teachers using the upgraded Pi labs in Kuma Adamé

Students and teachers using the upgraded Pi lab in Kuma Adamé

An upgrade of the organisation’s first lab, installed in 2012 in Kuma Tokpli, will be completed in December. This lab currently uses ‘retired’ laptops, which will be replaced with Raspberry Pis and peripherals. INITIC, in partnership with the local community, is also constructing a new building to house the upgraded technology, and the organisation’s third Raspberry Pi lab.

Reliable tech

Dominique has been very impressed with the performance of the Raspberry Pis since 2014.

“Our experience of three years, in two very different contexts, clearly demonstrates that the Raspberry Pi is a very convincing alternative to more ‘conventional’ computers for introducing young students to ICT where resources are scarce. I wish I could convince more communities in the world to invest in such ‘low cost, low consumption, low maintenance’ infrastructure. It really works!”

He goes on to explain that:

“Our goal now is to build at least one new Raspberry Pi lab in another Togolese school each year. That will, of course, depend on how successful we are at gathering the funds necessary for each installation, but we are confident we can convince enough friends to give us the financial support needed for our action.”

A desk with Raspberry Pis and peripherals

Reliable Raspberry Pis in the labs at Kpalimé

Get involved

We are delighted to see the Raspberry Pi being used to bring information technology to new teachers, students, and communities in Togo – it’s wonderful to see this project becoming established and building on its achievements. The mission of the Raspberry Pi Foundation is to put the power of digital making into the hands of people all over the world. Therefore, projects like this, in which people use our tech to fulfil this mission in places with few resources, are wonderful to us.

More information about INITIC and its projects can be found on its website. If you are interested in helping the organisation to meet its goals, visit the How to help page. And if you are involved with a project like this, bringing ICT, computer science, and coding to new places, please tell us about it in the comments below.

The post More Raspberry Pi labs in West Africa appeared first on Raspberry Pi.

Abandon Proactive Copyright Filters, Huge Coalition Tells EU Heavyweights

Post Syndicated from Andy original https://torrentfreak.com/abandon-proactive-copyright-filters-huge-coalition-tells-eu-heavyweights-171017/

Last September, EU Commission President Jean-Claude Juncker announced plans to modernize copyright law in Europe.

The proposals (pdf) are part of the Digital Single Market reforms, which have been under development for the past several years.

One of the proposals is causing significant concern. Article 13 would require some online service providers to become ‘Internet police’, proactively detecting and filtering allegedly infringing copyright works, uploaded to their platforms by users.

Currently, users are generally able to share whatever they like but should a copyright holder take exception to their upload, mechanisms are available for that content to be taken down. It’s envisioned that proactive filtering, whereby user uploads are routinely scanned and compared to a database of existing protected content, will prevent content becoming available in the first place.

These proposals are of great concern to digital rights groups, who believe that such filters will not only undermine users’ rights but will also place unfair burdens on Internet platforms, many of which will struggle to fund such a program. Yesterday, in the latest wave of opposition to Article 13, a huge coalition of international rights groups came together to underline their concerns.

Headed up by Civil Liberties Union for Europe (Liberties) and European Digital Rights (EDRi), the coalition is formed of dozens of influential groups, including Electronic Frontier Foundation (EFF), Human Rights Watch, Reporters without Borders, and Open Rights Group (ORG), to name just a few.

In an open letter to European Commission President Jean-Claude Juncker, President of the European Parliament Antonio Tajani, President of the European Council Donald Tusk and a string of others, the groups warn that the proposals undermine the trust established between EU member states.

“Fundamental rights, justice and the rule of law are intrinsically linked and constitute
core values on which the EU is founded,” the letter begins.

“Any attempt to disregard these values undermines the mutual trust between member states required for the EU to function. Any such attempt would also undermine the commitments made by the European Union and national governments to their citizens.”

Those citizens, the letter warns, would have their basic rights undermined, should the new proposals be written into EU law.

“Article 13 of the proposal on Copyright in the Digital Single Market include obligations on internet companies that would be impossible to respect without the imposition of excessive restrictions on citizens’ fundamental rights,” it notes.

A major concern is that by placing new obligations on Internet service providers that allow users to upload content – think YouTube, Facebook, Twitter and Instagram – they will be forced to err on the side of caution. Should there be any concern whatsoever that content might be infringing, fair use considerations and exceptions will be abandoned in favor of staying on the right side of the law.

“Article 13 appears to provoke such legal uncertainty that online services will have no other option than to monitor, filter and block EU citizens’ communications if they are to have any chance of staying in business,” the letter warns.

But while the potential problems for service providers and users are numerous, the groups warn that Article 13 could also be illegal since it contradicts case law of the Court of Justice.

According to the E-Commerce Directive, platforms are already required to remove infringing content, once they have been advised it exists. The new proposal, should it go ahead, would force the monitoring of uploads, something which goes against the ‘no general obligation to monitor‘ rules present in the Directive.

“The requirement to install a system for filtering electronic communications has twice been rejected by the Court of Justice, in the cases Scarlet Extended (C70/10) and Netlog/Sabam (C 360/10),” the rights groups warn.

“Therefore, a legislative provision that requires internet companies to install a filtering system would almost certainly be rejected by the Court of Justice because it would contravene the requirement that a fair balance be struck between the right to intellectual property on the one hand, and the freedom to conduct business and the right to freedom of expression, such as to receive or impart information, on the other.”

Specifically, the groups note that the proactive filtering of content would violate freedom of expression set out in Article 11 of the Charter of Fundamental Rights. That being the case, the groups expect national courts to disapply it and the rule to be annulled by the Court of Justice.

The latest protests against Article 13 come in the wake of large-scale objections earlier in the year, voicing similar concerns. However, despite the groups’ fears, they have powerful adversaries, each determined to stop the flood of copyrighted content currently being uploaded to the Internet.

Front and center in support of Article 13 is the music industry and its current hot-topic, the so-called Value Gap(1,2,3). The industry feels that platforms like YouTube are able to avoid paying expensive licensing fees (for music in particular) by exploiting the safe harbor protections of the DMCA and similar legislation.

They believe that proactively filtering uploads would significantly help to diminish this problem, which may very well be the case. But at what cost to the general public and the platforms they rely upon? Citizens and scholars feel that freedoms will be affected and it’s likely the outcry will continue.

The ball is now with the EU, whose members will soon have to make what could be the most important decision in recent copyright history. The rights groups, who are urging for Article 13 to be deleted, are clear where they stand.

The full letter is available here (pdf)

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Spinrilla Wants RIAA Case Thrown Out Over ‘Lies’ About ‘Hidden’ Piracy Data

Post Syndicated from Ernesto original https://torrentfreak.com/spinrilla-wants-riaa-case-thrown-out-over-lies-about-hidden-piracy-data-171016/

Earlier this year, a group of well-known labels targeted Spinrilla, a popular hip-hop mixtape site and app which serves millions of users.

The coalition of record labels, including Sony Music, Warner Bros. Records, and Universal Music Group, filed a lawsuit against the service over alleged copyright infringements.

While the discovery process is still ongoing, Spinrilla recently informed the court that the record labels have “just about derailed” the entire case. The company has submitted a motion for sanctions, which is currently sealed, but additional information submitted to the court this week reveals what’s going on.

When the labels filed their original complaint they listed 210 tracks, without providing the allegedly infringing URLs. These weren’t shared during the early stages of the discovery process either, forcing the site to manually search for potentially infringing links.

Then, early October, Spinrilla received a massive spreadsheet with over 2,000 tracks, including the infringing URLs. This data came from the RIAA and supported the long list of infringements in the amended complaint submitted around the same time.

The spreadsheet would have made the discovery process much easier for Spinrilla. In a supplemental brief supporting a motion for sanctions, Spinrilla accuses the labels of hiding the piracy data from them and lying about it, “derailing” the case in the process.

“Significantly, Plaintiffs used that lie to convince the Court they should be allowed to add about 1,900 allegedly infringed sound recordings to their original list of 210. Later, Plaintiffs repeated that lie to convince the Court to give them time to add even more sound recordings to their list.”

vbcn

Spinrilla says they were forced to go down an expensive and unnecessary rabbit hole to find the infringing files, even though the RIAA data was available all along.

“By hiding and lying about the RIAA data, Plaintiffs forced Defendants to spend precious time and money fumbling through discovery. Not knowing that Plaintiffs had the RIAA data,” the company writes.

The hip-hop mixtape site argues that the alleged wrongdoing is severe enough to have the entire complaint dismissed, as the ultimate sanction.

“It is without exaggeration to say that by hiding the RIAA spreadsheets and that underlying data, Defendants have been severely prejudiced. The Complaint should be dismissed with prejudice and, if it is, Plaintiffs can only blame themselves,” Spinrilla concludes.

The stakes are certainly high in this case. With well over 2,000 infringing tracks listed in the amended complaint, the hip-hop mixtape site faces statutory damages as high as $300 million, at least in theory.

Spinrilla’s supplement brief in further support of the motion for sanctions is available here (pdf).

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

PureVPN Explains How it Helped the FBI Catch a Cyberstalker

Post Syndicated from Andy original https://torrentfreak.com/purevpn-explains-how-it-helped-the-fbi-catch-a-cyberstalker-171016/

Early October, Ryan S. Lin, 24, of Newton, Massachusetts, was arrested on suspicion of conducting “an extensive cyberstalking campaign” against a 24-year-old Massachusetts woman, as well as her family members and friends.

The Department of Justice described Lin’s offenses as a “multi-faceted” computer hacking and cyberstalking campaign. Launched in April 2016 when he began hacking into the victim’s online accounts, Lin allegedly obtained personal photographs and sensitive information about her medical and sexual histories and distributed that information to hundreds of other people.

Details of what information the FBI compiled on Lin can be found in our earlier report but aside from his alleged crimes (which are both significant and repugnant), it was PureVPN’s involvement in the case that caused the most controversy.

In a report compiled by an FBI special agent, it was revealed that the Hong Kong-based company’s logs helped the authorities net the alleged criminal.

“Significantly, PureVPN was able to determine that their service was accessed by the same customer from two originating IP addresses: the RCN IP address from the home Lin was living in at the time, and the software company where Lin was employed at the time,” the agent’s affidavit reads.

Among many in the privacy community, this revelation was met with disappointment. On the PureVPN website the company claims to carry no logs and on a general basis, it’s expected that so-called “no-logging” VPN providers should provide people with some anonymity, at least as far as their service goes. Now, several days after the furor, the company has responded to its critics.

In a fairly lengthy statement, the company begins by confirming that it definitely doesn’t log what websites a user views or what content he or she downloads.

“PureVPN did not breach its Privacy Policy and certainly did not breach your trust. NO browsing logs, browsing habits or anything else was, or ever will be shared,” the company writes.

However, that’s only half the problem. While it doesn’t log user activity (what sites people visit or content they download), it does log the IP addresses that customers use to access the PureVPN service. These, given the right circumstances, can be matched to external activities thanks to logs carried by other web companies.

PureVPN talks about logs held by Google’s Gmail service to illustrate its point.

“A network log is automatically generated every time a user visits a website. For the sake of this example, let’s say a user logged into their Gmail account. Every time they accessed Gmail, the email provider created a network log,” the company explains.

“If you are using a VPN, Gmail’s network log would contain the IP provided by PureVPN. This is one half of the picture. Now, if someone asks Google who accessed the user’s account, Google would state that whoever was using this IP, accessed the account.

“If the user was connected to PureVPN, it would be a PureVPN IP. The inquirer [in the Lin case, the FBI] would then share timestamps and network logs acquired from Google and ask them to be compared with the network logs maintained by the VPN provider.”

Now, if PureVPN carried no logs – literally no logs – it would not be able to help with this kind of inquiry. That was the case last year when the FBI approached Private Internet Access for information and the company was unable to assist.

However, as is made pretty clear by PureVPN’s explanation, the company does log user IP addresses and timestamps which reveal when a user was logged on to the service. It doesn’t matter that PureVPN doesn’t log what the user allegedly did online, since the third-party service already knows that information to the precise second.

Following the example, GMail knows that a user sent an email at 10:22am on Monday October 16 from a PureVPN IP address. So, if PureVPN is approached by the FBI, the company can confirm that User X was using the same IP address at exactly the same time, and his home IP address was XXX.XX.XXX.XX. Effectively, the combined logs link one IP address to the other and the user is revealed. It’s that simple.

It is for this reason that in TorrentFreak’s annual summary of no-logging VPN providers, the very first question we ask every single company reads as follows:

Do you keep ANY logs which would allow you to match an IP-address and a time stamp to a user/users of your service? If so, what information do you hold and for how long?

Clearly, if a company says “yes we log incoming IP addresses and associated timestamps”, any claim to total user anonymity is ended right there and then.

While not completely useless (a logging service will still stop the prying eyes of ISPs and similar surveillance, while also defeating throttling and site-blocking), if you’re a whistle-blower with a job or even your life to protect, this level of protection is entirely inadequate.

The take-home points from this controversy are numerous, but perhaps the most important is for people to read and understand VPN provider logging policies.

Secondly, and just as importantly, VPN providers need to be extremely clear about the information they log. Not tracking browsing or downloading activities is all well and good, but if home IP addresses and timestamps are stored, this needs to be made clear to the customer.

Finally, VPN users should not be evil. There are plenty of good reasons to stay anonymous online but cyberstalking, death threats and ruining people’s lives are not included. Fortunately, the FBI have offline methods for catching this type of offender, and long may that continue.

PureVPN’s blog post is available here.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Manufacturing Astro Pi case replicas

Post Syndicated from Janina Ander original https://www.raspberrypi.org/blog/astro-pi-case-guest-post/

Tim Rowledge produces and sells wonderful replicas of the cases which our Astro Pis live in aboard the International Space Station. Here is the story of how he came to do this. Over to you, Tim!

When the Astro Pi case was first revealed a couple of years ago, the collective outpouring of ‘Squee!’ it elicited may have been heard on board the ISS itself. People wanted to buy it or build it at home, and someone wanted to know whether it would blend. (There’s always one.)

The complete Astro Pi

The Sense HAT and its Pi tucked snugly in the original Astro Pi flight case — gorgeous, isn’t it?

Replicating the Astro Pi case

Some months later the STL files for printing your own Astro Pi case were released, and people jumped at the chance to use them. Soon reports appeared saying you had to make quite a few attempts before getting a good print — normal for any complex 3D-printing project. A fellow member of my local makerspace successfully made a couple of cases, but it took a lot of time, filament, and post-print finishing work. And of course, a plastic Astro Pi case simply doesn’t look or feel like the original made of machined aluminium — or ‘aluminum’, as they tend to say over here in North America.

Batch of tops of Astro Pi case replicas by Tim Rowledge

A batch of tops designed by Tim

I wanted to build an Astro Pi case which would more closely match the original. Fortunately, someone else at my makerspace happens to have some serious CNC machining equipment at his small manufacturing company. Therefore, I focused on creating a case design that could be produced with his three-axis device. This meant simplifying some parts to avoid expensive, slow, complex multi-fixture work. It took us a while, but we ended up with a design we can efficiently make using his machine.

Lasered Astro Pi case replica by Tim Rowledge

Tim’s first lasered case

And the resulting case looks really, really like the original — in fact, upon receiving one of the final prototypes, Eben commented:

“I have to say, at first glance they look spectacular: unless you hold them side by side with the originals, it’s hard to pinpoint what’s changed. I’m looking forward to seeing one built up and then seeing them in the wild.”

Inside the Astro Pi case

Making just the bare case is nice, but there are other parts required to recreate a complete Astro Pi unit. Thus I got my local electronics company to design a small HAT to provide much the same support the mezzanine board offers: an RTC and nice, clean connections to the six buttons. We also added well-labelled, grouped pads for all the other GPIO lines, along with space for an ADC. If you’re making your own Astro Pi replica, you might like the Switchboard.

The electronics supply industry just loves to offer *some* of what you need, so that one supplier never has everything: we had to obtain the required stand-offs, screws, spacers, and JST wires from assorted other sources. Jeff at my nearby Industrial Paint & Plastics took on the laser engraving of our cases, leaving out copyrighted logos etcetera.

Lasering the top of an Astro Pi case replica by Tim Rowledge

Lasering the top of a case

Get your own Astro Pi case

Should you like to buy one of our Astro Pi case kits, pop over to www.astropicase.com, and we’ll get it on its way to you pronto. If you’re an institutional or corporate customer, the fully built option might make more sense for you — ordering the Pi and other components, and having a staff member assemble it all, may well be more work than is sensible.

Astro Pi case replica Tim Rowledge

Tim’s first full Astro Pi case replica, complete with shiny APEM buttons

To put the kit together yourself, all you need to do is add a Pi, Sense HAT, Camera Module, and RTC battery, and choose your buttons. An illustrated manual explains the process step by step. Our version of the Astro Pi case uses the same APEM buttons as the units in orbit, and whilst they are expensive, just clicking them is a source of great joy. It comes in a nice travel case too.

Tim Rowledge holding up a PCB

This is Tim. Thanks, Tim!

Take part in Astro Pi

If having an Astro Pi replica is not enough for you, this is your chance: the 2017-18 Astro Pi challenge is open! Do you know a teenager who might be keen to design a experiment to run on the Astro Pis in space? Are you one yourself? You have until 29 October to send us your Mission Space Lab entry and become part of the next generation of space scientists? Head over to the Astro Pi website to find out more.

The post Manufacturing Astro Pi case replicas appeared first on Raspberry Pi.

Netflix Expands Content Protection Team to Reduce Piracy

Post Syndicated from Ernesto original https://torrentfreak.com/netflix-expands-content-protection-team-to-reduce-piracy-171015/

There is little doubt that, in the United States and many other countries, Netflix has become the standard for watching movies on the Internet.

Despite the widespread availability, however, Netflix originals are widely pirated. Episodes from House of Cards, Narcos, and Orange is the New Black are downloaded and streamed millions of times through unauthorized platforms.

The streaming giant is obviously not happy with this situation and has ramped up its anti-piracy efforts in recent years. Since last year the company has sent out over a million takedown requests to Google alone and this volume continues to expand.

This growth coincides with an expansion of the company’s internal anti-piracy division. A new job posting shows that Netflix is expanding this team with a Copyright and Content Protection Coordinator. The ultimate goal is to reduce piracy to a fringe activity.

“The growing Global Copyright & Content Protection Group is looking to expand its team with the addition of a coordinator,” the job listing reads.

“He or she will be tasked with supporting the Netflix Global Copyright & Content Protection Group in its internal tactical take down efforts with the goal of reducing online piracy to a socially unacceptable fringe activity.”

Among other things, the new coordinator will evaluate new technological solutions to tackle piracy online.

More old-fashioned takedown efforts are also part of the job. This includes monitoring well-known content platforms, search engines and social network sites for pirated content.

“Day to day scanning of Facebook, YouTube, Twitter, Periscope, Google Search, Bing Search, VK, DailyMotion and all other platforms (including live platforms) used for piracy,” is listed as one of the main responsibilities.

Netflix’ Copyright and Content Protection Coordinator Job

The coordinator is further tasked with managing Facebook’s Rights Manager and YouTube’s Content-ID system, to prevent circumvention of these piracy filters. Experience with fingerprinting technologies and other anti-piracy tools will be helpful in this regard.

Netflix doesn’t do all the copyright enforcement on its own though. The company works together with other media giants in the recently launched “Alliance for Creativity and Entertainment” that is spearheaded by the MPAA.

In addition, the company also uses the takedown services of external anti-piracy outfits to target more traditional infringement sources, such as cyberlockers and piracy streaming sites. The coordinator has to keep an eye on these as well.

“Liaise with our vendors on manual takedown requests on linking sites and hosting sites and gathering data on pirate streaming sites, cyberlockers and usenet platforms.”

The above shows that Netflix is doing its best to prevent piracy from getting out of hand. It’s definitely taking the issue more seriously than a few years ago when the company didn’t have much original content.

The switch from being merely a distribution platform to becoming a major content producer and copyright holder has changed the stakes. Netflix hasn’t won the war on piracy, it’s just getting started.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

‘Pirate’ EBook Site Refuses Point Blank to Cooperate With BREIN

Post Syndicated from Andy original https://torrentfreak.com/pirate-ebook-site-refuses-point-blank-to-cooperate-with-brein-171015/

Dutch anti-piracy group BREIN is probably best known for its legal action against The Pirate Bay but the outfit also tackles many other forms of piracy.

A prime example is the case it pursued against a seller of fully-loaded Kodi boxes in the Netherlands. The subsequent landmark ruling from the European Court of Justice will reverberate around Europe for years to come.

Behind the scenes, however, BREIN persistently tries to take much smaller operations offline, and not without success. Earlier this year it revealed it had taken down 231 illegal sites and services includes 84 linking sites, 63 streaming portals, and 34 torrent sites. Some of these shut down completely and others were forced to leave their hosting providers.

Much of this work flies under the radar but some current action, against an eBook site, is now being thrust into the public eye.

For more than five years, EBoek.info (eBook) has serviced Internet users looking to obtain comic books in Dutch. The site informs TorrentFreak it provides a legitimate service, targeted at people who have purchased a hard copy but also want their comics in digital format.

“EBoek.info is a site about comic books in the Dutch language. Besides some general information about the books, people who have legally obtained a hard copy of the books can find a link to an NZB file which enables them to download a digital version of the books they already have,” site representative ‘Zala’ says.

For those out of the loop, NZB files are a bit like Usenet’s version of .torrent files. They contain no copyrighted content themselves but do provide software clients with information on where to find specific content, so it can be downloaded to a user’s machine.

“BREIN claims that this is illegal as it is impossible for us to verify if our visitor is telling the truth [about having purchased a copy],” Zala reveals.

Speaking with TorrentFreak, BREIN chief Tim Kuik says there’s no question that offering downloads like this is illegal.

“It is plain and simple: the site makes links to unauthorized digital copies available to the general public and therefore is infringing copyright. It is distribution of the content without authorization of the rights holder,” Kuik says.

“The unauthorized copies are not private copies. The private copy exception does not apply to this kind of distribution. The private copy has not been made by the owner of the book himself for his own use. Someone else made the digital copy and is making it available to anyone who wants to download it provided he makes the unverified claim that he has a legal copy. This harms the normal exploitation of the
content.”

Zala says that BREIN has been trying to take his site offline for many years but more recently, the platform has utilized the services of Cloudflare, partly as a form of shield. As readers may be aware, a site behind Cloudflare has its originating IP addresses hidden from the public, not to mention BREIN, who values that kind of information. According to the operator, however, BREIN managed to obtain the information from the CDN provider.

“BREIN has tried for years to take our site offline. Recently, however, Cloudflare was so friendly to give them our IP address,” Zala notes.

A text copy of an email reportedly sent by BREIN to EBoek’s web host and seen by TF appears to confirm that Cloudflare handed over the information as suggested. Among other things, the email has BREIN informing the host that “The IP we got back from Cloudflare is XXX.XXX.XX.33.”

This means that BREIN was able to place direct pressure on EBoek.info’s web host, so only time will tell if that bears any fruit for the anti-piracy group. In the meantime, however, EBoek has decided to go public over its battle with BREIN.

“We have received a request from Stichting BREIN via our hosting provider to take EBoek.info offline,” the site informed its users yesterday.

Interestingly, it also appears that BREIN doesn’t appreciate that the operators of EBoek have failed to make their identities publicly known on their platform.

“The site operates anonymously which also is unlawful. Consumer protection requires that the owner/operator of a site identifies himself,” Kuik says.

According to EBoek, the anti-piracy outfit told the site’s web host that as a “commercial online service”, EBoek is required under EU law to display its “correct and complete business information” including names, addresses, and other information. But perhaps unsurprisingly, the site doesn’t want to play ball.

“In my opinion, you are confusing us with Facebook. They are a foreign commercial company with a European branch in Ireland, and therefore are subject to Irish legislation,” Zala says in an open letter to BREIN.

“Eboek.info, on the other hand, is a foreign hobby club with no commercial purpose, whose administrators have no connection with any country in the European Union. As administrators, we follow the laws of our country of residence which do not oblige us to disclose our identity through our website.

“The fact that Eboek is visible in the Netherlands does not just mean that we are going to adapt to Dutch rules, just as we don’t adapt the site to the rules of Saudi Arabia or China or wherever we are available.”

In a further snub to the anti-piracy group, EBoek says that all visitors to the site have to communicate with its operators via its guestbook, which is publicly visible.

“We see no reason to make an exception for Stichting BREIN,” the site notes.

What makes the situation more complex is that EBoek isn’t refusing dialog completely. The site says it doesn’t want to talk to BREIN but will speak to BREIN’s customers – the publishers of the comic books in question – noting that to date no complaints from publishers have ever been received.

While the parties argue about lines of communication, BREIN insists that following this year’s European Court of Justice decision in the GS Media case, a link to a known infringing work represents copyright infringement. In this case, an NZB file – which links to a location on Usenet – would generally fit the bill.

But despite focusing on the Dutch market, the operators of EBoek say the ruling doesn’t apply to them as they’re outside of the ECJ’s jurisdiction and aren’t commercially motivated. Refusing point blank to take their site offline, EBoek’s operators say that BREIN can do its worst, nothing will have much effect.

“[W]hat’s the worst thing that can happen? That our web host hands [BREIN] our address and IP data. In that case, it will turn out that…we are actually far away,” Zala says.

“[In the case the site goes offline], we’ll just put a backup on another server and, in this case, won’t make use of the ‘services’ of Cloudflare, the provider that apparently put BREIN on the right track.”

The question of jurisdiction is indeed an interesting one, particularly given BREIN’s focus in the Netherlands. But Kuik is clear – it is the area where the content is made available that matters.

“The law of the country where the content is made available applies. In this case the EU and amongst others the Netherlands,” Kuik concludes.

To be continued…..

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Hollywood Giants Sue Kodi-powered ‘TickBox TV’ Over Piracy

Post Syndicated from Ernesto original https://torrentfreak.com/hollywood-giants-sue-kodi-powered-tickbox-tv-over-piracy-171014/

Online streaming piracy is booming and many people use dedicated media players to bring this content to their regular TVs.

The bare hardware is not illegal and neither is media player software such as Kodi. When these devices are loaded with copyright-infringing addons, however, they turn into an unprecedented piracy threat.

It becomes even more problematic when the sellers of these devices market their products as pirate tools. This is exactly what TickBox TV does, according to Hollywood’s major movie studios, Netflix, and Amazon.

TickBox is a Georgia-based provider of set-top boxes that allow users to stream a variety of popular media. The company’s devices use the Kodi media player and come with instructions on how to add various add-ons.

In a complaint filed in a California federal court yesterday, Universal, Columbia Pictures, Disney, 20th Century Fox, Paramount Pictures, Warner Bros, Amazon, and Netflix accuse Tickbox of inducing and contributing to copyright infringement.

“TickBox sells ‘TickBox TV,’ a computer hardware device that TickBox urges its customers to use as a tool for the mass infringement of Plaintiffs’ copyrighted motion pictures and television shows,” the complaint, picked up by THR, reads.

While the device itself does not host any infringing content, users are informed where they can find it.

The movie and TV studios stress that Tickbox’s marketing highlights its infringing uses with statements such as “if you’re tired of wasting money with online streaming services like Netflix, Hulu or Amazon Prime.”

Sick of paying high monthly fees?

“TickBox promotes the use of TickBox TV for overwhelmingly, if not exclusively, infringing purposes, and that is how its customers use TickBox TV. TickBox advertises TickBox TV as a substitute for authorized and legitimate distribution channels such as cable television or video-on-demand services like Amazon Prime and Netflix,” the studios’ lawyers write.

The complaint explains in detail how TickBox works. When users first boot up their device they are prompted to download the “TickBox TV Player” software. This comes with an instruction video guiding people to infringing streams.

“The TickBox TV instructional video urges the customer to use the ‘Select Your Theme’ button on the start-up menu for downloading addons. The ‘Themes’ are curated collections of popular addons that link to unauthorized streams of motion pictures and television shows.”

“Some of the most popular addons currently distributed — which are available through TickBox TV — are titled ‘Elysium,’ ‘Bob,’ and ‘Covenant’,” the complaint adds, showing screenshots of the interface.

Covenant

The movie and TV studios, which are the founding members of the recently launched ACE anti-piracy initiative, want TickBox to stop selling their devices. In addition, they demand compensation for the damages they’ve suffered. Requesting the maximum statutory damages of $150,000 per copyright infringement, this can run into the millions.

The involvement of Amazon, albeit the content division, is notable since the online store itself sells dozens of similar streaming devices, some of which even list “infringing” addons.

The TickBox lawsuit is the first case in the United States where a group of major Hollywood players is targeting a streaming device. Earlier this year various Hollywood insiders voiced concerns about the piracy streaming epidemic and if this case goes their way, it probably won’t be the last.

A copy of the full complaint is available here (pdf)

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Popular Zer0day Torrent Tracker Taken Offline By Mass Copyright Complaint

Post Syndicated from Andy original https://torrentfreak.com/popular-zer0day-torrent-tracker-taken-offline-by-mass-copyright-complaint-171014/

In January 2016, a BitTorrent enthusiast decided to launch a stand-alone tracker, purely for fun.

The Zer0day platform, which hosts no torrents, is a tracker in the purest sense, directing traffic between peers, no matter what content is involved and no matter where people are in the world.

With this type of tracker in short supply, it was soon utilized by The Pirate Bay and the now-defunct ExtraTorrent. By August 2016, it was tracking almost four million peers and a million torrents, a considerable contribution to the BitTorrent ecosystem.

After handling many ups and downs associated with a service of this type, the tracker eventually made it to the end of 2016 intact. This year it grew further still and by the end of September was tracking an impressive 5.5 million peers spread over 1.2 million torrents. Soon after, however, the tracker disappeared from the Internet without warning.

In an effort to find out what had happened, TorrentFreak contacted Zer0day’s operator who told us a familiar story. Without any warning at all, the site’s host pulled the plug on the service, despite having been paid 180 euros for hosting just a week earlier.

“We’re hereby informing you of the termination of your dedicated server due to a breach of our terms of service,” the host informed Zer0day.

“Hosting trackers on our servers that distribute infringing and copyrighted content is prohibited. This server was found to distribute such content. Should we identify additional similar activity in your services, we will be forced to close your account.”

While hosts tend not to worry too much about what their customers are doing, this one had just received a particularly lengthy complaint. Sent by the head of anti-piracy at French collecting society SCPP, it laid out the group’s problems with the Zer0day tracker.

“SCPP has been responsible for the collective management and protection of sound recordings and music videos producers’ rights since 1985. SCPP counts more than 2,600 members including the majority of independent French producers, in addition to independent European producers, and the major international companies: Sony, Universal and Warner,” the complaints reads.

“SCPP administers a catalog of 7,200,000 sound tracks and 77,000 music videos. SCPP is empowered by its members to take legal action in order to put an end to any infringements of the producers’ rights set out in Article L335-4 of the French Intellectual Property Code…..punishable by a three-year prison sentence or a fine of €300,000.”

Noting that it works on behalf of a number of labels and distributors including BMG, Sony Music, Universal Music, Warner Music and others, SCPP listed countless dozens of albums under its protection, each allegedly tracked by the Zer0day platform.

“It has come to our attention that these music albums are illegally being communicated to the public (made available for download) by various users of the BitTorrent-Network,” the complaint reads.

Noting that Zer0day is involved in the process, the anti-piracy outfit presented dozens of hash codes relating to protected works, demanding that the site stop facilitation of infringement on each and every one of them.

“We have proof that your tracker udp://tracker.zer0day.to:1337/announce provided peers of the BitTorrent-Network with information regarding these torrents, to be specific IP Addresses of peers that were offering without authorization the full albums for download, and that this information enabled peers to download files that contain the sound recordings to which our members producers have the exclusive rights.

“These sound recordings are thus being illegally communicated to the public, and your tracker is enabling the seeders to do so.”

Rather than take the hashes down from the tracker, SCPP actually demanded that Zer0day create a permanent blacklist within 24 hours, to ensure the corresponding torrents wouldn’t be tracked again.

“You should understand that this letter constitutes a notice to you that you may be liable for the infringing activity occurring on your service. In addition, if you ignore this notice, you may also be liable for any resulting infringement,” the complaint added.

But despite all the threats, SCPP didn’t receive the response they’d demanded since the operator of the site refused to take any action.

“Obviously, ‘info hashes’ are not copyrightable nor point to specific copyrighted content, or even have any meaning. Further, I cannot verify that request strings parameters (‘info hashes’) you sent me contain copyrighted material,” he told SCPP.

“Like the website says; for content removal kindly ask the indexing site to remove the listing and the .torrent file. Also, tracker software does not have an option to block request strings parameters (‘info hashes’).”

The net effect of non-compliance with SCPP was fairly dramatic and swift. Zer0day’s host took down the whole tracker instead and currently it remains offline. Whether it reappears depends on the site’s operator finding a suitable web host, but at the moment he says he has no idea where one will appear from.

“Currently I’m searching for some virtual private server as a temporary home for the tracker,” he concludes.

As mentioned in an earlier article detailing the problems sites like Zer0day.to face, trackers aren’t absolutely essential for the functioning of BitTorrent transfers. Nevertheless, their existence certainly improves matters for file-sharers so when they go down, millions can be affected.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Backblaze Release 5.1 – RMM Compatibility for Mass Deployments

Post Syndicated from Yev original https://www.backblaze.com/blog/rmm-for-mass-deployments/

diagram of Backblaze remote monitoring and management

Introducing Backblaze Computer Backup Release 5.1

This is a relatively minor release in terms of the core Backblaze Computer Backup service functionality, but is a big deal for Backblaze for Business as we’ve updated our Mac and PC clients to be RMM (Remote Monitoring and Management) compatible.

What Is New?

  • Updated Mac and PC clients to better handle large file uploads
  • Updated PC downloader to improve stability
  • Added RMM support for PC and Mac clients

What Is RMM?

RMM stands for “Remote Monitoring and Management.” It’s a way to administer computers that might be distributed geographically, without having access to the actual machine. If you are a systems administrator working with anywhere from a few distributed computers to a few thousand, you’re familiar with RMM and how it makes life easier.

The new clients allow administrators to deploy Backblaze Computer Backup through most “silent” installation/mass deployment tools. Two popular RMM tools are Munki and Jamf. We’ve written up knowledge base articles for both of these.

munki logo jamf logo
Learn more about Munki Learn more about Jamf

Do I Need To Use RMM Tools?

No — unless you are a systems administrator or someone who is deploying Backblaze to a lot of people all at once, you do not have to worry about RMM support.

Release Version Number:

Mac:  5.1.0
PC:  5.1.0

Availability:

October 12, 2017

Upgrade Methods:

  • “Check for Updates” on the Backblaze Client (right click on the Backblaze icon and then select “Check for Updates”)
  • Download from: https://secure.backblaze.com/update.htm
  • Auto-update will begin in a couple of weeks
Mac backup update PC backup update
Updating Backblaze on Mac Updating Backblaze on Windows

Questions:

If you have any questions, please contact Backblaze Support at www.backblaze.com/help.

The post Backblaze Release 5.1 – RMM Compatibility for Mass Deployments appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Predict Billboard Top 10 Hits Using RStudio, H2O and Amazon Athena

Post Syndicated from Gopal Wunnava original https://aws.amazon.com/blogs/big-data/predict-billboard-top-10-hits-using-rstudio-h2o-and-amazon-athena/

Success in the popular music industry is typically measured in terms of the number of Top 10 hits artists have to their credit. The music industry is a highly competitive multi-billion dollar business, and record labels incur various costs in exchange for a percentage of the profits from sales and concert tickets.

Predicting the success of an artist’s release in the popular music industry can be difficult. One release may be extremely popular, resulting in widespread play on TV, radio and social media, while another single may turn out quite unpopular, and therefore unprofitable. Record labels need to be selective in their decision making, and predictive analytics can help them with decision making around the type of songs and artists they need to promote.

In this walkthrough, you leverage H2O.ai, Amazon Athena, and RStudio to make predictions on whether a song might make it to the Top 10 Billboard charts. You explore the GLM, GBM, and deep learning modeling techniques using H2O’s rapid, distributed and easy-to-use open source parallel processing engine. RStudio is a popular IDE, licensed either commercially or under AGPLv3, for working with R. This is ideal if you don’t want to connect to a server via SSH and use code editors such as vi to do analytics. RStudio is available in a desktop version, or a server version that allows you to access R via a web browser. RStudio’s Notebooks feature is used to demonstrate the execution of code and output. In addition, this post showcases how you can leverage Athena for query and interactive analysis during the modeling phase. A working knowledge of statistics and machine learning would be helpful to interpret the analysis being performed in this post.

Walkthrough

Your goal is to predict whether a song will make it to the Top 10 Billboard charts. For this purpose, you will be using multiple modeling techniques―namely GLM, GBM and deep learning―and choose the model that is the best fit.

This solution involves the following steps:

  • Install and configure RStudio with Athena
  • Log in to RStudio
  • Install R packages
  • Connect to Athena
  • Create a dataset
  • Create models

Install and configure RStudio with Athena

Use the following AWS CloudFormation stack to install, configure, and connect RStudio on an Amazon EC2 instance with Athena.

Launching this stack creates all required resources and prerequisites:

  • Amazon EC2 instance with Amazon Linux (minimum size of t2.large is recommended)
  • Provisioning of the EC2 instance in an existing VPC and public subnet
  • Installation of Java 8
  • Assignment of an IAM role to the EC2 instance with the required permissions for accessing Athena and Amazon S3
  • Security group allowing access to the RStudio and SSH ports from the internet (I recommend restricting access to these ports)
  • S3 staging bucket required for Athena (referenced within RStudio as ATHENABUCKET)
  • RStudio username and password
  • Setup logs in Amazon CloudWatch Logs (if needed for additional troubleshooting)
  • Amazon EC2 Systems Manager agent, which makes it easy to manage and patch

All AWS resources are created in the US-East-1 Region. To avoid cross-region data transfer fees, launch the CloudFormation stack in the same region. To check the availability of Athena in other regions, see Region Table.

Log in to RStudio

The instance security group has been automatically configured to allow incoming connections on the RStudio port 8787 from any source internet address. You can edit the security group to restrict source IP access. If you have trouble connecting, ensure that port 8787 isn’t blocked by subnet network ACLS or by your outgoing proxy/firewall.

  1. In the CloudFormation stack, choose Outputs, Value, and then open the RStudio URL. You might need to wait for a few minutes until the instance has been launched.
  2. Log in to RStudio with the and password you provided during setup.

Install R packages

Next, install the required R packages from the RStudio console. You can download the R notebook file containing just the code.

#install pacman – a handy package manager for managing installs
if("pacman" %in% rownames(installed.packages()) == FALSE)
{install.packages("pacman")}  
library(pacman)
p_load(h2o,rJava,RJDBC,awsjavasdk)
h2o.init(nthreads = -1)
##  Connection successful!
## 
## R is connected to the H2O cluster: 
##     H2O cluster uptime:         2 hours 42 minutes 
##     H2O cluster version:        3.10.4.6 
##     H2O cluster version age:    4 months and 4 days !!! 
##     H2O cluster name:           H2O_started_from_R_rstudio_hjx881 
##     H2O cluster total nodes:    1 
##     H2O cluster total memory:   3.30 GB 
##     H2O cluster total cores:    4 
##     H2O cluster allowed cores:  4 
##     H2O cluster healthy:        TRUE 
##     H2O Connection ip:          localhost 
##     H2O Connection port:        54321 
##     H2O Connection proxy:       NA 
##     H2O Internal Security:      FALSE 
##     R Version:                  R version 3.3.3 (2017-03-06)
## Warning in h2o.clusterInfo(): 
## Your H2O cluster version is too old (4 months and 4 days)!
## Please download and install the latest version from http://h2o.ai/download/
#install aws sdk if not present (pre-requisite for using Athena with an IAM role)
if (!aws_sdk_present()) {
  install_aws_sdk()
}

load_sdk()
## NULL

Connect to Athena

Next, establish a connection to Athena from RStudio, using an IAM role associated with your EC2 instance. Use ATHENABUCKET to specify the S3 staging directory.

URL <- 'https://s3.amazonaws.com/athena-downloads/drivers/AthenaJDBC41-1.0.1.jar'
fil <- basename(URL)
#download the file into current working directory
if (!file.exists(fil)) download.file(URL, fil)
#verify that the file has been downloaded successfully
list.files()
## [1] "AthenaJDBC41-1.0.1.jar"
drv <- JDBC(driverClass="com.amazonaws.athena.jdbc.AthenaDriver", fil, identifier.quote="'")

con <- jdbcConnection <- dbConnect(drv, 'jdbc:awsathena://athena.us-east-1.amazonaws.com:443/',
                                   s3_staging_dir=Sys.getenv("ATHENABUCKET"),
                                   aws_credentials_provider_class="com.amazonaws.auth.DefaultAWSCredentialsProviderChain")

Verify the connection. The results returned depend on your specific Athena setup.

con
## <JDBCConnection>
dbListTables(con)
##  [1] "gdelt"               "wikistats"           "elb_logs_raw_native"
##  [4] "twitter"             "twitter2"            "usermovieratings"   
##  [7] "eventcodes"          "events"              "billboard"          
## [10] "billboardtop10"      "elb_logs"            "gdelthist"          
## [13] "gdeltmaster"         "twitter"             "twitter3"

Create a dataset

For this analysis, you use a sample dataset combining information from Billboard and Wikipedia with Echo Nest data in the Million Songs Dataset. Upload this dataset into your own S3 bucket. The table below provides a description of the fields used in this dataset.

Field Description
year Year that song was released
songtitle Title of the song
artistname Name of the song artist
songid Unique identifier for the song
artistid Unique identifier for the song artist
timesignature Variable estimating the time signature of the song
timesignature_confidence Confidence in the estimate for the timesignature
loudness Continuous variable indicating the average amplitude of the audio in decibels
tempo Variable indicating the estimated beats per minute of the song
tempo_confidence Confidence in the estimate for tempo
key Variable with twelve levels indicating the estimated key of the song (C, C#, B)
key_confidence Confidence in the estimate for key
energy Variable that represents the overall acoustic energy of the song, using a mix of features such as loudness
pitch Continuous variable that indicates the pitch of the song
timbre_0_min thru timbre_11_min Variables that indicate the minimum values over all segments for each of the twelve values in the timbre vector
timbre_0_max thru timbre_11_max Variables that indicate the maximum values over all segments for each of the twelve values in the timbre vector
top10 Indicator for whether or not the song made it to the Top 10 of the Billboard charts (1 if it was in the top 10, and 0 if not)

Create an Athena table based on the dataset

In the Athena console, select the default database, sampled, or create a new database.

Run the following create table statement.

create external table if not exists billboard
(
year int,
songtitle string,
artistname string,
songID string,
artistID string,
timesignature int,
timesignature_confidence double,
loudness double,
tempo double,
tempo_confidence double,
key int,
key_confidence double,
energy double,
pitch double,
timbre_0_min double,
timbre_0_max double,
timbre_1_min double,
timbre_1_max double,
timbre_2_min double,
timbre_2_max double,
timbre_3_min double,
timbre_3_max double,
timbre_4_min double,
timbre_4_max double,
timbre_5_min double,
timbre_5_max double,
timbre_6_min double,
timbre_6_max double,
timbre_7_min double,
timbre_7_max double,
timbre_8_min double,
timbre_8_max double,
timbre_9_min double,
timbre_9_max double,
timbre_10_min double,
timbre_10_max double,
timbre_11_min double,
timbre_11_max double,
Top10 int
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION 's3://aws-bigdata-blog/artifacts/predict-billboard/data'
;

Inspect the table definition for the ‘billboard’ table that you have created. If you chose a database other than sampledb, replace that value with your choice.

dbGetQuery(con, "show create table sampledb.billboard")
##                                      createtab_stmt
## 1       CREATE EXTERNAL TABLE `sampledb.billboard`(
## 2                                       `year` int,
## 3                               `songtitle` string,
## 4                              `artistname` string,
## 5                                  `songid` string,
## 6                                `artistid` string,
## 7                              `timesignature` int,
## 8                `timesignature_confidence` double,
## 9                                `loudness` double,
## 10                                  `tempo` double,
## 11                       `tempo_confidence` double,
## 12                                       `key` int,
## 13                         `key_confidence` double,
## 14                                 `energy` double,
## 15                                  `pitch` double,
## 16                           `timbre_0_min` double,
## 17                           `timbre_0_max` double,
## 18                           `timbre_1_min` double,
## 19                           `timbre_1_max` double,
## 20                           `timbre_2_min` double,
## 21                           `timbre_2_max` double,
## 22                           `timbre_3_min` double,
## 23                           `timbre_3_max` double,
## 24                           `timbre_4_min` double,
## 25                           `timbre_4_max` double,
## 26                           `timbre_5_min` double,
## 27                           `timbre_5_max` double,
## 28                           `timbre_6_min` double,
## 29                           `timbre_6_max` double,
## 30                           `timbre_7_min` double,
## 31                           `timbre_7_max` double,
## 32                           `timbre_8_min` double,
## 33                           `timbre_8_max` double,
## 34                           `timbre_9_min` double,
## 35                           `timbre_9_max` double,
## 36                          `timbre_10_min` double,
## 37                          `timbre_10_max` double,
## 38                          `timbre_11_min` double,
## 39                          `timbre_11_max` double,
## 40                                     `top10` int)
## 41                             ROW FORMAT DELIMITED 
## 42                         FIELDS TERMINATED BY ',' 
## 43                            STORED AS INPUTFORMAT 
## 44       'org.apache.hadoop.mapred.TextInputFormat' 
## 45                                     OUTPUTFORMAT 
## 46  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
## 47                                        LOCATION
## 48    's3://aws-bigdata-blog/artifacts/predict-billboard/data'
## 49                                  TBLPROPERTIES (
## 50            'transient_lastDdlTime'='1505484133')

Run a sample query

Next, run a sample query to obtain a list of all songs from Janet Jackson that made it to the Billboard Top 10 charts.

dbGetQuery(con, " SELECT songtitle,artistname,top10   FROM sampledb.billboard WHERE lower(artistname) =     'janet jackson' AND top10 = 1")
##                       songtitle    artistname top10
## 1                       Runaway Janet Jackson     1
## 2               Because Of Love Janet Jackson     1
## 3                         Again Janet Jackson     1
## 4                            If Janet Jackson     1
## 5  Love Will Never Do (Without You) Janet Jackson 1
## 6                     Black Cat Janet Jackson     1
## 7               Come Back To Me Janet Jackson     1
## 8                       Alright Janet Jackson     1
## 9                      Escapade Janet Jackson     1
## 10                Rhythm Nation Janet Jackson     1

Determine how many songs in this dataset are specifically from the year 2010.

dbGetQuery(con, " SELECT count(*)   FROM sampledb.billboard WHERE year = 2010")
##   _col0
## 1   373

The sample dataset provides certain song properties of interest that can be analyzed to gauge the impact to the song’s overall popularity. Look at one such property, timesignature, and determine the value that is the most frequent among songs in the database. Timesignature is a measure of the number of beats and the type of note involved.

Running the query directly may result in an error, as shown in the commented lines below. This error is a result of trying to retrieve a large result set over a JDBC connection, which can cause out-of-memory issues at the client level. To address this, reduce the fetch size and run again.

#t<-dbGetQuery(con, " SELECT timesignature FROM sampledb.billboard")
#Note:  Running the preceding query results in the following error: 
#Error in .jcall(rp, "I", "fetch", stride, block): java.sql.SQLException: The requested #fetchSize is more than the allowed value in Athena. Please reduce the fetchSize and try #again. Refer to the Athena documentation for valid fetchSize values.
# Use the dbSendQuery function, reduce the fetch size, and run again
r <- dbSendQuery(con, " SELECT timesignature     FROM sampledb.billboard")
dftimesignature<- fetch(r, n=-1, block=100)
dbClearResult(r)
## [1] TRUE
table(dftimesignature)
## dftimesignature
##    0    1    3    4    5    7 
##   10  143  503 6787  112   19
nrow(dftimesignature)
## [1] 7574

From the results, observe that 6787 songs have a timesignature of 4.

Next, determine the song with the highest tempo.

dbGetQuery(con, " SELECT songtitle,artistname,tempo   FROM sampledb.billboard WHERE tempo = (SELECT max(tempo) FROM sampledb.billboard) ")
##                   songtitle      artistname   tempo
## 1 Wanna Be Startin' Somethin' Michael Jackson 244.307

Create the training dataset

Your model needs to be trained such that it can learn and make accurate predictions. Split the data into training and test datasets, and create the training dataset first.  This dataset contains all observations from the year 2009 and earlier. You may face the same JDBC connection issue pointed out earlier, so this query uses a fetch size.

#BillboardTrain <- dbGetQuery(con, "SELECT * FROM sampledb.billboard WHERE year <= 2009")
#Running the preceding query results in the following error:-
#Error in .verify.JDBC.result(r, "Unable to retrieve JDBC result set for ", : Unable to retrieve #JDBC result set for SELECT * FROM sampledb.billboard WHERE year <= 2009 (Internal error)
#Follow the same approach as before to address this issue.

r <- dbSendQuery(con, "SELECT * FROM sampledb.billboard WHERE year <= 2009")
BillboardTrain <- fetch(r, n=-1, block=100)
dbClearResult(r)
## [1] TRUE
BillboardTrain[1:2,c(1:3,6:10)]
##   year           songtitle artistname timesignature
## 1 2009 The Awkward Goodbye    Athlete             3
## 2 2009        Rubik's Cube    Athlete             3
##   timesignature_confidence loudness   tempo tempo_confidence
## 1                    0.732   -6.320  89.614   0.652
## 2                    0.906   -9.541 117.742   0.542
nrow(BillboardTrain)
## [1] 7201

Create the test dataset

BillboardTest <- dbGetQuery(con, "SELECT * FROM sampledb.billboard where year = 2010")
BillboardTest[1:2,c(1:3,11:15)]
##   year              songtitle        artistname key
## 1 2010 This Is the House That Doubt Built A Day to Remember  11
## 2 2010        Sticks & Bricks A Day to Remember  10
##   key_confidence    energy pitch timbre_0_min
## 1          0.453 0.9666556 0.024        0.002
## 2          0.469 0.9847095 0.025        0.000
nrow(BillboardTest)
## [1] 373

Convert the training and test datasets into H2O dataframes

train.h2o <- as.h2o(BillboardTrain)
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=================================================================| 100%
test.h2o <- as.h2o(BillboardTest)
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=================================================================| 100%

Inspect the column names in your H2O dataframes.

colnames(train.h2o)
##  [1] "year"                     "songtitle"               
##  [3] "artistname"               "songid"                  
##  [5] "artistid"                 "timesignature"           
##  [7] "timesignature_confidence" "loudness"                
##  [9] "tempo"                    "tempo_confidence"        
## [11] "key"                      "key_confidence"          
## [13] "energy"                   "pitch"                   
## [15] "timbre_0_min"             "timbre_0_max"            
## [17] "timbre_1_min"             "timbre_1_max"            
## [19] "timbre_2_min"             "timbre_2_max"            
## [21] "timbre_3_min"             "timbre_3_max"            
## [23] "timbre_4_min"             "timbre_4_max"            
## [25] "timbre_5_min"             "timbre_5_max"            
## [27] "timbre_6_min"             "timbre_6_max"            
## [29] "timbre_7_min"             "timbre_7_max"            
## [31] "timbre_8_min"             "timbre_8_max"            
## [33] "timbre_9_min"             "timbre_9_max"            
## [35] "timbre_10_min"            "timbre_10_max"           
## [37] "timbre_11_min"            "timbre_11_max"           
## [39] "top10"

Create models

You need to designate the independent and dependent variables prior to applying your modeling algorithms. Because you’re trying to predict the ‘top10’ field, this would be your dependent variable and everything else would be independent.

Create your first model using GLM. Because GLM works best with numeric data, you create your model by dropping non-numeric variables. You only use the variables in the dataset that describe the numerical attributes of the song in the logistic regression model. You won’t use these variables:  “year”, “songtitle”, “artistname”, “songid”, or “artistid”.

y.dep <- 39
x.indep <- c(6:38)
x.indep
##  [1]  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
## [24] 29 30 31 32 33 34 35 36 37 38

Create Model 1: All numeric variables

Create Model 1 with the training dataset, using GLM as the modeling algorithm and H2O’s built-in h2o.glm function.

modelh1 <- h2o.glm( y = y.dep, x = x.indep, training_frame = train.h2o, family = "binomial")
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=====                                                            |   8%
  |                                                                       
  |=================================================================| 100%

Measure the performance of Model 1, using H2O’s built-in performance function.

h2o.performance(model=modelh1,newdata=test.h2o)
## H2OBinomialMetrics: glm
## 
## MSE:  0.09924684
## RMSE:  0.3150347
## LogLoss:  0.3220267
## Mean Per-Class Error:  0.2380168
## AUC:  0.8431394
## Gini:  0.6862787
## R^2:  0.254663
## Null Deviance:  326.0801
## Residual Deviance:  240.2319
## AIC:  308.2319
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##          0   1    Error     Rate
## 0      255  59 0.187898  =59/314
## 1       17  42 0.288136   =17/59
## Totals 272 101 0.203753  =76/373
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                         metric threshold    value idx
## 1                       max f1  0.192772 0.525000 100
## 2                       max f2  0.124912 0.650510 155
## 3                 max f0point5  0.416258 0.612903  23
## 4                 max accuracy  0.416258 0.879357  23
## 5                max precision  0.813396 1.000000   0
## 6                   max recall  0.037579 1.000000 282
## 7              max specificity  0.813396 1.000000   0
## 8             max absolute_mcc  0.416258 0.455251  23
## 9   max min_per_class_accuracy  0.161402 0.738854 125
## 10 max mean_per_class_accuracy  0.124912 0.765006 155
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or ` 
h2o.auc(h2o.performance(modelh1,test.h2o)) 
## [1] 0.8431394

The AUC metric provides insight into how well the classifier is able to separate the two classes. In this case, the value of 0.8431394 indicates that the classification is good. (A value of 0.5 indicates a worthless test, while a value of 1.0 indicates a perfect test.)

Next, inspect the coefficients of the variables in the dataset.

dfmodelh1 <- as.data.frame(h2o.varimp(modelh1))
dfmodelh1
##                       names coefficients sign
## 1              timbre_0_max  1.290938663  NEG
## 2                  loudness  1.262941934  POS
## 3                     pitch  0.616995941  NEG
## 4              timbre_1_min  0.422323735  POS
## 5              timbre_6_min  0.349016024  NEG
## 6                    energy  0.348092062  NEG
## 7             timbre_11_min  0.307331997  NEG
## 8              timbre_3_max  0.302225619  NEG
## 9             timbre_11_max  0.243632060  POS
## 10             timbre_4_min  0.224233951  POS
## 11             timbre_4_max  0.204134342  POS
## 12             timbre_5_min  0.199149324  NEG
## 13             timbre_0_min  0.195147119  POS
## 14 timesignature_confidence  0.179973904  POS
## 15         tempo_confidence  0.144242598  POS
## 16            timbre_10_max  0.137644568  POS
## 17             timbre_7_min  0.126995955  NEG
## 18            timbre_10_min  0.123851179  POS
## 19             timbre_7_max  0.100031481  NEG
## 20             timbre_2_min  0.096127636  NEG
## 21           key_confidence  0.083115820  POS
## 22             timbre_6_max  0.073712419  POS
## 23            timesignature  0.067241917  POS
## 24             timbre_8_min  0.061301881  POS
## 25             timbre_8_max  0.060041698  POS
## 26                      key  0.056158445  POS
## 27             timbre_3_min  0.050825116  POS
## 28             timbre_9_max  0.033733561  POS
## 29             timbre_2_max  0.030939072  POS
## 30             timbre_9_min  0.020708113  POS
## 31             timbre_1_max  0.014228818  NEG
## 32                    tempo  0.008199861  POS
## 33             timbre_5_max  0.004837870  POS
## 34                                    NA <NA>

Typically, songs with heavier instrumentation tend to be louder (have higher values in the variable “loudness”) and more energetic (have higher values in the variable “energy”). This knowledge is helpful for interpreting the modeling results.

You can make the following observations from the results:

  • The coefficient estimates for the confidence values associated with the time signature, key, and tempo variables are positive. This suggests that higher confidence leads to a higher predicted probability of a Top 10 hit.
  • The coefficient estimate for loudness is positive, meaning that mainstream listeners prefer louder songs with heavier instrumentation.
  • The coefficient estimate for energy is negative, meaning that mainstream listeners prefer songs that are less energetic, which are those songs with light instrumentation.

These coefficients lead to contradictory conclusions for Model 1. This could be due to multicollinearity issues. Inspect the correlation between the variables “loudness” and “energy” in the training set.

cor(train.h2o$loudness,train.h2o$energy)
## [1] 0.7399067

This number indicates that these two variables are highly correlated, and Model 1 does indeed suffer from multicollinearity. Typically, you associate a value of -1.0 to -0.5 or 1.0 to 0.5 to indicate strong correlation, and a value of 0.1 to 0.1 to indicate weak correlation. To avoid this correlation issue, omit one of these two variables and re-create the models.

You build two variations of the original model:

  • Model 2, in which you keep “energy” and omit “loudness”
  • Model 3, in which you keep “loudness” and omit “energy”

You compare these two models and choose the model with a better fit for this use case.

Create Model 2: Keep energy and omit loudness

colnames(train.h2o)
##  [1] "year"                     "songtitle"               
##  [3] "artistname"               "songid"                  
##  [5] "artistid"                 "timesignature"           
##  [7] "timesignature_confidence" "loudness"                
##  [9] "tempo"                    "tempo_confidence"        
## [11] "key"                      "key_confidence"          
## [13] "energy"                   "pitch"                   
## [15] "timbre_0_min"             "timbre_0_max"            
## [17] "timbre_1_min"             "timbre_1_max"            
## [19] "timbre_2_min"             "timbre_2_max"            
## [21] "timbre_3_min"             "timbre_3_max"            
## [23] "timbre_4_min"             "timbre_4_max"            
## [25] "timbre_5_min"             "timbre_5_max"            
## [27] "timbre_6_min"             "timbre_6_max"            
## [29] "timbre_7_min"             "timbre_7_max"            
## [31] "timbre_8_min"             "timbre_8_max"            
## [33] "timbre_9_min"             "timbre_9_max"            
## [35] "timbre_10_min"            "timbre_10_max"           
## [37] "timbre_11_min"            "timbre_11_max"           
## [39] "top10"
y.dep <- 39
x.indep <- c(6:7,9:38)
x.indep
##  [1]  6  7  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
## [24] 30 31 32 33 34 35 36 37 38
modelh2 <- h2o.glm( y = y.dep, x = x.indep, training_frame = train.h2o, family = "binomial")
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=======                                                          |  10%
  |                                                                       
  |=================================================================| 100%

Measure the performance of Model 2.

h2o.performance(model=modelh2,newdata=test.h2o)
## H2OBinomialMetrics: glm
## 
## MSE:  0.09922606
## RMSE:  0.3150017
## LogLoss:  0.3228213
## Mean Per-Class Error:  0.2490554
## AUC:  0.8431933
## Gini:  0.6863867
## R^2:  0.2548191
## Null Deviance:  326.0801
## Residual Deviance:  240.8247
## AIC:  306.8247
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##          0  1    Error     Rate
## 0      280 34 0.108280  =34/314
## 1       23 36 0.389831   =23/59
## Totals 303 70 0.152815  =57/373
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                         metric threshold    value idx
## 1                       max f1  0.254391 0.558140  69
## 2                       max f2  0.113031 0.647208 157
## 3                 max f0point5  0.413999 0.596026  22
## 4                 max accuracy  0.446250 0.876676  18
## 5                max precision  0.811739 1.000000   0
## 6                   max recall  0.037682 1.000000 283
## 7              max specificity  0.811739 1.000000   0
## 8             max absolute_mcc  0.254391 0.469060  69
## 9   max min_per_class_accuracy  0.141051 0.716561 131
## 10 max mean_per_class_accuracy  0.113031 0.761821 157
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`
dfmodelh2 <- as.data.frame(h2o.varimp(modelh2))
dfmodelh2
##                       names coefficients sign
## 1                     pitch  0.700331511  NEG
## 2              timbre_1_min  0.510270513  POS
## 3              timbre_0_max  0.402059546  NEG
## 4              timbre_6_min  0.333316236  NEG
## 5             timbre_11_min  0.331647383  NEG
## 6              timbre_3_max  0.252425901  NEG
## 7             timbre_11_max  0.227500308  POS
## 8              timbre_4_max  0.210663865  POS
## 9              timbre_0_min  0.208516163  POS
## 10             timbre_5_min  0.202748055  NEG
## 11             timbre_4_min  0.197246582  POS
## 12            timbre_10_max  0.172729619  POS
## 13         tempo_confidence  0.167523934  POS
## 14 timesignature_confidence  0.167398830  POS
## 15             timbre_7_min  0.142450727  NEG
## 16             timbre_8_max  0.093377516  POS
## 17            timbre_10_min  0.090333426  POS
## 18            timesignature  0.085851625  POS
## 19             timbre_7_max  0.083948442  NEG
## 20           key_confidence  0.079657073  POS
## 21             timbre_6_max  0.076426046  POS
## 22             timbre_2_min  0.071957831  NEG
## 23             timbre_9_max  0.071393189  POS
## 24             timbre_8_min  0.070225578  POS
## 25                      key  0.061394702  POS
## 26             timbre_3_min  0.048384697  POS
## 27             timbre_1_max  0.044721121  NEG
## 28                   energy  0.039698433  POS
## 29             timbre_5_max  0.039469064  POS
## 30             timbre_2_max  0.018461133  POS
## 31                    tempo  0.013279926  POS
## 32             timbre_9_min  0.005282143  NEG
## 33                                    NA <NA>

h2o.auc(h2o.performance(modelh2,test.h2o)) 
## [1] 0.8431933

You can make the following observations:

  • The AUC metric is 0.8431933.
  • Inspecting the coefficient of the variable energy, Model 2 suggests that songs with high energy levels tend to be more popular. This is as per expectation.
  • As H2O orders variables by significance, the variable energy is not significant in this model.

You can conclude that Model 2 is not ideal for this use , as energy is not significant.

CreateModel 3: Keep loudness but omit energy

colnames(train.h2o)
##  [1] "year"                     "songtitle"               
##  [3] "artistname"               "songid"                  
##  [5] "artistid"                 "timesignature"           
##  [7] "timesignature_confidence" "loudness"                
##  [9] "tempo"                    "tempo_confidence"        
## [11] "key"                      "key_confidence"          
## [13] "energy"                   "pitch"                   
## [15] "timbre_0_min"             "timbre_0_max"            
## [17] "timbre_1_min"             "timbre_1_max"            
## [19] "timbre_2_min"             "timbre_2_max"            
## [21] "timbre_3_min"             "timbre_3_max"            
## [23] "timbre_4_min"             "timbre_4_max"            
## [25] "timbre_5_min"             "timbre_5_max"            
## [27] "timbre_6_min"             "timbre_6_max"            
## [29] "timbre_7_min"             "timbre_7_max"            
## [31] "timbre_8_min"             "timbre_8_max"            
## [33] "timbre_9_min"             "timbre_9_max"            
## [35] "timbre_10_min"            "timbre_10_max"           
## [37] "timbre_11_min"            "timbre_11_max"           
## [39] "top10"
y.dep <- 39
x.indep <- c(6:12,14:38)
x.indep
##  [1]  6  7  8  9 10 11 12 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
## [24] 30 31 32 33 34 35 36 37 38
modelh3 <- h2o.glm( y = y.dep, x = x.indep, training_frame = train.h2o, family = "binomial")
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |========                                                         |  12%
  |                                                                       
  |=================================================================| 100%
perfh3<-h2o.performance(model=modelh3,newdata=test.h2o)
perfh3
## H2OBinomialMetrics: glm
## 
## MSE:  0.0978859
## RMSE:  0.3128672
## LogLoss:  0.3178367
## Mean Per-Class Error:  0.264925
## AUC:  0.8492389
## Gini:  0.6984778
## R^2:  0.2648836
## Null Deviance:  326.0801
## Residual Deviance:  237.1062
## AIC:  303.1062
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##          0  1    Error     Rate
## 0      286 28 0.089172  =28/314
## 1       26 33 0.440678   =26/59
## Totals 312 61 0.144772  =54/373
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                         metric threshold    value idx
## 1                       max f1  0.273799 0.550000  60
## 2                       max f2  0.125503 0.663265 155
## 3                 max f0point5  0.435479 0.628931  24
## 4                 max accuracy  0.435479 0.882038  24
## 5                max precision  0.821606 1.000000   0
## 6                   max recall  0.038328 1.000000 280
## 7              max specificity  0.821606 1.000000   0
## 8             max absolute_mcc  0.435479 0.471426  24
## 9   max min_per_class_accuracy  0.173693 0.745763 120
## 10 max mean_per_class_accuracy  0.125503 0.775073 155
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`
dfmodelh3 <- as.data.frame(h2o.varimp(modelh3))
dfmodelh3
##                       names coefficients sign
## 1              timbre_0_max 1.216621e+00  NEG
## 2                  loudness 9.780973e-01  POS
## 3                     pitch 7.249788e-01  NEG
## 4              timbre_1_min 3.891197e-01  POS
## 5              timbre_6_min 3.689193e-01  NEG
## 6             timbre_11_min 3.086673e-01  NEG
## 7              timbre_3_max 3.025593e-01  NEG
## 8             timbre_11_max 2.459081e-01  POS
## 9              timbre_4_min 2.379749e-01  POS
## 10             timbre_4_max 2.157627e-01  POS
## 11             timbre_0_min 1.859531e-01  POS
## 12             timbre_5_min 1.846128e-01  NEG
## 13 timesignature_confidence 1.729658e-01  POS
## 14             timbre_7_min 1.431871e-01  NEG
## 15            timbre_10_max 1.366703e-01  POS
## 16            timbre_10_min 1.215954e-01  POS
## 17         tempo_confidence 1.183698e-01  POS
## 18             timbre_2_min 1.019149e-01  NEG
## 19           key_confidence 9.109701e-02  POS
## 20             timbre_7_max 8.987908e-02  NEG
## 21             timbre_6_max 6.935132e-02  POS
## 22             timbre_8_max 6.878241e-02  POS
## 23            timesignature 6.120105e-02  POS
## 24                      key 5.814805e-02  POS
## 25             timbre_8_min 5.759228e-02  POS
## 26             timbre_1_max 2.930285e-02  NEG
## 27             timbre_9_max 2.843755e-02  POS
## 28             timbre_3_min 2.380245e-02  POS
## 29             timbre_2_max 1.917035e-02  POS
## 30             timbre_5_max 1.715813e-02  POS
## 31                    tempo 1.364418e-02  NEG
## 32             timbre_9_min 8.463143e-05  NEG
## 33                                    NA <NA>
h2o.sensitivity(perfh3,0.5)
## Warning in h2o.find_row_by_threshold(object, t): Could not find exact
## threshold: 0.5 for this set of metrics; using closest threshold found:
## 0.501855569251422. Run `h2o.predict` and apply your desired threshold on a
## probability column.
## [[1]]
## [1] 0.2033898
h2o.auc(perfh3)
## [1] 0.8492389

You can make the following observations:

  • The AUC metric is 0.8492389.
  • From the confusion matrix, the model correctly predicts that 33 songs will be top 10 hits (true positives). However, it has 26 false positives (songs that the model predicted would be Top 10 hits, but ended up not being Top 10 hits).
  • Loudness has a positive coefficient estimate, meaning that this model predicts that songs with heavier instrumentation tend to be more popular. This is the same conclusion from Model 2.
  • Loudness is significant in this model.

Overall, Model 3 predicts a higher number of top 10 hits with an accuracy rate that is acceptable. To choose the best fit for production runs, record labels should consider the following factors:

  • Desired model accuracy at a given threshold
  • Number of correct predictions for top10 hits
  • Tolerable number of false positives or false negatives

Next, make predictions using Model 3 on the test dataset.

predict.regh <- h2o.predict(modelh3, test.h2o)
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=================================================================| 100%
print(predict.regh)
##   predict        p0          p1
## 1       0 0.9654739 0.034526052
## 2       0 0.9654748 0.034525236
## 3       0 0.9635547 0.036445318
## 4       0 0.9343579 0.065642149
## 5       0 0.9978334 0.002166601
## 6       0 0.9779949 0.022005078
## 
## [373 rows x 3 columns]
predict.regh$predict
##   predict
## 1       0
## 2       0
## 3       0
## 4       0
## 5       0
## 6       0
## 
## [373 rows x 1 column]
dpr<-as.data.frame(predict.regh)
#Rename the predicted column 
colnames(dpr)[colnames(dpr) == 'predict'] <- 'predict_top10'
table(dpr$predict_top10)
## 
##   0   1 
## 312  61

The first set of output results specifies the probabilities associated with each predicted observation.  For example, observation 1 is 96.54739% likely to not be a Top 10 hit, and 3.4526052% likely to be a Top 10 hit (predict=1 indicates Top 10 hit and predict=0 indicates not a Top 10 hit).  The second set of results list the actual predictions made.  From the third set of results, this model predicts that 61 songs will be top 10 hits.

Compute the baseline accuracy, by assuming that the baseline predicts the most frequent outcome, which is that most songs are not Top 10 hits.

table(BillboardTest$top10)
## 
##   0   1 
## 314  59

Now observe that the baseline model would get 314 observations correct, and 59 wrong, for an accuracy of 314/(314+59) = 0.8418231.

It seems that Model 3, with an accuracy of 0.8552, provides you with a small improvement over the baseline model. But is this model useful for record labels?

View the two models from an investment perspective:

  • A production company is interested in investing in songs that are more likely to make it to the Top 10. The company’s objective is to minimize the risk of financial losses attributed to investing in songs that end up unpopular.
  • How many songs does Model 3 correctly predict as a Top 10 hit in 2010? Looking at the confusion matrix, you see that it predicts 33 top 10 hits correctly at an optimal threshold, which is more than half the number
  • It will be more useful to the record label if you can provide the production company with a list of songs that are highly likely to end up in the Top 10.
  • The baseline model is not useful, as it simply does not label any song as a hit.

Considering the three models built so far, you can conclude that Model 3 proves to be the best investment choice for the record label.

GBM model

H2O provides you with the ability to explore other learning models, such as GBM and deep learning. Explore building a model using the GBM technique, using the built-in h2o.gbm function.

Before you do this, you need to convert the target variable to a factor for multinomial classification techniques.

train.h2o$top10=as.factor(train.h2o$top10)
gbm.modelh <- h2o.gbm(y=y.dep, x=x.indep, training_frame = train.h2o, ntrees = 500, max_depth = 4, learn_rate = 0.01, seed = 1122,distribution="multinomial")
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |===                                                              |   5%
  |                                                                       
  |=====                                                            |   7%
  |                                                                       
  |======                                                           |   9%
  |                                                                       
  |=======                                                          |  10%
  |                                                                       
  |======================                                           |  33%
  |                                                                       
  |=====================================                            |  56%
  |                                                                       
  |====================================================             |  79%
  |                                                                       
  |================================================================ |  98%
  |                                                                       
  |=================================================================| 100%
perf.gbmh<-h2o.performance(gbm.modelh,test.h2o)
perf.gbmh
## H2OBinomialMetrics: gbm
## 
## MSE:  0.09860778
## RMSE:  0.3140188
## LogLoss:  0.3206876
## Mean Per-Class Error:  0.2120263
## AUC:  0.8630573
## Gini:  0.7261146
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##          0  1    Error     Rate
## 0      266 48 0.152866  =48/314
## 1       16 43 0.271186   =16/59
## Totals 282 91 0.171582  =64/373
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                       metric threshold    value idx
## 1                     max f1  0.189757 0.573333  90
## 2                     max f2  0.130895 0.693717 145
## 3               max f0point5  0.327346 0.598802  26
## 4               max accuracy  0.442757 0.876676  14
## 5              max precision  0.802184 1.000000   0
## 6                 max recall  0.049990 1.000000 284
## 7            max specificity  0.802184 1.000000   0
## 8           max absolute_mcc  0.169135 0.496486 104
## 9 max min_per_class_accuracy  0.169135 0.796610 104
## 10 max mean_per_class_accuracy  0.169135 0.805948 104
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `
h2o.sensitivity(perf.gbmh,0.5)
## Warning in h2o.find_row_by_threshold(object, t): Could not find exact
## threshold: 0.5 for this set of metrics; using closest threshold found:
## 0.501205344484314. Run `h2o.predict` and apply your desired threshold on a
## probability column.
## [[1]]
## [1] 0.1355932
h2o.auc(perf.gbmh)
## [1] 0.8630573

This model correctly predicts 43 top 10 hits, which is 10 more than the number predicted by Model 3. Moreover, the AUC metric is higher than the one obtained from Model 3.

As seen above, H2O’s API provides the ability to obtain key statistical measures required to analyze the models easily, using several built-in functions. The record label can experiment with different parameters to arrive at the model that predicts the maximum number of Top 10 hits at the desired level of accuracy and threshold.

H2O also allows you to experiment with deep learning models. Deep learning models have the ability to learn features implicitly, but can be more expensive computationally.

Now, create a deep learning model with the h2o.deeplearning function, using the same training and test datasets created before. The time taken to run this model depends on the type of EC2 instance chosen for this purpose.  For models that require more computation, consider using accelerated computing instances such as the P2 instance type.

system.time(
  dlearning.modelh <- h2o.deeplearning(y = y.dep,
                                      x = x.indep,
                                      training_frame = train.h2o,
                                      epoch = 250,
                                      hidden = c(250,250),
                                      activation = "Rectifier",
                                      seed = 1122,
                                      distribution="multinomial"
  )
)
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |===                                                              |   4%
  |                                                                       
  |=====                                                            |   8%
  |                                                                       
  |========                                                         |  12%
  |                                                                       
  |==========                                                       |  16%
  |                                                                       
  |=============                                                    |  20%
  |                                                                       
  |================                                                 |  24%
  |                                                                       
  |==================                                               |  28%
  |                                                                       
  |=====================                                            |  32%
  |                                                                       
  |=======================                                          |  36%
  |                                                                       
  |==========================                                       |  40%
  |                                                                       
  |=============================                                    |  44%
  |                                                                       
  |===============================                                  |  48%
  |                                                                       
  |==================================                               |  52%
  |                                                                       
  |====================================                             |  56%
  |                                                                       
  |=======================================                          |  60%
  |                                                                       
  |==========================================                       |  64%
  |                                                                       
  |============================================                     |  68%
  |                                                                       
  |===============================================                  |  72%
  |                                                                       
  |=================================================                |  76%
  |                                                                       
  |====================================================             |  80%
  |                                                                       
  |=======================================================          |  84%
  |                                                                       
  |=========================================================        |  88%
  |                                                                       
  |============================================================     |  92%
  |                                                                       
  |==============================================================   |  96%
  |                                                                       
  |=================================================================| 100%
##    user  system elapsed 
##   1.216   0.020 166.508
perf.dl<-h2o.performance(model=dlearning.modelh,newdata=test.h2o)
perf.dl
## H2OBinomialMetrics: deeplearning
## 
## MSE:  0.1678359
## RMSE:  0.4096778
## LogLoss:  1.86509
## Mean Per-Class Error:  0.3433013
## AUC:  0.7568822
## Gini:  0.5137644
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##          0  1    Error     Rate
## 0      290 24 0.076433  =24/314
## 1       36 23 0.610169   =36/59
## Totals 326 47 0.160858  =60/373
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                       metric threshold    value idx
## 1                     max f1  0.826267 0.433962  46
## 2                     max f2  0.000000 0.588235 239
## 3               max f0point5  0.999929 0.511811  16
## 4               max accuracy  0.999999 0.865952  10
## 5              max precision  1.000000 1.000000   0
## 6                 max recall  0.000000 1.000000 326
## 7            max specificity  1.000000 1.000000   0
## 8           max absolute_mcc  0.999929 0.363219  16
## 9 max min_per_class_accuracy  0.000004 0.662420 145
## 10 max mean_per_class_accuracy  0.000000 0.685334 224
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`
h2o.sensitivity(perf.dl,0.5)
## Warning in h2o.find_row_by_threshold(object, t): Could not find exact
## threshold: 0.5 for this set of metrics; using closest threshold found:
## 0.496293348880151. Run `h2o.predict` and apply your desired threshold on a
## probability column.
## [[1]]
## [1] 0.3898305
h2o.auc(perf.dl)
## [1] 0.7568822

The AUC metric for this model is 0.7568822, which is less than what you got from the earlier models. I recommend further experimentation using different hyper parameters, such as the learning rate, epoch or the number of hidden layers.

H2O’s built-in functions provide many key statistical measures that can help measure model performance. Here are some of these key terms.

Metric Description
Sensitivity Measures the proportion of positives that have been correctly identified. It is also called the true positive rate, or recall.
Specificity Measures the proportion of negatives that have been correctly identified. It is also called the true negative rate.
Threshold Cutoff point that maximizes specificity and sensitivity. While the model may not provide the highest prediction at this point, it would not be biased towards positives or negatives.
Precision The fraction of the documents retrieved that are relevant to the information needed, for example, how many of the positively classified are relevant
AUC

Provides insight into how well the classifier is able to separate the two classes. The implicit goal is to deal with situations where the sample distribution is highly skewed, with a tendency to overfit to a single class.

0.90 – 1 = excellent (A)

0.8 – 0.9 = good (B)

0.7 – 0.8 = fair (C)

.6 – 0.7 = poor (D)

0.5 – 0.5 = fail (F)

Here’s a summary of the metrics generated from H2O’s built-in functions for the three models that produced useful results.

Metric Model 3 GBM Model Deep Learning Model

Accuracy

(max)

0.882038

(t=0.435479)

0.876676

(t=0.442757)

0.865952

(t=0.999999)

Precision

(max)

1.0

(t=0.821606)

1.0

(t=0802184)

1.0

(t=1.0)

Recall

(max)

1.0 1.0

1.0

(t=0)

Specificity

(max)

1.0 1.0

1.0

(t=1)

Sensitivity

 

0.2033898 0.1355932

0.3898305

(t=0.5)

AUC 0.8492389 0.8630573 0.756882

Note: ‘t’ denotes threshold.

Your options at this point could be narrowed down to Model 3 and the GBM model, based on the AUC and accuracy metrics observed earlier.  If the slightly lower accuracy of the GBM model is deemed acceptable, the record label can choose to go to production with the GBM model, as it can predict a higher number of Top 10 hits.  The AUC metric for the GBM model is also higher than that of Model 3.

Record labels can experiment with different learning techniques and parameters before arriving at a model that proves to be the best fit for their business. Because deep learning models can be computationally expensive, record labels can choose more powerful EC2 instances on AWS to run their experiments faster.

Conclusion

In this post, I showed how the popular music industry can use analytics to predict the type of songs that make the Top 10 Billboard charts. By running H2O’s scalable machine learning platform on AWS, data scientists can easily experiment with multiple modeling techniques and interactively query the data using Amazon Athena, without having to manage the underlying infrastructure. This helps record labels make critical decisions on the type of artists and songs to promote in a timely fashion, thereby increasing sales and revenue.

If you have questions or suggestions, please comment below.


Additional Reading

Learn how to build and explore a simple geospita simple GEOINT application using SparkR.


About the Authors

gopalGopal Wunnava is a Partner Solution Architect with the AWS GSI Team. He works with partners and customers on big data engagements, and is passionate about building analytical solutions that drive business capabilities and decision making. In his spare time, he loves all things sports and movies related and is fond of old classics like Asterix, Obelix comics and Hitchcock movies.

 

 

Bob Strahan, a Senior Consultant with AWS Professional Services, contributed to this post.

 

 

"Responsible encryption" fallacies

Post Syndicated from Robert Graham original http://blog.erratasec.com/2017/10/responsible-encryption-fallacies.html

Deputy Attorney General Rod Rosenstein gave a speech recently calling for “Responsible Encryption” (aka. “Crypto Backdoors”). It’s full of dangerous ideas that need to be debunked.

The importance of law enforcement

The first third of the speech talks about the importance of law enforcement, as if it’s the only thing standing between us and chaos. It cites the 2016 Mirai attacks as an example of the chaos that will only get worse without stricter law enforcement.

But the Mira case demonstrated the opposite, how law enforcement is not needed. They made no arrests in the case. A year later, they still haven’t a clue who did it.

Conversely, we technologists have fixed the major infrastructure issues. Specifically, those affected by the DNS outage have moved to multiple DNS providers, including a high-capacity DNS provider like Google and Amazon who can handle such large attacks easily.

In other words, we the people fixed the major Mirai problem, and law-enforcement didn’t.

Moreover, instead being a solution to cyber threats, law enforcement has become a threat itself. The DNC didn’t have the FBI investigate the attacks from Russia likely because they didn’t want the FBI reading all their files, finding wrongdoing by the DNC. It’s not that they did anything actually wrong, but it’s more like that famous quote from Richelieu “Give me six words written by the most honest of men and I’ll find something to hang him by”. Give all your internal emails over to the FBI and I’m certain they’ll find something to hang you by, if they want.
Or consider the case of Andrew Auernheimer. He found AT&T’s website made public user accounts of the first iPad, so he copied some down and posted them to a news site. AT&T had denied the problem, so making the problem public was the only way to force them to fix it. Such access to the website was legal, because AT&T had made the data public. However, prosecutors disagreed. In order to protect the powerful, they twisted and perverted the law to put Auernheimer in jail.

It’s not that law enforcement is bad, it’s that it’s not the unalloyed good Rosenstein imagines. When law enforcement becomes the thing Rosenstein describes, it means we live in a police state.

Where law enforcement can’t go

Rosenstein repeats the frequent claim in the encryption debate:

Our society has never had a system where evidence of criminal wrongdoing was totally impervious to detection

Of course our society has places “impervious to detection”, protected by both legal and natural barriers.

An example of a legal barrier is how spouses can’t be forced to testify against each other. This barrier is impervious.

A better example, though, is how so much of government, intelligence, the military, and law enforcement itself is impervious. If prosecutors could gather evidence everywhere, then why isn’t Rosenstein prosecuting those guilty of CIA torture?

Oh, you say, government is a special exception. If that were the case, then why did Rosenstein dedicate a precious third of his speech discussing the “rule of law” and how it applies to everyone, “protecting people from abuse by the government”. It obviously doesn’t, there’s one rule of government and a different rule for the people, and the rule for government means there’s lots of places law enforcement can’t go to gather evidence.

Likewise, the crypto backdoor Rosenstein is demanding for citizens doesn’t apply to the President, Congress, the NSA, the Army, or Rosenstein himself.

Then there are the natural barriers. The police can’t read your mind. They can only get the evidence that is there, like partial fingerprints, which are far less reliable than full fingerprints. They can’t go backwards in time.

I mention this because encryption is a natural barrier. It’s their job to overcome this barrier if they can, to crack crypto and so forth. It’s not our job to do it for them.

It’s like the camera that increasingly comes with TVs for video conferencing, or the microphone on Alexa-style devices that are always recording. This suddenly creates evidence that the police want our help in gathering, such as having the camera turned on all the time, recording to disk, in case the police later gets a warrant, to peer backward in time what happened in our living rooms. The “nothing is impervious” argument applies here as well. And it’s equally bogus here. By not helping police by not recording our activities, we aren’t somehow breaking some long standing tradit

And this is the scary part. It’s not that we are breaking some ancient tradition that there’s no place the police can’t go (with a warrant). Instead, crypto backdoors breaking the tradition that never before have I been forced to help them eavesdrop on me, even before I’m a suspect, even before any crime has been committed. Sure, laws like CALEA force the phone companies to help the police against wrongdoers — but here Rosenstein is insisting I help the police against myself.

Balance between privacy and public safety

Rosenstein repeats the frequent claim that encryption upsets the balance between privacy/safety:

Warrant-proof encryption defeats the constitutional balance by elevating privacy above public safety.

This is laughable, because technology has swung the balance alarmingly in favor of law enforcement. Far from “Going Dark” as his side claims, the problem we are confronted with is “Going Light”, where the police state monitors our every action.

You are surrounded by recording devices. If you walk down the street in town, outdoor surveillance cameras feed police facial recognition systems. If you drive, automated license plate readers can track your route. If you make a phone call or use a credit card, the police get a record of the transaction. If you stay in a hotel, they demand your ID, for law enforcement purposes.

And that’s their stuff, which is nothing compared to your stuff. You are never far from a recording device you own, such as your mobile phone, TV, Alexa/Siri/OkGoogle device, laptop. Modern cars from the last few years increasingly have always-on cell connections and data recorders that record your every action (and location).

Even if you hike out into the country, when you get back, the FBI can subpoena your GPS device to track down your hidden weapon’s cache, or grab the photos from your camera.

And this is all offline. So much of what we do is now online. Of the photographs you own, fewer than 1% are printed out, the rest are on your computer or backed up to the cloud.

Your phone is also a GPS recorder of your exact position all the time, which if the government wins the Carpenter case, they police can grab without a warrant. Tagging all citizens with a recording device of their position is not “balance” but the premise for a novel more dystopic than 1984.

If suspected of a crime, which would you rather the police searched? Your person, houses, papers, and physical effects? Or your mobile phone, computer, email, and online/cloud accounts?

The balance of privacy and safety has swung so far in favor of law enforcement that rather than debating whether they should have crypto backdoors, we should be debating how to add more privacy protections.

“But it’s not conclusive”

Rosenstein defends the “going light” (“Golden Age of Surveillance”) by pointing out it’s not always enough for conviction. Nothing gives a conviction better than a person’s own words admitting to the crime that were captured by surveillance. This other data, while copious, often fails to convince a jury beyond a reasonable doubt.
This is nonsense. Police got along well enough before the digital age, before such widespread messaging. They solved terrorist and child abduction cases just fine in the 1980s. Sure, somebody’s GPS location isn’t by itself enough — until you go there and find all the buried bodies, which leads to a conviction. “Going dark” imagines that somehow, the evidence they’ve been gathering for centuries is going away. It isn’t. It’s still here, and matches up with even more digital evidence.
Conversely, a person’s own words are not as conclusive as you think. There’s always missing context. We quickly get back to the Richelieu “six words” problem, where captured communications are twisted to convict people, with defense lawyers trying to untwist them.

Rosenstein’s claim may be true, that a lot of criminals will go free because the other electronic data isn’t convincing enough. But I’d need to see that claim backed up with hard studies, not thrown out for emotional impact.

Terrorists and child molesters

You can always tell the lack of seriousness of law enforcement when they bring up terrorists and child molesters.
To be fair, sometimes we do need to talk about terrorists. There are things unique to terrorism where me may need to give government explicit powers to address those unique concerns. For example, the NSA buys mobile phone 0day exploits in order to hack terrorist leaders in tribal areas. This is a good thing.
But when terrorists use encryption the same way everyone else does, then it’s not a unique reason to sacrifice our freedoms to give the police extra powers. Either it’s a good idea for all crimes or no crimes — there’s nothing particular about terrorism that makes it an exceptional crime. Dead people are dead. Any rational view of the problem relegates terrorism to be a minor problem. More citizens have died since September 8, 2001 from their own furniture than from terrorism. According to studies, the hot water from the tap is more of a threat to you than terrorists.
Yes, government should do what they can to protect us from terrorists, but no, it’s not so bad of a threat that requires the imposition of a military/police state. When people use terrorism to justify their actions, it’s because they trying to form a military/police state.
A similar argument works with child porn. Here’s the thing: the pervs aren’t exchanging child porn using the services Rosenstein wants to backdoor, like Apple’s Facetime or Facebook’s WhatsApp. Instead, they are exchanging child porn using custom services they build themselves.
Again, I’m (mostly) on the side of the FBI. I support their idea of buying 0day exploits in order to hack the web browsers of visitors to the secret “PlayPen” site. This is something that’s narrow to this problem and doesn’t endanger the innocent. On the other hand, their calls for crypto backdoors endangers the innocent while doing effectively nothing to address child porn.
Terrorists and child molesters are a clichéd, non-serious excuse to appeal to our emotions to give up our rights. We should not give in to such emotions.

Definition of “backdoor”

Rosenstein claims that we shouldn’t call backdoors “backdoors”:

No one calls any of those functions [like key recovery] a “back door.”  In fact, those capabilities are marketed and sought out by many users.

He’s partly right in that we rarely refer to PGP’s key escrow feature as a “backdoor”.

But that’s because the term “backdoor” refers less to how it’s done and more to who is doing it. If I set up a recovery password with Apple, I’m the one doing it to myself, so we don’t call it a backdoor. If it’s the police, spies, hackers, or criminals, then we call it a “backdoor” — even it’s identical technology.

Wikipedia uses the key escrow feature of the 1990s Clipper Chip as a prime example of what everyone means by “backdoor“. By “no one”, Rosenstein is including Wikipedia, which is obviously incorrect.

Though in truth, it’s not going to be the same technology. The needs of law enforcement are different than my personal key escrow/backup needs. In particular, there are unsolvable problems, such as a backdoor that works for the “legitimate” law enforcement in the United States but not for the “illegitimate” police states like Russia and China.

I feel for Rosenstein, because the term “backdoor” does have a pejorative connotation, which can be considered unfair. But that’s like saying the word “murder” is a pejorative term for killing people, or “torture” is a pejorative term for torture. The bad connotation exists because we don’t like government surveillance. I mean, honestly calling this feature “government surveillance feature” is likewise pejorative, and likewise exactly what it is that we are talking about.

Providers

Rosenstein focuses his arguments on “providers”, like Snapchat or Apple. But this isn’t the question.

The question is whether a “provider” like Telegram, a Russian company beyond US law, provides this feature. Or, by extension, whether individuals should be free to install whatever software they want, regardless of provider.

Telegram is a Russian company that provides end-to-end encryption. Anybody can download their software in order to communicate so that American law enforcement can’t eavesdrop. They aren’t going to put in a backdoor for the U.S. If we succeed in putting backdoors in Apple and WhatsApp, all this means is that criminals are going to install Telegram.

If the, for some reason, the US is able to convince all such providers (including Telegram) to install a backdoor, then it still doesn’t solve the problem, as uses can just build their own end-to-end encryption app that has no provider. It’s like email: some use the major providers like GMail, others setup their own email server.

Ultimately, this means that any law mandating “crypto backdoors” is going to target users not providers. Rosenstein tries to make a comparison with what plain-old telephone companies have to do under old laws like CALEA, but that’s not what’s happening here. Instead, for such rules to have any effect, they have to punish users for what they install, not providers.

This continues the argument I made above. Government backdoors is not something that forces Internet services to eavesdrop on us — it forces us to help the government spy on ourselves.
Rosenstein tries to address this by pointing out that it’s still a win if major providers like Apple and Facetime are forced to add backdoors, because they are the most popular, and some terrorists/criminals won’t move to alternate platforms. This is false. People with good intentions, who are unfairly targeted by a police state, the ones where police abuse is rampant, are the ones who use the backdoored products. Those with bad intentions, who know they are guilty, will move to the safe products. Indeed, Telegram is already popular among terrorists because they believe American services are already all backdoored. 
Rosenstein is essentially demanding the innocent get backdoored while the guilty don’t. This seems backwards. This is backwards.

Apple is morally weak

The reason I’m writing this post is because Rosenstein makes a few claims that cannot be ignored. One of them is how he describes Apple’s response to government insistence on weakening encryption doing the opposite, strengthening encryption. He reasons this happens because:

Of course they [Apple] do. They are in the business of selling products and making money. 

We [the DoJ] use a different measure of success. We are in the business of preventing crime and saving lives. 

He swells in importance. His condescending tone ennobles himself while debasing others. But this isn’t how things work. He’s not some white knight above the peasantry, protecting us. He’s a beat cop, a civil servant, who serves us.

A better phrasing would have been:

They are in the business of giving customers what they want.

We are in the business of giving voters what they want.

Both sides are doing the same, giving people what they want. Yes, voters want safety, but they also want privacy. Rosenstein imagines that he’s free to ignore our demands for privacy as long has he’s fulfilling his duty to protect us. He has explicitly rejected what people want, “we use a different measure of success”. He imagines it’s his job to tell us where the balance between privacy and safety lies. That’s not his job, that’s our job. We, the people (and our representatives), make that decision, and it’s his job is to do what he’s told. His measure of success is how well he fulfills our wishes, not how well he satisfies his imagined criteria.

That’s why those of us on this side of the debate doubt the good intentions of those like Rosenstein. He criticizes Apple for wanting to protect our rights/freedoms, and declare they measure success differently.

They are willing to be vile

Rosenstein makes this argument:

Companies are willing to make accommodations when required by the government. Recent media reports suggest that a major American technology company developed a tool to suppress online posts in certain geographic areas in order to embrace a foreign government’s censorship policies. 

Let me translate this for you:

Companies are willing to acquiesce to vile requests made by police-states. Therefore, they should acquiesce to our vile police-state requests.

It’s Rosenstein who is admitting here is that his requests are those of a police-state.

Constitutional Rights

Rosenstein says:

There is no constitutional right to sell warrant-proof encryption.

Maybe. It’s something the courts will have to decide. There are many 1st, 2nd, 3rd, 4th, and 5th Amendment issues here.
The reason we have the Bill of Rights is because of the abuses of the British Government. For example, they quartered troops in our homes, as a way of punishing us, and as a way of forcing us to help in our own oppression. The troops weren’t there to defend us against the French, but to defend us against ourselves, to shoot us if we got out of line.

And that’s what crypto backdoors do. We are forced to be agents of our own oppression. The principles enumerated by Rosenstein apply to a wide range of even additional surveillance. With little change to his speech, it can equally argue why the constant TV video surveillance from 1984 should be made law.

Let’s go back and look at Apple. It is not some base company exploiting consumers for profit. Apple doesn’t have guns, they cannot make people buy their product. If Apple doesn’t provide customers what they want, then customers vote with their feet, and go buy an Android phone. Apple isn’t providing encryption/security in order to make a profit — it’s giving customers what they want in order to stay in business.
Conversely, if we citizens don’t like what the government does, tough luck, they’ve got the guns to enforce their edicts. We can’t easily vote with our feet and walk to another country. A “democracy” is far less democratic than capitalism. Apple is a minority, selling phones to 45% of the population, and that’s fine, the minority get the phones they want. In a Democracy, where citizens vote on the issue, those 45% are screwed, as the 55% impose their will unwanted onto the remainder.

That’s why we have the Bill of Rights, to protect the 49% against abuse by the 51%. Regardless whether the Supreme Court agrees the current Constitution, it is the sort right that might exist regardless of what the Constitution says. 

Obliged to speak the truth

Here is the another part of his speech that I feel cannot be ignored. We have to discuss this:

Those of us who swear to protect the rule of law have a different motivation.  We are obliged to speak the truth.

The truth is that “going dark” threatens to disable law enforcement and enable criminals and terrorists to operate with impunity.

This is not true. Sure, he’s obliged to say the absolute truth, in court. He’s also obliged to be truthful in general about facts in his personal life, such as not lying on his tax return (the sort of thing that can get lawyers disbarred).

But he’s not obliged to tell his spouse his honest opinion whether that new outfit makes them look fat. Likewise, Rosenstein knows his opinion on public policy doesn’t fall into this category. He can say with impunity that either global warming doesn’t exist, or that it’ll cause a biblical deluge within 5 years. Both are factually untrue, but it’s not going to get him fired.

And this particular claim is also exaggerated bunk. While everyone agrees encryption makes law enforcement’s job harder than with backdoors, nobody honestly believes it can “disable” law enforcement. While everyone agrees that encryption helps terrorists, nobody believes it can enable them to act with “impunity”.

I feel bad here. It’s a terrible thing to question your opponent’s character this way. But Rosenstein made this unavoidable when he clearly, with no ambiguity, put his integrity as Deputy Attorney General on the line behind the statement that “going dark threatens to disable law enforcement and enable criminals and terrorists to operate with impunity”. I feel it’s a bald face lie, but you don’t need to take my word for it. Read his own words yourself and judge his integrity.

Conclusion

Rosenstein’s speech includes repeated references to ideas like “oath”, “honor”, and “duty”. It reminds me of Col. Jessup’s speech in the movie “A Few Good Men”.

If you’ll recall, it was rousing speech, “you want me on that wall” and “you use words like honor as a punchline”. Of course, since he was violating his oath and sending two privates to death row in order to avoid being held accountable, it was Jessup himself who was crapping on the concepts of “honor”, “oath”, and “duty”.

And so is Rosenstein. He imagines himself on that wall, doing albeit terrible things, justified by his duty to protect citizens. He imagines that it’s he who is honorable, while the rest of us not, even has he utters bald faced lies to further his own power and authority.

We activists oppose crypto backdoors not because we lack honor, or because we are criminals, or because we support terrorists and child molesters. It’s because we value privacy and government officials who get corrupted by power. It’s not that we fear Trump becoming a dictator, it’s that we fear bureaucrats at Rosenstein’s level becoming drunk on authority — which Rosenstein demonstrably has. His speech is a long train of corrupt ideas pursuing the same object of despotism — a despotism we oppose.

In other words, we oppose crypto backdoors because it’s not a tool of law enforcement, but a tool of despotism.

Pirate Bay is Mining Cryptocurrency Again, No Opt Out

Post Syndicated from Ernesto original https://torrentfreak.com/pirate-bay-is-mining-cryptocurrency-again-no-opt-out-171011/

Last month The Pirate Bay caused some uproar by adding a Javascript-based cryptocurrency miner to its website.

The miner utilizes CPU power from visitors to generate Monero coins for the site, providing an extra source of revenue.

The Pirate Bay only tested the option briefly, but that was enough to inspire many others to follow suit. Now, a few weeks later, Pirate Bay has also turned on the miners again.

The miner is not directly embedded in the site’s core code but runs through an ad script. Many ad blockers and anti-malware tools are stopping these request, but people who don’t use any will see a clear spike in CPU usage when they access the site.

The Pirate Bay team previously said that they were testing the miner to see if it can replace ads. While there is some real revenue potential, for now, it’s running in addition to the regular banners. It’s unclear whether the current mining period is another test or if it will run permanently from now on.

The miner does appear to be throttled to a certain degree, so most users might not even notice that it’s running.

Pirate Bay load requests

Running a cryptocurrency miner such as the Coin-Hive script TPB is currently using is not without risk. Aside from user complaints, there is an issue that may make it harder for the site to operate in the future.

Last week we reported that CDN provider Cloudflare had suspended the account of torrent proxy site ProxyBunker, flagging its coin miner as malware. This means that The Pirate Bay now risks losing the Cloudflare service, which they rely on for DDoS protection, among other things.

Cloudflare’s suspension of ProxyBunker occurred even though the site provided users with an option to disable the miner. This functionality was implemented by Coinhive after the script was misused by some sites, which ran it without alerting their users.

The Pirate Bay currently has no opt-out option, nor has it informed users about the latest mining efforts. This could lead to another problem since Coinhive said it would crack down on customers who failed to keep users in the loop.

“We will verify this opt-in on our servers and will implement it in a way that it can not be circumvented. We will pledge to keep the opt-in intact at all times, without exceptions,” the Coinhive team previously noted.

The Pirate Bay team has not commented on the issue thus far. In theory, it’s possible that a rogue advertiser is responsible for the latest mining efforts. If that’s the case it will be disabled soon enough.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

5 Reasons Why AWS Leads the Cloud Market

Post Syndicated from Chris De Santis original https://www.anchor.com.au/blog/2017/10/5-reasons-aws-leads-cloud/

There is no doubt that in the cloud computing market, there is a lot of competition, but there is also a clear market leader. Amazon Web Services (AWS) leads the charge among other web services from similar tech giants such as Microsoft, IBM, and Google, but how did they get there and what’s taking so long for someone of the likes of Google to knock them off their pedestal?

5 Reasons Why AWS Leads the Cloud Market

Credit: Synergy Research Group

Recent research from Synergy Research shows that Amazon has a seemingly unbeatable lead. John Dinsdale, chief analyst at Synergy Research, told TechCrunch that, on paper, AWS is too far ahead of any competitor trying to gain a short-term advantage. The reason behind their spectacular lead is simple:

AWS was first.

If you start the race before everyone else and keep at the pace they’re running, you’re going to win, and that’s exactly what Amazon are doing. Yet, instead of sitting on their colossal market share like a throne, they’re continuing to rapidly innovate and differentiate.

Dinsdale continues to explain that AWS does five things continuously that allows them to stay on top of the cloud market:

  1. Invest considerable amounts in infrastructure
  2. Expand their fleet of services
  3. Execute it all well
  4. Grow its business with enterprises
  5. Has the full long-term backing of Amazon

What can we take from this?

Well, according to Dinsdale, the Amazon formula involves:

  • Investing in your innovation
  • Constantly broadening your product/service range
  • Perform with minimal error
  • Aim for the high-profile customers
  • Look to receive stable funding and support

The post 5 Reasons Why AWS Leads the Cloud Market appeared first on AWS Managed Services by Anchor.