Tag Archives: CIS

UK ‘Pirate’ Kodi Box Seller Handed a Suspended Prison Sentence

Post Syndicated from Andy original https://torrentfreak.com/uk-pirate-kodi-box-seller-handed-a-suspended-prison-sentence-171021/

After being raided by police and Trading Standards in 2015, Middlesbrough-based shopkeeper Brian ‘Tomo’ Thompson found himself in the spotlight.

Accused of selling “fully-loaded” Kodi boxes (those with ‘pirate’ addons installed), Thompson continued to protest his innocence.

“All I want to know is whether I am doing anything illegal. I know it’s a gray area but I want it in black and white,” he said last September.

Unlike other cases, where copyright holders took direct action, Thompson was prosecuted by his local council. At the time, he seemed prepared to martyr himself to test the limits of the law.

“This may have to go to the crown court and then it may go all the way to the European court, but I want to make a point with this and I want to make it easier for people to know what is legal and what isn’t,” he said. “I expect it go against me but at least I will know where I stand.”

In an opinion piece not long after this statement, we agreed with Thompson’s sentiment, noting that barring a miracle, the Middlesbrough man would indeed lose his case, probably in short order. But Thompson’s case turned out to be less than straightforward.

Thompson wasn’t charged with straightforward “making available” under the Copyrights, Designs and Patents Acts. If he had, there would’ve been no question that he’d been breaking law. This is due to a European Court of Justice decision in the BREIN v Filmspeler case earlier this year which determined that selling fully loaded boxes in the EU is illegal.

Instead, for reasons best known to the prosecution, ‘Tomo’ stood accused of two offenses under section 296ZB of the Copyright, Designs and Patents Act, which deals with devices and services designed to “circumvent technological measures”. It’s a different aspect of copyright law previously applied to cases where encryption has been broken on official products.

“A person commits an offense if he — in the course of a business — sells or lets for hire, any device, product or component which is primarily designed, produced, or adapted for the purpose of enabling or facilitating the circumvention of effective technological measures,” the law reads.

‘Tomo’ in his store

In January this year, Thompson entered his official ‘not guilty’ plea, setting up a potentially fascinating full trial in which we would’ve heard how ‘circumvention of technological measures’ could possibly relate to streaming illicit content from entirely unprotected far-flung sources.

Last month, however, Thompson suddenly had a change of heart, entering guilty pleas against one count of selling and one count of advertising devices for the purpose of enabling or facilitating the circumvention of effective technological measures.

That plea stomped on what could’ve been a really interesting trial, particularly since the Federation Against Copyright Theft’s own lawyer predicted it could be difficult and complex.

As a result, Thompson appeared at Teeside Crown Court on Friday for sentencing. Prosecutor Cameron Crowe said Thompson advertised and sold the ‘pirate’ devices for commercial gain, fully aware that they would be used to access infringing content and premium subscription services.

Crowe said that Thompson made around £40,000 from the devices while potentially costing Sky around £200,000 in lost subscription fees. When Thompson was raided in June 2015, a diary revealed he’d sold 159 devices in the previous four months, sales which generated £17,000 in revenue.

After his arrest, Thompson changed premises and continued to offer the devices for sale on social media.

Passing sentence, Judge Peter Armstrong told the 55-year-old businessman that he’d receive an 18-month prison term, suspended for two years.

“If anyone was under any illusion as to whether such devices as these, fully loaded Kodi boxes, were illegal or not, they can no longer be in any such doubt,” Judge Armstrong told the court, as reported by Gazette Live.

“I’ve come to the conclusion that in all the circumstances an immediate custodial sentence is not called for. But as a warning to others in future, they may not be so lucky.”

Also sentenced Friday was another local seller, Julian Allen, who sold devices to Thompson, among others. He was arrested following raids on his Geeky Kit businesses in 2015 and pleaded guilty this July to using or acquiring criminal property.

But despite making more than £135,000 from selling ‘pirate’ boxes, he too avoided jail, receiving a 21-month prison sentence suspended for two years instead.

While Thompson’s and Allen’s sentences are likely to be portrayed by copyright holders as a landmark moment, the earlier ruling from the European Court of Justice means that selling these kinds of devices for infringing purposes has always been illegal.

Perhaps the big surprise, given the dramatic lead up to both cases, is the relative leniency of their sentences. All that being said, however, a line has been drawn in the sand and other sellers should be aware.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Cloudflare Counters MPAA and RIAA’s ‘Rehashed’ Piracy Complaints

Post Syndicated from Ernesto original https://torrentfreak.com/cloudflare-counters-mpaa-and-riaas-rehashed-piracy-complaints-171020/

A few weeks ago several copyright holder groups sent their annual “Notorious Markets” complaints to the U.S. Trade Representative (USTR).

While the recommendations usually include well-known piracy sites such as The Pirate Bay, third-party services are increasingly mentioned. MPAA and RIAA, for example, wrote that Cloudflare frustrates enforcement efforts by helping pirate sites to “hide”.

The CDN provider is not happy with these characterizations and this week submitted a rebuttal. Cloudflare’s General Counsel Doug Kramer says that the company was surprised to see these mentions. Not only because they “distort” reality, but also because they are pretty much identical to those leveled last year.

“Most surprising is that their comments were basically the same complaints they filed in 2016 and contain the same mistakes and distortions that we pointed out in our rebuttal comments from October, 2016.”

“Simply repeating the same mischaracterizations for a second year in a row does not convert them into facts, so we are compelled to reiterate our objections,” Kramer adds (pdf).

There is indeed quite a bit of overlap between the submissions from both years. In fact, several sections are copied word for word, such as the RIAA’s allegation below.

“In addition, more sites are now employing services of Cloudflare, a content delivery network and distributed domain name server service. BitTorrent sites, like many other pirate sites, are increasing [sic] turning to Cloudflare because routing their site through Cloudflare obfuscates the IP address of the actual hosting provider, masking the location of the site.”

The same can be said about the MPAA’s submission, which includes a lot of the same comments and sentences as last year. That wouldn’t be much of a problem if the information was correct, but according to Cloudflare, that’s not the case.

The two industry groups claim that the CDN provider makes it more difficult to track where pirate sites are hosted. However, Cloudflare argues the opposite.

Both RIAA and MPAA are part of the “Trusted Reporter” program and use it frequently, Cloudflare points out. This program allows rightsholders to easily obtain the actual IP-addresses of Cloudflare-hosted websites that engage in widespread copyright infringement.

Most importantly, according to Cloudflare, is that the company follows the letter of the law.

“Cloudflare does not make the process of enforcing intellectual property rights online any harder — or any easier. We follow all applicable laws and regulations,” Cloudflare explained in its submission last year.

In its 2017 rebuttal, the company reiterates this position once again. Kramer also points to a recent blog post from CEO Matthew Prince, which discusses free speech and censorship issues. The message is that vigilante justice is not the answer to piracy, and all relevant stakeholders should get together to discuss how to handle these issues going forward.

For now, however, the USTR should disregard the comments regarding Cloudflare as irrelevant and inaccurate, the company argues.

“We trust that USTR will once again agree with Cloudflare that complaints implying that Cloudflare is aiding illegal activities have no place whatsoever in USTR’s Notorious Markets inquiry. It would seem to distract from and dilute the message of that report to focus on companies that are working to make the internet more cybersecure,” Kramer concludes.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Amazon QuickSight Adds Support for Combo Charts and Row-Level Security

Post Syndicated from Jose Kunnackal original https://aws.amazon.com/blogs/big-data/amazon-quicksight-adds-support-for-combo-charts-and-row-level-security/

We are excited to announce support for two new features in Amazon QuickSight: 1) Combo charts, the first visual type in QuickSight to support dual-axis visualization, and 2) Row-Level Security, which allows access control over data at the row level based on the user who is accessing QuickSight. Together, these features enable you to present more engaging and personalized dashboards in Amazon QuickSight, while enforcing stricter controls over data.

Combo charts

Amazon QuickSight now supports charts with bars and lines, which you can use to visualize metrics of different scale or numeric types. For example, you can view sales ($) and margin (%) figures for different product categories of a business on the same visual.

You can also add a field to group the bars by an additional category. Following the example above, a business might want to break up sales across product categories by state to understand the details better. Amazon QuickSight supports this as a clustered bar chart with a line:

Or, as a stacked bar chart with a line:

Row-Level Security

Today’s release also adds support for Row-Level Security (RLS) in Amazon QuickSight Enterprise Edition. RLS allows control over data at a row level based on the permissions that are associated with the user who is accessing the data. With RLS, owners of a dataset can ensure that consumers of dashboards and analyses based on the dataset only view slices of data that they are authorized to. This removes the need for dataset owners to prepare separate data sets and dashboards for users (or groups of users) with different levels of access within the data.

You can use RLS for any dataset (SPICE or direct query) by simply associating a set of user access rules. These user-specific rules can be managed in a dataset (which can also be SPICE or direct query), which is linked to the dataset that is to be restricted. Let’s walk through an example to see how this works.

Using the earlier business data example, let’s consider a situation where Susan and Jane are two users in the company who need access to different views of the same data. Susan manages sales for the state of California and should be granted access to all sales data related to the state. Jane, on the other hand, is a salesperson who covers the Aquatics, Exercise & Fitness, and Outdoors categories for Washington and Oregon.

To apply RLS for this use case, the administrator can create a new rules dataset with a username field and the specific fields that should be used to filter the data. Based on the user personas above, the rules dataset will look as follows

Username Category State
Jane Aquatics, Exercise & Fitness, Outdoors WA, OR
Susan CA

 

After creating the rules dataset in Amazon QuickSight, the administrator can link the dataset that contains sales data with this rules dataset via the new Permissions option.

After the administrator selects and links the dataset rules, the target dataset is now always filtered by the rules specified. This means that when Jane accesses the system, she sees data related to the states she covers and the categories she handles.

Similarly, Susan now sees all categories, but only for the state of California. 

With RLS in place, a data administrator no longer has to create multiple datasets to serve such use cases and can also use the same dashboards/analyses for multiple users. For more information about RLS and details about dataset rules configuration, see the Amazon QuickSight documentation.

Learn more: To learn more about these capabilities and start using them in your dashboards, see the Amazon QuickSight User Guide. 

Stay engaged: If you have questions or suggestions, you can post them on the Amazon QuickSight discussion forum. 

Not an Amazon QuickSight user?

To get started for FREE, see quicksight.aws.

 

Steal This Show S03E09: Learning To Love Your Panopticon

Post Syndicated from Ernesto original https://torrentfreak.com/steal-show-s03e09-learning-love-panopticon/

stslogo180If you enjoy this episode, consider becoming a patron and getting involved with the show. Check out Steal This Show’s Patreon campaign: support us and get all kinds of fantastic benefits!

In this episode we meet Diani Barreto from the Berlin Bureau of ExposeFacs. Launched in June 2014, ExposeFacts.org supports and encourages whistleblowers to disclose information that citizens need to make truly informed decisions in a democracy.

ExposeFacts aims to shed light on concealed activities that are relevant to human rights, corporate malfeasance, the environment, civil liberties and war.

Steal This Show aims to release bi-weekly episodes featuring insiders discussing copyright and file-sharing news. It complements our regular reporting by adding more room for opinion, commentary, and analysis.

The guests for our news discussions will vary, and we’ll aim to introduce voices from different backgrounds and persuasions. In addition to news, STS will also produce features interviewing some of the great innovators and minds.

Host: Jamie King

Guest: Diani Barreto

Produced by Jamie King
Edited & Mixed by Riley Byrne
Original Music by David Triana
Web Production by Siraje Amarniss

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Using AWS Step Functions State Machines to Handle Workflow-Driven AWS CodePipeline Actions

Post Syndicated from Marcilio Mendonca original https://aws.amazon.com/blogs/devops/using-aws-step-functions-state-machines-to-handle-workflow-driven-aws-codepipeline-actions/

AWS CodePipeline is a continuous integration and continuous delivery service for fast and reliable application and infrastructure updates. It offers powerful integration with other AWS services, such as AWS CodeBuildAWS CodeDeployAWS CodeCommit, AWS CloudFormation and with third-party tools such as Jenkins and GitHub. These services make it possible for AWS customers to successfully automate various tasks, including infrastructure provisioning, blue/green deployments, serverless deployments, AMI baking, database provisioning, and release management.

Developers have been able to use CodePipeline to build sophisticated automation pipelines that often require a single CodePipeline action to perform multiple tasks, fork into different execution paths, and deal with asynchronous behavior. For example, to deploy a Lambda function, a CodePipeline action might first inspect the changes pushed to the code repository. If only the Lambda code has changed, the action can simply update the Lambda code package, create a new version, and point the Lambda alias to the new version. If the changes also affect infrastructure resources managed by AWS CloudFormation, the pipeline action might have to create a stack or update an existing one through the use of a change set. In addition, if an update is required, the pipeline action might enforce a safety policy to infrastructure resources that prevents the deletion and replacement of resources. You can do this by creating a change set and having the pipeline action inspect its changes before updating the stack. Change sets that do not conform to the policy are deleted.

This use case is a good illustration of workflow-driven pipeline actions. These are actions that run multiple tasks, deal with async behavior and loops, need to maintain and propagate state, and fork into different execution paths. Implementing workflow-driven actions directly in CodePipeline can lead to complex pipelines that are hard for developers to understand and maintain. Ideally, a pipeline action should perform a single task and delegate the complexity of dealing with workflow-driven behavior associated with that task to a state machine engine. This would make it possible for developers to build simpler, more intuitive pipelines and allow them to use state machine execution logs to visualize and troubleshoot their pipeline actions.

In this blog post, we discuss how AWS Step Functions state machines can be used to handle workflow-driven actions. We show how a CodePipeline action can trigger a Step Functions state machine and how the pipeline and the state machine are kept decoupled through a Lambda function. The advantages of using state machines include:

  • Simplified logic (complex tasks are broken into multiple smaller tasks).
  • Ease of handling asynchronous behavior (through state machine wait states).
  • Built-in support for choices and processing different execution paths (through state machine choices).
  • Built-in visualization and logging of the state machine execution.

The source code for the sample pipeline, pipeline actions, and state machine used in this post is available at https://github.com/awslabs/aws-codepipeline-stepfunctions.

Overview

This figure shows the components in the CodePipeline-Step Functions integration that will be described in this post. The pipeline contains two stages: a Source stage represented by a CodeCommit Git repository and a Prod stage with a single Deploy action that represents the workflow-driven action.

This action invokes a Lambda function (1) called the State Machine Trigger Lambda, which, in turn, triggers a Step Function state machine to process the request (2). The Lambda function sends a continuation token back to the pipeline (3) to continue its execution later and terminates. Seconds later, the pipeline invokes the Lambda function again (4), passing the continuation token received. The Lambda function checks the execution state of the state machine (5,6) and communicates the status to the pipeline. The process is repeated until the state machine execution is complete. Then the Lambda function notifies the pipeline that the corresponding pipeline action is complete (7). If the state machine has failed, the Lambda function will then fail the pipeline action and stop its execution (7). While running, the state machine triggers various Lambda functions to perform different tasks. The state machine and the pipeline are fully decoupled. Their interaction is handled by the Lambda function.

The Deploy State Machine

The sample state machine used in this post is a simplified version of the use case, with emphasis on infrastructure deployment. The state machine will follow distinct execution paths and thus have different outcomes, depending on:

  • The current state of the AWS CloudFormation stack.
  • The nature of the code changes made to the AWS CloudFormation template and pushed into the pipeline.

If the stack does not exist, it will be created. If the stack exists, a change set will be created and its resources inspected by the state machine. The inspection consists of parsing the change set results and detecting whether any resources will be deleted or replaced. If no resources are being deleted or replaced, the change set is allowed to be executed and the state machine completes successfully. Otherwise, the change set is deleted and the state machine completes execution with a failure as the terminal state.

Let’s dive into each of these execution paths.

Path 1: Create a Stack and Succeed Deployment

The Deploy state machine is shown here. It is triggered by the Lambda function using the following input parameters stored in an S3 bucket.

Create New Stack Execution Path

{
    "environmentName": "prod",
    "stackName": "sample-lambda-app",
    "templatePath": "infra/Lambda-template.yaml",
    "revisionS3Bucket": "codepipeline-us-east-1-418586629775",
    "revisionS3Key": "StepFunctionsDrivenD/CodeCommit/sjcmExZ"
}

Note that some values used here are for the use case example only. Account-specific parameters like revisionS3Bucket and revisionS3Key will be different when you deploy this use case in your account.

These input parameters are used by various states in the state machine and passed to the corresponding Lambda functions to perform different tasks. For example, stackName is used to create a stack, check the status of stack creation, and create a change set. The environmentName represents the environment (for example, dev, test, prod) to which the code is being deployed. It is used to prefix the name of stacks and change sets.

With the exception of built-in states such as wait and choice, each state in the state machine invokes a specific Lambda function.  The results received from the Lambda invocations are appended to the state machine’s original input. When the state machine finishes its execution, several parameters will have been added to its original input.

The first stage in the state machine is “Check Stack Existence”. It checks whether a stack with the input name specified in the stackName input parameter already exists. The output of the state adds a Boolean value called doesStackExist to the original state machine input as follows:

{
  "doesStackExist": true,
  "environmentName": "prod",
  "stackName": "sample-lambda-app",
  "templatePath": "infra/lambda-template.yaml",
  "revisionS3Bucket": "codepipeline-us-east-1-418586629775",
  "revisionS3Key": "StepFunctionsDrivenD/CodeCommit/sjcmExZ",
}

The following stage, “Does Stack Exist?”, is represented by Step Functions built-in choice state. It checks the value of doesStackExist to determine whether a new stack needs to be created (doesStackExist=true) or a change set needs to be created and inspected (doesStackExist=false).

If the stack does not exist, the states illustrated in green in the preceding figure are executed. This execution path creates the stack, waits until the stack is created, checks the status of the stack’s creation, and marks the deployment successful after the stack has been created. Except for “Stack Created?” and “Wait Stack Creation,” each of these stages invokes a Lambda function. “Stack Created?” and “Wait Stack Creation” are implemented by using the built-in choice state (to decide which path to follow) and the wait state (to wait a few seconds before proceeding), respectively. Each stage adds the results of their Lambda function executions to the initial input of the state machine, allowing future stages to process them.

Path 2: Safely Update a Stack and Mark Deployment as Successful

Safely Update a Stack and Mark Deployment as Successful Execution Path

If the stack indicated by the stackName parameter already exists, a different path is executed. (See the green states in the figure.) This path will create a change set and use wait and choice states to wait until the change set is created. Afterwards, a stage in the execution path will inspect  the resources affected before the change set is executed.

The inspection procedure represented by the “Inspect Change Set Changes” stage consists of parsing the resources affected by the change set and checking whether any of the existing resources are being deleted or replaced. The following is an excerpt of the algorithm, where changeSetChanges.Changes is the object representing the change set changes:

...
var RESOURCES_BEING_DELETED_OR_REPLACED = "RESOURCES-BEING-DELETED-OR-REPLACED";
var CAN_SAFELY_UPDATE_EXISTING_STACK = "CAN-SAFELY-UPDATE-EXISTING-STACK";
for (var i = 0; i < changeSetChanges.Changes.length; i++) {
    var change = changeSetChanges.Changes[i];
    if (change.Type == "Resource") {
        if (change.ResourceChange.Action == "Delete") {
            return RESOURCES_BEING_DELETED_OR_REPLACED;
        }
        if (change.ResourceChange.Action == "Modify") {
            if (change.ResourceChange.Replacement == "True") {
                return RESOURCES_BEING_DELETED_OR_REPLACED;
            }
        }
    }
}
return CAN_SAFELY_UPDATE_EXISTING_STACK;

The algorithm returns different values to indicate whether the change set can be safely executed (CAN_SAFELY_UPDATE_EXISTING_STACK or RESOURCES_BEING_DELETED_OR_REPLACED). This value is used later by the state machine to decide whether to execute the change set and update the stack or interrupt the deployment.

The output of the “Inspect Change Set” stage is shown here.

{
  "environmentName": "prod",
  "stackName": "sample-lambda-app",
  "templatePath": "infra/lambda-template.yaml",
  "revisionS3Bucket": "codepipeline-us-east-1-418586629775",
  "revisionS3Key": "StepFunctionsDrivenD/CodeCommit/sjcmExZ",
  "doesStackExist": true,
  "changeSetName": "prod-sample-lambda-app-change-set-545",
  "changeSetCreationStatus": "complete",
  "changeSetAction": "CAN-SAFELY-UPDATE-EXISTING-STACK"
}

At this point, these parameters have been added to the state machine’s original input:

  • changeSetName, which is added by the “Create Change Set” state.
  • changeSetCreationStatus, which is added by the “Get Change Set Creation Status” state.
  • changeSetAction, which is added by the “Inspect Change Set Changes” state.

The “Safe to Update Infra?” step is a choice state (its JSON spec follows) that simply checks the value of the changeSetAction parameter. If the value is equal to “CAN-SAFELY-UPDATE-EXISTING-STACK“, meaning that no resources will be deleted or replaced, the step will execute the change set by proceeding to the “Execute Change Set” state. The deployment is successful (the state machine completes its execution successfully).

"Safe to Update Infra?": {
      "Type": "Choice",
      "Choices": [
        {
          "Variable": "$.taskParams.changeSetAction",
          "StringEquals": "CAN-SAFELY-UPDATE-EXISTING-STACK",
          "Next": "Execute Change Set"
        }
      ],
      "Default": "Deployment Failed"
 }

Path 3: Reject Stack Update and Fail Deployment

Reject Stack Update and Fail Deployment Execution Path

If the changeSetAction parameter is different from “CAN-SAFELY-UPDATE-EXISTING-STACK“, the state machine will interrupt the deployment by deleting the change set and proceeding to the “Deployment Fail” step, which is a built-in Fail state. (Its JSON spec follows.) This state causes the state machine to stop in a failed state and serves to indicate to the Lambda function that the pipeline deployment should be interrupted in a fail state as well.

 "Deployment Failed": {
      "Type": "Fail",
      "Cause": "Deployment Failed",
      "Error": "Deployment Failed"
    }

In all three scenarios, there’s a state machine’s visual representation available in the AWS Step Functions console that makes it very easy for developers to identify what tasks have been executed or why a deployment has failed. Developers can also inspect the inputs and outputs of each state and look at the state machine Lambda function’s logs for details. Meanwhile, the corresponding CodePipeline action remains very simple and intuitive for developers who only need to know whether the deployment was successful or failed.

The State Machine Trigger Lambda Function

The Trigger Lambda function is invoked directly by the Deploy action in CodePipeline. The CodePipeline action must pass a JSON structure to the trigger function through the UserParameters attribute, as follows:

{
  "s3Bucket": "codepipeline-StepFunctions-sample",
  "stateMachineFile": "state_machine_input.json"
}

The s3Bucket parameter specifies the S3 bucket location for the state machine input parameters file. The stateMachineFile parameter specifies the file holding the input parameters. By being able to specify different input parameters to the state machine, we make the Trigger Lambda function and the state machine reusable across environments. For example, the same state machine could be called from a test and prod pipeline action by specifying a different S3 bucket or state machine input file for each environment.

The Trigger Lambda function performs two main tasks: triggering the state machine and checking the execution state of the state machine. Its core logic is shown here:

exports.index = function (event, context, callback) {
    try {
        console.log("Event: " + JSON.stringify(event));
        console.log("Context: " + JSON.stringify(context));
        console.log("Environment Variables: " + JSON.stringify(process.env));
        if (Util.isContinuingPipelineTask(event)) {
            monitorStateMachineExecution(event, context, callback);
        }
        else {
            triggerStateMachine(event, context, callback);
        }
    }
    catch (err) {
        failure(Util.jobId(event), callback, context.invokeid, err.message);
    }
}

Util.isContinuingPipelineTask(event) is a utility function that checks if the Trigger Lambda function is being called for the first time (that is, no continuation token is passed by CodePipeline) or as a continuation of a previous call. In its first execution, the Lambda function will trigger the state machine and send a continuation token to CodePipeline that contains the state machine execution ARN. The state machine ARN is exposed to the Lambda function through a Lambda environment variable called stateMachineArn. Here is the code that triggers the state machine:

function triggerStateMachine(event, context, callback) {
    var stateMachineArn = process.env.stateMachineArn;
    var s3Bucket = Util.actionUserParameter(event, "s3Bucket");
    var stateMachineFile = Util.actionUserParameter(event, "stateMachineFile");
    getStateMachineInputData(s3Bucket, stateMachineFile)
        .then(function (data) {
            var initialParameters = data.Body.toString();
            var stateMachineInputJSON = createStateMachineInitialInput(initialParameters, event);
            console.log("State machine input JSON: " + JSON.stringify(stateMachineInputJSON));
            return stateMachineInputJSON;
        })
        .then(function (stateMachineInputJSON) {
            return triggerStateMachineExecution(stateMachineArn, stateMachineInputJSON);
        })
        .then(function (triggerStateMachineOutput) {
            var continuationToken = { "stateMachineExecutionArn": triggerStateMachineOutput.executionArn };
            var message = "State machine has been triggered: " + JSON.stringify(triggerStateMachineOutput) + ", continuationToken: " + JSON.stringify(continuationToken);
            return continueExecution(Util.jobId(event), continuationToken, callback, message);
        })
        .catch(function (err) {
            console.log("Error triggering state machine: " + stateMachineArn + ", Error: " + err.message);
            failure(Util.jobId(event), callback, context.invokeid, err.message);
        })
}

The Trigger Lambda function fetches the state machine input parameters from an S3 file, triggers the execution of the state machine using the input parameters and the stateMachineArn environment variable, and signals to CodePipeline that the execution should continue later by passing a continuation token that contains the state machine execution ARN. In case any of these operations fail and an exception is thrown, the Trigger Lambda function will fail the pipeline immediately by signaling a pipeline failure through the putJobFailureResult CodePipeline API.

If the Lambda function is continuing a previous execution, it will extract the state machine execution ARN from the continuation token and check the status of the state machine, as shown here.

function monitorStateMachineExecution(event, context, callback) {
    var stateMachineArn = process.env.stateMachineArn;
    var continuationToken = JSON.parse(Util.continuationToken(event));
    var stateMachineExecutionArn = continuationToken.stateMachineExecutionArn;
    getStateMachineExecutionStatus(stateMachineExecutionArn)
        .then(function (response) {
            if (response.status === "RUNNING") {
                var message = "Execution: " + stateMachineExecutionArn + " of state machine: " + stateMachineArn + " is still " + response.status;
                return continueExecution(Util.jobId(event), continuationToken, callback, message);
            }
            if (response.status === "SUCCEEDED") {
                var message = "Execution: " + stateMachineExecutionArn + " of state machine: " + stateMachineArn + " has: " + response.status;
                return success(Util.jobId(event), callback, message);
            }
            // FAILED, TIMED_OUT, ABORTED
            var message = "Execution: " + stateMachineExecutionArn + " of state machine: " + stateMachineArn + " has: " + response.status;
            return failure(Util.jobId(event), callback, context.invokeid, message);
        })
        .catch(function (err) {
            var message = "Error monitoring execution: " + stateMachineExecutionArn + " of state machine: " + stateMachineArn + ", Error: " + err.message;
            failure(Util.jobId(event), callback, context.invokeid, message);
        });
}

If the state machine is in the RUNNING state, the Lambda function will send the continuation token back to the CodePipeline action. This will cause CodePipeline to call the Lambda function again a few seconds later. If the state machine has SUCCEEDED, then the Lambda function will notify the CodePipeline action that the action has succeeded. In any other case (FAILURE, TIMED-OUT, or ABORT), the Lambda function will fail the pipeline action.

This behavior is especially useful for developers who are building and debugging a new state machine because a bug in the state machine can potentially leave the pipeline action hanging for long periods of time until it times out. The Trigger Lambda function prevents this.

Also, by having the Trigger Lambda function as a means to decouple the pipeline and state machine, we make the state machine more reusable. It can be triggered from anywhere, not just from a CodePipeline action.

The Pipeline in CodePipeline

Our sample pipeline contains two simple stages: the Source stage represented by a CodeCommit Git repository and the Prod stage, which contains the Deploy action that invokes the Trigger Lambda function. When the state machine decides that the change set created must be rejected (because it replaces or deletes some the existing production resources), it fails the pipeline without performing any updates to the existing infrastructure. (See the failed Deploy action in red.) Otherwise, the pipeline action succeeds, indicating that the existing provisioned infrastructure was either created (first run) or updated without impacting any resources. (See the green Deploy stage in the pipeline on the left.)

The Pipeline in CodePipeline

The JSON spec for the pipeline’s Prod stage is shown here. We use the UserParameters attribute to pass the S3 bucket and state machine input file to the Lambda function. These parameters are action-specific, which means that we can reuse the state machine in another pipeline action.

{
  "name": "Prod",
  "actions": [
      {
          "inputArtifacts": [
              {
                  "name": "CodeCommitOutput"
              }
          ],
          "name": "Deploy",
          "actionTypeId": {
              "category": "Invoke",
              "owner": "AWS",
              "version": "1",
              "provider": "Lambda"
          },
          "outputArtifacts": [],
          "configuration": {
              "FunctionName": "StateMachineTriggerLambda",
              "UserParameters": "{\"s3Bucket\": \"codepipeline-StepFunctions-sample\", \"stateMachineFile\": \"state_machine_input.json\"}"
          },
          "runOrder": 1
      }
  ]
}

Conclusion

In this blog post, we discussed how state machines in AWS Step Functions can be used to handle workflow-driven actions. We showed how a Lambda function can be used to fully decouple the pipeline and the state machine and manage their interaction. The use of a state machine greatly simplified the associated CodePipeline action, allowing us to build a much simpler and cleaner pipeline while drilling down into the state machine’s execution for troubleshooting or debugging.

Here are two exercises you can complete by using the source code.

Exercise #1: Do not fail the state machine and pipeline action after inspecting a change set that deletes or replaces resources. Instead, create a stack with a different name (think of blue/green deployments). You can do this by creating a state machine transition between the “Safe to Update Infra?” and “Create Stack” stages and passing a new stack name as input to the “Create Stack” stage.

Exercise #2: Add wait logic to the state machine to wait until the change set completes its execution before allowing the state machine to proceed to the “Deployment Succeeded” stage. Use the stack creation case as an example. You’ll have to create a Lambda function (similar to the Lambda function that checks the creation status of a stack) to get the creation status of the change set.

Have fun and share your thoughts!

About the Author

Marcilio Mendonca is a Sr. Consultant in the Canadian Professional Services Team at Amazon Web Services. He has helped AWS customers design, build, and deploy best-in-class, cloud-native AWS applications using VMs, containers, and serverless architectures. Before he joined AWS, Marcilio was a Software Development Engineer at Amazon. Marcilio also holds a Ph.D. in Computer Science. In his spare time, he enjoys playing drums, riding his motorcycle in the Toronto GTA area, and spending quality time with his family.

Implementing Default Directory Indexes in Amazon S3-backed Amazon CloudFront Origins Using [email protected]

Post Syndicated from Ronnie Eichler original https://aws.amazon.com/blogs/compute/implementing-default-directory-indexes-in-amazon-s3-backed-amazon-cloudfront-origins-using-lambdaedge/

With the recent launch of [email protected], it’s now possible for you to provide even more robust functionality to your static websites. Amazon CloudFront is a content distribution network service. In this post, I show how you can use [email protected] along with the CloudFront origin access identity (OAI) for Amazon S3 and still provide simple URLs (such as www.example.com/about/ instead of www.example.com/about/index.html).

Background

Amazon S3 is a great platform for hosting a static website. You don’t need to worry about managing servers or underlying infrastructure—you just publish your static to content to an S3 bucket. S3 provides a DNS name such as <bucket-name>.s3-website-<AWS-region>.amazonaws.com. Use this name for your website by creating a CNAME record in your domain’s DNS environment (or Amazon Route 53) as follows:

www.example.com -> <bucket-name>.s3-website-<AWS-region>.amazonaws.com

You can also put CloudFront in front of S3 to further scale the performance of your site and cache the content closer to your users. CloudFront can enable HTTPS-hosted sites, by either using a custom Secure Sockets Layer (SSL) certificate or a managed certificate from AWS Certificate Manager. In addition, CloudFront also offers integration with AWS WAF, a web application firewall. As you can see, it’s possible to achieve some robust functionality by using S3, CloudFront, and other managed services and not have to worry about maintaining underlying infrastructure.

One of the key concerns that you might have when implementing any type of WAF or CDN is that you want to force your users to go through the CDN. If you implement CloudFront in front of S3, you can achieve this by using an OAI. However, in order to do this, you cannot use the HTTP endpoint that is exposed by S3’s static website hosting feature. Instead, CloudFront must use the S3 REST endpoint to fetch content from your origin so that the request can be authenticated using the OAI. This presents some challenges in that the REST endpoint does not support redirection to a default index page.

CloudFront does allow you to specify a default root object (index.html), but it only works on the root of the website (such as http://www.example.com > http://www.example.com/index.html). It does not work on any subdirectory (such as http://www.example.com/about/). If you were to attempt to request this URL through CloudFront, CloudFront would do a S3 GetObject API call against a key that does not exist.

Of course, it is a bad user experience to expect users to always type index.html at the end of every URL (or even know that it should be there). Until now, there has not been an easy way to provide these simpler URLs (equivalent to the DirectoryIndex Directive in an Apache Web Server configuration) to users through CloudFront. Not if you still want to be able to restrict access to the S3 origin using an OAI. However, with the release of [email protected], you can use a JavaScript function running on the CloudFront edge nodes to look for these patterns and request the appropriate object key from the S3 origin.

Solution

In this example, you use the compute power at the CloudFront edge to inspect the request as it’s coming in from the client. Then re-write the request so that CloudFront requests a default index object (index.html in this case) for any request URI that ends in ‘/’.

When a request is made against a web server, the client specifies the object to obtain in the request. You can use this URI and apply a regular expression to it so that these URIs get resolved to a default index object before CloudFront requests the object from the origin. Use the following code:

'use strict';
exports.handler = (event, context, callback) => {
    
    // Extract the request from the CloudFront event that is sent to [email protected] 
    var request = event.Records[0].cf.request;

    // Extract the URI from the request
    var olduri = request.uri;

    // Match any '/' that occurs at the end of a URI. Replace it with a default index
    var newuri = olduri.replace(/\/$/, '\/index.html');
    
    // Log the URI as received by CloudFront and the new URI to be used to fetch from origin
    console.log("Old URI: " + olduri);
    console.log("New URI: " + newuri);
    
    // Replace the received URI with the URI that includes the index page
    request.uri = newuri;
    
    // Return to CloudFront
    return callback(null, request);

};

To get started, create an S3 bucket to be the origin for CloudFront:

Create bucket

On the other screens, you can just accept the defaults for the purposes of this walkthrough. If this were a production implementation, I would recommend enabling bucket logging and specifying an existing S3 bucket as the destination for access logs. These logs can be useful if you need to troubleshoot issues with your S3 access.

Now, put some content into your S3 bucket. For this walkthrough, create two simple webpages to demonstrate the functionality:  A page that resides at the website root, and another that is in a subdirectory.

<s3bucketname>/index.html

<!doctype html>
<html>
    <head>
        <meta charset="utf-8">
        <title>Root home page</title>
    </head>
    <body>
        <p>Hello, this page resides in the root directory.</p>
    </body>
</html>

<s3bucketname>/subdirectory/index.html

<!doctype html>
<html>
    <head>
        <meta charset="utf-8">
        <title>Subdirectory home page</title>
    </head>
    <body>
        <p>Hello, this page resides in the /subdirectory/ directory.</p>
    </body>
</html>

When uploading the files into S3, you can accept the defaults. You add a bucket policy as part of the CloudFront distribution creation that allows CloudFront to access the S3 origin. You should now have an S3 bucket that looks like the following:

Root of bucket

Subdirectory in bucket

Next, create a CloudFront distribution that your users will use to access the content. Open the CloudFront console, and choose Create Distribution. For Select a delivery method for your content, under Web, choose Get Started.

On the next screen, you set up the distribution. Below are the options to configure:

  • Origin Domain Name:  Select the S3 bucket that you created earlier.
  • Restrict Bucket Access: Choose Yes.
  • Origin Access Identity: Create a new identity.
  • Grant Read Permissions on Bucket: Choose Yes, Update Bucket Policy.
  • Object Caching: Choose Customize (I am changing the behavior to avoid having CloudFront cache objects, as this could affect your ability to troubleshoot while implementing the Lambda code).
    • Minimum TTL: 0
    • Maximum TTL: 0
    • Default TTL: 0

You can accept all of the other defaults. Again, this is a proof-of-concept exercise. After you are comfortable that the CloudFront distribution is working properly with the origin and Lambda code, you can re-visit the preceding values and make changes before implementing it in production.

CloudFront distributions can take several minutes to deploy (because the changes have to propagate out to all of the edge locations). After that’s done, test the functionality of the S3-backed static website. Looking at the distribution, you can see that CloudFront assigns a domain name:

CloudFront Distribution Settings

Try to access the website using a combination of various URLs:

http://<domainname>/:  Works

› curl -v http://d3gt20ea1hllb.cloudfront.net/
*   Trying 54.192.192.214...
* TCP_NODELAY set
* Connected to d3gt20ea1hllb.cloudfront.net (54.192.192.214) port 80 (#0)
> GET / HTTP/1.1
> Host: d3gt20ea1hllb.cloudfront.net
> User-Agent: curl/7.51.0
> Accept: */*
>
< HTTP/1.1 200 OK
< ETag: "cb7e2634fe66c1fd395cf868087dd3b9"
< Accept-Ranges: bytes
< Server: AmazonS3
< X-Cache: Miss from cloudfront
< X-Amz-Cf-Id: -D2FSRwzfcwyKZKFZr6DqYFkIf4t7HdGw2MkUF5sE6YFDxRJgi0R1g==
< Content-Length: 209
< Content-Type: text/html
< Last-Modified: Wed, 19 Jul 2017 19:21:16 GMT
< Via: 1.1 6419ba8f3bd94b651d416054d9416f1e.cloudfront.net (CloudFront), 1.1 iad6-proxy-3.amazon.com:80 (Cisco-WSA/9.1.2-010)
< Connection: keep-alive
<
<!doctype html>
<html>
    <head>
        <meta charset="utf-8">
        <title>Root home page</title>
    </head>
    <body>
        <p>Hello, this page resides in the root directory.</p>
    </body>
</html>
* Curl_http_done: called premature == 0
* Connection #0 to host d3gt20ea1hllb.cloudfront.net left intact

This is because CloudFront is configured to request a default root object (index.html) from the origin.

http://<domainname>/subdirectory/:  Doesn’t work

› curl -v http://d3gt20ea1hllb.cloudfront.net/subdirectory/
*   Trying 54.192.192.214...
* TCP_NODELAY set
* Connected to d3gt20ea1hllb.cloudfront.net (54.192.192.214) port 80 (#0)
> GET /subdirectory/ HTTP/1.1
> Host: d3gt20ea1hllb.cloudfront.net
> User-Agent: curl/7.51.0
> Accept: */*
>
< HTTP/1.1 200 OK
< ETag: "d41d8cd98f00b204e9800998ecf8427e"
< x-amz-server-side-encryption: AES256
< Accept-Ranges: bytes
< Server: AmazonS3
< X-Cache: Miss from cloudfront
< X-Amz-Cf-Id: Iqf0Gy8hJLiW-9tOAdSFPkL7vCWBrgm3-1ly5tBeY_izU82ftipodA==
< Content-Length: 0
< Content-Type: application/x-directory
< Last-Modified: Wed, 19 Jul 2017 19:21:24 GMT
< Via: 1.1 6419ba8f3bd94b651d416054d9416f1e.cloudfront.net (CloudFront), 1.1 iad6-proxy-3.amazon.com:80 (Cisco-WSA/9.1.2-010)
< Connection: keep-alive
<
* Curl_http_done: called premature == 0
* Connection #0 to host d3gt20ea1hllb.cloudfront.net left intact

If you use a tool such like cURL to test this, you notice that CloudFront and S3 are returning a blank response. The reason for this is that the subdirectory does exist, but it does not resolve to an S3 object. Keep in mind that S3 is an object store, so there are no real directories. User interfaces such as the S3 console present a hierarchical view of a bucket with folders based on the presence of forward slashes, but behind the scenes the bucket is just a collection of keys that represent stored objects.

http://<domainname>/subdirectory/index.html:  Works

› curl -v http://d3gt20ea1hllb.cloudfront.net/subdirectory/index.html
*   Trying 54.192.192.130...
* TCP_NODELAY set
* Connected to d3gt20ea1hllb.cloudfront.net (54.192.192.130) port 80 (#0)
> GET /subdirectory/index.html HTTP/1.1
> Host: d3gt20ea1hllb.cloudfront.net
> User-Agent: curl/7.51.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Thu, 20 Jul 2017 20:35:15 GMT
< ETag: "ddf87c487acf7cef9d50418f0f8f8dae"
< Accept-Ranges: bytes
< Server: AmazonS3
< X-Cache: RefreshHit from cloudfront
< X-Amz-Cf-Id: bkh6opXdpw8pUomqG3Qr3UcjnZL8axxOH82Lh0OOcx48uJKc_Dc3Cg==
< Content-Length: 227
< Content-Type: text/html
< Last-Modified: Wed, 19 Jul 2017 19:21:45 GMT
< Via: 1.1 3f2788d309d30f41de96da6f931d4ede.cloudfront.net (CloudFront), 1.1 iad6-proxy-3.amazon.com:80 (Cisco-WSA/9.1.2-010)
< Connection: keep-alive
<
<!doctype html>
<html>
    <head>
        <meta charset="utf-8">
        <title>Subdirectory home page</title>
    </head>
    <body>
        <p>Hello, this page resides in the /subdirectory/ directory.</p>
    </body>
</html>
* Curl_http_done: called premature == 0
* Connection #0 to host d3gt20ea1hllb.cloudfront.net left intact

This request works as expected because you are referencing the object directly. Now, you implement the [email protected] function to return the default index.html page for any subdirectory. Looking at the example JavaScript code, here’s where the magic happens:

var newuri = olduri.replace(/\/$/, '\/index.html');

You are going to use a JavaScript regular expression to match any ‘/’ that occurs at the end of the URI and replace it with ‘/index.html’. This is the equivalent to what S3 does on its own with static website hosting. However, as I mentioned earlier, you can’t rely on this if you want to use a policy on the bucket to restrict it so that users must access the bucket through CloudFront. That way, all requests to the S3 bucket must be authenticated using the S3 REST API. Because of this, you implement a [email protected] function that takes any client request ending in ‘/’ and append a default ‘index.html’ to the request before requesting the object from the origin.

In the Lambda console, choose Create function. On the next screen, skip the blueprint selection and choose Author from scratch, as you’ll use the sample code provided.

Next, configure the trigger. Choosing the empty box shows a list of available triggers. Choose CloudFront and select your CloudFront distribution ID (created earlier). For this example, leave Cache Behavior as * and CloudFront Event as Origin Request. Select the Enable trigger and replicate box and choose Next.

Lambda Trigger

Next, give the function a name and a description. Then, copy and paste the following code:

'use strict';
exports.handler = (event, context, callback) => {
    
    // Extract the request from the CloudFront event that is sent to [email protected] 
    var request = event.Records[0].cf.request;

    // Extract the URI from the request
    var olduri = request.uri;

    // Match any '/' that occurs at the end of a URI. Replace it with a default index
    var newuri = olduri.replace(/\/$/, '\/index.html');
    
    // Log the URI as received by CloudFront and the new URI to be used to fetch from origin
    console.log("Old URI: " + olduri);
    console.log("New URI: " + newuri);
    
    // Replace the received URI with the URI that includes the index page
    request.uri = newuri;
    
    // Return to CloudFront
    return callback(null, request);

};

Next, define a role that grants permissions to the Lambda function. For this example, choose Create new role from template, Basic Edge Lambda permissions. This creates a new IAM role for the Lambda function and grants the following permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": [
                "arn:aws:logs:*:*:*"
            ]
        }
    ]
}

In a nutshell, these are the permissions that the function needs to create the necessary CloudWatch log group and log stream, and to put the log events so that the function is able to write logs when it executes.

After the function has been created, you can go back to the browser (or cURL) and re-run the test for the subdirectory request that failed previously:

› curl -v http://d3gt20ea1hllb.cloudfront.net/subdirectory/
*   Trying 54.192.192.202...
* TCP_NODELAY set
* Connected to d3gt20ea1hllb.cloudfront.net (54.192.192.202) port 80 (#0)
> GET /subdirectory/ HTTP/1.1
> Host: d3gt20ea1hllb.cloudfront.net
> User-Agent: curl/7.51.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Thu, 20 Jul 2017 21:18:44 GMT
< ETag: "ddf87c487acf7cef9d50418f0f8f8dae"
< Accept-Ranges: bytes
< Server: AmazonS3
< X-Cache: Miss from cloudfront
< X-Amz-Cf-Id: rwFN7yHE70bT9xckBpceTsAPcmaadqWB9omPBv2P6WkIfQqdjTk_4w==
< Content-Length: 227
< Content-Type: text/html
< Last-Modified: Wed, 19 Jul 2017 19:21:45 GMT
< Via: 1.1 3572de112011f1b625bb77410b0c5cca.cloudfront.net (CloudFront), 1.1 iad6-proxy-3.amazon.com:80 (Cisco-WSA/9.1.2-010)
< Connection: keep-alive
<
<!doctype html>
<html>
    <head>
        <meta charset="utf-8">
        <title>Subdirectory home page</title>
    </head>
    <body>
        <p>Hello, this page resides in the /subdirectory/ directory.</p>
    </body>
</html>
* Curl_http_done: called premature == 0
* Connection #0 to host d3gt20ea1hllb.cloudfront.net left intact

You have now configured a way for CloudFront to return a default index page for subdirectories in S3!

Summary

In this post, you used [email protected] to be able to use CloudFront with an S3 origin access identity and serve a default root object on subdirectory URLs. To find out some more about this use-case, see [email protected] integration with CloudFront in our documentation.

If you have questions or suggestions, feel free to comment below. For troubleshooting or implementation help, check out the Lambda forum.

How to Compete with Giants

Post Syndicated from Gleb Budman original https://www.backblaze.com/blog/how-to-compete-with-giants/

How to Compete with Giants

This post by Backblaze’s CEO and co-founder Gleb Budman is the sixth in a series about entrepreneurship. You can choose posts in the series from the list below:

  1. How Backblaze got Started: The Problem, The Solution, and the Stuff In-Between
  2. Building a Competitive Moat: Turning Challenges Into Advantages
  3. From Idea to Launch: Getting Your First Customers
  4. How to Get Your First 1,000 Customers
  5. Surviving Your First Year
  6. How to Compete with Giants

Use the Join button above to receive notification of new posts in this series.

Perhaps your business is competing in a brand new space free from established competitors. Most of us, though, start companies that compete with existing offerings from large, established companies. You need to come up with a better mousetrap — not the first mousetrap.

That’s the challenge Backblaze faced. In this post, I’d like to share some of the lessons I learned from that experience.

Backblaze vs. Giants

Competing with established companies that are orders of magnitude larger can be daunting. How can you succeed?

I’ll set the stage by offering a few sets of giants we compete with:

  • When we started Backblaze, we offered online backup in a market where companies had been offering “online backup” for at least a decade, and even the newer entrants had raised tens of millions of dollars.
  • When we built our storage servers, the alternatives were EMC, NetApp, and Dell — each of which had a market cap of over $10 billion.
  • When we introduced our cloud storage offering, B2, our direct competitors were Amazon, Google, and Microsoft. You might have heard of them.

What did we learn by competing with these giants on a bootstrapped budget? Let’s take a look.

Determine What Success Means

For a long time Apple considered Apple TV to be a hobby, not a real product worth focusing on, because it did not generate a billion in revenue. For a $10 billion per year revenue company, a new business that generates $50 million won’t move the needle and often isn’t worth putting focus on. However, for a startup, getting to $50 million in revenue can be the start of a wildly successful business.

Lesson Learned: Don’t let the giants set your success metrics.

The Advantages Startups Have

The giants have a lot of advantages: more money, people, scale, resources, access, etc. Following their playbook and attacking head-on means you’re simply outgunned. Common paths to failure are trying to build more features, enter more markets, outspend on marketing, and other similar approaches where scale and resources are the primary determinants of success.

But being a startup affords many advantages most giants would salivate over. As a nimble startup you can leverage those to succeed. Let’s breakdown nine competitive advantages we’ve used that you can too.

1. Drive Focus

It’s hard to build a $10 billion revenue business doing just one thing, and most giants have a broad portfolio of businesses, numerous products for each, and targeting a variety of customer segments in multiple markets. That adds complexity and distributes management attention.

Startups get the benefit of having everyone in the company be extremely focused, often on a singular mission, product, customer segment, and market. While our competitors sell everything from advertising to Zantac, and are investing in groceries and shipping, Backblaze has focused exclusively on cloud storage. This means all of our best people (i.e. everyone) is focused on our cloud storage business. Where is all of your focus going?

Lesson Learned: Align everyone in your company to a singular focus to dramatically out-perform larger teams.

2. Use Lack-of-Scale as an Advantage

You may have heard Paul Graham say “Do things that don’t scale.” There are a host of things you can do specifically because you don’t have the same scale as the giants. Use that as an advantage.

When we look for data center space, we have more options than our largest competitors because there are simply more spaces available with room for 100 cabinets than for 1,000 cabinets. With some searching, we can find data center space that is better/cheaper.

When a flood in Thailand destroyed factories, causing the world’s supply of hard drives to plummet and prices to triple, we started drive farming. The giants certainly couldn’t. It was a bit crazy, but it let us keep prices unchanged for our customers.

Our Chief Cloud Officer, Tim, used to work at Adobe. Because of their size, any new product needed to always launch in a multitude of languages and in global markets. Once launched, they had scale. But getting any new product launched was incredibly challenging.

Lesson Learned: Use lack-of-scale to exploit opportunities that are closed to giants.

3. Build a Better Product

This one is probably obvious. If you’re going to provide the same product, at the same price, to the same customers — why do it? Remember that better does not always mean more features. Here’s one way we built a better product that didn’t require being a bigger company.

All online backup services required customers to choose what to include in their backup. We found that this was complicated for users since they often didn’t know what needed to be backed up. We flipped the model to back up everything and allow users to exclude if they wanted to, but it was not required. This reduced the number of features/options, while making it easier and better for the user.

This didn’t require the resources of a huge company; it just required understanding customers a bit deeper and thinking about the solution differently. Building a better product is the most classic startup competitive advantage.

Lesson Learned: Dig deep with your customers to understand and deliver a better mousetrap.

4. Provide Better Service

How can you provide better service? Use your advantages. Escalations from your customer care folks to engineering can go through fewer hoops. Fixing an issue and shipping can be quicker. Access to real answers on Twitter or Facebook can be more effective.

A strategic decision we made was to have all customer support people as full-time employees in our headquarters. This ensures they are in close contact to the whole company for feedback to quickly go both ways.

Having a smaller team and fewer layers enables faster internal communication, which increases customer happiness. And the option to do things that don’t scale — such as help a customer in a unique situation — can go a long way in building customer loyalty.

Lesson Learned: Service your customers better by establishing clear internal communications.

5. Remove The Unnecessary

After determining that the industry standard EMC/NetApp/Dell storage servers would be too expensive to build our own cloud storage upon, we decided to build our own infrastructure. Many said we were crazy to compete with these multi-billion dollar companies and that it would be impossible to build a lower cost storage server. However, not only did it prove to not be impossible — it wasn’t even that hard.

One key trick? Remove the unnecessary. While EMC and others built servers to sell to other companies for a wide variety of use cases, Backblaze needed servers that only Backblaze would run, and for a single use case. As a result we could tailor the servers for our needs by removing redundancy from each server (since we would run redundant servers), and using lower-performance components (since we would get high-performance by running parallel servers).

What do your customers and use cases not need? This can trim costs and complexity while often improving the product for your use case.

Lesson Learned: Don’t think “what can we add” to what the giants offer — think “what can we remove.”

6. Be Easy

How many times have you visited a large company website, particularly one that’s not consumer-focused, only to leave saying, “Huh? I don’t understand what you do.” Keeping your website clear, and your product and pricing simple, will dramatically increase conversion and customer satisfaction. If you’re able to make it 2x easier and thus increasing your conversion by 2x, you’ve just allowed yourself to spend ½ as much acquiring a customer.

Providing unlimited data backup wasn’t specifically about providing more storage — it was about making it easier. Since users didn’t know how much data they needed to back up, charging per gigabyte meant they wouldn’t know the cost. Providing unlimited data backup meant they could just relax.

Customers love easy — and being smaller makes easy easier to deliver. Use that as an advantage in your website, marketing materials, pricing, product, and in every other customer interaction.

Lesson Learned: Ease-of-use isn’t a slogan: it’s a competitive advantage. Treat it as seriously as any other feature of your product

7. Don’t Be Afraid of Risk

Obviously unnecessary risks are unnecessary, and some risks aren’t worth taking. However, large companies that have given guidance to Wall Street with a $0.01 range on their earning-per-share are inherently going to be very risk-averse. Use risk-tolerance to open up opportunities, and adjust your tolerance level as you scale. In your first year, there are likely an infinite number of ways your business may vaporize; don’t be too worried about taking a risk that might have a 20% downside when the upside is hockey stick growth.

Using consumer-grade hard drives in our servers may have caused pain and suffering for us years down-the-line, but they were priced at approximately 50% of enterprise drives. Giants wouldn’t have considered the option. Turns out, the consumer drives performed great for us.

Lesson Learned: Use calculated risks as an advantage.

8. Be Open

The larger a company grows, the more it wants to hide information. Some of this is driven by regulatory requirements as a public company. But most of this is cultural. Sharing something might cause a problem, so let’s not. All external communication is treated as a critical press release, with rounds and rounds of editing by multiple teams and approvals. However, customers are often desperate for information. Moreover, sharing information builds trust, understanding, and advocates.

I started blogging at Backblaze before we launched. When we blogged about our Storage Pod and open-sourced the design, many thought we were crazy to share this information. But it was transformative for us, establishing Backblaze as a tech thought leader in storage and giving people a sense of how we were able to provide our service at such a low cost.

Over the years we’ve developed a culture of being open internally and externally, on our blog and with the press, and in communities such as Hacker News and Reddit. Often we’ve been asked, “why would you share that!?” — but it’s the continual openness that builds trust. And that culture of openness is incredibly challenging for the giants.

Lesson Learned: Overshare to build trust and brand where giants won’t.

9. Be Human

As companies scale, typically a smaller percent of founders and executives interact with customers. The people who build the company become more hidden, the language feels “corporate,” and customers start to feel they’re interacting with the cliche “faceless, nameless corporation.” Use your humanity to your advantage. From day one the Backblaze About page listed all the founders, and my email address. While contacting us shouldn’t be the first path for a customer support question, I wanted it to be clear that we stand behind the service we offer; if we’re doing something wrong — I want to know it.

To scale it’s important to have processes and procedures, but sometimes a situation falls outside of a well-established process. While we want our employees to follow processes, they’re still encouraged to be human and “try to do the right thing.” How to you strike this balance? Simon Sinek gives a good talk about it: make your employees feel safe. If employees feel safe they’ll be human.

If your customer is a consumer, they’ll appreciate being treated as a human. Even if your customer is a corporation, the purchasing decision-makers are still people.

Lesson Learned: Being human is the ultimate antithesis to the faceless corporation.

Build Culture to Sustain Your Advantages at Scale

Presumably the goal is not to always be competing with giants, but to one day become a giant. Does this mean you’ll lose all of these advantages? Some, yes — but not all. Some of these advantages are cultural, and if you build these into the culture from the beginning, and fight to keep them as you scale, you can keep them as you become a giant.

Tesla still comes across as human, with Elon Musk frequently interacting with people on Twitter. Apple continues to provide great service through their Genius Bar. And, worst case, if you lose these at scale, you’ll still have the other advantages of being a giant such as money, people, scale, resources, and access.

Of course, some new startup will be gunning for you with grand ambitions, so just be sure not to get complacent. 😉

The post How to Compete with Giants appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Abandon Proactive Copyright Filters, Huge Coalition Tells EU Heavyweights

Post Syndicated from Andy original https://torrentfreak.com/abandon-proactive-copyright-filters-huge-coalition-tells-eu-heavyweights-171017/

Last September, EU Commission President Jean-Claude Juncker announced plans to modernize copyright law in Europe.

The proposals (pdf) are part of the Digital Single Market reforms, which have been under development for the past several years.

One of the proposals is causing significant concern. Article 13 would require some online service providers to become ‘Internet police’, proactively detecting and filtering allegedly infringing copyright works, uploaded to their platforms by users.

Currently, users are generally able to share whatever they like but should a copyright holder take exception to their upload, mechanisms are available for that content to be taken down. It’s envisioned that proactive filtering, whereby user uploads are routinely scanned and compared to a database of existing protected content, will prevent content becoming available in the first place.

These proposals are of great concern to digital rights groups, who believe that such filters will not only undermine users’ rights but will also place unfair burdens on Internet platforms, many of which will struggle to fund such a program. Yesterday, in the latest wave of opposition to Article 13, a huge coalition of international rights groups came together to underline their concerns.

Headed up by Civil Liberties Union for Europe (Liberties) and European Digital Rights (EDRi), the coalition is formed of dozens of influential groups, including Electronic Frontier Foundation (EFF), Human Rights Watch, Reporters without Borders, and Open Rights Group (ORG), to name just a few.

In an open letter to European Commission President Jean-Claude Juncker, President of the European Parliament Antonio Tajani, President of the European Council Donald Tusk and a string of others, the groups warn that the proposals undermine the trust established between EU member states.

“Fundamental rights, justice and the rule of law are intrinsically linked and constitute
core values on which the EU is founded,” the letter begins.

“Any attempt to disregard these values undermines the mutual trust between member states required for the EU to function. Any such attempt would also undermine the commitments made by the European Union and national governments to their citizens.”

Those citizens, the letter warns, would have their basic rights undermined, should the new proposals be written into EU law.

“Article 13 of the proposal on Copyright in the Digital Single Market include obligations on internet companies that would be impossible to respect without the imposition of excessive restrictions on citizens’ fundamental rights,” it notes.

A major concern is that by placing new obligations on Internet service providers that allow users to upload content – think YouTube, Facebook, Twitter and Instagram – they will be forced to err on the side of caution. Should there be any concern whatsoever that content might be infringing, fair use considerations and exceptions will be abandoned in favor of staying on the right side of the law.

“Article 13 appears to provoke such legal uncertainty that online services will have no other option than to monitor, filter and block EU citizens’ communications if they are to have any chance of staying in business,” the letter warns.

But while the potential problems for service providers and users are numerous, the groups warn that Article 13 could also be illegal since it contradicts case law of the Court of Justice.

According to the E-Commerce Directive, platforms are already required to remove infringing content, once they have been advised it exists. The new proposal, should it go ahead, would force the monitoring of uploads, something which goes against the ‘no general obligation to monitor‘ rules present in the Directive.

“The requirement to install a system for filtering electronic communications has twice been rejected by the Court of Justice, in the cases Scarlet Extended (C70/10) and Netlog/Sabam (C 360/10),” the rights groups warn.

“Therefore, a legislative provision that requires internet companies to install a filtering system would almost certainly be rejected by the Court of Justice because it would contravene the requirement that a fair balance be struck between the right to intellectual property on the one hand, and the freedom to conduct business and the right to freedom of expression, such as to receive or impart information, on the other.”

Specifically, the groups note that the proactive filtering of content would violate freedom of expression set out in Article 11 of the Charter of Fundamental Rights. That being the case, the groups expect national courts to disapply it and the rule to be annulled by the Court of Justice.

The latest protests against Article 13 come in the wake of large-scale objections earlier in the year, voicing similar concerns. However, despite the groups’ fears, they have powerful adversaries, each determined to stop the flood of copyrighted content currently being uploaded to the Internet.

Front and center in support of Article 13 is the music industry and its current hot-topic, the so-called Value Gap(1,2,3). The industry feels that platforms like YouTube are able to avoid paying expensive licensing fees (for music in particular) by exploiting the safe harbor protections of the DMCA and similar legislation.

They believe that proactively filtering uploads would significantly help to diminish this problem, which may very well be the case. But at what cost to the general public and the platforms they rely upon? Citizens and scholars feel that freedoms will be affected and it’s likely the outcry will continue.

The ball is now with the EU, whose members will soon have to make what could be the most important decision in recent copyright history. The rights groups, who are urging for Article 13 to be deleted, are clear where they stand.

The full letter is available here (pdf)

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

PureVPN Explains How it Helped the FBI Catch a Cyberstalker

Post Syndicated from Andy original https://torrentfreak.com/purevpn-explains-how-it-helped-the-fbi-catch-a-cyberstalker-171016/

Early October, Ryan S. Lin, 24, of Newton, Massachusetts, was arrested on suspicion of conducting “an extensive cyberstalking campaign” against a 24-year-old Massachusetts woman, as well as her family members and friends.

The Department of Justice described Lin’s offenses as a “multi-faceted” computer hacking and cyberstalking campaign. Launched in April 2016 when he began hacking into the victim’s online accounts, Lin allegedly obtained personal photographs and sensitive information about her medical and sexual histories and distributed that information to hundreds of other people.

Details of what information the FBI compiled on Lin can be found in our earlier report but aside from his alleged crimes (which are both significant and repugnant), it was PureVPN’s involvement in the case that caused the most controversy.

In a report compiled by an FBI special agent, it was revealed that the Hong Kong-based company’s logs helped the authorities net the alleged criminal.

“Significantly, PureVPN was able to determine that their service was accessed by the same customer from two originating IP addresses: the RCN IP address from the home Lin was living in at the time, and the software company where Lin was employed at the time,” the agent’s affidavit reads.

Among many in the privacy community, this revelation was met with disappointment. On the PureVPN website the company claims to carry no logs and on a general basis, it’s expected that so-called “no-logging” VPN providers should provide people with some anonymity, at least as far as their service goes. Now, several days after the furor, the company has responded to its critics.

In a fairly lengthy statement, the company begins by confirming that it definitely doesn’t log what websites a user views or what content he or she downloads.

“PureVPN did not breach its Privacy Policy and certainly did not breach your trust. NO browsing logs, browsing habits or anything else was, or ever will be shared,” the company writes.

However, that’s only half the problem. While it doesn’t log user activity (what sites people visit or content they download), it does log the IP addresses that customers use to access the PureVPN service. These, given the right circumstances, can be matched to external activities thanks to logs carried by other web companies.

PureVPN talks about logs held by Google’s Gmail service to illustrate its point.

“A network log is automatically generated every time a user visits a website. For the sake of this example, let’s say a user logged into their Gmail account. Every time they accessed Gmail, the email provider created a network log,” the company explains.

“If you are using a VPN, Gmail’s network log would contain the IP provided by PureVPN. This is one half of the picture. Now, if someone asks Google who accessed the user’s account, Google would state that whoever was using this IP, accessed the account.

“If the user was connected to PureVPN, it would be a PureVPN IP. The inquirer [in the Lin case, the FBI] would then share timestamps and network logs acquired from Google and ask them to be compared with the network logs maintained by the VPN provider.”

Now, if PureVPN carried no logs – literally no logs – it would not be able to help with this kind of inquiry. That was the case last year when the FBI approached Private Internet Access for information and the company was unable to assist.

However, as is made pretty clear by PureVPN’s explanation, the company does log user IP addresses and timestamps which reveal when a user was logged on to the service. It doesn’t matter that PureVPN doesn’t log what the user allegedly did online, since the third-party service already knows that information to the precise second.

Following the example, GMail knows that a user sent an email at 10:22am on Monday October 16 from a PureVPN IP address. So, if PureVPN is approached by the FBI, the company can confirm that User X was using the same IP address at exactly the same time, and his home IP address was XXX.XX.XXX.XX. Effectively, the combined logs link one IP address to the other and the user is revealed. It’s that simple.

It is for this reason that in TorrentFreak’s annual summary of no-logging VPN providers, the very first question we ask every single company reads as follows:

Do you keep ANY logs which would allow you to match an IP-address and a time stamp to a user/users of your service? If so, what information do you hold and for how long?

Clearly, if a company says “yes we log incoming IP addresses and associated timestamps”, any claim to total user anonymity is ended right there and then.

While not completely useless (a logging service will still stop the prying eyes of ISPs and similar surveillance, while also defeating throttling and site-blocking), if you’re a whistle-blower with a job or even your life to protect, this level of protection is entirely inadequate.

The take-home points from this controversy are numerous, but perhaps the most important is for people to read and understand VPN provider logging policies.

Secondly, and just as importantly, VPN providers need to be extremely clear about the information they log. Not tracking browsing or downloading activities is all well and good, but if home IP addresses and timestamps are stored, this needs to be made clear to the customer.

Finally, VPN users should not be evil. There are plenty of good reasons to stay anonymous online but cyberstalking, death threats and ruining people’s lives are not included. Fortunately, the FBI have offline methods for catching this type of offender, and long may that continue.

PureVPN’s blog post is available here.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Some notes on the KRACK attack

Post Syndicated from Robert Graham original http://blog.erratasec.com/2017/10/some-notes-on-krack-attack.html

This is my interpretation of the KRACK attacks paper that describes a way of decrypting encrypted WiFi traffic with an active attack.

tl;dr: Wow. Everyone needs to be afraid. (Well, worried — not panicked.) It means in practice, attackers can decrypt a lot of wifi traffic, with varying levels of difficulty depending on your precise network setup. My post last July about the DEF CON network being safe was in error.

Details

This is not a crypto bug but a protocol bug (a pretty obvious and trivial protocol bug).
When a client connects to the network, the access-point will at some point send a random “key” data to use for encryption. Because this packet may be lost in transmission, it can be repeated many times.
What the hacker does is just repeatedly sends this packet, potentially hours later. Each time it does so, it resets the “keystream” back to the starting conditions. The obvious patch that device vendors will make is to only accept the first such packet it receives, ignore all the duplicates.
At this point, the protocol bug becomes a crypto bug. We know how to break crypto when we have two keystreams from the same starting position. It’s not always reliable, but reliable enough that people need to be afraid.
Android, though, is the biggest danger. Rather than simply replaying the packet, a packet with key data of all zeroes can be sent. This allows attackers to setup a fake WiFi access-point and man-in-the-middle all traffic.
In a related case, the access-point/base-station can sometimes also be attacked, affecting the stream sent to the client.
Not only is sniffing possible, but in some limited cases, injection. This allows the traditional attack of adding bad code to the end of HTML pages in order to trick users into installing a virus.

This is an active attack, not a passive attack, so in theory, it’s detectable.

Who is vulnerable?

Everyone, pretty much.
The hacker only needs to be within range of your WiFi. Your neighbor’s teenage kid is going to be downloading and running the tool in order to eavesdrop on your packets.
The hacker doesn’t need to be logged into your network.
It affects all WPA1/WPA2, the personal one with passwords that we use in home, and the enterprise version with certificates we use in enterprises.
It can’t defeat SSL/TLS or VPNs. Thus, if you feel your laptop is safe surfing the public WiFi at airports, then your laptop is still safe from this attack. With Android, it does allow running tools like sslstrip, which can fool many users.
Your home network is vulnerable. Many devices will be using SSL/TLS, so are fine, like your Amazon echo, which you can continue to use without worrying about this attack. Other devices, like your Phillips lightbulbs, may not be so protected.

How can I defend myself?

Patch.
More to the point, measure your current vendors by how long it takes them to patch. Throw away gear by those vendors that took a long time to patch and replace it with vendors that took a short time.
High-end access-points that contains “WIPS” (WiFi Intrusion Prevention Systems) features should be able to detect this and block vulnerable clients from connecting to the network (once the vendor upgrades the systems, of course). Even low-end access-points, like the $30 ones you get for home, can easily be updated to prevent packet sequence numbers from going back to the start (i.e. from the keystream resetting back to the start).
At some point, you’ll need to run the attack against yourself, to make sure all your devices are secure. Since you’ll be constantly allowing random phones to connect to your network, you’ll need to check their vulnerability status before connecting them. You’ll need to continue doing this for several years.
Of course, if you are using SSL/TLS for everything, then your danger is mitigated. This is yet another reason why you should be using SSL/TLS for internal communications.
Most security vendors will add things to their products/services to defend you. While valuable in some cases, it’s not a defense. The defense is patching the devices you know about, and preventing vulnerable devices from attaching to your network.
If I remember correctly, DEF CON uses Aruba. Aruba contains WIPS functionality, which means by the time DEF CON roles around again next year, they should have the feature to deny vulnerable devices from connecting, and specifically to detect an attack in progress and prevent further communication.
However, for an attacker near an Android device using a low-powered WiFi, it’s likely they will be able to conduct man-in-the-middle without any WIPS preventing them.

‘Pirate’ EBook Site Refuses Point Blank to Cooperate With BREIN

Post Syndicated from Andy original https://torrentfreak.com/pirate-ebook-site-refuses-point-blank-to-cooperate-with-brein-171015/

Dutch anti-piracy group BREIN is probably best known for its legal action against The Pirate Bay but the outfit also tackles many other forms of piracy.

A prime example is the case it pursued against a seller of fully-loaded Kodi boxes in the Netherlands. The subsequent landmark ruling from the European Court of Justice will reverberate around Europe for years to come.

Behind the scenes, however, BREIN persistently tries to take much smaller operations offline, and not without success. Earlier this year it revealed it had taken down 231 illegal sites and services includes 84 linking sites, 63 streaming portals, and 34 torrent sites. Some of these shut down completely and others were forced to leave their hosting providers.

Much of this work flies under the radar but some current action, against an eBook site, is now being thrust into the public eye.

For more than five years, EBoek.info (eBook) has serviced Internet users looking to obtain comic books in Dutch. The site informs TorrentFreak it provides a legitimate service, targeted at people who have purchased a hard copy but also want their comics in digital format.

“EBoek.info is a site about comic books in the Dutch language. Besides some general information about the books, people who have legally obtained a hard copy of the books can find a link to an NZB file which enables them to download a digital version of the books they already have,” site representative ‘Zala’ says.

For those out of the loop, NZB files are a bit like Usenet’s version of .torrent files. They contain no copyrighted content themselves but do provide software clients with information on where to find specific content, so it can be downloaded to a user’s machine.

“BREIN claims that this is illegal as it is impossible for us to verify if our visitor is telling the truth [about having purchased a copy],” Zala reveals.

Speaking with TorrentFreak, BREIN chief Tim Kuik says there’s no question that offering downloads like this is illegal.

“It is plain and simple: the site makes links to unauthorized digital copies available to the general public and therefore is infringing copyright. It is distribution of the content without authorization of the rights holder,” Kuik says.

“The unauthorized copies are not private copies. The private copy exception does not apply to this kind of distribution. The private copy has not been made by the owner of the book himself for his own use. Someone else made the digital copy and is making it available to anyone who wants to download it provided he makes the unverified claim that he has a legal copy. This harms the normal exploitation of the
content.”

Zala says that BREIN has been trying to take his site offline for many years but more recently, the platform has utilized the services of Cloudflare, partly as a form of shield. As readers may be aware, a site behind Cloudflare has its originating IP addresses hidden from the public, not to mention BREIN, who values that kind of information. According to the operator, however, BREIN managed to obtain the information from the CDN provider.

“BREIN has tried for years to take our site offline. Recently, however, Cloudflare was so friendly to give them our IP address,” Zala notes.

A text copy of an email reportedly sent by BREIN to EBoek’s web host and seen by TF appears to confirm that Cloudflare handed over the information as suggested. Among other things, the email has BREIN informing the host that “The IP we got back from Cloudflare is XXX.XXX.XX.33.”

This means that BREIN was able to place direct pressure on EBoek.info’s web host, so only time will tell if that bears any fruit for the anti-piracy group. In the meantime, however, EBoek has decided to go public over its battle with BREIN.

“We have received a request from Stichting BREIN via our hosting provider to take EBoek.info offline,” the site informed its users yesterday.

Interestingly, it also appears that BREIN doesn’t appreciate that the operators of EBoek have failed to make their identities publicly known on their platform.

“The site operates anonymously which also is unlawful. Consumer protection requires that the owner/operator of a site identifies himself,” Kuik says.

According to EBoek, the anti-piracy outfit told the site’s web host that as a “commercial online service”, EBoek is required under EU law to display its “correct and complete business information” including names, addresses, and other information. But perhaps unsurprisingly, the site doesn’t want to play ball.

“In my opinion, you are confusing us with Facebook. They are a foreign commercial company with a European branch in Ireland, and therefore are subject to Irish legislation,” Zala says in an open letter to BREIN.

“Eboek.info, on the other hand, is a foreign hobby club with no commercial purpose, whose administrators have no connection with any country in the European Union. As administrators, we follow the laws of our country of residence which do not oblige us to disclose our identity through our website.

“The fact that Eboek is visible in the Netherlands does not just mean that we are going to adapt to Dutch rules, just as we don’t adapt the site to the rules of Saudi Arabia or China or wherever we are available.”

In a further snub to the anti-piracy group, EBoek says that all visitors to the site have to communicate with its operators via its guestbook, which is publicly visible.

“We see no reason to make an exception for Stichting BREIN,” the site notes.

What makes the situation more complex is that EBoek isn’t refusing dialog completely. The site says it doesn’t want to talk to BREIN but will speak to BREIN’s customers – the publishers of the comic books in question – noting that to date no complaints from publishers have ever been received.

While the parties argue about lines of communication, BREIN insists that following this year’s European Court of Justice decision in the GS Media case, a link to a known infringing work represents copyright infringement. In this case, an NZB file – which links to a location on Usenet – would generally fit the bill.

But despite focusing on the Dutch market, the operators of EBoek say the ruling doesn’t apply to them as they’re outside of the ECJ’s jurisdiction and aren’t commercially motivated. Refusing point blank to take their site offline, EBoek’s operators say that BREIN can do its worst, nothing will have much effect.

“[W]hat’s the worst thing that can happen? That our web host hands [BREIN] our address and IP data. In that case, it will turn out that…we are actually far away,” Zala says.

“[In the case the site goes offline], we’ll just put a backup on another server and, in this case, won’t make use of the ‘services’ of Cloudflare, the provider that apparently put BREIN on the right track.”

The question of jurisdiction is indeed an interesting one, particularly given BREIN’s focus in the Netherlands. But Kuik is clear – it is the area where the content is made available that matters.

“The law of the country where the content is made available applies. In this case the EU and amongst others the Netherlands,” Kuik concludes.

To be continued…..

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Coaxing 2D platforming out of Unity

Post Syndicated from Eevee original https://eev.ee/blog/2017/10/13/coaxing-2d-platforming-out-of-unity/

An anonymous donor asked a question that I can’t even begin to figure out how to answer, but they also said anything else is fine, so here’s anything else.

I’ve been avoiding writing about game physics, since I want to save it for ✨ the book I’m writing ✨, but that book will almost certainly not touch on Unity. Here, then, is a brief run through some of the brick walls I ran into while trying to convince Unity to do 2D platforming.

This is fairly high-level — there are no blocks of code or helpful diagrams. I’m just getting this out of my head because it’s interesting. If you want more gritty details, I guess you’ll have to wait for ✨ the book ✨.

The setup

I hadn’t used Unity before. I hadn’t even used a “real” physics engine before. My games so far have mostly used LÖVE, a Lua-based engine. LÖVE includes box2d bindings, but for various reasons (not all of them good), I opted to avoid them and instead write my own physics completely from scratch. (How, you ask? ✨ Book ✨!)

I was invited to work on a Unity project, Chaos Composer, that someone else had already started. It had basic movement already implemented; I taught myself Unity’s physics system by hacking on it. It’s entirely possible that none of this is actually the best way to do anything, since I was really trying to reproduce my own homegrown stuff in Unity, but it’s the best I’ve managed to come up with.

Two recurring snags were that you can’t ask Unity to do multiple physics updates in a row, and sometimes getting the information I wanted was difficult. Working with my own code spoiled me a little, since I could invoke it at any time and ask it anything I wanted; Unity, on the other hand, is someone else’s black box with a rigid interface on top.

Also, wow, Googling for a lot of this was not quite as helpful as expected. A lot of what’s out there is just the first thing that works, and often that’s pretty hacky and imposes severe limits on the game design (e.g., “this won’t work with slopes”). Basic movement and collision are the first thing you do, which seems to me like the worst time to be locking yourself out of a lot of design options. I tried very (very, very, very) hard to minimize those kinds of constraints.

Problem 1: Movement

When I showed up, movement was already working. Problem solved!

Like any good programmer, I immediately set out to un-solve it. Given a “real” physics engine like Unity prominently features, you have two options: ⓐ treat the player as a physics object, or ⓑ don’t. The existing code went with option ⓑ, like I’d done myself with LÖVE, and like I’d seen countless people advise. Using a physics sim makes for bad platforming.

But… why? I believed it, but I couldn’t concretely defend it. I had to know for myself. So I started a blank project, drew some physics boxes, and wrote a dozen-line player controller.

Ah! Immediate enlightenment.

If the player was sliding down a wall, and I tried to move them into the wall, they would simply freeze in midair until I let go of the movement key. The trouble is that the physics sim works in terms of forces — moving the player involves giving them a nudge in some direction, like a giant invisible hand pushing them around the level. Surprise! If you press a real object against a real wall with your real hand, you’ll see the same effect — friction will cancel out gravity, and the object will stay in midair..

Platformer movement, as it turns out, doesn’t make any goddamn physical sense. What is air control? What are you pushing against? Nothing, really; we just have it because it’s nice to play with, because not having it is a nightmare.

I looked to see if there were any common solutions to this, and I only really found one: make all your walls frictionless.

Game development is full of hacks like this, and I… don’t like them. I can accept that minor hacks are necessary sometimes, but this one makes an early and widespread change to a fundamental system to “fix” something that was wrong in the first place. It also imposes an “invisible” requirement, something I try to avoid at all costs — if you forget to make a particular wall frictionless, you’ll never know unless you happen to try sliding down it.

And so, I swiftly returned to the existing code. It wasn’t too different from what I’d come up with for LÖVE: it applied gravity by hand, tracked the player’s velocity, computed the intended movement each frame, and moved by that amount. The interesting thing was that it used MovePosition, which schedules a movement for the next physics update and stops the movement if the player hits something solid.

It’s kind of a nice hybrid approach, actually; all the “physics” for conscious actors is done by hand, but the physics engine is still used for collision detection. It’s also used for collision rejection — if the player manages to wedge themselves several pixels into a solid object, for example, the physics engine will try to gently nudge them back out of it with no extra effort required on my part. I still haven’t figured out how to get that to work with my homegrown stuff, which is built to prevent overlap rather than to jiggle things out of it.

But wait, what about…

Our player is a dynamic body with rotation lock and no gravity. Why not just use a kinematic body?

I must be missing something, because I do not understand the point of kinematic bodies. I ran into this with Godot, too, which documented them the same way: as intended for use as players and other manually-moved objects. But by default, they don’t even collide with other kinematic bodies or static geometry. What? There’s a checkbox to turn this on, which I enabled, but then I found out that MovePosition doesn’t stop kinematic bodies when they hit something, so I would’ve had to cast along the intended path of movement to figure out when to stop, thus duplicating the same work the physics engine was about to do.

But that’s impossible anyway! Static geometry generally wants to be made of edge colliders, right? They don’t care about concave/convex. Imagine the player is standing on the ground near a wall and tries to move towards the wall. Both the ground and the wall are different edges from the same edge collider.

If you try to cast the player’s hitbox horizontally, parallel to the ground, you’ll only get one collision: the existing collision with the ground. Casting doesn’t distinguish between touching and hitting. And because Unity only reports one collision per collider, and because the ground will always show up first, you will never find out about the impending wall collision.

So you’re forced to either use raycasts for collision detection or decomposed polygons for world geometry, both of which are slightly worse tools for no real gain.

I ended up sticking with a dynamic body.


Oh, one other thing that doesn’t really fit anywhere else: keep track of units! If you’re adding something called “velocity” directly to something called “position”, something has gone very wrong. Acceleration is distance per time squared; velocity is distance per time; position is distance. You must multiply or divide by time to convert between them.

I never even, say, add a constant directly to position every frame; I always phrase it as velocity and multiply by Δt. It keeps the units consistent: time is always in seconds, not in tics.

Problem 2: Slopes

Ah, now we start to get off in the weeds.

A sort of pre-problem here was detecting whether we’re on a slope, which means detecting the ground. The codebase originally used a manual physics query of the area around the player’s feet to check for the ground, which seems to be somewhat common, but that can’t tell me the angle of the detected ground. (It’s also kind of error-prone, since “around the player’s feet” has to be specified by hand and may not stay correct through animations or changes in the hitbox.)

I replaced that with what I’d eventually settled on in LÖVE: detect the ground by detecting collisions, and looking at the normal of the collision. A normal is a vector that points straight out from a surface, so if you’re standing on the ground, the normal points straight up; if you’re on a 10° incline, the normal points 10° away from straight up.

Not all collisions are with the ground, of course, so I assumed something is ground if the normal pointed away from gravity. (I like this definition more than “points upwards”, because it avoids assuming anything about the direction of gravity, which leaves some interesting doors open for later on.) That’s easily detected by taking the dot product — if it’s negative, the collision was with the ground, and I now have the normal of the ground.

Actually doing this in practice was slightly tricky. With my LÖVE engine, I could cram this right into the middle of collision resolution. With Unity, not quite so much. I went through a couple iterations before I really grasped Unity’s execution order, which I guess I will have to briefly recap for this to make sense.

Unity essentially has two update cycles. It performs physics updates at fixed intervals for consistency, and updates everything else just before rendering. Within a single frame, Unity does as many fixed physics updates as it has spare time for (which might be zero, one, or more), then does a regular update, then renders. User code can implement either or both of Update, which runs during a regular update, and FixedUpdate, which runs just before Unity does a physics pass.

So my solution was:

  • At the very end of FixedUpdate, clear the actor’s “on ground” flag and ground normal.

  • During OnCollisionEnter2D and OnCollisionStay2D (which are called from within a physics pass), if there’s a collision that looks like it’s with the ground, set the “on ground” flag and ground normal. (If there are multiple ground collisions, well, good luck figuring out the best way to resolve that! At the moment I’m just taking the first and hoping for the best.)

That means there’s a brief window between the end of FixedUpdate and Unity’s physics pass during which a grounded actor might mistakenly believe it’s not on the ground, which is a bit of a shame, but there are very few good reasons for anything to be happening in that window.

Okay! Now we can do slopes.

Just kidding! First we have to do sliding.

When I first looked at this code, it didn’t apply gravity while the player was on the ground. I think I may have had some problems with detecting the ground as result, since the player was no longer pushing down against it? Either way, it seemed like a silly special case, so I made gravity always apply.

Lo! I was a fool. The player could no longer move.

Why? Because MovePosition does exactly what it promises. If the player collides with something, they’ll stop moving. Applying gravity means that the player is trying to move diagonally downwards into the ground, and so MovePosition stops them immediately.

Hence, sliding. I don’t want the player to actually try to move into the ground. I want them to move the unblocked part of that movement. For flat ground, that means the horizontal part, which is pretty much the same as discarding gravity. For sloped ground, it’s a bit more complicated!

Okay but actually it’s less complicated than you’d think. It can be done with some cross products fairly easily, but Unity makes it even easier with a couple casts. There’s a Vector3.ProjectOnPlane function that projects an arbitrary vector on a plane given by its normal — exactly the thing I want! So I apply that to the attempted movement before passing it along to MovePosition. I do the same thing with the current velocity, to prevent the player from accelerating infinitely downwards while standing on flat ground.

One other thing: I don’t actually use the detected ground normal for this. The player might be touching two ground surfaces at the same time, and I’d want to project on both of them. Instead, I use the player body’s GetContacts method, which returns contact points (and normals!) for everything the player is currently touching. I believe those contact points are tracked by the physics engine anyway, so asking for them doesn’t require any actual physics work.

(Looking at the code I have, I notice that I still only perform the slide for surfaces facing upwards — but I’d want to slide against sloped ceilings, too. Why did I do this? Maybe I should remove that.)

(Also, I’m pretty sure projecting a vector on a plane is non-commutative, which raises the question of which order the projections should happen in and what difference it makes. I don’t have a good answer.)

(I note that my LÖVE setup does something slightly different: it just tries whatever the movement ought to be, and if there’s a collision, then it projects — and tries again with the remaining movement. But I can’t ask Unity to do multiple moves in one physics update, alas.)

Okay! Now, slopes. But actually, with the above work done, slopes are most of the way there already.

One obvious problem is that the player tries to move horizontally even when on a slope, and the easy fix is to change their movement from speed * Vector2.right to speed * new Vector2(ground.y, -ground.x) while on the ground. That’s the ground normal rotated a quarter-turn clockwise, so for flat ground it still points to the right, and in general it points rightwards along the ground. (Note that it assumes the ground normal is a unit vector, but as far as I’m aware, that’s true for all the normals Unity gives you.)

Another issue is that if the player stands motionless on a slope, gravity will cause them to slowly slide down it — because the movement from gravity will be projected onto the slope, and unlike flat ground, the result is no longer zero. For conscious actors only, I counter this by adding the opposite factor to the player’s velocity as part of adding in their walking speed. This matches how the real world works, to some extent: when you’re standing on a hill, you’re exerting some small amount of effort just to stay in place.

(Note that slope resistance is not the same as friction. Okay, yes, in the real world, virtually all resistance to movement happens as a result of friction, but bracing yourself against the ground isn’t the same as being passively resisted.)

From here there are a lot of things you can do, depending on how you think slopes should be handled. You could make the player unable to walk up slopes that are too steep. You could make walking down a slope faster than walking up it. You could make jumping go along the ground normal, rather than straight up. You could raise the player’s max allowed speed while running downhill. Whatever you want, really. Armed with a normal and awareness of dot products, you can do whatever you want.

But first you might want to fix a few aggravating side effects.

Problem 3: Ground adherence

I don’t know if there’s a better name for this. I rarely even see anyone talk about it, which surprises me; it seems like it should be a very common problem.

The problem is: if the player runs up a slope which then abruptly changes to flat ground, their momentum will carry them into the air. For very fast players going off the top of very steep slopes, this makes sense, but it becomes visible even for relatively gentle slopes. It was a mild nightmare in the original release of our game Lunar Depot 38, which has very “rough” ground made up of lots of shallow slopes — so the player is very frequently slightly off the ground, which meant they couldn’t jump, for seemingly no reason. (I even had code to fix this, but I disabled it because of a silly visual side effect that I never got around to fixing.)

Anyway! The reason this is a problem is that game protagonists are generally not boxes sliding around — they have legs. We don’t go flying off the top of real-world hilltops because we put our foot down until it touches the ground.

Simulating this footfall is surprisingly fiddly to get right, especially with someone else’s physics engine. It’s made somewhat easier by Cast, which casts the entire hitbox — no matter what shape it is — in a particular direction, as if it had moved, and tells you all the hypothetical collisions in order.

So I cast the player in the direction of gravity by some distance. If the cast hits something solid with a ground-like collision normal, then the player must be close to the ground, and I move them down to touch it (and set that ground as the new ground normal).

There are some wrinkles.

Wrinkle 1: I only want to do this if the player is off the ground now, but was on the ground last frame, and is not deliberately moving upwards. That latter condition means I want to skip this logic if the player jumps, for example, but also if the player is thrust upwards by a spring or abducted by a UFO or whatever. As long as external code goes through some interface and doesn’t mess with the player’s velocity directly, that shouldn’t be too hard to track.

Wrinkle 2: When does this logic run? It needs to happen after the player moves, which means after a Unity physics pass… but there’s no callback for that point in time. I ended up running it at the beginning of FixedUpdate and the beginning of Update — since I definitely want to do it before rendering happens! That means it’ll sometimes happen twice between physics updates. (I could carefully juggle a flag to skip the second run, but I… didn’t do that. Yet?)

Wrinkle 3: I can’t move the player with MovePosition! Remember, MovePosition schedules a movement, it doesn’t actually perform one; that means if it’s called twice before the physics pass, the first call is effectively ignored. I can’t easily combine the drop with the player’s regular movement, for various fiddly reasons. I ended up doing it “by hand” using transform.Translate, which I think was the “old way” to do manual movement before MovePosition existed. I’m not totally sure if it activates triggers? For that matter, I’m not sure it even notices collisions — but since I did a full-body Cast, there shouldn’t be any anyway.

Wrinkle 4: What, exactly, is “some distance”? I’ve yet to find a satisfying answer for this. It seems like it ought to be based on the player’s current speed and the slope of the ground they’re moving along, but every time I’ve done that math, I’ve gotten totally ludicrous answers that sometimes exceed the size of a tile. But maybe that’s not wrong? Play around, I guess, and think about when the effect should “break” and the player should go flying off the top of a hill.

Wrinkle 5: It’s possible that the player will launch off a slope, hit something, and then be adhered to the ground where they wouldn’t have hit it. I don’t much like this edge case, but I don’t see a way around it either.

This problem is surprisingly awkward for how simple it sounds, and the solution isn’t entirely satisfying. Oh, well; the results are much nicer than the solution. As an added bonus, this also fixes occasional problems with running down a hill and becoming detached from the ground due to precision issues or whathaveyou.

Problem 4: One-way platforms

Ah, what a nightmare.

It took me ages just to figure out how to define one-way platforms. Only block when the player is moving downwards? Nope. Only block when the player is above the platform? Nuh-uh.

Well, okay, yes, those approaches might work for convex players and flat platforms. But what about… sloped, one-way platforms? There’s no reason you shouldn’t be able to have those. If Super Mario World can do it, surely Unity can do it almost 30 years later.

The trick is, again, to look at the collision normal. If it faces away from gravity, the player is hitting a ground-like surface, so the platform should block them. Otherwise (or if the player overlaps the platform), it shouldn’t.

Here’s the catch: Unity doesn’t have conditional collision. I can’t decide, on the fly, whether a collision should block or not. In fact, I think that by the time I get a callback like OnCollisionEnter2D, the physics pass is already over.

I could go the other way and use triggers (which are non-blocking), but then I have the opposite problem: I can’t stop the player on the fly. I could move them back to where they hit the trigger, but I envision all kinds of problems as a result. What if they were moving fast enough to activate something on the other side of the platform? What if something else moved to where I’m trying to shove them back to in the meantime? How does this interact with ground detection and listing contacts, which would rightly ignore a trigger as non-blocking?

I beat my head against this for a while, but the inability to respond to collision conditionally was a huge roadblock. It’s all the more infuriating a problem, because Unity ships with a one-way platform modifier thing. Unfortunately, it seems to have been implemented by someone who has never played a platformer. It’s literally one-way — the player is only allowed to move straight upwards through it, not in from the sides. It also tries to block the player if they’re moving downwards while inside the platform, which invokes clumsy rejection behavior. And this all seems to be built into the physics engine itself somehow, so I can’t simply copy whatever they did.

Eventually, I settled on the following. After calculating attempted movement (including sliding), just at the end of FixedUpdate, I do a Cast along the movement vector. I’m not thrilled about having to duplicate the physics engine’s own work, but I do filter to only things on a “one-way platform” physics layer, which should at least help. For each object the cast hits, I use Physics2D.IgnoreCollision to either ignore or un-ignore the collision between the player and the platform, depending on whether the collision was ground-like or not.

(A lot of people suggested turning off collision between layers, but that can’t possibly work — the player might be standing on one platform while inside another, and anyway, this should work for all actors!)

Again, wrinkles! But fewer this time. Actually, maybe just one: handling the case where the player already overlaps the platform. I can’t just check for that with e.g. OverlapCollider, because that doesn’t distinguish between overlapping and merely touching.

I came up with a fairly simple fix: if I was going to un-ignore the collision (i.e. make the platform block), and the cast distance is reported as zero (either already touching or overlapping), I simply do nothing instead. If I’m standing on the platform, I must have already set it blocking when I was approaching it from the top anyway; if I’m overlapping it, I must have already set it non-blocking to get here in the first place.

I can imagine a few cases where this might go wrong. Moving platforms, especially, are going to cause some interesting issues. But this is the best I can do with what I know, and it seems to work well enough so far.

Oh, and our player can deliberately drop down through platforms, which was easy enough to implement; I just decide the platform is always passable while some button is held down.

Problem 5: Pushers and carriers

I haven’t gotten to this yet! Oh boy, can’t wait. I implemented it in LÖVE, but my way was hilariously invasive; I’m hoping that having a physics engine that supports a handwaved “this pushes that” will help. Of course, you also have to worry about sticking to platforms, for which the recommended solution is apparently to parent the cargo to the platform, which sounds goofy to me? I guess I’ll find out when I throw myself at it later.

Overall result

I ended up with a fairly pleasant-feeling system that supports slopes and one-way platforms and whatnot, with all the same pieces as I came up with for LÖVE. The code somehow ended up as less of a mess, too, but it probably helps that I’ve been down this rabbit hole once before and kinda knew what I was aiming for this time.

Animation of a character running smoothly along the top of an irregular dinosaur skeleton

Sorry that I don’t have a big block of code for you to copy-paste into your project. I don’t think there are nearly enough narrative discussions of these fundamentals, though, so hopefully this is useful to someone. If not, well, look forward to ✨ my book, that I am writing ✨!

My Blogging

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/10/my_blogging.html

Blog regulars will notice that I haven’t been posting as much lately as I have in the past. There are two reasons. One, it feels harder to find things to write about. So often it’s the same stories over and over. I don’t like repeating myself. Two, I am busy writing a book. The title is still: Click Here to Kill Everybody: Peril and Promise in a Hyper-Connected World. The book is a year late, and as a very different table of contents than it had in 2016. I have been writing steadily since mid-August. The book is due to the publisher at the end of March 2018, and will be published in the beginning of September.

This is the current table of contents:

  • Introduction: Everything is Becoming a Computer
  • Part 1: The Trends
    • 1. Capitalism Continues to Drive the Internet
    • 2. Customer/User Control is Next
    • 3. Government Surveillance and Control is Also Increasing
    • 4. Cybercrime is More Profitable Than Ever
    • 5. Cyberwar is the New Normal
    • 6. Algorithms, Automation, and Autonomy Bring New Dangers
    • 7. What We Know About Computer Security
    • 8. Agile is Failing as a Security Paradigm
    • 9. Authentication and Identification are Getting Harder
    • 10. Risks are Becoming Catastrophic
  • Part 2: The Solutions
    • 11. We Need to Regulate the Internet of Things
    • 12. We Need to Defend Critical Infrastructure
    • 13. We Need to Prioritize Defense Over Offence
    • 14. We Need to Make Smarter Decisions About Connecting
    • 15. What’s Likely to Happen, and What We Can Do in Response
    • 16. Where Policy Can Go Wrong
  • Conclusion: Technology and Policy, Together

So that’s what’s been happening.

Predict Billboard Top 10 Hits Using RStudio, H2O and Amazon Athena

Post Syndicated from Gopal Wunnava original https://aws.amazon.com/blogs/big-data/predict-billboard-top-10-hits-using-rstudio-h2o-and-amazon-athena/

Success in the popular music industry is typically measured in terms of the number of Top 10 hits artists have to their credit. The music industry is a highly competitive multi-billion dollar business, and record labels incur various costs in exchange for a percentage of the profits from sales and concert tickets.

Predicting the success of an artist’s release in the popular music industry can be difficult. One release may be extremely popular, resulting in widespread play on TV, radio and social media, while another single may turn out quite unpopular, and therefore unprofitable. Record labels need to be selective in their decision making, and predictive analytics can help them with decision making around the type of songs and artists they need to promote.

In this walkthrough, you leverage H2O.ai, Amazon Athena, and RStudio to make predictions on whether a song might make it to the Top 10 Billboard charts. You explore the GLM, GBM, and deep learning modeling techniques using H2O’s rapid, distributed and easy-to-use open source parallel processing engine. RStudio is a popular IDE, licensed either commercially or under AGPLv3, for working with R. This is ideal if you don’t want to connect to a server via SSH and use code editors such as vi to do analytics. RStudio is available in a desktop version, or a server version that allows you to access R via a web browser. RStudio’s Notebooks feature is used to demonstrate the execution of code and output. In addition, this post showcases how you can leverage Athena for query and interactive analysis during the modeling phase. A working knowledge of statistics and machine learning would be helpful to interpret the analysis being performed in this post.

Walkthrough

Your goal is to predict whether a song will make it to the Top 10 Billboard charts. For this purpose, you will be using multiple modeling techniques―namely GLM, GBM and deep learning―and choose the model that is the best fit.

This solution involves the following steps:

  • Install and configure RStudio with Athena
  • Log in to RStudio
  • Install R packages
  • Connect to Athena
  • Create a dataset
  • Create models

Install and configure RStudio with Athena

Use the following AWS CloudFormation stack to install, configure, and connect RStudio on an Amazon EC2 instance with Athena.

Launching this stack creates all required resources and prerequisites:

  • Amazon EC2 instance with Amazon Linux (minimum size of t2.large is recommended)
  • Provisioning of the EC2 instance in an existing VPC and public subnet
  • Installation of Java 8
  • Assignment of an IAM role to the EC2 instance with the required permissions for accessing Athena and Amazon S3
  • Security group allowing access to the RStudio and SSH ports from the internet (I recommend restricting access to these ports)
  • S3 staging bucket required for Athena (referenced within RStudio as ATHENABUCKET)
  • RStudio username and password
  • Setup logs in Amazon CloudWatch Logs (if needed for additional troubleshooting)
  • Amazon EC2 Systems Manager agent, which makes it easy to manage and patch

All AWS resources are created in the US-East-1 Region. To avoid cross-region data transfer fees, launch the CloudFormation stack in the same region. To check the availability of Athena in other regions, see Region Table.

Log in to RStudio

The instance security group has been automatically configured to allow incoming connections on the RStudio port 8787 from any source internet address. You can edit the security group to restrict source IP access. If you have trouble connecting, ensure that port 8787 isn’t blocked by subnet network ACLS or by your outgoing proxy/firewall.

  1. In the CloudFormation stack, choose Outputs, Value, and then open the RStudio URL. You might need to wait for a few minutes until the instance has been launched.
  2. Log in to RStudio with the and password you provided during setup.

Install R packages

Next, install the required R packages from the RStudio console. You can download the R notebook file containing just the code.

#install pacman – a handy package manager for managing installs
if("pacman" %in% rownames(installed.packages()) == FALSE)
{install.packages("pacman")}  
library(pacman)
p_load(h2o,rJava,RJDBC,awsjavasdk)
h2o.init(nthreads = -1)
##  Connection successful!
## 
## R is connected to the H2O cluster: 
##     H2O cluster uptime:         2 hours 42 minutes 
##     H2O cluster version:        3.10.4.6 
##     H2O cluster version age:    4 months and 4 days !!! 
##     H2O cluster name:           H2O_started_from_R_rstudio_hjx881 
##     H2O cluster total nodes:    1 
##     H2O cluster total memory:   3.30 GB 
##     H2O cluster total cores:    4 
##     H2O cluster allowed cores:  4 
##     H2O cluster healthy:        TRUE 
##     H2O Connection ip:          localhost 
##     H2O Connection port:        54321 
##     H2O Connection proxy:       NA 
##     H2O Internal Security:      FALSE 
##     R Version:                  R version 3.3.3 (2017-03-06)
## Warning in h2o.clusterInfo(): 
## Your H2O cluster version is too old (4 months and 4 days)!
## Please download and install the latest version from http://h2o.ai/download/
#install aws sdk if not present (pre-requisite for using Athena with an IAM role)
if (!aws_sdk_present()) {
  install_aws_sdk()
}

load_sdk()
## NULL

Connect to Athena

Next, establish a connection to Athena from RStudio, using an IAM role associated with your EC2 instance. Use ATHENABUCKET to specify the S3 staging directory.

URL <- 'https://s3.amazonaws.com/athena-downloads/drivers/AthenaJDBC41-1.0.1.jar'
fil <- basename(URL)
#download the file into current working directory
if (!file.exists(fil)) download.file(URL, fil)
#verify that the file has been downloaded successfully
list.files()
## [1] "AthenaJDBC41-1.0.1.jar"
drv <- JDBC(driverClass="com.amazonaws.athena.jdbc.AthenaDriver", fil, identifier.quote="'")

con <- jdbcConnection <- dbConnect(drv, 'jdbc:awsathena://athena.us-east-1.amazonaws.com:443/',
                                   s3_staging_dir=Sys.getenv("ATHENABUCKET"),
                                   aws_credentials_provider_class="com.amazonaws.auth.DefaultAWSCredentialsProviderChain")

Verify the connection. The results returned depend on your specific Athena setup.

con
## <JDBCConnection>
dbListTables(con)
##  [1] "gdelt"               "wikistats"           "elb_logs_raw_native"
##  [4] "twitter"             "twitter2"            "usermovieratings"   
##  [7] "eventcodes"          "events"              "billboard"          
## [10] "billboardtop10"      "elb_logs"            "gdelthist"          
## [13] "gdeltmaster"         "twitter"             "twitter3"

Create a dataset

For this analysis, you use a sample dataset combining information from Billboard and Wikipedia with Echo Nest data in the Million Songs Dataset. Upload this dataset into your own S3 bucket. The table below provides a description of the fields used in this dataset.

Field Description
year Year that song was released
songtitle Title of the song
artistname Name of the song artist
songid Unique identifier for the song
artistid Unique identifier for the song artist
timesignature Variable estimating the time signature of the song
timesignature_confidence Confidence in the estimate for the timesignature
loudness Continuous variable indicating the average amplitude of the audio in decibels
tempo Variable indicating the estimated beats per minute of the song
tempo_confidence Confidence in the estimate for tempo
key Variable with twelve levels indicating the estimated key of the song (C, C#, B)
key_confidence Confidence in the estimate for key
energy Variable that represents the overall acoustic energy of the song, using a mix of features such as loudness
pitch Continuous variable that indicates the pitch of the song
timbre_0_min thru timbre_11_min Variables that indicate the minimum values over all segments for each of the twelve values in the timbre vector
timbre_0_max thru timbre_11_max Variables that indicate the maximum values over all segments for each of the twelve values in the timbre vector
top10 Indicator for whether or not the song made it to the Top 10 of the Billboard charts (1 if it was in the top 10, and 0 if not)

Create an Athena table based on the dataset

In the Athena console, select the default database, sampled, or create a new database.

Run the following create table statement.

create external table if not exists billboard
(
year int,
songtitle string,
artistname string,
songID string,
artistID string,
timesignature int,
timesignature_confidence double,
loudness double,
tempo double,
tempo_confidence double,
key int,
key_confidence double,
energy double,
pitch double,
timbre_0_min double,
timbre_0_max double,
timbre_1_min double,
timbre_1_max double,
timbre_2_min double,
timbre_2_max double,
timbre_3_min double,
timbre_3_max double,
timbre_4_min double,
timbre_4_max double,
timbre_5_min double,
timbre_5_max double,
timbre_6_min double,
timbre_6_max double,
timbre_7_min double,
timbre_7_max double,
timbre_8_min double,
timbre_8_max double,
timbre_9_min double,
timbre_9_max double,
timbre_10_min double,
timbre_10_max double,
timbre_11_min double,
timbre_11_max double,
Top10 int
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION 's3://aws-bigdata-blog/artifacts/predict-billboard/data'
;

Inspect the table definition for the ‘billboard’ table that you have created. If you chose a database other than sampledb, replace that value with your choice.

dbGetQuery(con, "show create table sampledb.billboard")
##                                      createtab_stmt
## 1       CREATE EXTERNAL TABLE `sampledb.billboard`(
## 2                                       `year` int,
## 3                               `songtitle` string,
## 4                              `artistname` string,
## 5                                  `songid` string,
## 6                                `artistid` string,
## 7                              `timesignature` int,
## 8                `timesignature_confidence` double,
## 9                                `loudness` double,
## 10                                  `tempo` double,
## 11                       `tempo_confidence` double,
## 12                                       `key` int,
## 13                         `key_confidence` double,
## 14                                 `energy` double,
## 15                                  `pitch` double,
## 16                           `timbre_0_min` double,
## 17                           `timbre_0_max` double,
## 18                           `timbre_1_min` double,
## 19                           `timbre_1_max` double,
## 20                           `timbre_2_min` double,
## 21                           `timbre_2_max` double,
## 22                           `timbre_3_min` double,
## 23                           `timbre_3_max` double,
## 24                           `timbre_4_min` double,
## 25                           `timbre_4_max` double,
## 26                           `timbre_5_min` double,
## 27                           `timbre_5_max` double,
## 28                           `timbre_6_min` double,
## 29                           `timbre_6_max` double,
## 30                           `timbre_7_min` double,
## 31                           `timbre_7_max` double,
## 32                           `timbre_8_min` double,
## 33                           `timbre_8_max` double,
## 34                           `timbre_9_min` double,
## 35                           `timbre_9_max` double,
## 36                          `timbre_10_min` double,
## 37                          `timbre_10_max` double,
## 38                          `timbre_11_min` double,
## 39                          `timbre_11_max` double,
## 40                                     `top10` int)
## 41                             ROW FORMAT DELIMITED 
## 42                         FIELDS TERMINATED BY ',' 
## 43                            STORED AS INPUTFORMAT 
## 44       'org.apache.hadoop.mapred.TextInputFormat' 
## 45                                     OUTPUTFORMAT 
## 46  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
## 47                                        LOCATION
## 48    's3://aws-bigdata-blog/artifacts/predict-billboard/data'
## 49                                  TBLPROPERTIES (
## 50            'transient_lastDdlTime'='1505484133')

Run a sample query

Next, run a sample query to obtain a list of all songs from Janet Jackson that made it to the Billboard Top 10 charts.

dbGetQuery(con, " SELECT songtitle,artistname,top10   FROM sampledb.billboard WHERE lower(artistname) =     'janet jackson' AND top10 = 1")
##                       songtitle    artistname top10
## 1                       Runaway Janet Jackson     1
## 2               Because Of Love Janet Jackson     1
## 3                         Again Janet Jackson     1
## 4                            If Janet Jackson     1
## 5  Love Will Never Do (Without You) Janet Jackson 1
## 6                     Black Cat Janet Jackson     1
## 7               Come Back To Me Janet Jackson     1
## 8                       Alright Janet Jackson     1
## 9                      Escapade Janet Jackson     1
## 10                Rhythm Nation Janet Jackson     1

Determine how many songs in this dataset are specifically from the year 2010.

dbGetQuery(con, " SELECT count(*)   FROM sampledb.billboard WHERE year = 2010")
##   _col0
## 1   373

The sample dataset provides certain song properties of interest that can be analyzed to gauge the impact to the song’s overall popularity. Look at one such property, timesignature, and determine the value that is the most frequent among songs in the database. Timesignature is a measure of the number of beats and the type of note involved.

Running the query directly may result in an error, as shown in the commented lines below. This error is a result of trying to retrieve a large result set over a JDBC connection, which can cause out-of-memory issues at the client level. To address this, reduce the fetch size and run again.

#t<-dbGetQuery(con, " SELECT timesignature FROM sampledb.billboard")
#Note:  Running the preceding query results in the following error: 
#Error in .jcall(rp, "I", "fetch", stride, block): java.sql.SQLException: The requested #fetchSize is more than the allowed value in Athena. Please reduce the fetchSize and try #again. Refer to the Athena documentation for valid fetchSize values.
# Use the dbSendQuery function, reduce the fetch size, and run again
r <- dbSendQuery(con, " SELECT timesignature     FROM sampledb.billboard")
dftimesignature<- fetch(r, n=-1, block=100)
dbClearResult(r)
## [1] TRUE
table(dftimesignature)
## dftimesignature
##    0    1    3    4    5    7 
##   10  143  503 6787  112   19
nrow(dftimesignature)
## [1] 7574

From the results, observe that 6787 songs have a timesignature of 4.

Next, determine the song with the highest tempo.

dbGetQuery(con, " SELECT songtitle,artistname,tempo   FROM sampledb.billboard WHERE tempo = (SELECT max(tempo) FROM sampledb.billboard) ")
##                   songtitle      artistname   tempo
## 1 Wanna Be Startin' Somethin' Michael Jackson 244.307

Create the training dataset

Your model needs to be trained such that it can learn and make accurate predictions. Split the data into training and test datasets, and create the training dataset first.  This dataset contains all observations from the year 2009 and earlier. You may face the same JDBC connection issue pointed out earlier, so this query uses a fetch size.

#BillboardTrain <- dbGetQuery(con, "SELECT * FROM sampledb.billboard WHERE year <= 2009")
#Running the preceding query results in the following error:-
#Error in .verify.JDBC.result(r, "Unable to retrieve JDBC result set for ", : Unable to retrieve #JDBC result set for SELECT * FROM sampledb.billboard WHERE year <= 2009 (Internal error)
#Follow the same approach as before to address this issue.

r <- dbSendQuery(con, "SELECT * FROM sampledb.billboard WHERE year <= 2009")
BillboardTrain <- fetch(r, n=-1, block=100)
dbClearResult(r)
## [1] TRUE
BillboardTrain[1:2,c(1:3,6:10)]
##   year           songtitle artistname timesignature
## 1 2009 The Awkward Goodbye    Athlete             3
## 2 2009        Rubik's Cube    Athlete             3
##   timesignature_confidence loudness   tempo tempo_confidence
## 1                    0.732   -6.320  89.614   0.652
## 2                    0.906   -9.541 117.742   0.542
nrow(BillboardTrain)
## [1] 7201

Create the test dataset

BillboardTest <- dbGetQuery(con, "SELECT * FROM sampledb.billboard where year = 2010")
BillboardTest[1:2,c(1:3,11:15)]
##   year              songtitle        artistname key
## 1 2010 This Is the House That Doubt Built A Day to Remember  11
## 2 2010        Sticks & Bricks A Day to Remember  10
##   key_confidence    energy pitch timbre_0_min
## 1          0.453 0.9666556 0.024        0.002
## 2          0.469 0.9847095 0.025        0.000
nrow(BillboardTest)
## [1] 373

Convert the training and test datasets into H2O dataframes

train.h2o <- as.h2o(BillboardTrain)
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=================================================================| 100%
test.h2o <- as.h2o(BillboardTest)
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=================================================================| 100%

Inspect the column names in your H2O dataframes.

colnames(train.h2o)
##  [1] "year"                     "songtitle"               
##  [3] "artistname"               "songid"                  
##  [5] "artistid"                 "timesignature"           
##  [7] "timesignature_confidence" "loudness"                
##  [9] "tempo"                    "tempo_confidence"        
## [11] "key"                      "key_confidence"          
## [13] "energy"                   "pitch"                   
## [15] "timbre_0_min"             "timbre_0_max"            
## [17] "timbre_1_min"             "timbre_1_max"            
## [19] "timbre_2_min"             "timbre_2_max"            
## [21] "timbre_3_min"             "timbre_3_max"            
## [23] "timbre_4_min"             "timbre_4_max"            
## [25] "timbre_5_min"             "timbre_5_max"            
## [27] "timbre_6_min"             "timbre_6_max"            
## [29] "timbre_7_min"             "timbre_7_max"            
## [31] "timbre_8_min"             "timbre_8_max"            
## [33] "timbre_9_min"             "timbre_9_max"            
## [35] "timbre_10_min"            "timbre_10_max"           
## [37] "timbre_11_min"            "timbre_11_max"           
## [39] "top10"

Create models

You need to designate the independent and dependent variables prior to applying your modeling algorithms. Because you’re trying to predict the ‘top10’ field, this would be your dependent variable and everything else would be independent.

Create your first model using GLM. Because GLM works best with numeric data, you create your model by dropping non-numeric variables. You only use the variables in the dataset that describe the numerical attributes of the song in the logistic regression model. You won’t use these variables:  “year”, “songtitle”, “artistname”, “songid”, or “artistid”.

y.dep <- 39
x.indep <- c(6:38)
x.indep
##  [1]  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
## [24] 29 30 31 32 33 34 35 36 37 38

Create Model 1: All numeric variables

Create Model 1 with the training dataset, using GLM as the modeling algorithm and H2O’s built-in h2o.glm function.

modelh1 <- h2o.glm( y = y.dep, x = x.indep, training_frame = train.h2o, family = "binomial")
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=====                                                            |   8%
  |                                                                       
  |=================================================================| 100%

Measure the performance of Model 1, using H2O’s built-in performance function.

h2o.performance(model=modelh1,newdata=test.h2o)
## H2OBinomialMetrics: glm
## 
## MSE:  0.09924684
## RMSE:  0.3150347
## LogLoss:  0.3220267
## Mean Per-Class Error:  0.2380168
## AUC:  0.8431394
## Gini:  0.6862787
## R^2:  0.254663
## Null Deviance:  326.0801
## Residual Deviance:  240.2319
## AIC:  308.2319
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##          0   1    Error     Rate
## 0      255  59 0.187898  =59/314
## 1       17  42 0.288136   =17/59
## Totals 272 101 0.203753  =76/373
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                         metric threshold    value idx
## 1                       max f1  0.192772 0.525000 100
## 2                       max f2  0.124912 0.650510 155
## 3                 max f0point5  0.416258 0.612903  23
## 4                 max accuracy  0.416258 0.879357  23
## 5                max precision  0.813396 1.000000   0
## 6                   max recall  0.037579 1.000000 282
## 7              max specificity  0.813396 1.000000   0
## 8             max absolute_mcc  0.416258 0.455251  23
## 9   max min_per_class_accuracy  0.161402 0.738854 125
## 10 max mean_per_class_accuracy  0.124912 0.765006 155
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or ` 
h2o.auc(h2o.performance(modelh1,test.h2o)) 
## [1] 0.8431394

The AUC metric provides insight into how well the classifier is able to separate the two classes. In this case, the value of 0.8431394 indicates that the classification is good. (A value of 0.5 indicates a worthless test, while a value of 1.0 indicates a perfect test.)

Next, inspect the coefficients of the variables in the dataset.

dfmodelh1 <- as.data.frame(h2o.varimp(modelh1))
dfmodelh1
##                       names coefficients sign
## 1              timbre_0_max  1.290938663  NEG
## 2                  loudness  1.262941934  POS
## 3                     pitch  0.616995941  NEG
## 4              timbre_1_min  0.422323735  POS
## 5              timbre_6_min  0.349016024  NEG
## 6                    energy  0.348092062  NEG
## 7             timbre_11_min  0.307331997  NEG
## 8              timbre_3_max  0.302225619  NEG
## 9             timbre_11_max  0.243632060  POS
## 10             timbre_4_min  0.224233951  POS
## 11             timbre_4_max  0.204134342  POS
## 12             timbre_5_min  0.199149324  NEG
## 13             timbre_0_min  0.195147119  POS
## 14 timesignature_confidence  0.179973904  POS
## 15         tempo_confidence  0.144242598  POS
## 16            timbre_10_max  0.137644568  POS
## 17             timbre_7_min  0.126995955  NEG
## 18            timbre_10_min  0.123851179  POS
## 19             timbre_7_max  0.100031481  NEG
## 20             timbre_2_min  0.096127636  NEG
## 21           key_confidence  0.083115820  POS
## 22             timbre_6_max  0.073712419  POS
## 23            timesignature  0.067241917  POS
## 24             timbre_8_min  0.061301881  POS
## 25             timbre_8_max  0.060041698  POS
## 26                      key  0.056158445  POS
## 27             timbre_3_min  0.050825116  POS
## 28             timbre_9_max  0.033733561  POS
## 29             timbre_2_max  0.030939072  POS
## 30             timbre_9_min  0.020708113  POS
## 31             timbre_1_max  0.014228818  NEG
## 32                    tempo  0.008199861  POS
## 33             timbre_5_max  0.004837870  POS
## 34                                    NA <NA>

Typically, songs with heavier instrumentation tend to be louder (have higher values in the variable “loudness”) and more energetic (have higher values in the variable “energy”). This knowledge is helpful for interpreting the modeling results.

You can make the following observations from the results:

  • The coefficient estimates for the confidence values associated with the time signature, key, and tempo variables are positive. This suggests that higher confidence leads to a higher predicted probability of a Top 10 hit.
  • The coefficient estimate for loudness is positive, meaning that mainstream listeners prefer louder songs with heavier instrumentation.
  • The coefficient estimate for energy is negative, meaning that mainstream listeners prefer songs that are less energetic, which are those songs with light instrumentation.

These coefficients lead to contradictory conclusions for Model 1. This could be due to multicollinearity issues. Inspect the correlation between the variables “loudness” and “energy” in the training set.

cor(train.h2o$loudness,train.h2o$energy)
## [1] 0.7399067

This number indicates that these two variables are highly correlated, and Model 1 does indeed suffer from multicollinearity. Typically, you associate a value of -1.0 to -0.5 or 1.0 to 0.5 to indicate strong correlation, and a value of 0.1 to 0.1 to indicate weak correlation. To avoid this correlation issue, omit one of these two variables and re-create the models.

You build two variations of the original model:

  • Model 2, in which you keep “energy” and omit “loudness”
  • Model 3, in which you keep “loudness” and omit “energy”

You compare these two models and choose the model with a better fit for this use case.

Create Model 2: Keep energy and omit loudness

colnames(train.h2o)
##  [1] "year"                     "songtitle"               
##  [3] "artistname"               "songid"                  
##  [5] "artistid"                 "timesignature"           
##  [7] "timesignature_confidence" "loudness"                
##  [9] "tempo"                    "tempo_confidence"        
## [11] "key"                      "key_confidence"          
## [13] "energy"                   "pitch"                   
## [15] "timbre_0_min"             "timbre_0_max"            
## [17] "timbre_1_min"             "timbre_1_max"            
## [19] "timbre_2_min"             "timbre_2_max"            
## [21] "timbre_3_min"             "timbre_3_max"            
## [23] "timbre_4_min"             "timbre_4_max"            
## [25] "timbre_5_min"             "timbre_5_max"            
## [27] "timbre_6_min"             "timbre_6_max"            
## [29] "timbre_7_min"             "timbre_7_max"            
## [31] "timbre_8_min"             "timbre_8_max"            
## [33] "timbre_9_min"             "timbre_9_max"            
## [35] "timbre_10_min"            "timbre_10_max"           
## [37] "timbre_11_min"            "timbre_11_max"           
## [39] "top10"
y.dep <- 39
x.indep <- c(6:7,9:38)
x.indep
##  [1]  6  7  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
## [24] 30 31 32 33 34 35 36 37 38
modelh2 <- h2o.glm( y = y.dep, x = x.indep, training_frame = train.h2o, family = "binomial")
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=======                                                          |  10%
  |                                                                       
  |=================================================================| 100%

Measure the performance of Model 2.

h2o.performance(model=modelh2,newdata=test.h2o)
## H2OBinomialMetrics: glm
## 
## MSE:  0.09922606
## RMSE:  0.3150017
## LogLoss:  0.3228213
## Mean Per-Class Error:  0.2490554
## AUC:  0.8431933
## Gini:  0.6863867
## R^2:  0.2548191
## Null Deviance:  326.0801
## Residual Deviance:  240.8247
## AIC:  306.8247
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##          0  1    Error     Rate
## 0      280 34 0.108280  =34/314
## 1       23 36 0.389831   =23/59
## Totals 303 70 0.152815  =57/373
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                         metric threshold    value idx
## 1                       max f1  0.254391 0.558140  69
## 2                       max f2  0.113031 0.647208 157
## 3                 max f0point5  0.413999 0.596026  22
## 4                 max accuracy  0.446250 0.876676  18
## 5                max precision  0.811739 1.000000   0
## 6                   max recall  0.037682 1.000000 283
## 7              max specificity  0.811739 1.000000   0
## 8             max absolute_mcc  0.254391 0.469060  69
## 9   max min_per_class_accuracy  0.141051 0.716561 131
## 10 max mean_per_class_accuracy  0.113031 0.761821 157
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`
dfmodelh2 <- as.data.frame(h2o.varimp(modelh2))
dfmodelh2
##                       names coefficients sign
## 1                     pitch  0.700331511  NEG
## 2              timbre_1_min  0.510270513  POS
## 3              timbre_0_max  0.402059546  NEG
## 4              timbre_6_min  0.333316236  NEG
## 5             timbre_11_min  0.331647383  NEG
## 6              timbre_3_max  0.252425901  NEG
## 7             timbre_11_max  0.227500308  POS
## 8              timbre_4_max  0.210663865  POS
## 9              timbre_0_min  0.208516163  POS
## 10             timbre_5_min  0.202748055  NEG
## 11             timbre_4_min  0.197246582  POS
## 12            timbre_10_max  0.172729619  POS
## 13         tempo_confidence  0.167523934  POS
## 14 timesignature_confidence  0.167398830  POS
## 15             timbre_7_min  0.142450727  NEG
## 16             timbre_8_max  0.093377516  POS
## 17            timbre_10_min  0.090333426  POS
## 18            timesignature  0.085851625  POS
## 19             timbre_7_max  0.083948442  NEG
## 20           key_confidence  0.079657073  POS
## 21             timbre_6_max  0.076426046  POS
## 22             timbre_2_min  0.071957831  NEG
## 23             timbre_9_max  0.071393189  POS
## 24             timbre_8_min  0.070225578  POS
## 25                      key  0.061394702  POS
## 26             timbre_3_min  0.048384697  POS
## 27             timbre_1_max  0.044721121  NEG
## 28                   energy  0.039698433  POS
## 29             timbre_5_max  0.039469064  POS
## 30             timbre_2_max  0.018461133  POS
## 31                    tempo  0.013279926  POS
## 32             timbre_9_min  0.005282143  NEG
## 33                                    NA <NA>

h2o.auc(h2o.performance(modelh2,test.h2o)) 
## [1] 0.8431933

You can make the following observations:

  • The AUC metric is 0.8431933.
  • Inspecting the coefficient of the variable energy, Model 2 suggests that songs with high energy levels tend to be more popular. This is as per expectation.
  • As H2O orders variables by significance, the variable energy is not significant in this model.

You can conclude that Model 2 is not ideal for this use , as energy is not significant.

CreateModel 3: Keep loudness but omit energy

colnames(train.h2o)
##  [1] "year"                     "songtitle"               
##  [3] "artistname"               "songid"                  
##  [5] "artistid"                 "timesignature"           
##  [7] "timesignature_confidence" "loudness"                
##  [9] "tempo"                    "tempo_confidence"        
## [11] "key"                      "key_confidence"          
## [13] "energy"                   "pitch"                   
## [15] "timbre_0_min"             "timbre_0_max"            
## [17] "timbre_1_min"             "timbre_1_max"            
## [19] "timbre_2_min"             "timbre_2_max"            
## [21] "timbre_3_min"             "timbre_3_max"            
## [23] "timbre_4_min"             "timbre_4_max"            
## [25] "timbre_5_min"             "timbre_5_max"            
## [27] "timbre_6_min"             "timbre_6_max"            
## [29] "timbre_7_min"             "timbre_7_max"            
## [31] "timbre_8_min"             "timbre_8_max"            
## [33] "timbre_9_min"             "timbre_9_max"            
## [35] "timbre_10_min"            "timbre_10_max"           
## [37] "timbre_11_min"            "timbre_11_max"           
## [39] "top10"
y.dep <- 39
x.indep <- c(6:12,14:38)
x.indep
##  [1]  6  7  8  9 10 11 12 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
## [24] 30 31 32 33 34 35 36 37 38
modelh3 <- h2o.glm( y = y.dep, x = x.indep, training_frame = train.h2o, family = "binomial")
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |========                                                         |  12%
  |                                                                       
  |=================================================================| 100%
perfh3<-h2o.performance(model=modelh3,newdata=test.h2o)
perfh3
## H2OBinomialMetrics: glm
## 
## MSE:  0.0978859
## RMSE:  0.3128672
## LogLoss:  0.3178367
## Mean Per-Class Error:  0.264925
## AUC:  0.8492389
## Gini:  0.6984778
## R^2:  0.2648836
## Null Deviance:  326.0801
## Residual Deviance:  237.1062
## AIC:  303.1062
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##          0  1    Error     Rate
## 0      286 28 0.089172  =28/314
## 1       26 33 0.440678   =26/59
## Totals 312 61 0.144772  =54/373
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                         metric threshold    value idx
## 1                       max f1  0.273799 0.550000  60
## 2                       max f2  0.125503 0.663265 155
## 3                 max f0point5  0.435479 0.628931  24
## 4                 max accuracy  0.435479 0.882038  24
## 5                max precision  0.821606 1.000000   0
## 6                   max recall  0.038328 1.000000 280
## 7              max specificity  0.821606 1.000000   0
## 8             max absolute_mcc  0.435479 0.471426  24
## 9   max min_per_class_accuracy  0.173693 0.745763 120
## 10 max mean_per_class_accuracy  0.125503 0.775073 155
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`
dfmodelh3 <- as.data.frame(h2o.varimp(modelh3))
dfmodelh3
##                       names coefficients sign
## 1              timbre_0_max 1.216621e+00  NEG
## 2                  loudness 9.780973e-01  POS
## 3                     pitch 7.249788e-01  NEG
## 4              timbre_1_min 3.891197e-01  POS
## 5              timbre_6_min 3.689193e-01  NEG
## 6             timbre_11_min 3.086673e-01  NEG
## 7              timbre_3_max 3.025593e-01  NEG
## 8             timbre_11_max 2.459081e-01  POS
## 9              timbre_4_min 2.379749e-01  POS
## 10             timbre_4_max 2.157627e-01  POS
## 11             timbre_0_min 1.859531e-01  POS
## 12             timbre_5_min 1.846128e-01  NEG
## 13 timesignature_confidence 1.729658e-01  POS
## 14             timbre_7_min 1.431871e-01  NEG
## 15            timbre_10_max 1.366703e-01  POS
## 16            timbre_10_min 1.215954e-01  POS
## 17         tempo_confidence 1.183698e-01  POS
## 18             timbre_2_min 1.019149e-01  NEG
## 19           key_confidence 9.109701e-02  POS
## 20             timbre_7_max 8.987908e-02  NEG
## 21             timbre_6_max 6.935132e-02  POS
## 22             timbre_8_max 6.878241e-02  POS
## 23            timesignature 6.120105e-02  POS
## 24                      key 5.814805e-02  POS
## 25             timbre_8_min 5.759228e-02  POS
## 26             timbre_1_max 2.930285e-02  NEG
## 27             timbre_9_max 2.843755e-02  POS
## 28             timbre_3_min 2.380245e-02  POS
## 29             timbre_2_max 1.917035e-02  POS
## 30             timbre_5_max 1.715813e-02  POS
## 31                    tempo 1.364418e-02  NEG
## 32             timbre_9_min 8.463143e-05  NEG
## 33                                    NA <NA>
h2o.sensitivity(perfh3,0.5)
## Warning in h2o.find_row_by_threshold(object, t): Could not find exact
## threshold: 0.5 for this set of metrics; using closest threshold found:
## 0.501855569251422. Run `h2o.predict` and apply your desired threshold on a
## probability column.
## [[1]]
## [1] 0.2033898
h2o.auc(perfh3)
## [1] 0.8492389

You can make the following observations:

  • The AUC metric is 0.8492389.
  • From the confusion matrix, the model correctly predicts that 33 songs will be top 10 hits (true positives). However, it has 26 false positives (songs that the model predicted would be Top 10 hits, but ended up not being Top 10 hits).
  • Loudness has a positive coefficient estimate, meaning that this model predicts that songs with heavier instrumentation tend to be more popular. This is the same conclusion from Model 2.
  • Loudness is significant in this model.

Overall, Model 3 predicts a higher number of top 10 hits with an accuracy rate that is acceptable. To choose the best fit for production runs, record labels should consider the following factors:

  • Desired model accuracy at a given threshold
  • Number of correct predictions for top10 hits
  • Tolerable number of false positives or false negatives

Next, make predictions using Model 3 on the test dataset.

predict.regh <- h2o.predict(modelh3, test.h2o)
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=================================================================| 100%
print(predict.regh)
##   predict        p0          p1
## 1       0 0.9654739 0.034526052
## 2       0 0.9654748 0.034525236
## 3       0 0.9635547 0.036445318
## 4       0 0.9343579 0.065642149
## 5       0 0.9978334 0.002166601
## 6       0 0.9779949 0.022005078
## 
## [373 rows x 3 columns]
predict.regh$predict
##   predict
## 1       0
## 2       0
## 3       0
## 4       0
## 5       0
## 6       0
## 
## [373 rows x 1 column]
dpr<-as.data.frame(predict.regh)
#Rename the predicted column 
colnames(dpr)[colnames(dpr) == 'predict'] <- 'predict_top10'
table(dpr$predict_top10)
## 
##   0   1 
## 312  61

The first set of output results specifies the probabilities associated with each predicted observation.  For example, observation 1 is 96.54739% likely to not be a Top 10 hit, and 3.4526052% likely to be a Top 10 hit (predict=1 indicates Top 10 hit and predict=0 indicates not a Top 10 hit).  The second set of results list the actual predictions made.  From the third set of results, this model predicts that 61 songs will be top 10 hits.

Compute the baseline accuracy, by assuming that the baseline predicts the most frequent outcome, which is that most songs are not Top 10 hits.

table(BillboardTest$top10)
## 
##   0   1 
## 314  59

Now observe that the baseline model would get 314 observations correct, and 59 wrong, for an accuracy of 314/(314+59) = 0.8418231.

It seems that Model 3, with an accuracy of 0.8552, provides you with a small improvement over the baseline model. But is this model useful for record labels?

View the two models from an investment perspective:

  • A production company is interested in investing in songs that are more likely to make it to the Top 10. The company’s objective is to minimize the risk of financial losses attributed to investing in songs that end up unpopular.
  • How many songs does Model 3 correctly predict as a Top 10 hit in 2010? Looking at the confusion matrix, you see that it predicts 33 top 10 hits correctly at an optimal threshold, which is more than half the number
  • It will be more useful to the record label if you can provide the production company with a list of songs that are highly likely to end up in the Top 10.
  • The baseline model is not useful, as it simply does not label any song as a hit.

Considering the three models built so far, you can conclude that Model 3 proves to be the best investment choice for the record label.

GBM model

H2O provides you with the ability to explore other learning models, such as GBM and deep learning. Explore building a model using the GBM technique, using the built-in h2o.gbm function.

Before you do this, you need to convert the target variable to a factor for multinomial classification techniques.

train.h2o$top10=as.factor(train.h2o$top10)
gbm.modelh <- h2o.gbm(y=y.dep, x=x.indep, training_frame = train.h2o, ntrees = 500, max_depth = 4, learn_rate = 0.01, seed = 1122,distribution="multinomial")
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |===                                                              |   5%
  |                                                                       
  |=====                                                            |   7%
  |                                                                       
  |======                                                           |   9%
  |                                                                       
  |=======                                                          |  10%
  |                                                                       
  |======================                                           |  33%
  |                                                                       
  |=====================================                            |  56%
  |                                                                       
  |====================================================             |  79%
  |                                                                       
  |================================================================ |  98%
  |                                                                       
  |=================================================================| 100%
perf.gbmh<-h2o.performance(gbm.modelh,test.h2o)
perf.gbmh
## H2OBinomialMetrics: gbm
## 
## MSE:  0.09860778
## RMSE:  0.3140188
## LogLoss:  0.3206876
## Mean Per-Class Error:  0.2120263
## AUC:  0.8630573
## Gini:  0.7261146
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##          0  1    Error     Rate
## 0      266 48 0.152866  =48/314
## 1       16 43 0.271186   =16/59
## Totals 282 91 0.171582  =64/373
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                       metric threshold    value idx
## 1                     max f1  0.189757 0.573333  90
## 2                     max f2  0.130895 0.693717 145
## 3               max f0point5  0.327346 0.598802  26
## 4               max accuracy  0.442757 0.876676  14
## 5              max precision  0.802184 1.000000   0
## 6                 max recall  0.049990 1.000000 284
## 7            max specificity  0.802184 1.000000   0
## 8           max absolute_mcc  0.169135 0.496486 104
## 9 max min_per_class_accuracy  0.169135 0.796610 104
## 10 max mean_per_class_accuracy  0.169135 0.805948 104
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `
h2o.sensitivity(perf.gbmh,0.5)
## Warning in h2o.find_row_by_threshold(object, t): Could not find exact
## threshold: 0.5 for this set of metrics; using closest threshold found:
## 0.501205344484314. Run `h2o.predict` and apply your desired threshold on a
## probability column.
## [[1]]
## [1] 0.1355932
h2o.auc(perf.gbmh)
## [1] 0.8630573

This model correctly predicts 43 top 10 hits, which is 10 more than the number predicted by Model 3. Moreover, the AUC metric is higher than the one obtained from Model 3.

As seen above, H2O’s API provides the ability to obtain key statistical measures required to analyze the models easily, using several built-in functions. The record label can experiment with different parameters to arrive at the model that predicts the maximum number of Top 10 hits at the desired level of accuracy and threshold.

H2O also allows you to experiment with deep learning models. Deep learning models have the ability to learn features implicitly, but can be more expensive computationally.

Now, create a deep learning model with the h2o.deeplearning function, using the same training and test datasets created before. The time taken to run this model depends on the type of EC2 instance chosen for this purpose.  For models that require more computation, consider using accelerated computing instances such as the P2 instance type.

system.time(
  dlearning.modelh <- h2o.deeplearning(y = y.dep,
                                      x = x.indep,
                                      training_frame = train.h2o,
                                      epoch = 250,
                                      hidden = c(250,250),
                                      activation = "Rectifier",
                                      seed = 1122,
                                      distribution="multinomial"
  )
)
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |===                                                              |   4%
  |                                                                       
  |=====                                                            |   8%
  |                                                                       
  |========                                                         |  12%
  |                                                                       
  |==========                                                       |  16%
  |                                                                       
  |=============                                                    |  20%
  |                                                                       
  |================                                                 |  24%
  |                                                                       
  |==================                                               |  28%
  |                                                                       
  |=====================                                            |  32%
  |                                                                       
  |=======================                                          |  36%
  |                                                                       
  |==========================                                       |  40%
  |                                                                       
  |=============================                                    |  44%
  |                                                                       
  |===============================                                  |  48%
  |                                                                       
  |==================================                               |  52%
  |                                                                       
  |====================================                             |  56%
  |                                                                       
  |=======================================                          |  60%
  |                                                                       
  |==========================================                       |  64%
  |                                                                       
  |============================================                     |  68%
  |                                                                       
  |===============================================                  |  72%
  |                                                                       
  |=================================================                |  76%
  |                                                                       
  |====================================================             |  80%
  |                                                                       
  |=======================================================          |  84%
  |                                                                       
  |=========================================================        |  88%
  |                                                                       
  |============================================================     |  92%
  |                                                                       
  |==============================================================   |  96%
  |                                                                       
  |=================================================================| 100%
##    user  system elapsed 
##   1.216   0.020 166.508
perf.dl<-h2o.performance(model=dlearning.modelh,newdata=test.h2o)
perf.dl
## H2OBinomialMetrics: deeplearning
## 
## MSE:  0.1678359
## RMSE:  0.4096778
## LogLoss:  1.86509
## Mean Per-Class Error:  0.3433013
## AUC:  0.7568822
## Gini:  0.5137644
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##          0  1    Error     Rate
## 0      290 24 0.076433  =24/314
## 1       36 23 0.610169   =36/59
## Totals 326 47 0.160858  =60/373
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                       metric threshold    value idx
## 1                     max f1  0.826267 0.433962  46
## 2                     max f2  0.000000 0.588235 239
## 3               max f0point5  0.999929 0.511811  16
## 4               max accuracy  0.999999 0.865952  10
## 5              max precision  1.000000 1.000000   0
## 6                 max recall  0.000000 1.000000 326
## 7            max specificity  1.000000 1.000000   0
## 8           max absolute_mcc  0.999929 0.363219  16
## 9 max min_per_class_accuracy  0.000004 0.662420 145
## 10 max mean_per_class_accuracy  0.000000 0.685334 224
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`
h2o.sensitivity(perf.dl,0.5)
## Warning in h2o.find_row_by_threshold(object, t): Could not find exact
## threshold: 0.5 for this set of metrics; using closest threshold found:
## 0.496293348880151. Run `h2o.predict` and apply your desired threshold on a
## probability column.
## [[1]]
## [1] 0.3898305
h2o.auc(perf.dl)
## [1] 0.7568822

The AUC metric for this model is 0.7568822, which is less than what you got from the earlier models. I recommend further experimentation using different hyper parameters, such as the learning rate, epoch or the number of hidden layers.

H2O’s built-in functions provide many key statistical measures that can help measure model performance. Here are some of these key terms.

Metric Description
Sensitivity Measures the proportion of positives that have been correctly identified. It is also called the true positive rate, or recall.
Specificity Measures the proportion of negatives that have been correctly identified. It is also called the true negative rate.
Threshold Cutoff point that maximizes specificity and sensitivity. While the model may not provide the highest prediction at this point, it would not be biased towards positives or negatives.
Precision The fraction of the documents retrieved that are relevant to the information needed, for example, how many of the positively classified are relevant
AUC

Provides insight into how well the classifier is able to separate the two classes. The implicit goal is to deal with situations where the sample distribution is highly skewed, with a tendency to overfit to a single class.

0.90 – 1 = excellent (A)

0.8 – 0.9 = good (B)

0.7 – 0.8 = fair (C)

.6 – 0.7 = poor (D)

0.5 – 0.5 = fail (F)

Here’s a summary of the metrics generated from H2O’s built-in functions for the three models that produced useful results.

Metric Model 3 GBM Model Deep Learning Model

Accuracy

(max)

0.882038

(t=0.435479)

0.876676

(t=0.442757)

0.865952

(t=0.999999)

Precision

(max)

1.0

(t=0.821606)

1.0

(t=0802184)

1.0

(t=1.0)

Recall

(max)

1.0 1.0

1.0

(t=0)

Specificity

(max)

1.0 1.0

1.0

(t=1)

Sensitivity

 

0.2033898 0.1355932

0.3898305

(t=0.5)

AUC 0.8492389 0.8630573 0.756882

Note: ‘t’ denotes threshold.

Your options at this point could be narrowed down to Model 3 and the GBM model, based on the AUC and accuracy metrics observed earlier.  If the slightly lower accuracy of the GBM model is deemed acceptable, the record label can choose to go to production with the GBM model, as it can predict a higher number of Top 10 hits.  The AUC metric for the GBM model is also higher than that of Model 3.

Record labels can experiment with different learning techniques and parameters before arriving at a model that proves to be the best fit for their business. Because deep learning models can be computationally expensive, record labels can choose more powerful EC2 instances on AWS to run their experiments faster.

Conclusion

In this post, I showed how the popular music industry can use analytics to predict the type of songs that make the Top 10 Billboard charts. By running H2O’s scalable machine learning platform on AWS, data scientists can easily experiment with multiple modeling techniques and interactively query the data using Amazon Athena, without having to manage the underlying infrastructure. This helps record labels make critical decisions on the type of artists and songs to promote in a timely fashion, thereby increasing sales and revenue.

If you have questions or suggestions, please comment below.


Additional Reading

Learn how to build and explore a simple geospita simple GEOINT application using SparkR.


About the Authors

gopalGopal Wunnava is a Partner Solution Architect with the AWS GSI Team. He works with partners and customers on big data engagements, and is passionate about building analytical solutions that drive business capabilities and decision making. In his spare time, he loves all things sports and movies related and is fond of old classics like Asterix, Obelix comics and Hitchcock movies.

 

 

Bob Strahan, a Senior Consultant with AWS Professional Services, contributed to this post.

 

 

timeShift(GrafanaBuzz, 1w) Issue 17

Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2017/10/13/timeshiftgrafanabuzz-1w-issue-17/

It’s been a busy week here at Grafana Labs. While we’ve been working on GrafanaCon EU preparations here at the NYC office, the Stockholm office has been diligently working to release Grafana 4.6-beta-1. We’re really excited about this latest release and look forward to your feedback on the new features.


Latest Release

Grafana 4.6-beta-1 is now available! Grafana v4.6 brings many enhancements to Annotations, Cloudwatch and Prometheus. It also adds support for Postgres as a metric and table data source!

To see more details on what’s in the newest version, please see the release notes.

Download Grafana 4.6.0-beta-1 Now


From the Blogosphere

Using Kafka and Grafana to Monitor Meteorological Conditions: Oliver was looking for a way to track historical mountain conditions around the UK, but only had available data for the last 24 hours. It seemed like a perfect job for Kafka. This post discusses how to get going with Kafka very easily, store the data in Graphite and visualize the data in Grafana.

Web Interfaces for your Syslog Server – An Overview: System administrators often prefer to use the command line, but complex queries can be completed much faster with logs indexed in a database and a web interface. This article provides a run-down of various GUI-based tools available for your syslog server.

JEE Performance with JMeter, Prometheus and Grafana. Complete Project from Scratch: This comprehensive article walks you through the steps of monitoring JEE application performance from scratch. We start with making implementation decisions, then how to collect data, visualization and dashboarding configuration, and conclude with alerting. Buckle up; it’s a long article, with a ton of information.


Early Bird Tickets Now Available

Early bird tickets are going fast, so take advantage of the discounted price before they’re gone! We will be announcing the first block of speakers in the coming week.

There’s still time to submit a talk. We’ll accept submissions through the end of October. We’re accepting technical and non-technical talks of all sizes. Submit a CFP.

Get Your Early Bird Ticket Now


Grafana Plugins

This week we add the Prometheus Alertmanager Data Source to our growing list of plugins, lots of updates to the GLPI Data source, and have a urgent bugfix for the WorldMap Panel. To update plugins from on-prem Grafana, use the Grafana-cli tool, or with 1 click if you are using Hosted Grafana.

NEW PLUGIN

Prometheus Alertmanager Data Source – This new data source lets you show data from the Prometheus Alertmanager in Grafana. The Alertmanager handles alerts sent by client applications such as the Prometheus server. With this data source, you can show data in Table form or as a SingleStat.

Install Now

UPDATED PLUGIN

WorldMap Panel – A new version with an urgent bugfix for Elasticsearch users:

  • A fix for Geohash maps after a breaking change in Grafana 4.5.0.
  • Last Geohash as center for the map – it centers the map on the last geohash position received. Useful for real time tracking (with auto refresh on in Grafana).

Update

UPDATED PLUGIN

GLPI App – Lots of fixes in the new version:

  • Compatibility with GLPI 9.2
  • Autofill the Timerange field based on the query
  • When adding new query, add by default a ticket query instead of undefined
  • Correct values in hover tooltip
  • Can have element count by hour of the day with the panel histogram

Update


Contributions of the week:

Each week we highlight some of the important contributions from our amazing open source community. Thank you for helping make Grafana better!


Grafana Labs is Hiring!

We are passionate about open source software and thrive on tackling complex challenges to build the future. We ship code from every corner of the globe and love working with the community. If this sounds exciting, you’re in luck – WE’RE HIRING!

Check out our Open Positions


New Annotation Function

In addition to being able to add annotations easily in the graph panel, you can also create ranges as shown above. Give 4.6.0-beta-1 a try and give us your feedback.

We Need Your Help!

Do you have a graph that you love because the data is beautiful or because the graph provides interesting information? Please get in touch. Tweet or send us an email with a screenshot, and we’ll tell you about this fun experiment.

Tell Me More


What do you think?

We want to keep these articles interesting and relevant, so please tell us how we’re doing. Submit a comment on this article below, or post something at our community forum. Help us make these weekly roundups better!

Follow us on Twitter, like us on Facebook, and join the Grafana Labs community.

Popcorn Time Creator Readies BitTorrent & Blockchain-Powered Video Platform

Post Syndicated from Andy original https://torrentfreak.com/popcorn-time-creator-readies-bittorrent-blockchain-powered-youtube-competitor-171012/

Without a doubt, YouTube is one of the most important websites available on the Internet today.

Its massive archive of videos brings pleasure to millions on a daily basis but its centralized nature means that owner Google always exercises control.

Over the years, people have looked to decentralize the YouTube concept and the latest project hoping to shake up the market has a particularly interesting player onboard.

Until 2015, only insiders knew that Argentinian designer Federico Abad was actually ‘Sebastian’, the shadowy figure behind notorious content sharing platform Popcorn Time.

Now he’s part of the team behind Flixxo, a BitTorrent and blockchain-powered startup hoping to wrestle a share of the video market from YouTube. Here’s how the team, which features blockchain startup RSK Labs, hope things will play out.

The Flixxo network will have no centralized storage of data, eliminating the need for expensive hosting along with associated costs. Instead, transfers will take place between peers using BitTorrent, meaning video content will be stored on the machines of Flixxo users. In practice, the content will be downloaded and uploaded in much the same way as users do on The Pirate Bay or indeed Abad’s baby, Popcorn Time.

However, there’s a twist to the system that envisions content creators, content consumers, and network participants (seeders) making revenue from their efforts.

At the heart of the Flixxo system are digital tokens (think virtual currency), called Flixx. These Flixx ‘coins’, which will go on sale in 12 days, can be used to buy access to content. Creators can also opt to pay consumers when those people help to distribute their content to others.

“Free from structural costs, producers can share the earnings from their content with the network that supports them,” the team explains.

“This way you get paid for helping us improve Flixxo, and you earn credits (in the form of digital tokens called Flixx) for watching higher quality content. Having no intermediaries means that the price you pay for watching the content that you actually want to watch is lower and fairer.”

The Flixxo team

In addition to earning tokens from helping to distribute content, people in the Flixxo ecosystem can also earn currency by watching sponsored content, i.e advertisements. While in a traditional system adverts are often considered a nuisance, Flixx tokens have real value, with a promise that users will be able to trade their Flixx not only for videos, but also for tangible and semi-tangible goods.

“Use your Flixx to reward the producers you follow, encouraging them to create more awesome content. Or keep your Flixx in your wallet and use them to buy a movie ticket, a pair of shoes from an online retailer, a chest of coins in your favourite game or even convert them to old-fashioned cash or up-and-coming digital assets, like Bitcoin,” the team explains.

The Flixxo team have big plans. After foundation in early 2016, the second quarter of 2017 saw the completion of a functional alpha release. In a little under two weeks, the project will begin its token generation event, with new offices in Los Angeles planned for the first half of 2018 alongside a premiere of the Flixxo platform.

“A total of 1,000,000,000 (one billion) Flixx tokens will be issued. A maximum of 300,000,000 (three hundred million) tokens will be sold. Some of these tokens (not more than 33% or 100,000,000 Flixx) may be sold with anticipation of the token allocation event to strategic investors,” Flixxo states.

Like all content platforms, Flixxo will live or die by the quality of the content it provides and whether, at least in the first instance, it can persuade people to part with their hard-earned cash. Only time will tell whether its content will be worth a premium over readily accessible YouTube content but with much-reduced costs, it may tempt creators seeking a bigger piece of the pie.

“Flixxo will also educate its community, teaching its users that in this new internet era value can be held and transferred online without intermediaries, a value that can be earned back by participating in a community, by contributing, being rewarded for every single social interaction,” the team explains.

Of course, the elephant in the room is what will happen when people begin sharing copyrighted content via Flixxo. Certainly, the fact that Popcorn Time’s founder is a key player and rival streaming platform Stremio is listed as a partner means that things could get a bit spicy later on.

Nevertheless, the team suggests that piracy and spam content distribution will be limited by mechanisms already built into the system.

“[A]uthors have to time-block tokens in a smart contract (set as a warranty) in order to upload content. This contract will also handle and block their earnings for a certain period of time, so that in the case of a dispute the unfair-uploader may lose those tokens,” they explain.

That being said, Flixxo also says that “there is no way” for third parties to censor content “which means that anyone has the chance of making any piece of media available on the network.” However, Flixxo says it will develop tools for filtering what it describes as “inappropriate content.”

At this point, things start to become a little unclear. On the one hand Flixxo says it could become a “revolutionary tool for uncensorable and untraceable media” yet on the other it says that it’s necessary to ensure that adult content, for example, isn’t seen by kids.

“We know there is a thin line between filtering or curating content and censorship, and it is a fact that we have an open network for everyone to upload any content. However, Flixxo as a platform will apply certain filtering based on clear rules – there should be a behavior-code for uploaders in order to offer the right content to the right user,” Flixxo explains.

To this end, Flixxo says it will deploy a centralized curation function, carried out by 101 delegates elected by the community, which will become progressively decentralized over time.

“This curation will have a cost, paid in Flixx, and will be collected from the warranty blocked by the content uploaders,” they add.

There can be little doubt that if Flixxo begins ‘curating’ unsuitable content, copyright holders will call on it to do the same for their content too. And, if the platform really takes off, 101 curators probably won’t scratch the surface. There’s also the not inconsiderable issue of what might happen to curators’ judgment when they’re incentivized to block curate content.

Finally, for those sick of “not available in your region” messages, there’s good and bad news. Flixxo insists there will be no geo-blocking of content on its part but individual creators will still have that feature available to them, should they choose.

The Flixx whitepaper can be downloaded here (pdf)

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Introducing Gluon: a new library for machine learning from AWS and Microsoft

Post Syndicated from Ana Visneski original https://aws.amazon.com/blogs/aws/introducing-gluon-a-new-library-for-machine-learning-from-aws-and-microsoft/

Post by Dr. Matt Wood

Today, AWS and Microsoft announced Gluon, a new open source deep learning interface which allows developers to more easily and quickly build machine learning models, without compromising performance.

Gluon Logo

Gluon provides a clear, concise API for defining machine learning models using a collection of pre-built, optimized neural network components. Developers who are new to machine learning will find this interface more familiar to traditional code, since machine learning models can be defined and manipulated just like any other data structure. More seasoned data scientists and researchers will value the ability to build prototypes quickly and utilize dynamic neural network graphs for entirely new model architectures, all without sacrificing training speed.

Gluon is available in Apache MXNet today, a forthcoming Microsoft Cognitive Toolkit release, and in more frameworks over time.

Neural Networks vs Developers
Machine learning with neural networks (including ‘deep learning’) has three main components: data for training; a neural network model, and an algorithm which trains the neural network. You can think of the neural network in a similar way to a directed graph; it has a series of inputs (which represent the data), which connect to a series of outputs (the prediction), through a series of connected layers and weights. During training, the algorithm adjusts the weights in the network based on the error in the network output. This is the process by which the network learns; it is a memory and compute intensive process which can take days.

Deep learning frameworks such as Caffe2, Cognitive Toolkit, TensorFlow, and Apache MXNet are, in part, an answer to the question ‘how can we speed this process up? Just like query optimizers in databases, the more a training engine knows about the network and the algorithm, the more optimizations it can make to the training process (for example, it can infer what needs to be re-computed on the graph based on what else has changed, and skip the unaffected weights to speed things up). These frameworks also provide parallelization to distribute the computation process, and reduce the overall training time.

However, in order to achieve these optimizations, most frameworks require the developer to do some extra work: specifically, by providing a formal definition of the network graph, up-front, and then ‘freezing’ the graph, and just adjusting the weights.

The network definition, which can be large and complex with millions of connections, usually has to be constructed by hand. Not only are deep learning networks unwieldy, but they can be difficult to debug and it’s hard to re-use the code between projects.

The result of this complexity can be difficult for beginners and is a time-consuming task for more experienced researchers. At AWS, we’ve been experimenting with some ideas in MXNet around new, flexible, more approachable ways to define and train neural networks. Microsoft is also a contributor to the open source MXNet project, and were interested in some of these same ideas. Based on this, we got talking, and found we had a similar vision: to use these techniques to reduce the complexity of machine learning, making it accessible to more developers.

Enter Gluon: dynamic graphs, rapid iteration, scalable training
Gluon introduces four key innovations.

  1. Friendly API: Gluon networks can be defined using a simple, clear, concise code – this is easier for developers to learn, and much easier to understand than some of the more arcane and formal ways of defining networks and their associated weighted scoring functions.
  2. Dynamic networks: the network definition in Gluon is dynamic: it can bend and flex just like any other data structure. This is in contrast to the more common, formal, symbolic definition of a network which the deep learning framework has to effectively carve into stone in order to be able to effectively optimizing computation during training. Dynamic networks are easier to manage, and with Gluon, developers can easily ‘hybridize’ between these fast symbolic representations and the more friendly, dynamic ‘imperative’ definitions of the network and algorithms.
  3. The algorithm can define the network: the model and the training algorithm are brought much closer together. Instead of separate definitions, the algorithm can adjust the network dynamically during definition and training. Not only does this mean that developers can use standard programming loops, and conditionals to create these networks, but researchers can now define even more sophisticated algorithms and models which were not possible before. They are all easier to create, change, and debug.
  4. High performance operators for training: which makes it possible to have a friendly, concise API and dynamic graphs, without sacrificing training speed. This is a huge step forward in machine learning. Some frameworks bring a friendly API or dynamic graphs to deep learning, but these previous methods all incur a cost in terms of training speed. As with other areas of software, abstraction can slow down computation since it needs to be negotiated and interpreted at run time. Gluon can efficiently blend together a concise API with the formal definition under the hood, without the developer having to know about the specific details or to accommodate the compiler optimizations manually.

The team here at AWS, and our collaborators at Microsoft, couldn’t be more excited to bring these improvements to developers through Gluon. We’re already seeing quite a bit of excitement from developers and researchers alike.

Getting started with Gluon
Gluon is available today in Apache MXNet, with support coming for the Microsoft Cognitive Toolkit in a future release. We’re also publishing the front-end interface and the low-level API specifications so it can be included in other frameworks in the fullness of time.

You can get started with Gluon today. Fire up the AWS Deep Learning AMI with a single click and jump into one of 50 fully worked, notebook examples. If you’re a contributor to a machine learning framework, check out the interface specs on GitHub.

-Dr. Matt Wood

"Responsible encryption" fallacies

Post Syndicated from Robert Graham original http://blog.erratasec.com/2017/10/responsible-encryption-fallacies.html

Deputy Attorney General Rod Rosenstein gave a speech recently calling for “Responsible Encryption” (aka. “Crypto Backdoors”). It’s full of dangerous ideas that need to be debunked.

The importance of law enforcement

The first third of the speech talks about the importance of law enforcement, as if it’s the only thing standing between us and chaos. It cites the 2016 Mirai attacks as an example of the chaos that will only get worse without stricter law enforcement.

But the Mira case demonstrated the opposite, how law enforcement is not needed. They made no arrests in the case. A year later, they still haven’t a clue who did it.

Conversely, we technologists have fixed the major infrastructure issues. Specifically, those affected by the DNS outage have moved to multiple DNS providers, including a high-capacity DNS provider like Google and Amazon who can handle such large attacks easily.

In other words, we the people fixed the major Mirai problem, and law-enforcement didn’t.

Moreover, instead being a solution to cyber threats, law enforcement has become a threat itself. The DNC didn’t have the FBI investigate the attacks from Russia likely because they didn’t want the FBI reading all their files, finding wrongdoing by the DNC. It’s not that they did anything actually wrong, but it’s more like that famous quote from Richelieu “Give me six words written by the most honest of men and I’ll find something to hang him by”. Give all your internal emails over to the FBI and I’m certain they’ll find something to hang you by, if they want.
Or consider the case of Andrew Auernheimer. He found AT&T’s website made public user accounts of the first iPad, so he copied some down and posted them to a news site. AT&T had denied the problem, so making the problem public was the only way to force them to fix it. Such access to the website was legal, because AT&T had made the data public. However, prosecutors disagreed. In order to protect the powerful, they twisted and perverted the law to put Auernheimer in jail.

It’s not that law enforcement is bad, it’s that it’s not the unalloyed good Rosenstein imagines. When law enforcement becomes the thing Rosenstein describes, it means we live in a police state.

Where law enforcement can’t go

Rosenstein repeats the frequent claim in the encryption debate:

Our society has never had a system where evidence of criminal wrongdoing was totally impervious to detection

Of course our society has places “impervious to detection”, protected by both legal and natural barriers.

An example of a legal barrier is how spouses can’t be forced to testify against each other. This barrier is impervious.

A better example, though, is how so much of government, intelligence, the military, and law enforcement itself is impervious. If prosecutors could gather evidence everywhere, then why isn’t Rosenstein prosecuting those guilty of CIA torture?

Oh, you say, government is a special exception. If that were the case, then why did Rosenstein dedicate a precious third of his speech discussing the “rule of law” and how it applies to everyone, “protecting people from abuse by the government”. It obviously doesn’t, there’s one rule of government and a different rule for the people, and the rule for government means there’s lots of places law enforcement can’t go to gather evidence.

Likewise, the crypto backdoor Rosenstein is demanding for citizens doesn’t apply to the President, Congress, the NSA, the Army, or Rosenstein himself.

Then there are the natural barriers. The police can’t read your mind. They can only get the evidence that is there, like partial fingerprints, which are far less reliable than full fingerprints. They can’t go backwards in time.

I mention this because encryption is a natural barrier. It’s their job to overcome this barrier if they can, to crack crypto and so forth. It’s not our job to do it for them.

It’s like the camera that increasingly comes with TVs for video conferencing, or the microphone on Alexa-style devices that are always recording. This suddenly creates evidence that the police want our help in gathering, such as having the camera turned on all the time, recording to disk, in case the police later gets a warrant, to peer backward in time what happened in our living rooms. The “nothing is impervious” argument applies here as well. And it’s equally bogus here. By not helping police by not recording our activities, we aren’t somehow breaking some long standing tradit

And this is the scary part. It’s not that we are breaking some ancient tradition that there’s no place the police can’t go (with a warrant). Instead, crypto backdoors breaking the tradition that never before have I been forced to help them eavesdrop on me, even before I’m a suspect, even before any crime has been committed. Sure, laws like CALEA force the phone companies to help the police against wrongdoers — but here Rosenstein is insisting I help the police against myself.

Balance between privacy and public safety

Rosenstein repeats the frequent claim that encryption upsets the balance between privacy/safety:

Warrant-proof encryption defeats the constitutional balance by elevating privacy above public safety.

This is laughable, because technology has swung the balance alarmingly in favor of law enforcement. Far from “Going Dark” as his side claims, the problem we are confronted with is “Going Light”, where the police state monitors our every action.

You are surrounded by recording devices. If you walk down the street in town, outdoor surveillance cameras feed police facial recognition systems. If you drive, automated license plate readers can track your route. If you make a phone call or use a credit card, the police get a record of the transaction. If you stay in a hotel, they demand your ID, for law enforcement purposes.

And that’s their stuff, which is nothing compared to your stuff. You are never far from a recording device you own, such as your mobile phone, TV, Alexa/Siri/OkGoogle device, laptop. Modern cars from the last few years increasingly have always-on cell connections and data recorders that record your every action (and location).

Even if you hike out into the country, when you get back, the FBI can subpoena your GPS device to track down your hidden weapon’s cache, or grab the photos from your camera.

And this is all offline. So much of what we do is now online. Of the photographs you own, fewer than 1% are printed out, the rest are on your computer or backed up to the cloud.

Your phone is also a GPS recorder of your exact position all the time, which if the government wins the Carpenter case, they police can grab without a warrant. Tagging all citizens with a recording device of their position is not “balance” but the premise for a novel more dystopic than 1984.

If suspected of a crime, which would you rather the police searched? Your person, houses, papers, and physical effects? Or your mobile phone, computer, email, and online/cloud accounts?

The balance of privacy and safety has swung so far in favor of law enforcement that rather than debating whether they should have crypto backdoors, we should be debating how to add more privacy protections.

“But it’s not conclusive”

Rosenstein defends the “going light” (“Golden Age of Surveillance”) by pointing out it’s not always enough for conviction. Nothing gives a conviction better than a person’s own words admitting to the crime that were captured by surveillance. This other data, while copious, often fails to convince a jury beyond a reasonable doubt.
This is nonsense. Police got along well enough before the digital age, before such widespread messaging. They solved terrorist and child abduction cases just fine in the 1980s. Sure, somebody’s GPS location isn’t by itself enough — until you go there and find all the buried bodies, which leads to a conviction. “Going dark” imagines that somehow, the evidence they’ve been gathering for centuries is going away. It isn’t. It’s still here, and matches up with even more digital evidence.
Conversely, a person’s own words are not as conclusive as you think. There’s always missing context. We quickly get back to the Richelieu “six words” problem, where captured communications are twisted to convict people, with defense lawyers trying to untwist them.

Rosenstein’s claim may be true, that a lot of criminals will go free because the other electronic data isn’t convincing enough. But I’d need to see that claim backed up with hard studies, not thrown out for emotional impact.

Terrorists and child molesters

You can always tell the lack of seriousness of law enforcement when they bring up terrorists and child molesters.
To be fair, sometimes we do need to talk about terrorists. There are things unique to terrorism where me may need to give government explicit powers to address those unique concerns. For example, the NSA buys mobile phone 0day exploits in order to hack terrorist leaders in tribal areas. This is a good thing.
But when terrorists use encryption the same way everyone else does, then it’s not a unique reason to sacrifice our freedoms to give the police extra powers. Either it’s a good idea for all crimes or no crimes — there’s nothing particular about terrorism that makes it an exceptional crime. Dead people are dead. Any rational view of the problem relegates terrorism to be a minor problem. More citizens have died since September 8, 2001 from their own furniture than from terrorism. According to studies, the hot water from the tap is more of a threat to you than terrorists.
Yes, government should do what they can to protect us from terrorists, but no, it’s not so bad of a threat that requires the imposition of a military/police state. When people use terrorism to justify their actions, it’s because they trying to form a military/police state.
A similar argument works with child porn. Here’s the thing: the pervs aren’t exchanging child porn using the services Rosenstein wants to backdoor, like Apple’s Facetime or Facebook’s WhatsApp. Instead, they are exchanging child porn using custom services they build themselves.
Again, I’m (mostly) on the side of the FBI. I support their idea of buying 0day exploits in order to hack the web browsers of visitors to the secret “PlayPen” site. This is something that’s narrow to this problem and doesn’t endanger the innocent. On the other hand, their calls for crypto backdoors endangers the innocent while doing effectively nothing to address child porn.
Terrorists and child molesters are a clichéd, non-serious excuse to appeal to our emotions to give up our rights. We should not give in to such emotions.

Definition of “backdoor”

Rosenstein claims that we shouldn’t call backdoors “backdoors”:

No one calls any of those functions [like key recovery] a “back door.”  In fact, those capabilities are marketed and sought out by many users.

He’s partly right in that we rarely refer to PGP’s key escrow feature as a “backdoor”.

But that’s because the term “backdoor” refers less to how it’s done and more to who is doing it. If I set up a recovery password with Apple, I’m the one doing it to myself, so we don’t call it a backdoor. If it’s the police, spies, hackers, or criminals, then we call it a “backdoor” — even it’s identical technology.

Wikipedia uses the key escrow feature of the 1990s Clipper Chip as a prime example of what everyone means by “backdoor“. By “no one”, Rosenstein is including Wikipedia, which is obviously incorrect.

Though in truth, it’s not going to be the same technology. The needs of law enforcement are different than my personal key escrow/backup needs. In particular, there are unsolvable problems, such as a backdoor that works for the “legitimate” law enforcement in the United States but not for the “illegitimate” police states like Russia and China.

I feel for Rosenstein, because the term “backdoor” does have a pejorative connotation, which can be considered unfair. But that’s like saying the word “murder” is a pejorative term for killing people, or “torture” is a pejorative term for torture. The bad connotation exists because we don’t like government surveillance. I mean, honestly calling this feature “government surveillance feature” is likewise pejorative, and likewise exactly what it is that we are talking about.

Providers

Rosenstein focuses his arguments on “providers”, like Snapchat or Apple. But this isn’t the question.

The question is whether a “provider” like Telegram, a Russian company beyond US law, provides this feature. Or, by extension, whether individuals should be free to install whatever software they want, regardless of provider.

Telegram is a Russian company that provides end-to-end encryption. Anybody can download their software in order to communicate so that American law enforcement can’t eavesdrop. They aren’t going to put in a backdoor for the U.S. If we succeed in putting backdoors in Apple and WhatsApp, all this means is that criminals are going to install Telegram.

If the, for some reason, the US is able to convince all such providers (including Telegram) to install a backdoor, then it still doesn’t solve the problem, as uses can just build their own end-to-end encryption app that has no provider. It’s like email: some use the major providers like GMail, others setup their own email server.

Ultimately, this means that any law mandating “crypto backdoors” is going to target users not providers. Rosenstein tries to make a comparison with what plain-old telephone companies have to do under old laws like CALEA, but that’s not what’s happening here. Instead, for such rules to have any effect, they have to punish users for what they install, not providers.

This continues the argument I made above. Government backdoors is not something that forces Internet services to eavesdrop on us — it forces us to help the government spy on ourselves.
Rosenstein tries to address this by pointing out that it’s still a win if major providers like Apple and Facetime are forced to add backdoors, because they are the most popular, and some terrorists/criminals won’t move to alternate platforms. This is false. People with good intentions, who are unfairly targeted by a police state, the ones where police abuse is rampant, are the ones who use the backdoored products. Those with bad intentions, who know they are guilty, will move to the safe products. Indeed, Telegram is already popular among terrorists because they believe American services are already all backdoored. 
Rosenstein is essentially demanding the innocent get backdoored while the guilty don’t. This seems backwards. This is backwards.

Apple is morally weak

The reason I’m writing this post is because Rosenstein makes a few claims that cannot be ignored. One of them is how he describes Apple’s response to government insistence on weakening encryption doing the opposite, strengthening encryption. He reasons this happens because:

Of course they [Apple] do. They are in the business of selling products and making money. 

We [the DoJ] use a different measure of success. We are in the business of preventing crime and saving lives. 

He swells in importance. His condescending tone ennobles himself while debasing others. But this isn’t how things work. He’s not some white knight above the peasantry, protecting us. He’s a beat cop, a civil servant, who serves us.

A better phrasing would have been:

They are in the business of giving customers what they want.

We are in the business of giving voters what they want.

Both sides are doing the same, giving people what they want. Yes, voters want safety, but they also want privacy. Rosenstein imagines that he’s free to ignore our demands for privacy as long has he’s fulfilling his duty to protect us. He has explicitly rejected what people want, “we use a different measure of success”. He imagines it’s his job to tell us where the balance between privacy and safety lies. That’s not his job, that’s our job. We, the people (and our representatives), make that decision, and it’s his job is to do what he’s told. His measure of success is how well he fulfills our wishes, not how well he satisfies his imagined criteria.

That’s why those of us on this side of the debate doubt the good intentions of those like Rosenstein. He criticizes Apple for wanting to protect our rights/freedoms, and declare they measure success differently.

They are willing to be vile

Rosenstein makes this argument:

Companies are willing to make accommodations when required by the government. Recent media reports suggest that a major American technology company developed a tool to suppress online posts in certain geographic areas in order to embrace a foreign government’s censorship policies. 

Let me translate this for you:

Companies are willing to acquiesce to vile requests made by police-states. Therefore, they should acquiesce to our vile police-state requests.

It’s Rosenstein who is admitting here is that his requests are those of a police-state.

Constitutional Rights

Rosenstein says:

There is no constitutional right to sell warrant-proof encryption.

Maybe. It’s something the courts will have to decide. There are many 1st, 2nd, 3rd, 4th, and 5th Amendment issues here.
The reason we have the Bill of Rights is because of the abuses of the British Government. For example, they quartered troops in our homes, as a way of punishing us, and as a way of forcing us to help in our own oppression. The troops weren’t there to defend us against the French, but to defend us against ourselves, to shoot us if we got out of line.

And that’s what crypto backdoors do. We are forced to be agents of our own oppression. The principles enumerated by Rosenstein apply to a wide range of even additional surveillance. With little change to his speech, it can equally argue why the constant TV video surveillance from 1984 should be made law.

Let’s go back and look at Apple. It is not some base company exploiting consumers for profit. Apple doesn’t have guns, they cannot make people buy their product. If Apple doesn’t provide customers what they want, then customers vote with their feet, and go buy an Android phone. Apple isn’t providing encryption/security in order to make a profit — it’s giving customers what they want in order to stay in business.
Conversely, if we citizens don’t like what the government does, tough luck, they’ve got the guns to enforce their edicts. We can’t easily vote with our feet and walk to another country. A “democracy” is far less democratic than capitalism. Apple is a minority, selling phones to 45% of the population, and that’s fine, the minority get the phones they want. In a Democracy, where citizens vote on the issue, those 45% are screwed, as the 55% impose their will unwanted onto the remainder.

That’s why we have the Bill of Rights, to protect the 49% against abuse by the 51%. Regardless whether the Supreme Court agrees the current Constitution, it is the sort right that might exist regardless of what the Constitution says. 

Obliged to speak the truth

Here is the another part of his speech that I feel cannot be ignored. We have to discuss this:

Those of us who swear to protect the rule of law have a different motivation.  We are obliged to speak the truth.

The truth is that “going dark” threatens to disable law enforcement and enable criminals and terrorists to operate with impunity.

This is not true. Sure, he’s obliged to say the absolute truth, in court. He’s also obliged to be truthful in general about facts in his personal life, such as not lying on his tax return (the sort of thing that can get lawyers disbarred).

But he’s not obliged to tell his spouse his honest opinion whether that new outfit makes them look fat. Likewise, Rosenstein knows his opinion on public policy doesn’t fall into this category. He can say with impunity that either global warming doesn’t exist, or that it’ll cause a biblical deluge within 5 years. Both are factually untrue, but it’s not going to get him fired.

And this particular claim is also exaggerated bunk. While everyone agrees encryption makes law enforcement’s job harder than with backdoors, nobody honestly believes it can “disable” law enforcement. While everyone agrees that encryption helps terrorists, nobody believes it can enable them to act with “impunity”.

I feel bad here. It’s a terrible thing to question your opponent’s character this way. But Rosenstein made this unavoidable when he clearly, with no ambiguity, put his integrity as Deputy Attorney General on the line behind the statement that “going dark threatens to disable law enforcement and enable criminals and terrorists to operate with impunity”. I feel it’s a bald face lie, but you don’t need to take my word for it. Read his own words yourself and judge his integrity.

Conclusion

Rosenstein’s speech includes repeated references to ideas like “oath”, “honor”, and “duty”. It reminds me of Col. Jessup’s speech in the movie “A Few Good Men”.

If you’ll recall, it was rousing speech, “you want me on that wall” and “you use words like honor as a punchline”. Of course, since he was violating his oath and sending two privates to death row in order to avoid being held accountable, it was Jessup himself who was crapping on the concepts of “honor”, “oath”, and “duty”.

And so is Rosenstein. He imagines himself on that wall, doing albeit terrible things, justified by his duty to protect citizens. He imagines that it’s he who is honorable, while the rest of us not, even has he utters bald faced lies to further his own power and authority.

We activists oppose crypto backdoors not because we lack honor, or because we are criminals, or because we support terrorists and child molesters. It’s because we value privacy and government officials who get corrupted by power. It’s not that we fear Trump becoming a dictator, it’s that we fear bureaucrats at Rosenstein’s level becoming drunk on authority — which Rosenstein demonstrably has. His speech is a long train of corrupt ideas pursuing the same object of despotism — a despotism we oppose.

In other words, we oppose crypto backdoors because it’s not a tool of law enforcement, but a tool of despotism.

How to Automatically Revert and Receive Notifications About Changes to Your Amazon VPC Security Groups

Post Syndicated from Rob Barnes original https://aws.amazon.com/blogs/security/how-to-automatically-revert-and-receive-notifications-about-changes-to-your-amazon-vpc-security-groups/

In a previous AWS Security Blog post, Jeff Levine showed how you can monitor changes to your Amazon EC2 security groups. The methods he describes in that post are examples of detective controls, which can help you determine when changes are made to security controls on your AWS resources.

In this post, I take that approach a step further by introducing an example of a responsive control, which you can use to automatically respond to a detected security event by applying a chosen security mitigation. I demonstrate a solution that continuously monitors changes made to an Amazon VPC security group, and if a new ingress rule (the same as an inbound rule) is added to that security group, the solution removes the rule and then sends you a notification after the changes have been automatically reverted.

The scenario

Let’s say you want to reduce your infrastructure complexity by replacing your Secure Shell (SSH) bastion hosts with Amazon EC2 Systems Manager (SSM). SSM allows you to run commands on your hosts remotely, removing the need to manage bastion hosts or rely on SSH to execute commands. To support this objective, you must prevent your staff members from opening SSH ports to your web server’s Amazon VPC security group. If one of your staff members does modify the VPC security group to allow SSH access, you want the change to be automatically reverted and then receive a notification that the change to the security group was automatically reverted. If you are not yet familiar with security groups, see Security Groups for Your VPC before reading the rest of this post.

Solution overview

This solution begins with a directive control to mandate that no web server should be accessible using SSH. The directive control is enforced using a preventive control, which is implemented using a security group rule that prevents ingress from port 22 (typically used for SSH). The detective control is a “listener” that identifies any changes made to your security group. Finally, the responsive control reverts changes made to the security group and then sends a notification of this security mitigation.

The detective control, in this case, is an Amazon CloudWatch event that detects changes to your security group and triggers the responsive control, which in this case is an AWS Lambda function. I use AWS CloudFormation to simplify the deployment.

The following diagram shows the architecture of this solution.

Solution architecture diagram

Here is how the process works:

  1. Someone on your staff adds a new ingress rule to your security group.
  2. A CloudWatch event that continually monitors changes to your security groups detects the new ingress rule and invokes a designated Lambda function (with Lambda, you can run code without provisioning or managing servers).
  3. The Lambda function evaluates the event to determine whether you are monitoring this security group and reverts the new security group ingress rule.
  4. Finally, the Lambda function sends you an email to let you know what the change was, who made it, and that the change was reverted.

Deploy the solution by using CloudFormation

In this section, you will click the Launch Stack button shown below to launch the CloudFormation stack and deploy the solution.

Prerequisites

  • You must have AWS CloudTrail already enabled in the AWS Region where you will be deploying the solution. CloudTrail lets you log, continuously monitor, and retain events related to API calls across your AWS infrastructure. See Getting Started with CloudTrail for more information.
  • You must have a default VPC in the region in which you will be deploying the solution. AWS accounts have one default VPC per AWS Region. If you’ve deleted your VPC, see Creating a Default VPC to recreate it.

Resources that this solution creates

When you launch the CloudFormation stack, it creates the following resources:

  • A sample VPC security group in your default VPC, which is used as the target for reverting ingress rule changes.
  • A CloudWatch event rule that monitors changes to your AWS infrastructure.
  • A Lambda function that reverts changes to the security group and sends you email notifications.
  • A permission that allows CloudWatch to invoke your Lambda function.
  • An AWS Identity and Access Management (IAM) role with limited privileges that the Lambda function assumes when it is executed.
  • An Amazon SNS topic to which the Lambda function publishes notifications.

Launch the CloudFormation stack

The link in this section uses the us-east-1 Region (the US East [N. Virginia] Region). Change the region if you want to use this solution in a different region. See Selecting a Region for more information about changing the region.

To deploy the solution, click the following Launch Stack button to launch the stack. After you click the button, you must sign in to the AWS Management Console if you have not already done so.

Click this "Launch Stack" button

Then:

  1. Choose Next to proceed to the Specify Details page.
  2. On the Specify Details page, type your email address in the Send notifications to box. This is the email address to which change notifications will be sent. (After the stack is launched, you will receive a confirmation email that you must accept before you can receive notifications.)
  3. Choose Next until you get to the Review page, and then choose the I acknowledge that AWS CloudFormation might create IAM resources check box. This confirms that you are aware that the CloudFormation template includes an IAM resource.
  4. Choose Create. CloudFormation displays the stack status, CREATE_COMPLETE, when the stack has launched completely, which should take less than two minutes.Screenshot showing that the stack has launched completely

Testing the solution

  1. Check your email for the SNS confirmation email. You must confirm this subscription to receive future notification emails. If you don’t confirm the subscription, your security group ingress rules still will be automatically reverted, but you will not receive notification emails.
  2. Navigate to the EC2 console and choose Security Groups in the navigation pane.
  3. Choose the security group created by CloudFormation. Its name is Web Server Security Group.
  4. Choose the Inbound tab in the bottom pane of the page. Note that only one rule allows HTTPS ingress on port 443 from 0.0.0.0/0 (from anywhere).Screenshot showing the "Inbound" tab in the bottom pane of the page
  1. Choose Edit to display the Edit inbound rules dialog box (again, an inbound rule and an ingress rule are the same thing).
  2. Choose Add Rule.
  3. Choose SSH from the Type drop-down list.
  4. Choose My IP from the Source drop-down list. Your IP address is populated for you. By adding this rule, you are simulating one of your staff members violating your organization’s policy (in this blog post’s hypothetical example) against allowing SSH access to your EC2 servers. You are testing the solution created when you launched the CloudFormation stack in the previous section. The solution should remove this newly created SSH rule automatically.
    Screenshot of editing inbound rules
  5. Choose Save.

Adding this rule creates an EC2 AuthorizeSecurityGroupIngress service event, which triggers the Lambda function created in the CloudFormation stack. After a few moments, choose the refresh button ( The "refresh" icon ) to see that the new SSH ingress rule that you just created has been removed by the solution you deployed earlier with the CloudFormation stack. If the rule is still there, wait a few more moments and choose the refresh button again.

Screenshot of refreshing the page to see that the SSH ingress rule has been removed

You should also receive an email to notify you that the ingress rule was added and subsequently reverted.

Screenshot of the notification email

Cleaning up

If you want to remove the resources created by this CloudFormation stack, you can delete the CloudFormation stack:

  1. Navigate to the CloudFormation console.
  2. Choose the stack that you created earlier.
  3. Choose the Actions drop-down list.
  4. Choose Delete Stack, and then choose Yes, Delete.
  5. CloudFormation will display a status of DELETE_IN_PROGRESS while it deletes the resources created with the stack. After a few moments, the stack should no longer appear in the list of completed stacks.
    Screenshot of stack "DELETE_IN_PROGRESS"

Other applications of this solution

I have shown one way to use multiple AWS services to help continuously ensure that your security controls haven’t deviated from your security baseline. However, you also could use the CIS Amazon Web Services Foundations Benchmarks, for example, to establish a governance baseline across your AWS accounts and then use the principles in this blog post to automatically mitigate changes to that baseline.

To scale this solution, you can create a framework that uses resource tags to identify particular resources for monitoring. You also can use a consolidated monitoring approach by using cross-account event delivery. See Sending and Receiving Events Between AWS Accounts for more information. You also can extend the principle of automatic mitigation to detect and revert changes to other resources such as IAM policies and Amazon S3 bucket policies.

Summary

In this blog post, I demonstrated how you can automatically revert changes to a VPC security group and have a notification sent about the changes. You can use this solution in your own AWS accounts to enforce your security requirements continuously.

If you have comments about this blog post or other ideas for ways to use this solution, submit a comment in the “Comments” section below. If you have implementation questions, start a new thread in the EC2 forum or contact AWS Support.

– Rob

Kim Dotcom Plots Hollywood Execs’ Downfall in Wake of Weinstein Scandal

Post Syndicated from Andy original https://torrentfreak.com/kim-dotcom-plots-hollywood-execs-downfall-in-wake-of-weinstein-scandal-171011/

It has been nothing short of a disastrous week for movie mogul Harvey Weinstein.

Accused of sexual abuse and harassment by a string of actresses, the latest including Angelina Jolie and Gwyneth Paltrow, the 65-year-old is having his life taken apart.

This week, the influential producer was fired by his own The Weinstein Company, which is now seeking to change its name. And yesterday, following allegations of rape made in The New Yorker magazine, his wife, designer Georgina Chapman, announced she was leaving the Miramax co-founder.

“My heart breaks for all the women who have suffered tremendous pain because of these unforgivable actions,” the 41-year-old told People magazine.

As the scandal continues and more victims come forward, there are signs of a general emboldening of women in Hollywood, some of whom are publicly speaking out about their own experiences. If that continues to gain momentum – and the opportunity is certainly there – one man with his own experiences of Hollywood’s wrath wants to play a prominent role.

“Just the beginning. Sexual abuse and slavery by the Hollywood elites is as common as dirt. Tsunami,” Kim Dotcom wrote on Twitter.

Dotcom initially suggested that via a website, victims of Hollywood abuse could share their stories anonymously, shining light on a topic that is often shrouded in fear and secrecy. But soon the idea was growing legs.

“Looking for a Los Angeles law firm willing to represent hundreds of sexual abuse victims of Hollywood elites, pro-bono. I’ll find funding,” he said.

Within hours, Dotcom announced that he’d found lawyers in the US who are willing to help victims, for free.

“I had talks with Hollywood lawyers. Found a big law firm willing to represent sexual abuse victims, for free. Next, the website,” he teased.

It’s not hard to see why Dotcom is making this battle his own. Aside from any empathy he feels towards victims on a personal level, he sees his family as kindred spirits, people who have also felt the wrath of Hollywood executives.

That being said, the Megaupload founder is extremely clear that framing this as revenge or a personal vendetta would be not only wrong, but also disrespectful to the victims of abuse.

“I want to help victims because I’m a victim,” he told TorrentFreak.

“I’m an abuse victim of Hollywood, not sexual abuse, but certainly abuse of power. It’s time to shine some light on those Hollywood elites who think they are above the law and untouchable.”

Dotcom told NZ Herald that people like Harvey Weinstein rub shoulders with the great and the good, hoping to influence decision-makers for their own personal gain. It’s something Dotcom, his family, and his colleagues have felt the effects of.

“They dine with presidents, donate millions to powerful politicians and buy favors like tax breaks and new copyright legislation, even the Megaupload raid. They think they can destroy lives and businesses with impunity. They think they can get away with anything. But they can’t. We’ll teach them,” he warned.

The Megaupload founder says he has both “the motive and the resources” to help victims and he’s promising to do that with proven skills. Ironically, many of these have been honed as a direct result of Hollywood’s attack on Megaupload and Dotcom’s relentless drive to bounce back with new sites like Mega and his latest K.im / Bitcache project.

“I’m an experienced fundraiser. A high traffic crowdfunding campaign for this cause can raise millions. The costs won’t be an issue,” Dotcom informs TF. “There seems to be an appetite for these cases because defendants usually settle quickly. I have calls with LA firms today and tomorrow.

“Just the beginning. Watch me,” he concludes.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.