Tag Archives: ISO

Kroah-Hartman: Meltdown and Spectre Linux Kernel Status – Update

Post Syndicated from corbet original https://lwn.net/Articles/744803/rss

Here’s a
brief update from Greg Kroah-Hartman
on the kernel’s handling of the
Meltdown and Spectre vulnerabilities. “This shows that my kernel is
properly mitigating the Meltdown problem by implementing PTI (Page Table
Isolation), and that my system is still vulnerable to the Spectre variant
1, but is trying really hard to resolve the variant 2, but is not quite
there (because I did not build my kernel with a compiler to properly
support the retpoline feature).

UK Government Teaches 7-Year-Olds That Piracy is Stealing

Post Syndicated from Ernesto original https://torrentfreak.com/uk-government-teaches-7-year-olds-that-piracy-is-stealing-180118/

In 2014, Mike Weatherley, the UK Government’s top IP advisor at the time, offered a recommendation that copyright education should be added to the school curriculum, starting with the youngest kids in primary school.

New generations should learn copyright moral and ethics, the idea was, and a few months later the first version of the new “Cracking Ideas” curriculum was made public.

In the years that followed new course material was added, published by the UK’s Intellectual Property Office (IPO) with support from the local copyright industry. The teaching material is aimed at a variety of ages, including those who have just started primary school.

Part of the education features a fictitious cartoon band called Nancy and the Meerkats. With help from their manager, they learn key copyright insights and this week several new videos were published, BBC points out.

The videos try to explain concepts including copyright, trademarks, and how people can protect the things they’ve created. Interestingly, the videos themselves use names of existing musicians, with puns such as Ed Shealing, Justin Beaver, and the evil Kitty Perry. Even Nancy and the Meerkats appears to be a play on the classic 1970s cartoon series Josie and the Pussycats, featuring a pop band of the same name.

The play on Ed Sheeran’s name is interesting, to say the least. While he’s one of the most popular artists today, he also mentioned in the past that file-sharing made his career.

“…illegal fire sharing was what made me. It was students in England going to university, sharing my songs with each other,” Sheeran said in an interview with CBS last year.

But that didn’t stop the IPO from using his likeness for their anti-file-sharing campaign. According to Catherine Davies of IPO’s education outreach department, knowledge about key intellectual property issues is a “life skill” nowadays.

“In today’s digital environment, even very young people are IP consumers, accessing online digital content independently and regularly,” she tells the BBC. “A basic understanding of IP and a respect for others’ IP rights is therefore a key life skill.”

While we doubt that these concepts will appeal to the average five-year-old, the course material does it best to simplify complex copyright issues. Perhaps that’s also where the danger lies.

The program is in part backed by copyright-reliant industries, who have a different view on the matter than many others. For example, a previously published video of Nancy and the Meerkats deals with the topic of file-sharing.

After the Meerkats found out that people were downloading their tracks from pirate sites and became outraged, their manager Big Joe explained that file-sharing is just the same as stealing a CD from a physical store.

“In a way, all those people who downloaded free copies are doing the same thing as walking out of the shop with a CD and forgetting to go the till,” he says.

“What these sites are doing is sometimes called piracy. It not only affects music but also videos, books, and movies.If someone owns the copyright to something, well, it is stealing. Simple as that,” Big Joe adds.

The Pirates of the Internet!

While we won’t go into the copying vs. stealing debate, it’s interesting that there is no mention of more liberal copyright licenses. There are thousands of artists who freely share their work after all, by adopting Creative Commons licenses for example. Downloading these tracks is certainly not stealing.

Jim Killock, director of the Open Rights Group, notes that the campaign is a bit extreme at points.

“Infringing copyright is a bad thing, but it is not the same as physical theft. Many children will guess that making a copy is not the same as making off with the local store’s chocolate bars,” he says.

“Children aren’t born bureaucrats, and they are surrounded by stupid rules made by stupid adults. Presumably, the IPO doesn’t want children to conclude that copyright is just another one, so they should be a bit more careful with how they explain things.”

Killock also stresses that children copy a lot of things in school, which would normally violate copyright. However, thanks to the educational exceptions they’re not getting in trouble. The IPO could pay more attention to these going forward.

Perhaps Nancy and the Meerkats could decide to release a free to share track in a future episode, for example, and encourage kids to use it for their own remixes, or other creative projects. Creativity and copyright are not all about restrictions, after all.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

[$] Monitoring with Prometheus 2.0

Post Syndicated from corbet original https://lwn.net/Articles/744410/rss

Prometheus is a monitoring tool
built from scratch by SoundCloud in 2012. It works by pulling metrics from
monitored services and storing them in a time series database (TSDB). It
has a powerful query language to inspect that database, create alerts, and
plot basic graphs. Those graphs can then be used to detect anomalies or
trends for (possibly automated) resource provisioning. Prometheus also has
extensive service discovery features and supports high availability
configurations.

That’s what the brochure says, anyway; let’s see how it works in the hands
of an old grumpy system administrator. I’ll be drawing comparisons
with Munin and Nagios frequently because those are the tools I have
used for over a decade in monitoring Unix clusters.

Article from a Former Chinese PLA General on Cyber Sovereignty

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/01/article_from_a_.html

Interesting article by Major General Hao Yeli, Chinese People’s Liberation Army (ret.), a senior advisor at the China International Institute for Strategic Society, Vice President of China Institute for Innovation and Development Strategy, and the Chair of the Guanchao Cyber Forum.

Against the background of globalization and the internet era, the emerging cyber sovereignty concept calls for breaking through the limitations of physical space and avoiding misunderstandings based on perceptions of binary opposition. Reinforcing a cyberspace community with a common destiny, it reconciles the tension between exclusivity and transferability, leading to a comprehensive perspective. China insists on its cyber sovereignty, meanwhile, it transfers segments of its cyber sovereignty reasonably. China rightly attaches importance to its national security, meanwhile, it promotes international cooperation and open development.

China has never been opposed to multi-party governance when appropriate, but rejects the denial of government’s proper role and responsibilities with respect to major issues. The multilateral and multiparty models are complementary rather than exclusive. Governments and multi-stakeholders can play different leading roles at the different levels of cyberspace.

In the internet era, the law of the jungle should give way to solidarity and shared responsibilities. Restricted connections should give way to openness and sharing. Intolerance should be replaced by understanding. And unilateral values should yield to respect for differences while recognizing the importance of diversity.

Analyzing the Linux boot process (opensource.com)

Post Syndicated from corbet original https://lwn.net/Articles/744528/rss

Alison Chaiken looks
in detail at how the kernel boots
on opensource.com.
Besides starting buggy spyware, what function does early boot
firmware serve? The job of a bootloader is to make available to a newly
powered processor the resources it needs to run a general-purpose operating
system like Linux. At power-on, there not only is no virtual memory, but no
DRAM until its controller is brought up.

Now Open – Third AWS Availability Zone in London

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/now-open-third-aws-availability-zone-in-london/

We expand AWS by picking a geographic area (which we call a Region) and then building multiple, isolated Availability Zones in that area. Each Availability Zone (AZ) has multiple Internet connections and power connections to multiple grids.

Today I am happy to announce that we are opening our 50th AWS Availability Zone, with the addition of a third AZ to the EU (London) Region. This will give you additional flexibility to architect highly scalable, fault-tolerant applications that run across multiple AZs in the UK.

Since launching the EU (London) Region, we have seen an ever-growing set of customers, particularly in the public sector and in regulated industries, use AWS for new and innovative applications. Here are a couple of examples, courtesy of my AWS colleagues in the UK:

Enterprise – Some of the UK’s most respected enterprises are using AWS to transform their businesses, including BBC, BT, Deloitte, and Travis Perkins. Travis Perkins is one of the largest suppliers of building materials in the UK and is implementing the biggest systems and business change in its history, including an all-in migration of its data centers to AWS.

Startups – Cross-border payments company Currencycloud has migrated its entire payments production, and demo platform to AWS resulting in a 30% saving on their infrastructure costs. Clearscore, with plans to disrupting the credit score industry, has also chosen to host their entire platform on AWS. UnderwriteMe is using the EU (London) Region to offer an underwriting platform to their customers as a managed service.

Public Sector -The Met Office chose AWS to support the Met Office Weather App, available for iPhone and Android phones. Since the Met Office Weather App went live in January 2016, it has attracted more than half a million users. Using AWS, the Met Office has been able to increase agility, speed, and scalability while reducing costs. The Driver and Vehicle Licensing Agency (DVLA) is using the EU (London) Region for services such as the Strategic Card Payments platform, which helps the agency achieve PCI DSS compliance.

The AWS EU (London) Region has achieved Public Services Network (PSN) assurance, which provides UK Public Sector customers with an assured infrastructure on which to build UK Public Sector services. In conjunction with AWS’s Standardized Architecture for UK-OFFICIAL, PSN assurance enables UK Public Sector organizations to move their UK-OFFICIAL classified data to the EU (London) Region in a controlled and risk-managed manner.

For a complete list of AWS Regions and Services, visit the AWS Global Infrastructure page. As always, pricing for services in the Region can be found on the detail pages; visit our Cloud Products page to get started.

Jeff;

Tickbox Clearly Promotes and Facilitates Piracy, Hollywood Tells Court

Post Syndicated from Ernesto original https://torrentfreak.com/tickbox-clearly-promotes-and-facilitates-piracy-hollywood-tells-court-180115/

The rising popularity of piracy streaming boxes has turned into Hollywood’s main piracy concern in recent months.

While the hardware and media players such as Kodi are not a problem, sellers who ship devices with unauthorized add-ons turn them into fully-fledged piracy machines.

According to the Alliance for Creativity and Entertainment (ACE), an anti-piracy partnership comprised of Hollywood studios, Netflix, Amazon, and more than two dozen other companies, Tickbox TV is one of these bad actors.

Last year, ACE filed a lawsuit against the Georgia-based company, which sells set-top boxes that allow users to stream a variety of popular media. The Tickbox devices use the Kodi media player and comes with instructions on how to add various add-ons.

According to ACE, these devices are nothing more than pirate tools, allowing buyers to stream copyright-infringing content. The coalition, therefore, asked the court for a permanent injunction to remove all infringing add-ons from previously sold devices.

Tickbox maintained its innocence, however. The company informed the court that its box is a simple computer like any other, which is perfectly legal.

According to Tickbox, they don’t have anything to do with the infringing “Themes” that users can select on their device. These themes feature several addons that link to infringing content.

This explanation doesn’t sit well with the movie companies, which submitted a reply to the court late last week. They claim that Tickbox is deliberately downplaying their own role, as they are the ones who decided to make these themes accessible through their boxes.

“TickBox falsely claims that the presence of these ‘Themes’ on TickBox devices ‘have nothing to do with Defendant’,” ACE’s reply reads.

“To the contrary, TickBox intentionally chooses which ‘Themes’ to include on its ‘Select your Theme’ menu for the TickBox TV interface, and TickBox pushes out automatic software updates to its customers’ TickBox TV devices.”

The movie companies also dispute Tickbox’s argument that they don’t induce copyright infringement because their device is “simply a small computer” that has many legitimate uses.

This liability question isn’t about whether Tickbox stores any infringing material or runs pirate streams through their servers, they counter. It’s about the intended use and how Tickbox promotes its product.

“TickBox’s liability arises based on its advertising and promoting TickBox TV as a tool for infringing use, and from designing and including software on the device that encourages access to infringing streams from third-party sources.”

ACE notes that, unlike Tickbox claims, the current case shows a lot of parallels with previous landmark cases including Grokster and Fung [isoHunt].

The isoHunt website didn’t store and infringing material, nor was it crucial in the torrent piracy ecosystem. However, it was liable because the operator willingly facilitated copyright infringing activity. This is what Tickbox does too, according to ACE.

“TickBox ‘competes’ with legitimate services by telling customers that they can access the same content available from legitimate distributors ‘ABSOLUTELY FREE’ and that customers therefore ‘will find that you no longer need those subscriptions’.”

The movie companies therefore ask the court to issue the requested injunction. They want all existing devices to be impounded and Tickbox should, through an update, remove infringing addons from already sold devices.

Tickbox argued that this would require them to “hack into” their customers’ boxes and delete content. ACE, however, says that this is a simple update and nothing different from what the company has done in the past.

“The proposed injunction would merely obligate TickBox to make good on its halfhearted and ineffective efforts to do what it claims to have already done: remove Kodi builds with illicit addons from TickBox TV,” ACE writes.

“As demonstrated by TickBox’s own, repeated software updates since the filing of Plaintiffs’ Complaint, TickBox has the means and ability to easily and remotely change what options users see and can access on their TickBox TVs.”

After having heard the arguments from both sides, it’s now up to the California federal court to decide who’s right.

The current case should set an important precedent. In addition to Tickbox, ACE also filed a similar lawsuit against Dragon Box. Clearly, the coalition is determined to get these alleged pirate devices off the market.

A copy of ACE’s reply is available here (pdf).

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Pirate Streaming on Facebook is a Seriously Risky Business

Post Syndicated from Andy original https://torrentfreak.com/pirate-streaming-on-facebook-is-a-seriously-risky-business-180114/

For more than a year the British public has been warned about the supposed dangers of Kodi piracy.

Dozens of headlines have claimed consequences ranging from system-destroying malware to prison sentences. Fortunately, most of them can be filed under “tabloid nonsense.”

That being said, there is an extremely important issue that deserves much closer attention, particularly given a shift in the UK legal climate during 2017. We’re talking about live streaming copyrighted content on Facebook, which is both incredibly easy and frighteningly risky.

This week it was revealed that 34-year-old Craig Foster from the UK had been given an ultimatum from Sky to pay a £5,000 settlement fee. The media giant discovered that he’d live-streamed the Anthony Joshua v Wladimir Klitschko fight on Facebook and wanted compensation to make a potential court case disappear.

While it may seem initially odd to use the word, Foster was lucky.

Under last year’s Digital Economy Act, he could’ve been jailed for up to ten years for distributing copyright-infringing content to the public, if he had “reason to believe that communicating the work to the public [would] cause loss to the owner of the copyright, or [would] expose the owner of the copyright to a risk of loss.”

Clearly, as a purchaser of the £19.95 pay-per-view himself, he would’ve appreciated that the event costs money. With that in mind, a court would likely find that he would have been aware that Sky would have been exposed to a “risk of loss”. Sky claim that 4,250 people watched the stream but the way the law is written, no specific level of loss is required for a breach of the law.

But it’s not just the threat of a jail sentence that’s the problem. People streaming live sports on Facebook are sitting ducks.

In Foster’s case, the fight he streamed was watermarked, which means that Sky put a tracking code into it which identified him personally as the buyer of the event. When he (or his friend, as Foster claims) streamed it on Facebook, it was trivial for Sky to capture the watermark and track it back to his Sky account.

Equally, it would be simplicity itself to see that the name on the Sky account had exactly the same name and details as Foster’s Facebook account. So, to most observers, it would appear that not only had Foster purchased the event, but he was also streaming it to Facebook illegally.

It’s important to keep something else in mind. No cooperation between Sky and Facebook would’ve been necessary to obtain Foster’s details. Take the amount of information most people share on Facebook, combine that with the information Sky already had, and the company’s anti-piracy team would have had a very easy job.

Now compare this situation with an upload of the same stream to a torrent site.

While the video capture would still contain Foster’s watermark, which would indicate the source, to prove he also distributed the video Sky would’ve needed to get inside a torrent swarm. From there they would need to capture the IP address of the initial seeder and take the case to court, to force an ISP to hand over that person’s details.

Presuming they were the same person, Sky would have a case, with a broadly similar level of evidence to that presented in the current matter. However, it would’ve taken them months to get their man and cost large sums of money to get there. It’s very unlikely that £5,000 would cover the costs, meaning a much, much bigger bill for the culprit.

Or, confident that Foster was behind the leak based on the watermark alone, Sky could’ve gone straight to the police. That never ends well.

The bottom line is that while live-streaming on Facebook is simplicity itself, people who do it casually from their own account (especially with watermarked content) are asking for trouble.

Nailing Foster was the piracy equivalent of shooting fish in a barrel but the worrying part is that he probably never gave his (or his friend’s…) alleged infringement a second thought. With a click or two, the fight was live and he was staring down the barrel of a potential jail sentence, had Sky not gone the civil route.

It’s scary stuff and not enough is being done to warn people of the consequences. Forget the scare stories attempting to deter people from watching fights or movies on Kodi, thoughtlessly streaming them to the public on social media is the real danger.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

AWS Glue Now Supports Scala Scripts

Post Syndicated from Mehul Shah original https://aws.amazon.com/blogs/big-data/aws-glue-now-supports-scala-scripts/

We are excited to announce AWS Glue support for running ETL (extract, transform, and load) scripts in Scala. Scala lovers can rejoice because they now have one more powerful tool in their arsenal. Scala is the native language for Apache Spark, the underlying engine that AWS Glue offers for performing data transformations.

Beyond its elegant language features, writing Scala scripts for AWS Glue has two main advantages over writing scripts in Python. First, Scala is faster for custom transformations that do a lot of heavy lifting because there is no need to shovel data between Python and Apache Spark’s Scala runtime (that is, the Java virtual machine, or JVM). You can build your own transformations or invoke functions in third-party libraries. Second, it’s simpler to call functions in external Java class libraries from Scala because Scala is designed to be Java-compatible. It compiles to the same bytecode, and its data structures don’t need to be converted.

To illustrate these benefits, we walk through an example that analyzes a recent sample of the GitHub public timeline available from the GitHub archive. This site is an archive of public requests to the GitHub service, recording more than 35 event types ranging from commits and forks to issues and comments.

This post shows how to build an example Scala script that identifies highly negative issues in the timeline. It pulls out issue events in the timeline sample, analyzes their titles using the sentiment prediction functions from the Stanford CoreNLP libraries, and surfaces the most negative issues.

Getting started

Before we start writing scripts, we use AWS Glue crawlers to get a sense of the data—its structure and characteristics. We also set up a development endpoint and attach an Apache Zeppelin notebook, so we can interactively explore the data and author the script.

Crawl the data

The dataset used in this example was downloaded from the GitHub archive website into our sample dataset bucket in Amazon S3, and copied to the following locations:

s3://aws-glue-datasets-<region>/examples/scala-blog/githubarchive/data/

Choose the best folder by replacing <region> with the region that you’re working in, for example, us-east-1. Crawl this folder, and put the results into a database named githubarchive in the AWS Glue Data Catalog, as described in the AWS Glue Developer Guide. This folder contains 12 hours of the timeline from January 22, 2017, and is organized hierarchically (that is, partitioned) by year, month, and day.

When finished, use the AWS Glue console to navigate to the table named data in the githubarchive database. Notice that this data has eight top-level columns, which are common to each event type, and three partition columns that correspond to year, month, and day.

Choose the payload column, and you will notice that it has a complex schema—one that reflects the union of the payloads of event types that appear in the crawled data. Also note that the schema that crawlers generate is a subset of the true schema because they sample only a subset of the data.

Set up the library, development endpoint, and notebook

Next, you need to download and set up the libraries that estimate the sentiment in a snippet of text. The Stanford CoreNLP libraries contain a number of human language processing tools, including sentiment prediction.

Download the Stanford CoreNLP libraries. Unzip the .zip file, and you’ll see a directory full of jar files. For this example, the following jars are required:

  • stanford-corenlp-3.8.0.jar
  • stanford-corenlp-3.8.0-models.jar
  • ejml-0.23.jar

Upload these files to an Amazon S3 path that is accessible to AWS Glue so that it can load these libraries when needed. For this example, they are in s3://glue-sample-other/corenlp/.

Development endpoints are static Spark-based environments that can serve as the backend for data exploration. You can attach notebooks to these endpoints to interactively send commands and explore and analyze your data. These endpoints have the same configuration as that of AWS Glue’s job execution system. So, commands and scripts that work there also work the same when registered and run as jobs in AWS Glue.

To set up an endpoint and a Zeppelin notebook to work with that endpoint, follow the instructions in the AWS Glue Developer Guide. When you are creating an endpoint, be sure to specify the locations of the previously mentioned jars in the Dependent jars path as a comma-separated list. Otherwise, the libraries will not be loaded.

After you set up the notebook server, go to the Zeppelin notebook by choosing Dev Endpoints in the left navigation pane on the AWS Glue console. Choose the endpoint that you created. Next, choose the Notebook Server URL, which takes you to the Zeppelin server. Log in using the notebook user name and password that you specified when creating the notebook. Finally, create a new note to try out this example.

Each notebook is a collection of paragraphs, and each paragraph contains a sequence of commands and the output for that command. Moreover, each notebook includes a number of interpreters. If you set up the Zeppelin server using the console, the (Python-based) pyspark and (Scala-based) spark interpreters are already connected to your new development endpoint, with pyspark as the default. Therefore, throughout this example, you need to prepend %spark at the top of your paragraphs. In this example, we omit these for brevity.

Working with the data

In this section, we use AWS Glue extensions to Spark to work with the dataset. We look at the actual schema of the data and filter out the interesting event types for our analysis.

Start with some boilerplate code to import libraries that you need:

%spark

import com.amazonaws.services.glue.DynamicRecord
import com.amazonaws.services.glue.GlueContext
import com.amazonaws.services.glue.util.GlueArgParser
import com.amazonaws.services.glue.util.Job
import com.amazonaws.services.glue.util.JsonOptions
import com.amazonaws.services.glue.types._
import org.apache.spark.SparkContext

Then, create the Spark and AWS Glue contexts needed for working with the data:

@transient val spark: SparkContext = SparkContext.getOrCreate()
val glueContext: GlueContext = new GlueContext(spark)

You need the transient decorator on the SparkContext when working in Zeppelin; otherwise, you will run into a serialization error when executing commands.

Dynamic frames

This section shows how to create a dynamic frame that contains the GitHub records in the table that you crawled earlier. A dynamic frame is the basic data structure in AWS Glue scripts. It is like an Apache Spark data frame, except that it is designed and optimized for data cleaning and transformation workloads. A dynamic frame is well-suited for representing semi-structured datasets like the GitHub timeline.

A dynamic frame is a collection of dynamic records. In Spark lingo, it is an RDD (resilient distributed dataset) of DynamicRecords. A dynamic record is a self-describing record. Each record encodes its columns and types, so every record can have a schema that is unique from all others in the dynamic frame. This is convenient and often more efficient for datasets like the GitHub timeline, where payloads can vary drastically from one event type to another.

The following creates a dynamic frame, github_events, from your table:

val github_events = glueContext
                    .getCatalogSource(database = "githubarchive", tableName = "data")
                    .getDynamicFrame()

The getCatalogSource() method returns a DataSource, which represents a particular table in the Data Catalog. The getDynamicFrame() method returns a dynamic frame from the source.

Recall that the crawler created a schema from only a sample of the data. You can scan the entire dataset, count the rows, and print the complete schema as follows:

github_events.count
github_events.printSchema()

The result looks like the following:

The data has 414,826 records. As before, notice that there are eight top-level columns, and three partition columns. If you scroll down, you’ll also notice that the payload is the most complex column.

Run functions and filter records

This section describes how you can create your own functions and invoke them seamlessly to filter records. Unlike filtering with Python lambdas, Scala scripts do not need to convert records from one language representation to another, thereby reducing overhead and running much faster.

Let’s create a function that picks only the IssuesEvents from the GitHub timeline. These events are generated whenever someone posts an issue for a particular repository. Each GitHub event record has a field, “type”, that indicates the kind of event it is. The issueFilter() function returns true for records that are IssuesEvents.

def issueFilter(rec: DynamicRecord): Boolean = { 
    rec.getField("type").exists(_ == "IssuesEvent") 
}

Note that the getField() method returns an Option[Any] type, so you first need to check that it exists before checking the type.

You pass this function to the filter transformation, which applies the function on each record and returns a dynamic frame of those records that pass.

val issue_events =  github_events.filter(issueFilter)

Now, let’s look at the size and schema of issue_events.

issue_events.count
issue_events.printSchema()

It’s much smaller (14,063 records), and the payload schema is less complex, reflecting only the schema for issues. Keep a few essential columns for your analysis, and drop the rest using the ApplyMapping() transform:

val issue_titles = issue_events.applyMapping(Seq(("id", "string", "id", "string"),
                                                 ("actor.login", "string", "actor", "string"), 
                                                 ("repo.name", "string", "repo", "string"),
                                                 ("payload.action", "string", "action", "string"),
                                                 ("payload.issue.title", "string", "title", "string")))
issue_titles.show()

The ApplyMapping() transform is quite handy for renaming columns, casting types, and restructuring records. The preceding code snippet tells the transform to select the fields (or columns) that are enumerated in the left half of the tuples and map them to the fields and types in the right half.

Estimating sentiment using Stanford CoreNLP

To focus on the most pressing issues, you might want to isolate the records with the most negative sentiments. The Stanford CoreNLP libraries are Java-based and offer sentiment-prediction functions. Accessing these functions through Python is possible, but quite cumbersome. It requires creating Python surrogate classes and objects for those found on the Java side. Instead, with Scala support, you can use those classes and objects directly and invoke their methods. Let’s see how.

First, import the libraries needed for the analysis:

import java.util.Properties
import edu.stanford.nlp.ling.CoreAnnotations
import edu.stanford.nlp.neural.rnn.RNNCoreAnnotations
import edu.stanford.nlp.pipeline.{Annotation, StanfordCoreNLP}
import edu.stanford.nlp.sentiment.SentimentCoreAnnotations
import scala.collection.convert.wrapAll._

The Stanford CoreNLP libraries have a main driver that orchestrates all of their analysis. The driver setup is heavyweight, setting up threads and data structures that are shared across analyses. Apache Spark runs on a cluster with a main driver process and a collection of backend executor processes that do most of the heavy sifting of the data.

The Stanford CoreNLP shared objects are not serializable, so they cannot be distributed easily across a cluster. Instead, you need to initialize them once for every backend executor process that might need them. Here is how to accomplish that:

val props = new Properties()
props.setProperty("annotators", "tokenize, ssplit, parse, sentiment")
props.setProperty("parse.maxlen", "70")

object myNLP {
    lazy val coreNLP = new StanfordCoreNLP(props)
}

The properties tell the libraries which annotators to execute and how many words to process. The preceding code creates an object, myNLP, with a field coreNLP that is lazily evaluated. This field is initialized only when it is needed, and only once. So, when the backend executors start processing the records, each executor initializes the driver for the Stanford CoreNLP libraries only one time.

Next is a function that estimates the sentiment of a text string. It first calls Stanford CoreNLP to annotate the text. Then, it pulls out the sentences and takes the average sentiment across all the sentences. The sentiment is a double, from 0.0 as the most negative to 4.0 as the most positive.

def estimatedSentiment(text: String): Double = {
    if ((text == null) || (!text.nonEmpty)) { return Double.NaN }
    val annotations = myNLP.coreNLP.process(text)
    val sentences = annotations.get(classOf[CoreAnnotations.SentencesAnnotation])
    sentences.foldLeft(0.0)( (csum, x) => { 
        csum + RNNCoreAnnotations.getPredictedClass(x.get(classOf[SentimentCoreAnnotations.SentimentAnnotatedTree])) 
    }) / sentences.length
}

Now, let’s estimate the sentiment of the issue titles and add that computed field as part of the records. You can accomplish this with the map() method on dynamic frames:

val issue_sentiments = issue_titles.map((rec: DynamicRecord) => { 
    val mbody = rec.getField("title")
    mbody match {
        case Some(mval: String) => { 
            rec.addField("sentiment", ScalarNode(estimatedSentiment(mval)))
            rec }
        case _ => rec
    }
})

The map() method applies the user-provided function on every record. The function takes a DynamicRecord as an argument and returns a DynamicRecord. The code above computes the sentiment, adds it in a top-level field, sentiment, to the record, and returns the record.

Count the records with sentiment and show the schema. This takes a few minutes because Spark must initialize the library and run the sentiment analysis, which can be involved.

issue_sentiments.count
issue_sentiments.printSchema()

Notice that all records were processed (14,063), and the sentiment value was added to the schema.

Finally, let’s pick out the titles that have the lowest sentiment (less than 1.5). Count them and print out a sample to see what some of the titles look like.

val pressing_issues = issue_sentiments.filter(_.getField("sentiment").exists(_.asInstanceOf[Double] < 1.5))
pressing_issues.count
pressing_issues.show(10)

Next, write them all to a file so that you can handle them later. (You’ll need to replace the output path with your own.)

glueContext.getSinkWithFormat(connectionType = "s3", 
                              options = JsonOptions("""{"path": "s3://<bucket>/out/path/"}"""), 
                              format = "json")
            .writeDynamicFrame(pressing_issues)

Take a look in the output path, and you can see the output files.

Putting it all together

Now, let’s create a job from the preceding interactive session. The following script combines all the commands from earlier. It processes the GitHub archive files and writes out the highly negative issues:

import com.amazonaws.services.glue.DynamicRecord
import com.amazonaws.services.glue.GlueContext
import com.amazonaws.services.glue.util.GlueArgParser
import com.amazonaws.services.glue.util.Job
import com.amazonaws.services.glue.util.JsonOptions
import com.amazonaws.services.glue.types._
import org.apache.spark.SparkContext
import java.util.Properties
import edu.stanford.nlp.ling.CoreAnnotations
import edu.stanford.nlp.neural.rnn.RNNCoreAnnotations
import edu.stanford.nlp.pipeline.{Annotation, StanfordCoreNLP}
import edu.stanford.nlp.sentiment.SentimentCoreAnnotations
import scala.collection.convert.wrapAll._

object GlueApp {

    object myNLP {
        val props = new Properties()
        props.setProperty("annotators", "tokenize, ssplit, parse, sentiment")
        props.setProperty("parse.maxlen", "70")

        lazy val coreNLP = new StanfordCoreNLP(props)
    }

    def estimatedSentiment(text: String): Double = {
        if ((text == null) || (!text.nonEmpty)) { return Double.NaN }
        val annotations = myNLP.coreNLP.process(text)
        val sentences = annotations.get(classOf[CoreAnnotations.SentencesAnnotation])
        sentences.foldLeft(0.0)( (csum, x) => { 
            csum + RNNCoreAnnotations.getPredictedClass(x.get(classOf[SentimentCoreAnnotations.SentimentAnnotatedTree])) 
        }) / sentences.length
    }

    def main(sysArgs: Array[String]) {
        val spark: SparkContext = SparkContext.getOrCreate()
        val glueContext: GlueContext = new GlueContext(spark)

        val dbname = "githubarchive"
        val tblname = "data"
        val outpath = "s3://<bucket>/out/path/"

        val github_events = glueContext
                            .getCatalogSource(database = dbname, tableName = tblname)
                            .getDynamicFrame()

        val issue_events =  github_events.filter((rec: DynamicRecord) => {
            rec.getField("type").exists(_ == "IssuesEvent")
        })

        val issue_titles = issue_events.applyMapping(Seq(("id", "string", "id", "string"),
                                                         ("actor.login", "string", "actor", "string"), 
                                                         ("repo.name", "string", "repo", "string"),
                                                         ("payload.action", "string", "action", "string"),
                                                         ("payload.issue.title", "string", "title", "string")))

        val issue_sentiments = issue_titles.map((rec: DynamicRecord) => { 
            val mbody = rec.getField("title")
            mbody match {
                case Some(mval: String) => { 
                    rec.addField("sentiment", ScalarNode(estimatedSentiment(mval)))
                    rec }
                case _ => rec
            }
        })

        val pressing_issues = issue_sentiments.filter(_.getField("sentiment").exists(_.asInstanceOf[Double] < 1.5))

        glueContext.getSinkWithFormat(connectionType = "s3", 
                              options = JsonOptions(s"""{"path": "$outpath"}"""), 
                              format = "json")
                    .writeDynamicFrame(pressing_issues)
    }
}

Notice that the script is enclosed in a top-level object called GlueApp, which serves as the script’s entry point for the job. (You’ll need to replace the output path with your own.) Upload the script to an Amazon S3 location so that AWS Glue can load it when needed.

To create the job, open the AWS Glue console. Choose Jobs in the left navigation pane, and then choose Add job. Create a name for the job, and specify a role with permissions to access the data. Choose An existing script that you provide, and choose Scala as the language.

For the Scala class name, type GlueApp to indicate the script’s entry point. Specify the Amazon S3 location of the script.

Choose Script libraries and job parameters. In the Dependent jars path field, enter the Amazon S3 locations of the Stanford CoreNLP libraries from earlier as a comma-separated list (without spaces). Then choose Next.

No connections are needed for this job, so choose Next again. Review the job properties, and choose Finish. Finally, choose Run job to execute the job.

You can simply edit the script’s input table and output path to run this job on whatever GitHub timeline datasets that you might have.

Conclusion

In this post, we showed how to write AWS Glue ETL scripts in Scala via notebooks and how to run them as jobs. Scala has the advantage that it is the native language for the Spark runtime. With Scala, it is easier to call Scala or Java functions and third-party libraries for analyses. Moreover, data processing is faster in Scala because there’s no need to convert records from one language runtime to another.

You can find more example of Scala scripts in our GitHub examples repository: https://github.com/awslabs/aws-glue-samples. We encourage you to experiment with Scala scripts and let us know about any interesting ETL flows that you want to share.

Happy Glue-ing!

 


Additional Reading

If you found this post useful, be sure to check out Simplify Querying Nested JSON with the AWS Glue Relationalize Transform and Genomic Analysis with Hail on Amazon EMR and Amazon Athena.

 


About the Authors

Mehul Shah is a senior software manager for AWS Glue. His passion is leveraging the cloud to build smarter, more efficient, and easier to use data systems. He has three girls, and, therefore, he has no spare time.

 

 

 

Ben Sowell is a software development engineer at AWS Glue.

 

 

 

 
Vinay Vivili is a software development engineer for AWS Glue.

 

 

 

Fix Your Crawler

Post Syndicated from Bozho original https://techblog.bozho.net/fix-your-crawler/

Every now and then I open the admin panel of my blog hosting and ban a few IPs (after I’ve tried messaging their abuse email, if I find one). It is always IPs that are generating tons of requests (and traffic) – most likely running some home-made crawler. In some cases the IPs belong to an actual service that captures and provides content, in other cases it’s just a scraper for unknown reasons.

I don’t want to ban IPs, especially because that same IP may be reassigned to a legitimate user (or network) in the future. But they are increasing my hosting usage, which in turn leads to the hosting provider suggesting an upgrade in the plan. And this is not about me, I’m just an example – tons of requests to millions of sites are … useless.

My advice (and plea) is this – please fix your crawlers. Or scrapers. Or whatever you prefer to call that thing that programmatically goes on websites and gets their content.

How? First, reuse an existing crawler. No need to make something new (unless there’s a very specific use-case). A good intro and comparison can be seen here.

Second, make your crawler “polite” (the “politeness” property in the article above). Here’s a good overview on how to be polite, including respect for robots.txt. Existing implementations most likely have politeness options, but you may have to configure them.

Here I’d suggest another option – set a dynamic crawl rate per website that depends on how often the content is updated. My blog updates 3 times a month – no need to crawl it more than once or twice a day. TechCrunch updates many times a day; it’s probably a good idea to crawl it way more often. I don’t have a formula, but you can come up with one that ends up crawling different sites with periods between 2 minutes and 1 day.

Third, don’t “scrape” the content if a better protocol is supported. Many content websites have RSS – use that instead of the HTML of the page. If not, make use of sitemaps. If the WebSub protocol gains traction, you can avoid the crawling/scraping entirely and get notified on new content.

Finally, make sure your crawler/scraper is identifiable by the UserAgent. You can supply your service name or web address in it to make it easier for website owners to find you and complain in case you’ve misconfigured something.

I guess it makes sense to see if using a service like import.io, ScrapingHub, WrapAPI or GetData makes sense for your usecase, instead of reinventing the wheel.

No matter what your use case or approach is, please make sure you don’t put unnecessary pressure on others’ websites.

The post Fix Your Crawler appeared first on Bozho's tech blog.

Connect Veeam to the B2 Cloud: Episode 1 — Using Synology

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/backing-up-veeam-cloud-connect-synology-b2/

Veeam Cloud Connect to Backblaze B2

Veeam is well-known for its easy-to-use software for backing up virtual machines from VMware and Microsoft.

Users of Veeam and Backblaze B2 Cloud Storage have asked for a way to back up a Veeam repository to B2. Backblaze’s B2 is an ideal solution for backing up Veeam’s backup repository due to B2’s combination of low-cost and high availability compared to other cloud solutions such as Microsoft Azure.

This is the first in a series of posts on the topic of backing up Veeam to B2. Future posts will cover other methods.

In this post we provide a step-by-step tutorial on how to configure a Synology NAS as a Veeam backup repository, and in turn use Synology’s CloudSync software to back up that repository to the B2 Cloud.

Our guest contributor, Rhys Hammond, is well qualified to author this tutorial. Rhys is a Senior System Engineer for Data#3 in Australia specializing in Veeam and VMware solutions. He is a VMware vExpert and a member of the Veeam Vanguard program.

Rhy’s tutorial is outlined as follows:

Veeam and Backblaze B2 — Introduction

Introduction

Background on B2 and Veeam, and a discussion of various ways to back up a Veeam backup repository to the cloud.

Phase 1 — Create the Backblaze B2 Bucket

How to create the B2 Bucket that will be the destination for mirroring our Veeam backup repository.

Phase 2 — Install and Configure Synology CloudSync

Get CloudSync ready to perform the backup to B2.

Phase 3 — Configure Veeam Backup Repository

Create a new Veeam backup repository in preparation for upload to B2.

Phase 4 — Create the Veeam Backup Job

Configure the Veeam backup job, with two possible scenarios, primary target and secondary backup target.

Phase 5 — Testing and Tuning

Making sure it all works.

Summary

Some thoughts on the process, other options, and tips.

You can read the full tutorial on Rhy’s website by following the link below. To be sure to receive notice of future posts in this series on Veeam, use the Join button at the top of the page.

Beta Testers Needed: Veeam/Starwind/B2

If you back up Veeam using Starwind VTL, we have a BETA program for you. Help us with the Starwind VTL to Backblaze B2 integration Beta and test whether you can automatically back up Veeam to Backblaze B2 via Starwind VTL. Motivated beta testers can email starwind@backblaze.com for details and how to get started.

The post Connect Veeam to the B2 Cloud: Episode 1 — Using Synology appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Wanted: Fixed Assets Accountant

Post Syndicated from Yev original https://www.backblaze.com/blog/wanted-fixed-assets-accountant/

As Backblaze continues to grow, we’re expanding our accounting team! We’re looking for a seasoned Fixed Asset Accountant to help us with fixed assets and equipment leases.

Job Duties:

  • Maintain and review fixed assets.
  • Record fixed asset acquisitions and dispositions.
  • Review and update the detailed schedule of fixed assets and accumulated depreciation.
  • Calculate depreciation for all fixed assets.
  • Investigate the potential obsolescence of fixed assets.
  • Coordinate with Operations team data center asset dispositions.
  • Conduct periodic physical inventory counts of fixed assets. Work with Operations team on cycle counts.
  • Reconcile the balance in the fixed asset subsidiary ledger to the summary-level account in the general ledger.
  • Track company expenditures for fixed assets in comparison to the capital budget and management authorizations.
  • Prepare audit schedules relating to fixed assets, and assist the auditors in their inquiries.
  • Recommend to management any updates to accounting policies related to fixed assets.
  • Manage equipment leases.
  • Engage and negotiate acquisition of new equipment lease lines.
  • Overall control of original lease documentation and maintenance of master lease files.
  • Facilitate and track routing and execution of various lease related: agreements — documents/forms/lease documents.
  • Establish and maintain proper controls to track expirations, renewal options, and all other critical dates.
  • Perform other duties and special projects as assigned.

Qualifications:

  • 5-6 years relevant accounting experience.
  • Knowledge of inventory and cycle counting preferred.
  • Quickbooks, Excel, Word experience desired.
  • Organized, with excellent attention to detail, meticulous, quick-learner.
  • Good interpersonal skills and a team player.
  • Flexibility and ability to adapt and wear different hats.

Backblaze Employees Have:

  • Good attitude and willingness to do whatever it takes to get the job done.
  • Strong desire to work for a small, fast-paced company.
  • Desire to learn and adapt to rapidly changing technologies and work environment.
  • Comfortable with well-behaved pets in the office.

This position is located in San Mateo, California. Regular attendance in the office is expected. Backblaze is an Equal Opportunity Employer and we offer competitive salary and benefits, including our no policy vacation policy.

If This Sounds Like You:
Send an email to [email protected] with:

  1. Fixed Asset Accountant in the subject line
  2. Your resume attached
  3. An overview of your relevant experience

The post Wanted: Fixed Assets Accountant appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Security updates for Monday

Post Syndicated from ris original https://lwn.net/Articles/743575/rss

Security updates have been issued by Arch Linux (linux-hardened, linux-lts, linux-zen, and mongodb), Debian (gdk-pixbuf, gifsicle, graphicsmagick, kernel, and poppler), Fedora (dracut, electron-cash, and firefox), Gentoo (backintime, binutils, chromium, emacs, libXcursor, miniupnpc, openssh, optipng, and webkit-gtk), Mageia (kernel, kernel-linus, kernel-tmb, openafs, and python-mistune), openSUSE (clamav-database, ImageMagick, kernel-firmware, nodejs4, and qemu), Red Hat (linux-firmware, ovirt-guest-agent-docker, qemu-kvm-rhev, redhat-virtualization-host, rhev-hypervisor7, rhvm-appliance, thunderbird, and vdsm), Scientific Linux (thunderbird), SUSE (kernel and qemu), and Ubuntu (firefox and poppler).

I am Beemo, a little living boy: Adventure Time prop build

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/adventure-time-bmo/

Bob Herzberg, BMO builder and blogger at BYOBMO.com, fills us in on the whys and hows and even the Pen Wards of creating interactive Adventure Time BMO props with the Raspberry Pi.

A Conversation With BMO

A conversation with BMO showing off some voice recognition capabilities. There is no interaction for BMO’s responses other than voice commands. There is a small microphone inside BMO (right behind the blue dot) and the voice commands are processed by Google voice API over WiFi.

Finding BMO

My first BMO began as a cosplay prop for my daughter. She and her friends are huge fans of Adventure Time and made their costumes for Princess Bubblegum, Marceline, and Finn. It was my job to come up with a BMO.

Raspberry Pi BMO Laura Herzberg Bob Herzberg

Bob as Banana Guard, daughter Laura as Princess Bubblegum, and son Steven as Finn

I wanted something electronic, and also interactive if possible. And it had to run on battery power. There was only one option that I found that would work: the Raspberry Pi.

Building a living little boy

BMO’s basic internals consist of the Raspberry Pi, an 8” HDMI monitor, and a USB battery pack. The body is made from laser-cut MDF wood, which I sanded, sealed, and painted. I added 3D-printed arms and legs along with some vinyl lettering to complete the look. There is also a small wireless keyboard that works as a remote control.

Adventure Time BMO prop
Adventure Time BMO prop
Adventure Time BMO prop
Adventure Time BMO prop

To make the front panel button function, I created a custom PCB, mounted laser-cut acrylic buttons on it, and connected it to the Pi’s IO header.

Inside BMO - Raspberry Pi BMO Laura Herzberg Bob Herzberg

Custom-made PCBs control BMO’s gaming buttons and USB input.

The USB jack is extended with another custom PCB, which gives BMO USB ports on the front panel. His battery life is an impressive 8 hours of continuous use.

The main brain game frame

Most of BMO’s personality comes from custom animations that my daughter created and that were then turned into MP4 video files. The animations are triggered by the remote keyboard. Some versions of BMO have an internal microphone, and the Google Voice API is used to translate the user’s voice and map it to an appropriate response, so it’s possible to have a conversation with BMO.

The final components of Raspberry Pi BMO Laura Herzberg Bob Herzberg

The Raspberry Pi Camera Module was also put to use. Some BMOs have a servo that can pop up a camera, called GoMO, which takes pictures. Although some people mistake it for ghost detecting equipment, BMO just likes taking nice pictures.

Who wants to play video games?

Playing games on BMO is as simple as loading one of the emulators supported by Raspbian.

BMO connected to SNES controllers - Raspberry Pi BMO Laura Herzberg Bob Herzberg

I’m partial to the Atari 800 emulator, since I used to write games for that platform when I was just starting to learn programming. The front-panel USB ports are used for connecting gamepads, or his front-panel buttons and D-Pad can be used.

Adventure time

BMO has been a lot of fun to bring to conventions. He makes it to ComicCon San Diego each year and has been as far away as DragonCon in Atlanta, where he finally got to meet the voice of BMO, Niki Yang.

BMO's back panel - Raspberry Pi BMO Laura Herzberg Bob Herzberg

BMO’s back panel, autographed by Niki Yang

One day, I received an email from the producer of Adventure Time, Kelly Crews, with a very special request. Kelly was looking for a birthday present for the show’s creator, Pendleton Ward. It was either luck or coincidence that I just was finishing up the latest version of BMO. Niki Yang added some custom greetings just for Pen.

BMO Wishes Pendleton Ward a Happy Birthday!

Happy birthday to Pendleton Ward, the creator of, well, you know what. We were asked to build Pen his very own BMO and with help from Niki Yang and the Adventure Time crew here is the result.

We added a few more items inside, including a 3D-printed heart, a medal, and a certificate which come from the famous Be More episode that explains BMO’s origins.

Back of Adventure Time BMO prop
Adventure Time BMO prop
Adventure Time BMO prop
Adventure Time BMO prop

BMO was quite a challenge to create. Fabricating the enclosure required several different techniques and materials. Fortunately, bringing him to life was quite simple once he had a Raspberry Pi inside!

Find out more

Be sure to follow Bob’s adventures with BMO at the Build Your Own BMO blog. And if you’ve built your own prop from television or film using a Raspberry Pi, be sure to share it with us in the comments below or on our social media channels.

 

All images c/o Bob and Laura Herzberg

The post I am Beemo, a little living boy: Adventure Time prop build appeared first on Raspberry Pi.

Pirate Bay Founder: Netflix and Spotify Are a Threat, No Solution

Post Syndicated from Ernesto original https://torrentfreak.com/pirate-bay-founder-netflix-and-spotify-are-a-threat-no-solution-180107/

Ten years ago the Internet was an entirely different place. Piracy was rampant, as it is today, but the people behind the largest torrent sites were more vocal then.

There was a battle going on for the right to freely share content online. This was very much a necessity at the time, as legal options were scarce, but for many it was also an idealistic battle.

As the spokesperson of The Pirate Bay, Peter Sunde was one of the leading voices at the time. He believed, and still does, that people should be able to share anything without restrictions. Period.

For Peter and three others associated with The Pirate Bay, this eventually resulted in jail sentences. They were not the only ones to feel the consequences. Over the past decade, dozens of torrent sites were shut down under legal pressure, forcing those operators that remain to go into hiding.

Today, ten years after we spoke to Peter about the future of torrent sites and file-sharing, we reach out to him again. A lot has changed, but how does The Pirate Bay’s co-founder look at things now?

“On the personal side, all is great, and I’m working on a TV-series about activism that will air next year. On top of that of course working on Njalla, Ipredator and other known projects,” Peter says.

“In general, I think that projects for me are still about the same thing as a decade ago, but just trying different approaches!”

While Peter stays true to his activist roots, fighting for privacy and freedom on the Internet, his outlook is not as positive as it once was.

He is proud that The Pirate Bay never caved and that they fought their cases to the end. The moral struggle was won, but he also realizes that the greater battle was lost.

“I’m proud and happy to be able to look myself in the mirror every morning with a feeling of doing right. A lot of corrupt people involved in our cases probably feel quite shitty. Well, if they have feelings,” Peter says.

The Pirate Bay’s former spokesperson doesn’t have any regrets really. The one thing that comes to mind, when we ask about things that he would have done differently, is to tell fellow Pirate Bay founder Anakata to encrypt his hard drive.

Brokep (Peter) and Anakata (Gottfrid)

Looking at the current media climate, Peter doesn’t think we are better off. On the contrary. While it might be easier in some counties to access content legally online, this also means that control is now firmly in the hands of a few major companies.

The Pirate Bay and others always encouraged free sharing for creators and consumers. This certainly hasn’t improved. Instead, media today is contained in large centralized silos.

“I’m surprised that people are so short-sighted. The ‘solution’ to file sharing was never centralizing content control back to a few entities – that was the struggle we were fighting for.

“Netflix, Spotify etc are not a solution but a loss. And it surprises me that the pirate movement is not trying to talk more about that,” he adds.

The Netflixes and Spotifies of this world are often portrayed as a solution to piracy. However, Peter sees things differently. He believes that these services put more control in the hands of powerful companies.

“The same companies we fought own these platforms. Either they own the shares in the companies, or they have deals with them which makes it impossible for these companies to not follow their rules.

“Artists can’t choose to be or not to be on Spotify in reality, because there’s nothing else in the end. If Spotify doesn’t follow the rules from these companies, they are fucked as well. The dependence is higher than ever.”

The first wave of mass Internet piracy well over a decade ago was a wake-up call to the entertainment industry. The immense popularity of torrent sites showed that people demanded something they weren’t offering.

In a way, these early pirate sites are the reason why Netflix and Spotify were able to do what they do. Literally, in the case of Spotify, which used pirated music to get the service going.

Peter doesn’t see them as the answer though. The only solution in his book is to redefine and legalize piracy.

“The solution to piracy is to re-define piracy. Make things available to everyone, without that being a crime,” Peter says.

In this regard, not much has changed in ten years. However, having witnessed this battle closer than anyone else, he also realizes that the winners are likely on the other end.

Piracy will decrease over time, but not the way Peter hopes it will.

“I think we’ll have less piracy because of the problems we see today. With net neutrality being infringed upon and more laws against individual liberties and access to culture, instead of actually benefiting people.

“The media industry will be happy to know that their lobbying efforts and bribes are paying off,” he concludes.

This is the second and final post in our torrent pioneers series. The first interview with isoHunt founder Gary Fung is available here.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Top 10 Most Popular Torrent Sites of 2018

Post Syndicated from Ernesto original https://torrentfreak.com/top-10-most-popular-torrent-sites-of-2018-180107/

Torrent sites have come and gone over past year. Now, at the start of 2018, we take a look to see what the most-used sites are in the current landscape.

The Pirate Bay remains the undisputed number one. The site has weathered a few storms over the years, but it looks like it will be able to celebrate its 15th anniversary, which is coming up in a few months.

The list also includes various newcomers including Idope and Zooqle. While many people are happy to see new torrent sites emerge, this often means that others have called it quits.

Last year’s runner-up Extratorrent, for example, has shut down and left a gaping hole behind. And it wasn’t the only site that went away. TorrentProject also disappeared without a trace and the same was true for isohunt.to.

The unofficial Torrentz reincarnation Torrentz2.eu, the highest newcomer last year, is somewhat of an unusual entry. A few weeks ago all links to externally hosted torrents were removed, as was the list of indexed pages.

We decided to include the site nonetheless, given its history and because it’s still possible to find hashes through the site. As Torrentz2’s future is uncertain, we added an extra site (10.1) as compensation.

Finally, RuTracker also deserves a mention. The torrent site generates enough traffic to warrant a listing, but we traditionally limit the list to sites that are targeted primarily at an English or international audience.

Below is the full list of the ten most-visited torrent sites at the start of the new year. The list is based on various traffic reports and we display the Alexa rank for each. In addition, we include last year’s ranking.

Most Popular Torrent Sites

1. The Pirate Bay

The Pirate Bay is the “king of torrents” once again and also the oldest site in this list. The past year has been relatively quiet for the notorious torrent site, which is currently operating from its original .org domain name.

Alexa Rank: 104/ Last year #1

2. RARBG

RARBG, which started out as a Bulgarian tracker, has captured the hearts and minds of many video pirates. The site was founded in 2008 and specializes in high quality video releases.

Alexa Rank: 298 / Last year #3

3. 1337x

1337x continues where it left off last year. The site gained a lot of traffic and, unlike some other sites in the list, has a dedicated group of uploaders that provide fresh content.

Alexa Rank: 321 / Last year #6

4. Torrentz2

Torrentz2 launched as a stand-in for the original Torrentz.eu site, which voluntarily closed its doors in 2016. At the time of writing, the site only lists torrent hashes and no longer any links to external torrent sites. While browser add-ons and plugins still make the site functional, its future is uncertain.

Alexa Rank: 349 / Last year #5

5. YTS.ag

YTS.ag is the unofficial successors of the defunct YTS or YIFY group. Not all other torrent sites were happy that the site hijacked the popuar brand and several are actively banning its releases.

Alexa Rank: 563 / Last year #4

6. EZTV.ag

The original TV-torrent distribution group EZTV shut down after a hostile takeover in 2015, with new owners claiming ownership of the brand. The new group currently operates from EZTV.ag and releases its own torrents. These releases are banned on some other torrent sites due to this controversial history.

Alexa Rank: 981 / Last year #7

7. LimeTorrents

Limetorrents has been an established torrent site for more than half a decade. The site’s operator also runs the torrent cache iTorrents, which is used by several other torrent search engines.

Alexa Rank: 2,433 / Last year #10

8. NYAA.si

NYAA.si is a popular resurrection of the anime torrent site NYAA, which shut down last year. Previously we left anime-oriented sites out of the list, but since we also include dedicated TV and movie sites, we decided that a mention is more than warranted.

Alexa Rank: 1,575 / Last year #NA

9. Torrents.me

Torrents.me is one of the torrent sites that enjoyed a meteoric rise in traffic this year. It’s a meta-search engine that links to torrent files and magnet links from other torrent sites.

Alexa Rank: 2,045 / Last year #NA

10. Zooqle

Zooqle, which boasts nearly three million verified torrents, has stayed under the radar for years but has still kept growing. The site made it into the top 10 for the first time this year.

Alexa Rank: 2,347 / Last year #NA

10.1 iDope

The special 10.1 mention goes to iDope. Launched in 2016, the site is a relative newcomer to the torrent scene. The torrent indexer has steadily increased its audience over the past year. With similar traffic numbers to Zooqle, a listing is therefore warranted.

Alexa Rank: 2,358 / Last year #NA

Disclaimer: Yes, we know that Alexa isn’t perfect, but it helps to compare sites that operate in a similar niche. We also used other traffic metrics to compile the top ten. Please keep in mind that many sites have mirrors or alternative domains, which are not taken into account here.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Torrent Pioneers: isoHunt’s Gary Fung, Ten Years Later

Post Syndicated from Ernesto original https://torrentfreak.com/torrent-pioneers-isohunts-gary-fung-ten-years-later-180106/

Ten years ago, November 2007 to be precise, we published an article featuring the four leading torrent site admins at the time.

Niek van der Maas of Mininova, Justin Bunnell of TorrentSpy, Pirate Bay’s Peter Sunde and isoHunt’s Gary Fung were all kind enough to share their vision of BitTorrent’s future.

This future is the present today, and although the predictions were not all spot-on, there are a few interesting observations to make.

For one, these four men were all known by name, despite the uncertain legal situation they were in. How different is that today, when the operators of most of the world’s largest torrent sites are unknown to the broader public.

Another thing that stands out is that none of these pioneers are still active in the torrent space today. Niek and Justin have their own advertising businesses, Peter is a serial entrepreneur involved in various startups, while Gary works on his own projects.

While they have all moved on, they also remain a part of Internet history, which is why we decided to reach out to them ten years on.

Gary Fung was the first to reply. Those who’ve been following torrent news for a while know that isoHunt was shut down in 2013. The shutdown was the result of a lawsuit and came with a $110 million settlement with the MPAA, on paper.

Today the Canadian entrepreneur has other things on his hands, which includes “leveling up” his now one-year-old daughter. While that can be a day job by itself, he is also finalizing a mobile search app which will be released in the near future.

“The key is speed, and I can measure its speedup of the whole mobile search experience to be 10-100x that of conventional mobile web browsers,” Gary tells us, noting that after years of development, it’s almost ready.

The new search app is not one dedicated to torrents, as isoHunt once was. However, looking back, Gary is proud of what he accomplished with isoHunt, despite the bitter end.

“It was a humbling experience, in more ways than one. I’m proud that I participated and championed the rise of P2P content distribution through isoHunt as a search gateway,” Gary tells us.

“But I was also humbled by the responsibility and power at play, as seen in the lawsuits from the media industry giants, as well as the even larger picture of what P2P technologies were bringing, and still bring today.”

Decentralization has always been a key feature of BitTorrent and Gary sees this coming back in new trends. This includes the massive attention for blockchain related projects such as Bitcoin.

“2017 was the year Bitcoin became mainstream in a big way, and it’s feeling like the Internet before 2000. Decentralization is by nature disruptive, and I can’t wait to see what decentralizing money, governance, organizations and all kinds of applications will bring in the next few years.

“dApps [decentralized apps] made possible by platforms like Ethereum are like generalized BitTorrent for all kinds of applications, with ones we haven’t even thought of yet,” Gary adds.

Not everything is positive in hindsight, of course. Gary tells us that if he had to do it all over again he would take legal issues and lawyers more seriously. Not doing so led to more trouble than he imagined.

As a former torrent site admin, he has thought about the piracy issue quite a bit over the years. And unlike some sites today, he was happy to look for possible solutions to stop piracy.

One solution Gary suggested to Hollywood in the past was a hash recognition system for infringing torrents. A system to automatically filter known infringing files and remove these from cooperating torrent sites could still work today, he thinks.

“ContentID for all files shared on BitTorrent, similar to YouTube. I’ve proposed this to Hollywood studios before, as a better solution to suing their customers and potential P2P technology partners, but it obviously fell on deaf ears.”

In any case, torrent sites and similar services will continue to play an important role in how the media industry evolves. These platforms are showing Hollywood what the public wants, Gary believes.

“It has and will continue to play a role in showing the industry what consumers truly want: frictionless, convenient distribution, without borders of country or bundles. Bundles as in cable channels, but also in any way unwanted content is forced onto consumers without choice.”

While torrents were dominant in the past, the future will be streaming mostly, isoHunt’s founder says. He said this ten years ago, and he believes that in another decade it will have completely replaced cable TV.

Whether piracy will still be relevant then depends on how content is offered. More fragmentation will lead to more piracy, while easier access will make it less relevant.

“The question then will be, will streaming platforms be fragmented and exclusive content bundled into a hundred pieces besides Netflix, or will consumer choice and convenience win out in a cross-platform way?

“A piracy increase or reduction will depend on how that plays out because nobody wants to worry about ten monthly subscriptions to ten different streaming services, much less a hundred,” Gary concludes.

Perhaps we should revisit this again next decade…


The second post in this series, with Peter Sunde, will be published this weekend. The other two pioneers did not respond or declined to take part.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

[$] Addressing Meltdown and Spectre in the kernel

Post Syndicated from corbet original https://lwn.net/Articles/743265/rss

When the Meltdown and Spectre vulnerabilities were disclosed on
January 3, attention quickly turned to mitigations. There was already
a clear defense against Meltdown in the form of kernel page-table isolation (KPTI), but the
defenses
against the two Spectre variants had not been developed in public and still
do not exist in the mainline kernel. Initial versions of proposed
defenses have now been disclosed. The resulting picture shows what has
been done to fend off Spectre-based attacks in the near future, but the
situation remains chaotic, to put it lightly.