Tag Archives: downloads

A Million ‘Pirate’ Boxes Sold in the UK During The Last Two Years

Post Syndicated from Andy original https://torrentfreak.com/a-million-pirate-boxes-sold-in-the-uk-during-the-last-two-years-170919/

With the devices hitting the headlines on an almost weekly basis, it probably comes as no surprise that ‘pirate’ set-top boxes are quickly becoming public enemy number one with video rightsholders.

Typically loaded with the legal Kodi software but augmented with third-party addons, these often Android-based pieces of hardware drag piracy out of the realm of the computer savvy and into the living rooms of millions.

One of the countries reportedly most affected by this boom is the UK. The consumption of these devices among the general public is said to have reached epidemic proportions, and anecdotal evidence suggests that terms like Kodi and Showbox are now household terms.

Today we have another report to digest, this time from the Federation Against Copyright Theft, or FACT as they’re often known. Titled ‘Cracking Down on Digital Piracy,’ the report provides a general overview of the piracy scene, tackling well-worn topics such as how release groups and site operators work, among others.

The report is produced by FACT after consultation with the Police Intellectual Property Crime Unit, Intellectual Property Office, Police Scotland, and anti-piracy outfit Entura International. It begins by noting that the vast majority of the British public aren’t involved in the consumption of infringing content.

“The most recent stats show that 75% of Brits who look at content online abide by the law and don’t download or stream it illegally – up from 70% in 2013. However, that still leaves 25% who do access material illegally,” the report reads.

The report quickly heads to the topic of ‘pirate’ set-top boxes which is unsurprising, not least due to FACT’s current focus as a business entity.

While it often positions itself alongside government bodies (which no doubt boosts its status with the general public), FACT is a private limited company serving The Premier League, another company desperate to stamp out the use of infringing devices.

Nevertheless, it’s difficult to argue with some of the figures cited in the report.

“At a conservative estimate, we believe a million set-top boxes with software added
to them to facilitate illegal downloads have been sold in the UK in the last couple
of years,” the Intellectual Property Office reveals.

Interestingly, given a growing tech-savvy public, FACT’s report notes that ready-configured boxes are increasingly coming into the country.

“Historically, individuals and organized gangs have added illegal apps and add-ons onto the boxes once they have been imported, to allow illegal access to premium channels. However more recently, more boxes are coming into the UK complete with illegal access to copyrighted content via apps and add-ons already installed,” FACT notes.

“Boxes are often stored in ‘fulfillment houses’ along with other illegal electrical items and sold on social media. The boxes are either sold as one-off purchases, or with a monthly subscription to access paid-for channels.”

While FACT press releases regularly blur the lines when people are prosecuted for supplying set-top boxes in general, it’s important to note that there are essentially two kinds of products on offer to the public.

The first relies on Kodi-type devices which provide on-going free access to infringing content. The second involves premium IPTV subscriptions which are a whole different level of criminality. Separating the two when reading news reports can be extremely difficult, but it’s a hugely important to recognize the difference when assessing the kinds of sentences set-top box suppliers are receiving in the UK.

Nevertheless, FACT correctly highlights that the supply of both kinds of product are on the increase, with various parties recognizing the commercial opportunities.

“A significant number of home-grown British criminals are now involved in this type of crime. Some of them import the boxes wholesale through entirely legal channels, and modify them with illegal software at home. Others work with sophisticated criminal networks across Europe to bring the boxes into the UK.

“They then sell these boxes online, for example through eBay or Facebook, sometimes managing to sell hundreds or thousands of boxes before being caught,” the company adds.

The report notes that in some cases the sale of infringing set-top boxes occurs through cottage industry, with suppliers often working on their own or with small groups of friends and family. Invetiably, perhaps, larger scale operations are reported to be part of networks with connections to other kinds of crime, such as dealing in drugs.

“In contrast to drugs, streaming devices provide a relatively steady and predictable revenue stream for these criminals – while still being lucrative, often generating hundreds of thousands of pounds a year, they are seen as a lower risk activity with less likelihood of leading to arrest or imprisonment,” FACT reports.

While there’s certainly the potential to earn large sums from ‘pirate’ boxes and premium IPTV services, operating on the “hundreds of thousands of pounds a year” scale in the UK would attract a lot of unwanted attention. That’s not saying that it isn’t already, however.

Noting that digital piracy has evolved hugely over the past three or four years, the report says that the cases investigated so far are just the “tip of the iceberg” and that many other cases are in the early stages and will only become known to the public in the months and years ahead.

Indeed, the Intellectual Property Office hints that some kind of large-scale enforcement action may be on the horizon.

“We have identified a significant criminal business model which we have discussed and shared with key law enforcement partners. I can’t go into detail on this, but as investigations take their course, you will see the scale,” an IPO spokesperson reveals.

While details are necessarily scarce, a source familiar with this area told TF that he would be very surprised if the targets aren’t the growing handful of commercial UK-based IPTV re-sellers who offer full subscription TV services for a few pounds per month.

“They’re brazen. Watch this space,” he said.

FACT’s full report, Cracking Down on Digital Piracy, can be downloaded here (pdf)

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

ShareBeast & AlbumJams Operator Pleads Guilty to Criminal Copyright Infringement

Post Syndicated from Andy original https://torrentfreak.com/sharebeast-albumjams-operator-pleads-guilty-to-criminal-copyright-infringement-170911/

In September 2015, U.S. authorities announced action against a pair of sites involved in music piracy.

ShareBeast.com and AlbumJams.com were allegedly responsible for the distribution of “a massive library” of popular albums and tracks. Both were accused of offering thousands of tracks before their official release dates.

The U.S. Department of Justice (DOJ) placed their now familiar seizure notice on both domains, with the RIAA claiming ShareBeast was the largest illegal file-sharing site operating in the United States. Indeed, the site’s IP addresses at the time indicated at least some hosting taking place in Illinois.

“This is a huge win for the music community and legitimate music services. Sharebeast operated with flagrant disregard for the rights of artists and labels while undermining the legal marketplace,” RIAA Chairman & CEO Cary Sherman commented at the time.

“Millions of users accessed songs from Sharebeast each month without one penny of compensation going to countless artists, songwriters, labels and others who created the music.”

Now, a full two years later, former Sharebeast operator Artur Sargsyan has pleaded guilty to one felony count of criminal copyright infringement, admitting to the unauthorized distribution and reproduction of over 1 billion copies of copyrighted works.

“Through Sharebeast and other related sites, this defendant profited by illegally distributing copyrighted music and albums on a massive scale,” said U. S. Attorney John Horn.

“The collective work of the FBI and our international law enforcement partners have shut down the Sharebeast websites and prevented further economic losses by scores of musicians and artists.”

The Department of Justice says that from 2012 to 2015, 29-year-old Sargsyan used ShareBeast as a pirate music repository, infringing works produced by Ariana Grande, Katy Perry, Beyonce, Kanye West, and Justin Bieber, among others. He linked to that content from Newjams.net and Albumjams.com, two other sites under his control.

The DoJ says that Sargsyan was informed at least 100 times that there was infringing content on ShareBeast but despite the warnings, the content remained available. When those warnings produced no results, the FBI – assisted by law enforcement in the UK and the Netherlands – seized servers used by Sargsyan to distribute the material.

Brad Buckles, EVP, Anti-Piracy at the RIAA, welcomed the guilty plea.

“Sharebeast and its related sites represented the most popular network of infringing music sites operated out of the United States. The network was responsible for providing millions of downloads of popular music files including unauthorized pre-release albums and tracks.This illicit activity was a gut-punch to music creators who were paid nothing by the service,” Buckles said.

“We are incredibly grateful for the government’s commitment to protecting the rights of artists and labels. We especially thank the dedicated agents of the FBI who painstakingly unraveled this criminal enterprise, and U.S. Attorney John Horn and his team for their work and diligence in seeing this case to its successful conclusion.”

Sargsyan, of Glendale, California, will be sentenced December 4 before U.S. District Judge Timothy C. Batten.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

‘Game of Thrones Season 7 Pirated Over a Billion Times’

Post Syndicated from Ernesto original https://torrentfreak.com/game-of-thrones-season-7-pirated-over-a-billion-times-170905/

The seventh season of Game of Thrones has brought tears and joy to HBO this summer.

It was the most-viewed season thus far, with record-breaking TV ratings. But on the other hand, HBO and Game of Thrones were plagued by hacks, leaks, and piracy, of course.

While it’s hard to measure piracy accurately, streaming in particular, piracy tracking outfit MUSO has just released some staggering numbers. According to the company, the latest season was pirated more than a billion times in total.

To put this into perspective, this means that on average each episode was pirated 140 million times, compared to 32 million views through legal channels.

The vast majority of the pirate ‘views’ came from streaming services (85%), followed by torrents (9%) and direct downloads (6%). Private torrent trackers are at the bottom with less than one percent.

Pirate sources

Andy Chatterley, MUSO’s CEO and Co-Founder, notes that the various leaks may have contributed to these high numbers. This is supported by the finding that the sixth episode, which leaked several days in advance, was pirated more than the season finale.

“It’s no secret that HBO has been plagued by security breaches throughout the latest season, which has seen some episodes leak before broadcast and added to unlicensed activity,” Chatterley says.

In addition, the data shows that despite a heavy focus on torrent traffic, unauthorized streaming is a much bigger problem for rightsholders.

“In addition to the scale of piracy when it comes to popular shows, these numbers demonstrate that unlicensed streaming can be a far more significant type of piracy than torrent downloads.”

Although the report shares precise numbers, it’s probably best to describe them as estimates.

The streaming data MUSO covers is sourced from SimilarWeb, which uses a sample of 200 million ‘devices’ to estimate website traffic. The sample data covers thousands of popular pirate sites and is extrapolated into the totals.

While more than a billion downloads are pretty significant, to say the least, MUSO is not even looking at the full pirate landscape.

For one, Muso’s streaming data doesn’t include Chinese traffic, which usually has a very active piracy community. As if that’s not enough, alternative pirate sources such as fully-loaded Kodi boxes, are not included either.

It’s clear though, which doesn’t really come as a surprise, that Game of Thrones piracy overall is still very significant. The torrent numbers may not have grown in recent years, but streaming seems to be making up for it and probably adding a few dozen million extra, give or take.

Total Global Downloads and Streams by Episode

Episode one: 187,427,575
Episode two: 123,901,209
Episode three: 116,027,851
Episode four: 121,719,868
Episode five: 151,569,560
Episode six: 184,913,279
Episode seven (as of 3rd Sept): 143,393,804
All Episode Bundles – Season 7: 834,522
TOTAL (as of 3rd September) = 1,029,787,668

Total Breakdown By Type

Streaming: 84.66%
Torrent: 9.12%
Download: 5.59%
Private Torrent: 0.63%

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Streaming Service iflix Buys Shows Based on Piracy Data

Post Syndicated from Ernesto original https://torrentfreak.com/streaming-service-iflix-buys-shows-based-on-piracy-data-170819/

When major movie and TV companies discuss piracy they often mention the massive losses incurred as a result of unauthorized downloads and streams.

However, this unofficial market also offers a valuable pool of often publicly available data on the media consumption habits of a relatively young generation.

Many believe that piracy is in part a market signal showing copyright holders what consumers want. This makes piracy statistics key business intelligence, which some companies have started to realize.

Netflix, for example, previously said that their offering is partly based on what shows do well on BitTorrent networks and other pirate sites. In addition, the streaming service also uses piracy to figure out how much they can charge in a country. They are not alone.

Other major entertainment companies also keep a close eye on piracy, using this data to their advantage. This includes the Asia-based streaming portal iFlix, which recently secured $133 million in funding and boasts to have over five million users.

Iflix co-founder Patrick Grove says that his company actively uses piracy numbers to determine what content they acquire. The data reveal what is popular locally, and help to give viewers the TV-shows and movies they’re most interested in.

“We looked at piracy data in every market,” Grove informed CNBC’s Managing Asia, which doesn’t stop at looking at a few torrent download numbers.

Representatives from the Asian company actually went out on the streets to buy pirated DVDs from street vendors. In addition, iflix also received help from local Internet providers which shared a variety of streaming data.

TorrentFreak reached out to the streaming service to get more details about their data gathering techniques. One of the main partners to measure online piracy is the German company TECXIPIO, which is known to actively monitor BitTorrent traffic.

The company also maintains a close relationship with Internet providers that offer further insight, including streaming data, to determine which titles work best in each market.

While analyzing the different sets of data, the streaming service was surprised to see the diversity in different regions as well as the ever-changing consumer demand.

“Through looking at the Top 20 pirated DVDs in every market we are live in, we were surprised to find the amount of pirated K-drama content. In Ghana for example, the number one pirated title is K-drama series called ‘Legend of the Blue Sea’,” an iflix spokesperson told us.

Iflix believes that piracy data is superior to other market intelligence. Before rolling out its service in Saudi Arabia the company made a list of the 1,000 most popular shows and used that to its advantage.

While there is a lot of piracy in emerging markets, iflix doesn’t think that people are not willing to pay for entertainment. It just has to be available for a decent price, and that’s where they come in.

“We believe that people in emerging markets do not actively want to steal content, they do so because there is no better alternative,” the company informs us.

“As consumers become more connected, gaining access to information and cultural influences on a global scale, they want to be entertained at a world-class standard. We set out with the aim of offering an alternative that is better than piracy; by providing unlimited access to high-quality, world-class entertainment, all at the price of pirated DVD.”

There is no doubt that iflix is ambitious, and that it’s willing to employ some unusual tactics to grow its userbase. The company is quite optimistic about the future as well, judging from its co-founder’s prediction that it will welcome its billionth viewer in a few years.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Porn Producer Says He’ll Prove That AMC TV Exec is a BitTorrent Pirate

Post Syndicated from Andy original https://torrentfreak.com/porn-producer-says-hell-prove-that-amc-tv-exec-is-a-bittorrent-pirate-170818/

When people are found sharing copyrighted pornographic content online in the United States, there’s always a chance that an angry studio will attempt to track down the perpertrator in pursuit of a cash settlement.

That’s what adult studio Flava Works did recently, after finding its content being shared without permission on a number of gay-focused torrent sites. It’s now clear that their target was Marc Juris, President & General Manager of AMC-owned WE tv. Until this week, however, that information was secret.

As detailed in our report yesterday, Flava Works contacted Juris with an offer of around $97,000 to settle the case before trial. And, crucially, before Juris was publicly named in a lawsuit. If Juris decided not to pay, that amount would increase significantly, Flava Works CEO Phillip Bleicher told him at the time.

Not only did Juris not pay, he actually went on the offensive, filing a ‘John Doe’ complaint in a California district court which accused Flava Works of extortion and blackmail. It’s possible that Juris felt that this would cause Flava Works to back off but in fact, it had quite the opposite effect.

In a complaint filed this week in an Illinois district court, Flava Works named Juris and accused him of a broad range of copyright infringement offenses.

The complaint alleges that Juris was a signed-up member of Flava Works’ network of websites, from where he downloaded pornographic content as his subscription allowed. However, it’s claimed that Juris then uploaded this material elsewhere, in breach of copyright law.

“Defendant downloaded copyrighted videos of Flava Works as part of his paid memberships and, in violation of the terms and conditions of the paid sites, posted and distributed the aforesaid videos on other websites, including websites with peer to peer sharing and torrents technology,” the complaint reads.

“As a result of Defendant’ conduct, third parties were able to download the copyrighted videos, without permission of Flava Works.”

In addition to demanding injunctions against Juris, Flava Works asks the court for a judgment in its favor amounting to a cool $1.2m, more than twelve times the amount it was initially prepared to settle for. It’s a huge amount, but according to CEO Phillip Bleicher, it’s what his company is owed, despite Juris being a former customer.

“Juris was a member of various Flava Works websites at various times dating back to 2006. He is no longer a member and his login info has been blocked by us to prevent him from re-joining,” Bleicher informs TF.

“We allow full downloads, although each download a person performs, it tags the video with a hidden code that identifies who the user was that downloaded it and their IP info and date / time.”

We asked Bleicher how he can be sure that the content downloaded from Flava Works and re-uploaded elsewhere was actually uploaded by Juris. Fine details weren’t provided but he’s insistent that the company’s evidence holds up.

“We identified him directly, this was done by cross referencing all his IP logins with Flava Works, his email addresses he used and his usernames. We can confirm that he is/was a member of Gay-Torrents.org and Gayheaven.org. We also believe (we will find out in discovery) that he is a member of a Russian file sharing site called GayTorrent.Ru,” he says.

While the technicalities of who downloaded and shared what will be something for the court to decide, there’s still Juris’ allegations that Bleicher used extortion-like practices to get him to settle and used his relative fame against him. Bleicher says that’s not how things played out.

“[Juris] hired an attorney and they agreed to settle out of court. But then we saw him still accessing the file sharing sites (one site shows a user’s last login) and we were waiting on the settlement agreement to be drafted up by his attorney,” he explains.

“When he kept pushing the date of when we would see an agreement back we gave him a final deadline and said that after this date we would sue [him] and with all lawsuits – we make a press release.”

Bleicher says at this point Juris replaced his legal team and hired lawyer Mark Geragos, who Bleicher says tried to “bully” him, warning him of potential criminal offenses.

“Your threats in the last couple months to ‘expose’ Mr. Juris knowing he is a high profile individual, i.e., today you threatened to issue a press release, to induce him into wiring you close to $100,000 is outright extortion and subject to criminal prosecution,” Geragos wrote.

“I suggest you direct your attention to various statutes which specifically criminalize your conduct in the various jurisdictions where you have threatened suit.”

Interestingly, Geragos then went on to suggest that the lawsuit may ultimately backfire, since going public might affect Flava Works’ reputation in the gay market.

“With respect to Mr. Juris, your actions have been nothing but extortion and we reject your attempts and will vigorously pursue all available remedies against you,” Geragos’ email reads.

“We intend to use the platform you have provided to raise awareness in the LGBTQ community of this new form of digital extortion that you promote.”

But Bleicher, it seems, is up for a fight.

“Marc knows what he did and enjoyed downloading our videos and sharing them and those of videos of other studios, but now he has been caught,” he told the lawyer.

“This is the kind of case I would like to take all the way to trial, win or lose. It shows
people that want to steal our copyrighted videos that we aggressively protect our intellectual property.”

But to the tune of $1.2m? Apparently so.

“We could get up to $150,000 per infringement – we have solid proof of eight full videos – not to mention we have caught [Juris] downloading many other studios’ videos too – I think – but not sure – the number was over 75,” Bleicher told TF.

It’s quite rare for this kind of dispute to play out in public, especially considering Juris’ profile and occupation. Only time will tell if this will ultimately end in a settlement, but Bleicher and Juris seemed determined at this stage to stand by their ground and fight this out in court.

Complaint (pdf)

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Raspbian Stretch has arrived for Raspberry Pi

Post Syndicated from Simon Long original https://www.raspberrypi.org/blog/raspbian-stretch/

It’s now just under two years since we released the Jessie version of Raspbian. Those of you who know that Debian run their releases on a two-year cycle will therefore have been wondering when we might be releasing the next version, codenamed Stretch. Well, wonder no longer – Raspbian Stretch is available for download today!

Disney Pixar Toy Story Raspbian Stretch Raspberry Pi

Debian releases are named after characters from Disney Pixar’s Toy Story trilogy. In case, like me, you were wondering: Stretch is a purple octopus from Toy Story 3. Hi, Stretch!

The differences between Jessie and Stretch are mostly under-the-hood optimisations, and you really shouldn’t notice any differences in day-to-day use of the desktop and applications. (If you’re really interested, the technical details are in the Debian release notes here.)

However, we’ve made a few small changes to our image that are worth mentioning.

New versions of applications

Version 3.0.1 of Sonic Pi is included – this includes a lot of new functionality in terms of input/output. See the Sonic Pi release notes for more details of exactly what has changed.

Raspbian Stretch Raspberry Pi

The Chromium web browser has been updated to version 60, the most recent stable release. This offers improved memory usage and more efficient code, so you may notice it running slightly faster than before. The visual appearance has also been changed very slightly.

Raspbian Stretch Raspberry Pi

Bluetooth audio

In Jessie, we used PulseAudio to provide support for audio over Bluetooth, but integrating this with the ALSA architecture used for other audio sources was clumsy. For Stretch, we are using the bluez-alsa package to make Bluetooth audio work with ALSA itself. PulseAudio is therefore no longer installed by default, and the volume plugin on the taskbar will no longer start and stop PulseAudio. From a user point of view, everything should still work exactly as before – the only change is that if you still wish to use PulseAudio for some other reason, you will need to install it yourself.

Better handling of other usernames

The default user account in Raspbian has always been called ‘pi’, and a lot of the desktop applications assume that this is the current user. This has been changed for Stretch, so now applications like Raspberry Pi Configuration no longer assume this to be the case. This means, for example, that the option to automatically log in as the ‘pi’ user will now automatically log in with the name of the current user instead.

One other change is how sudo is handled. By default, the ‘pi’ user is set up with passwordless sudo access. We are no longer assuming this to be the case, so now desktop applications which require sudo access will prompt for the password rather than simply failing to work if a user without passwordless sudo uses them.

Scratch 2 SenseHAT extension

In the last Jessie release, we added the offline version of Scratch 2. While Scratch 2 itself hasn’t changed for this release, we have added a new extension to allow the SenseHAT to be used with Scratch 2. Look under ‘More Blocks’ and choose ‘Add an Extension’ to load the extension.

This works with either a physical SenseHAT or with the SenseHAT emulator. If a SenseHAT is connected, the extension will control that in preference to the emulator.

Raspbian Stretch Raspberry Pi

Fix for Broadpwn exploit

A couple of months ago, a vulnerability was discovered in the firmware of the BCM43xx wireless chipset which is used on Pi 3 and Pi Zero W; this potentially allows an attacker to take over the chip and execute code on it. The Stretch release includes a patch that addresses this vulnerability.

There is also the usual set of minor bug fixes and UI improvements – I’ll leave you to spot those!

How to get Raspbian Stretch

As this is a major version upgrade, we recommend using a clean image; these are available from the Downloads page on our site as usual.

Upgrading an existing Jessie image is possible, but is not guaranteed to work in every circumstance. If you wish to try upgrading a Jessie image to Stretch, we strongly recommend taking a backup first – we can accept no responsibility for loss of data from a failed update.

To upgrade, first modify the files /etc/apt/sources.list and /etc/apt/sources.list.d/raspi.list. In both files, change every occurrence of the word ‘jessie’ to ‘stretch’. (Both files will require sudo to edit.)

Then open a terminal window and execute

sudo apt-get update
sudo apt-get -y dist-upgrade

Answer ‘yes’ to any prompts. There may also be a point at which the install pauses while a page of information is shown on the screen – hold the ‘space’ key to scroll through all of this and then hit ‘q’ to continue.

Finally, if you are not using PulseAudio for anything other than Bluetooth audio, remove it from the image by entering

sudo apt-get -y purge pulseaudio*

The post Raspbian Stretch has arrived for Raspberry Pi appeared first on Raspberry Pi.

Spinrilla Refuses to Share Its Source Code With the RIAA

Post Syndicated from Ernesto original https://torrentfreak.com/spinrilla-refuses-to-share-its-source-code-with-the-riaa-170815/

Earlier this year, a group of well-known labels targeted Spinrilla, a popular hip-hop mixtape site and accompanying app with millions of users.

The coalition of record labels including Sony Music, Warner Bros. Records, and Universal Music Group, filed a lawsuit accusing the service of alleged copyright infringements.

Both sides have started the discovery process and recently asked the court to rule on several unresolved matters. The parties begin with their statements of facts, clearly from opposite angles.

The RIAA remains confident that the mixtape site is ripping off music creators and wants its operators to be held accountable.

“Since Spinrilla launched, Defendants have facilitated millions of unauthorized downloads and streams of thousands of Plaintiffs’ sound recordings without Plaintiffs’ permission,” RIAA writes, complaining about “rampant” infringement on the site.

However, Spinrilla itself believes that the claims are overblown. The company points out that the RIAA’s complaint only lists a tiny fraction of all the songs uploaded by its users. These somehow slipped through its Audible Magic anti-piracy filter.

Where the RIAA paints a picture of rampant copyright infringement, the mixtape site stresses that the record labels are complaining about less than 0.001% of all the tracks they ever published.

“From 2013 to the present, Spinrilla users have uploaded about 1 million songs to Spinrilla’s servers and Spinrilla published about 850,000 of those. Plaintiffs are complaining that 210 of those songs are owned by them and published on Spinrilla without permission,” Spinrilla’s lawyers write.

“That means that Plaintiffs make no claim to 99.9998% of the songs on Spinrilla. Plaintiffs’ shouting of ‘rampant infringement on Spinrilla’, an accusation that Spinrilla was designed to allow easy and open access to infringing material, and assertion that ‘Defendants have facilitated millions of unauthorized downloads’ of those 210 songs is untrue – it is nothing more than a wish and a dream.”

The company reiterates that it’s a platform for independent musicians and that it doesn’t want to feature the Eminem’s and Bieber’s of this world, especially not without permission.

As for the discovery process, there are still several outstanding issues they need the Court’s advice on. Spinrilla has thus far produced 12,000 pages of documents and answered all RIAA interrogatories, but refuses to hand over certain information, including its source code.

According to Spinrilla, there is no reason for the RIAA to have access to its “crown jewel.”

“The source code is the crown jewel of any software based business, including Spinrilla. Even worse, Plaintiffs want an ‘executable’ version of Spinrilla’s source code, which would literally enable them to replicate Spinrilla’s entire website. Any Plaintiff could, in hours, delete all references to ‘Spinrilla,’ add its own brand and launch Spinrilla’s exact website.

“If we sued YouTube for hosting 210 infringing videos, would I be entitled to the source code for YouTube? There is simply no justification for Spinrilla sharing its source code with Plaintiffs,” Spinrilla adds.

The RIAA, on the other hand, argues that the source code will provide insight into several critical issues, including Spinrilla’s knowledge about infringing activity and its ability to terminate repeat copyright infringers.

In addition to the source code, the RIAA has also requested detailed information about the site’s users, including their download and streaming history. This request is too broad, the mixtape site argues, and has offered to provide information on the uploaders of the 210 infringing tracks instead.

It’s clear that the RIAA and Spinrilla disagree on various fronts and it will be up to the court to decide what information must be handed over. So far, however, the language used clearly shows that both parties are far from reaching some kind of compromise.

The first joint discovery statement is available in full here (pdf).

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

72-Year-Old Man Accused of ‘Pirating’ Over a Thousand Torrents

Post Syndicated from Ernesto original https://torrentfreak.com/72-year-old-man-accused-of-pirating-over-a-thousand-torrents-170810/

In recent years, file-sharers around the world have been pressured to pay significant settlement fees, or face legal repercussions.

These so-called ‘copyright trolling‘ efforts are a common occurrence in the United States too, where hundreds of thousands of people have been targeted in recent years.

While a significant number of defendants are indeed guilty, there are also many that are wrongfully accused. Third-parties may have connected to their Wi-Fi, for example, which isn’t a rarity.

In Hawaii, a recent target of a copyright trolling expedition claims to be innocent, and he’s taken his case to the local press. The 72-year-old John J. Harding doesn’t fit the typical profile of a prolific pirate, but that’s exactly what a movie company has accused him of being.

In June, Harding received a letter from local attorney Kerry Culpepper, who works for the rightsholders of movies such as ‘Mechanic: Resurrection’ and ‘Once Upon a Time in Venice.’

The letter accused the 72-year-old of downloading a movie and also listed over 1,000 other downloads that were tied to his IP-address. Harding was understandably shocked by the threat and says he never downloads anything.

“I’ve never illegally downloaded anything … or even legally! I use my computer for email, games, news and that’s about it,” Harding told HawaiiNewsNow.

“I know definitely that I’m not guilty and my wife is not guilty. So what’s going on? Did somebody hack us? Is somebody out there actively hacking us? How they do that and go about doing that, I have no idea,” Harding added.

As is common in these cases, the copyright holder asked the Hawaii Federal Court for a subpoena, which ordered the associated Internet provider to hand over the personal details of the alleged infringers. The attorney then went on to send out settlement requests to the exposed users.

Harding received a letter offering an easy $3,900 settlement, which would increase to $4,900 if he failed to respond before August 7th. However, the elderly man wasn’t keen on taking the deal, describing the pay-up-or-else demand as “absolutely absurd.”

The attorney reiterated to the local newspaper that these are not idle threats. People risk $150,000 per illegal download, he stressed. That said, mistakes happen and people who feel that they are wrongfully accused should contact his office.

Culpepper explained it further with an analogy while adding a new dimension to the ‘you wouldn’t steal a car’ meme in the process.

“This is similar to a car stolen. If your car was stolen and your car hit someone or did some damage, initially the victim would look to see who was the owner of the car. You would probably tell them, someone stole my car. That time, that person would try to find the person who stole your car,” he said.

The attorney says that they are not trying to bankrupt people. Their goal is to deter piracy. There are cases where they’ve accepted lower settlements or even a mere apology, he notes.

How the 72-year-old will respond in unknown, but judging for his tone he may be looking for an apology himself. Going to the press was probably a smart move, as rightsholders generally don’t like the PR that comes with this kind of story.

These cases are by no means unique though. While browsing through the court dockets of Culpepper’s recent cases we quickly stumbled upon a similar denial. This one comes from a Honolulu woman who’s accused of pirating ‘Mechanic: Resurrection.’

“I have never downloaded the movie they are referencing and when I do download movies I use legal services such as Amazon, and Apple TV,” she wrote to the court, urging it to keep her personal information private.

“I do have frequent guests at our house often using the Internet. In the future I will request that nobody uses any file sharing on our Internet connection,” the letter added.

Unfortunately for her, the letter includes her full name and address, which means that she has effectively exposed herself. This likely means that she will soon receive a settlement request in the mail, just like Harding did, if she hasn’t already.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Backblaze Cloud Backup 5.0: The Rapid Access Release

Post Syndicated from Yev original https://www.backblaze.com/blog/cloud-backup-5-0-rapid-access/

Announcing Backblaze Cloud Backup 5.0: the Rapid Access Release. We’ve been at the backup game for a long time now, and we continue to focus on providing the best unlimited backup service on the planet. A lot of the features in this release have come from listening to our customers about how they want to use their data. “Rapid Access” quickly became the theme because, well, we’re all acquiring more and more data and want to access it in a myriad of ways.

This release brings a lot of new functionality to Backblaze Computer Backup: faster backups, accelerated file browsing, image preview, individual file download (without creating a “restore”), and file sharing. To top it all off, we’ve refreshed the user interface on our client app. We hope you like it!

Speeding Things Up

New code + new hardware + elbow grease = things are going to move much faster.

Faster Backups

We’ve doubled the number of threads available for backup on both Mac and PC . This gives our service the ability to intelligently detect the right settings for you (based on your computer, capacity, and bandwidth). As always, you can manually set the number of threads — keep in mind that if you have a slow internet connection, adding threads might have the opposite effect and slow you down. On its default settings, our client app will now automatically evaluate what’s best given your environment. We’ve internally tested our service backing up at over 100 Mbps, which means if you have a fast-enough internet connection, you could back up 50 GB in just one hour.

Faster Browsing

We’ve introduced a number of enhancements that increase file browsing speed by 3x. Hidden files are no longer displayed by default, but you can still show them with one click on the restore page. This gives the restore interface a cleaner look, and helps you navigate backup history if you need to roll back time.

Faster Restore Preparation

We take pride in providing a variety of ways for consumers to get their data back. When something has happened to your computer, getting your files back quickly is critical. Both web download restores and Restore by Mail will now be much faster. In some cases up to 10x faster!

Preview — Access — Share

Our system has received a number of enhancements — all intended to give you more access to your data.

Image Preview

If you have a lot of photos, this one’s for you. When you go to the restore page you’ll now be able to click on each individual file that we have backed up, and if it’s an image you’ll see a preview of that file. We hope this helps people figure out which pictures they want to download (this especially helps people with a lot of photos named something along the lines of: 2017-04-20-9783-41241.jpg). Now you can just click on the picture to preview it.

Access

Once you’ve clicked on a file (30MB and smaller), you’ll be able to individually download that file directly in your browser. You’ll no longer need to wait for a single-file restore to be built and zipped up; you’ll be able to download it quickly and easily. This was a highly requested feature and we’re stoked to get it implemented.

Share

We’re leveraging Backblaze B2 Cloud Storage and giving folks the ability to publicly share their files. In order to use this feature, you’ll need to enable Backblaze B2 on your account (if you haven’t already, there’s a simple wizard that will pop up the first time you try to share a file). Files can be shared anywhere in the world via URL. All B2 accounts have 10GB/month of storage and 1GB/day of downloads (equivalent to sharing an iPhone photo 1,000 times per month) for free. You can increase those limits in your B2 Settings. Keep in mind that any file you share will be accessible to anybody with the link. Learn more about File Sharing.

For now, we’ve limited the Preview/Access/Share functionality to files 30MB and smaller, but larger files will be supported in the coming weeks!

Other Goodies

In addition to adding 2FV via ToTP, we’ve also been hard at work on the client. In version 5.0 we’ve touched up the user interface to make it a bit more lively, and we’ve also made the client IPv6 compatible.

Backblaze 5.0 Available: August 10, 2017

We will slowly be auto-updating all users in the coming weeks. To update now:

This version is now the default download on www.backblaze.com.

We hope you enjoy Backblaze Cloud Backup v5.0!

The post Backblaze Cloud Backup 5.0: The Rapid Access Release appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

RIAA’s Piracy Claims are Misleading and Inaccurate, ISP Says

Post Syndicated from Ernesto original https://torrentfreak.com/riaas-piracy-claims-are-misleading-and-inaccurate-isp-says-170807/

For more than a decade, copyright holders have been sending ISPs takedown notices to alert them that their subscribers are sharing copyrighted material.

Under US law, providers have to terminate the accounts of repeat infringers “in appropriate circumstances” and increasingly they are being held to this standard.

Earlier this year several major record labels, represented by the RIAA, filed a lawsuit in a Texas District Court, accusing ISP Grande Communications of failing to take action against its pirating subscribers.

The ISP is not happy with the claims and was quick to submit a motion to dismiss the lawsuit. One of the arguments is that the RIAA’s evidence is insufficient.

In its original motion, Grande doesn’t deny receiving millions of takedown notices from piracy tracking company Rightscorp. However, it believes that these notices are flawed as Rightscorp is incapable of monitoring actual copyright infringements.

The RIAA disagreed and pointed out that their evidence is sufficient. They stressed that Rightcorp is able to monitor actual downloads, as opposed to simply checking if a subscriber is offering certain infringing content.

In a response from Grande, late last week, the ISP argues that this isn’t good enough to build a case. While Rightcorp may be able to track the actual infringing downloads to which the RIAA labels hold the copyrights, there is no such evidence provided in the present case, the ISP notes.

“Importantly, Plaintiffs do not allege that Rightscorp has ever recorded an instance of a Grande subscriber actually distributing even one of Plaintiffs’ copyrighted works. Plaintiffs certainly have not alleged any concrete facts regarding such an act,” Grande’s legal team writes (pdf).

According to the ISP, the RIAA’s evidence merely shows that Rightscorp sent notices of alleged infringements on behalf of other copyright holders, who are not involved in the lawsuit.

“Instead, Plaintiffs generally allege that Rightscorp has sent notices regarding ‘various copyrighted works,’ encompassing all of the notices sent by Rightscorp on behalf of entities other than Plaintiffs.”

While the RIAA argues that this circumstantial evidence is sufficient, the ISP believes that there are grounds to have the entire case dismissed.

The record labels can’t hold Grande liable for secondary copyright infringement, without providing concrete evidence that their works were actively distributed by Grande subscribers, the company claims.

“Plaintiffs cannot allege direct infringement without alleging concrete facts which show that a Grande subscriber actually infringed one of Plaintiffs’ copyrights,” Grande’s lawyers note.

“For this reason, it is incredibly misleading for Plaintiffs to repeatedly refer to Grande having received ‘millions’ of notices of alleged infringement, as if those notices all pertained to Plaintiffs’ asserted copyrights.”

The “misleading” copyright infringement evidence argument is only one part of the ISPs defense. The company also notes that it has no control over what its subscribers do, nor do they control the BitTorrent clients that were allegedly used to download content.

If the court ruled otherwise, Grande and other ISPs would essentially be forced to become an “unpaid enforcement agent of the recording industry,” the company’s lawyers note.

The RIAA, however, sees things quite differently.

The music industry group believes that Grande failed to take proper action in response to repeat infringers and should pay damages to compensate the labels. This claim is very similar to the one BMG brought against Cox, where the latter was eventually ordered to pay $25 million.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

ESET Tries to Scare People Away From Using Torrents

Post Syndicated from Andy original https://torrentfreak.com/eset-tries-to-scare-people-away-from-using-torrents-170805/

Any company in the security game can be expected to play up threats among its customer base in order to get sales.

Sellers of CCTV equipment, for example, would have us believe that criminals don’t want to be photographed and will often go elsewhere in the face of that. Car alarm companies warn us that since X thousand cars are stolen every minute, an expensive Immobilizer is an anti-theft must.

Of course, they’re absolutely right to point these things out. People want to know about these offline risks since they affect our quality of life. The same can be said of those that occur in the online world too.

We ARE all at risk of horrible malware that will trash our computers and steal our banking information so we should all be running adequate protection. That being said, how many times do our anti-virus programs actually trap a piece of nasty-ware in a year? Once? Twice? Ten times? Almost never?

The truth is we all need to be informed but it should be done in a measured way. That’s why an article just published by security firm ESET on the subject of torrents strikes a couple of bad chords, particularly with people who like torrents. It’s titled “Why you should view torrents as a threat” and predictably proceeds to outline why.

“Despite their popularity among users, torrents are very risky ‘business’,” it begins.

“Apart from the obvious legal trouble you could face for violating the copyright of musicians, filmmakers or software developers, there are security issues linked to downloading them that could put you or your computer in the crosshairs of the black hats.”

Aside from the use of the phrase “very risky” (‘some risk’ is a better description), there’s probably very little to complain about in this opening shot. However, things soon go downhill.

“Merely downloading the newest version of BitTorrent clients – software necessary for any user who wants to download or seed files from this ‘ecosystem’ – could infect your machine and irreversibly damage your files,” ESET writes.

Following that scary statement, some readers will have already vowed never to use a torrent again and moved on without reading any more, but the details are really important.

To support its claim, ESET points to two incidents in 2016 (which to its great credit the company actually discovered) which involved the Transmission torrent client. Both involved deliberate third-party infection and in the latter hackers attacked Transmission’s servers and embedded malware in its OSX client before distribution to the public.

No doubt these were both miserable incidents (to which the Transmission team quickly responded) but to characterize this as a torrent client problem seems somewhat unfair.

People intent on spreading viruses and malware do not discriminate and will happily infect ANY piece of computer software they can. Sadly, many non-technical people reading the ESET post won’t read beyond the claim that installing torrent clients can “infect your machine and irreversibly damage your files.”

That’s a huge disservice to the hundreds of millions of torrent client installations that have taken place over a decade and a half and were absolutely trouble free. On a similar basis, we could argue that installing Windows is the main initial problem for people getting viruses from the Internet. It’s true but it’s also not the full picture.

Finally, the piece goes on to detail other incidents over the years where torrents have been found to contain malware. The several cases highlighted by ESET are both real and pretty unpleasant for victims but the important thing to note here is torrent users are no different to any other online user, no matter how they use the Internet.

People who download files from the Internet, from ALL untrusted sources, are putting themselves at risk of getting a virus or other malware. Whether that content is obtained from a website or a P2P network, the risks are ever-present and only a foolish person would do so without decent security software (such as ESET’s) protecting them.

The take home point here is to be aware of security risks and put them into perspective. It’s hard to put a percentage on these things but of the hundreds of millions of torrent and torrent client downloads that have taken place since their inception 15 years ago, the overwhelming majority have been absolutely fine.

Security situations do arise and we need to be aware of them, but presenting things in a way that spreads unnecessary concern in a particular sector isn’t necessary to sell products.

The AV-TEST Institute registers around 390,000 new malicious programs every day that don’t involve torrents, plenty for any anti-virus firm to deal with.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Turbocharge your Apache Hive queries on Amazon EMR using LLAP

Post Syndicated from Jigar Mistry original https://aws.amazon.com/blogs/big-data/turbocharge-your-apache-hive-queries-on-amazon-emr-using-llap/

Apache Hive is one of the most popular tools for analyzing large datasets stored in a Hadoop cluster using SQL. Data analysts and scientists use Hive to query, summarize, explore, and analyze big data.

With the introduction of Hive LLAP (Low Latency Analytical Processing), the notion of Hive being just a batch processing tool has changed. LLAP uses long-lived daemons with intelligent in-memory caching to circumvent batch-oriented latency and provide sub-second query response times.

This post provides an overview of Hive LLAP, including its architecture and common use cases for boosting query performance. You will learn how to install and configure Hive LLAP on an Amazon EMR cluster and run queries on LLAP daemons.

What is Hive LLAP?

Hive LLAP was introduced in Apache Hive 2.0, which provides very fast processing of queries. It uses persistent daemons that are deployed on a Hadoop YARN cluster using Apache Slider. These daemons are long-running and provide functionality such as I/O with DataNode, in-memory caching, query processing, and fine-grained access control. And since the daemons are always running in the cluster, it saves substantial overhead of launching new YARN containers for every new Hive session, thereby avoiding long startup times.

When Hive is configured in hybrid execution mode, small and short queries execute directly on LLAP daemons. Heavy lifting (like large shuffles in the reduce stage) is performed in YARN containers that belong to the application. Resources (CPU, memory, etc.) are obtained in a traditional fashion using YARN. After the resources are obtained, the execution engine can decide which resources are to be allocated to LLAP, or it can launch Apache Tez processors in separate YARN containers. You can also configure Hive to run all the processing workloads on LLAP daemons for querying small datasets at lightning fast speeds.

LLAP daemons are launched under YARN management to ensure that the nodes don’t get overloaded with the compute resources of these daemons. You can use scheduling queues to make sure that there is enough compute capacity for other YARN applications to run.

Why use Hive LLAP?

With many options available in the market (Presto, Spark SQL, etc.) for doing interactive SQL  over data that is stored in Amazon S3 and HDFS, there are several reasons why using Hive and LLAP might be a good choice:

  • For those who are heavily invested in the Hive ecosystem and have external BI tools that connect to Hive over JDBC/ODBC connections, LLAP plugs in to their existing architecture without a steep learning curve.
  • It’s compatible with existing Hive SQL and other Hive tools, like HiveServer2, and JDBC drivers for Hive.
  • It has native support for security features with authentication and authorization (SQL standards-based authorization) using HiveServer2.
  • LLAP daemons are aware about of the columns and records that are being processed which enables you to enforce fine-grained access control.
  • It can use Hive’s vectorization capabilities to speed up queries, and Hive has better support for Parquet file format when vectorization is enabled.
  • It can take advantage of a number of Hive optimizations like merging multiple small files for query results, automatically determining the number of reducers for joins and groupbys, etc.
  • It’s optional and modular so it can be turned on or off depending on the compute and resource requirements of the cluster. This lets you to run other YARN applications concurrently without reserving a cluster specifically for LLAP.

How do you install Hive LLAP in Amazon EMR?

To install and configure LLAP on an EMR cluster, use the following bootstrap action (BA):

s3://aws-bigdata-blog/artifacts/Turbocharge_Apache_Hive_on_EMR/configure-Hive-LLAP.sh

This BA downloads and installs Apache Slider on the cluster and configures LLAP so that it works with EMR Hive. For LLAP to work, the EMR cluster must have Hive, Tez, and Apache Zookeeper installed.

You can pass the following arguments to the BA.

Argument Definition Default value
--instances Number of instances of LLAP daemon Number of core/task nodes of the cluster
--cache Cache size per instance 20% of physical memory of the node
--executors Number of executors per instance Number of CPU cores of the node
--iothreads Number of IO threads per instance Number of CPU cores of the node
--size Container size per instance 50% of physical memory of the node
--xmx Working memory size 50% of container size
--log-level Log levels for the LLAP instance INFO

LLAP example

This section describes how you can try the faster Hive queries with LLAP using the TPC-DS testbench for Hive on Amazon EMR.

Use the following AWS command line interface (AWS CLI) command to launch a 1+3 nodes m4.xlarge EMR 5.6.0 cluster with the bootstrap action to install LLAP:

aws emr create-cluster --release-label emr-5.6.0 \
--applications Name=Hadoop Name=Hive Name=Hue Name=ZooKeeper Name=Tez \
--bootstrap-actions '[{"Path":"s3://aws-bigdata-blog/artifacts/Turbocharge_Apache_Hive_on_EMR/configure-Hive-LLAP.sh","Name":"Custom action"}]' \ 
--ec2-attributes '{"KeyName":"<YOUR-KEY-PAIR>","InstanceProfile":"EMR_EC2_DefaultRole","SubnetId":"subnet-xxxxxxxx","EmrManagedSlaveSecurityGroup":"sg-xxxxxxxx","EmrManagedMasterSecurityGroup":"sg-xxxxxxxx"}' 
--service-role EMR_DefaultRole \
--enable-debugging \
--log-uri 's3n://<YOUR-BUCKET/' --name 'test-hive-llap' \
--instance-groups '[{"InstanceCount":1,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":32,"VolumeType":"gp2"},"VolumesPerInstance":1}],"EbsOptimized":true},"InstanceGroupType":"MASTER","InstanceType":"m4.xlarge","Name":"Master - 1"},{"InstanceCount":3,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":32,"VolumeType":"gp2"},"VolumesPerInstance":1}],"EbsOptimized":true},"InstanceGroupType":"CORE","InstanceType":"m4.xlarge","Name":"Core - 2"}]' 
--region us-east-1

After the cluster is launched, log in to the master node using SSH, and do the following:

  1. Open the hive-tpcds folder:
    cd /home/hadoop/hive-tpcds/
  2. Start Hive CLI using the testbench configuration, create the required tables, and run the sample query:

    hive –i testbench.settings
    hive> source create_tables.sql;
    hive> source query55.sql;

    This sample query runs on a 40 GB dataset that is stored on Amazon S3. The dataset is generated using the data generation tool in the TPC-DS testbench for Hive.It results in output like the following:
  3. This screenshot shows that the query finished in about 47 seconds for LLAP mode. Now, to compare this to the execution time without LLAP, you can run the same workload using only Tez containers:
    hive> set hive.llap.execution.mode=none;
    hive> source query55.sql;


    This query finished in about 80 seconds.

The difference in query execution time is almost 1.7 times when using just YARN containers in contrast to running the query on LLAP daemons. And with every rerun of the query, you notice that the execution time substantially decreases by the virtue of in-memory caching by LLAP daemons.

Conclusion

In this post, I introduced Hive LLAP as a way to boost Hive query performance. I discussed its architecture and described several use cases for the component. I showed how you can install and configure Hive LLAP on an Amazon EMR cluster and how you can run queries on LLAP daemons.

If you have questions about using Hive LLAP on Amazon EMR or would like to share your use cases, please leave a comment below.


Additional Reading

Learn how to to automatically partition Hive external tables with AWS.


About the Author

Jigar Mistry is a Hadoop Systems Engineer with Amazon Web Services. He works with customers to provide them architectural guidance and technical support for processing large datasets in the cloud using open-source applications. In his spare time, he enjoys going for camping and exploring different restaurants in the Seattle area.

 

 

 

 

Transparency in Cloud Storage Costs

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/transparency-in-cloud-storage-costs/

cloud storage cost calculator

Backblaze’s mission is to make cloud storage that’s affordable and astonishingly easy to use. Backblaze B2 embodies that mission for those looking for an object storage solution.

Another Backblaze core value is being transparent, from releasing our Storage Pod designs to detailing our cloud storage cost of goods sold. We are an open book in the Cloud Storage industry. So it makes sense that opaque pricing policies that require mind numbing calculations are a no-no for us. Our approach to pricing is to be transparent, straight-forward, and predictable.

For Backblaze B2, this means that no matter how much data you have, the cost for B2 is $0.005/GB per month for data storage and $0.02/GB to download data. There are no costs to upload. We also throw in 10GB of storage and 1GB of downloads for free every month.

Cloud Storage Price Comparison

The storage industry does not share our view of making pricing transparent, or affordable. In an effort to help everyone, we’ve made a Cloud Storage Pricing Calculator, where anyone can enter in their specific use case and get pricing back for B2, S3, Azure, and GCS. We’ve also included the calculator below for those interested in trying it out.

B2 Cost Calculator

Backblaze provides this calculator as an estimate.

Initial Upload:

GB

Data over time

Monthly Upload:

GB

Monthly Delete:

GB

Monthly Download:

GB


Period of Time:

Months

Storage Costs

Storage Cost for Initial Month:
x

Data Added Each Month:
x

Data Deleted Each Month:
x

Net Data:
x

Download Costs

Monthly Download Cost:
x

Total

Total Cost for x Months
x

Amazon S3
Microsoft Azure
Google Cloud

x
x
x
x
x
x
* Figures are not exact and do not include the following: Free first 10 GB of storage, free 1 GB of daily downloads, or $.004/10,000 class B Transactions and $.004/1,000 Class C Transactions.

Sample storage scenarios:

Scenario 1

You have data you wish to archive, and will be adding more each month, but you don’t expect that you will be downloading or deleting any data.

Initial upload: 10,000GB
Monthly upload: 1,000GB

For twelve months, your costs would be:

Backblaze B2 $990.00
Amazon S3 $4,158.00 +420%
Microsoft Azure $4,356.00 +440%
Google Cloud $5,148.00 +520%

 

Scenario 2

You wish to store data, and will be actively changing that data with uploads, downloads, and deletions.

Initial upload: 10,000GB
Monthly upload: 2,000GB
Monthly deletion: 1,000GB
Monthly download: 500GB

Your costs for 12 months would be:

Backblaze B2 $1,100.00
Amazon S3 $3.458/00 +402%
Microsoft Azure $4,656.00 +519%
Google Cloud $5,628.00 +507%

We invite you to compare our cost estimates against the competition. Here are the links to our competitors’ pricing calculators.

B2 Cloud Storage Pricing Summary

Provider
Storage
($/GB/Month)

Download
($/GB)
$0.005 $0.02
$0.021
+420%
$0.05+
+250%
$0.022+
+440%
$0.05+
+250%
$0.026
+520%
$0.08+
+400%

The Details


STORAGE
$0.005/GB/Month
How much data you have stored with Backblaze. This is calculated once a day based on the average storage of the previous 24 hours.
The first 10 GB of storage is free.

DOWNLOAD
$0.02/GB
Charged when you download files and charged when you create a Snapshot. Charged for any portion of a GB. The first 1 GB of data downloaded each day is free.

TRANSACTIONS
Class “A” transactions – Free
Class “B” transactions – $0.004 per 10,000 with 2,500 free per day.
Class “C” transactions – $0.004 per 1,000 with 2,500 free per day.
View Transactions by API Call

DATA BY MAIL
Mail us your data on a B2 Fireball – $550
Backblaze will mail your data to you by FedEx:
• USB Flash Drive – up to 110 GB – $89
• USB Hard Drive – up to 3.5TB of data – $189

PRODUCT SUPPORT
All B2 active account owners can contact Backblaze support at help.backblaze.com where they will also find a free-to- use knowledge base of B2 advice, guides, and more. In addition, a B2 user can pay to upgrade their support plan to include phone service, 24×7 support and more.

EVERYTHING ELSE
Free
Unlike other services, you won’t be nickeled and dimed with upload fees, file deletion charges, minimum files size requirements, and more. Everything you can possibly pay Backblaze is listed above.

 

Visit our B2 Cloud Storage Pricing web page for more details.


Amazon S3
Storage Costs
Initial upload cost:
x
Data added each month:
x

Data del. each month:
x

Net data:
x

Download Costs

Monthly Download Cost:
x

Total

Total Cost for x Months
x

Microsoft
Storage Costs
Initial upload cost:
x
Data added each month:
x

Data del. each month:
x

Net data:
x

Download Costs

Monthly Download Cost:
x

Total

Total Cost for x Months
x

Google
Storage Costs
Initial upload cost:
x
Data added each month:
x

Data del. each month:
x

Net data:
x

Download Costs

Monthly Download Cost:
x

Total

Total Cost for x Months
x

The post Transparency in Cloud Storage Costs appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

A new twist on data backup: CloudNAS

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/cloudnas-backup/

Morro CacheDrive

There are many ways for SMBs, professionals, and advanced users to back up their data. The process can be as simple as copying files to a flash drive or an external drive, or as sophisticated as using a Synology or QNAP NAS device as your primary storage device and syncing the files to a cloud storage service such as Backblaze B2.

A recent entry into the backup arena is Morro Data and their CloudNAS solution, where files are stored in the cloud, cached locally as needed, and synced globally among the other CloudNAS systems in a given organization. There are three components to the solution:

  • A Morro CacheDrive — This resides on your internal network like a NAS device and stores from 1- to 8 TB of data depending on the model
  • The CloudNAS service — This software runs on the Morro CacheDrive to keep track of and manage the data
  • Backblaze B2 Cloud Storage — Where the data is stored in the cloud

The Morro CacheDrive is installed on your local network and looks like a network share. On Windows, the share can be mounted as a letter device, M:, for example. On the Mac, the device is mounted as a Shared device (Databank in the example below).

CloudNAS software dashboard

In either case, the device works like a folder/directory, typically on your desktop. You then either drag-and-drop or save a file to the folder/directory. This places the file on the CacheDrive. Once there, the file is automatically backed up to the cloud. In the case of CloudNAS solution, that cloud is Backblaze B2.

All that sounds pretty straight-forward, but what makes the CloudNAS solution unique is the solution allows you to have unlimited storage space. For example, you can access 5 TB of data from a 1 TB CacheDrive. Confused? Let me explain. All 5 TB of the data is stored in B2, having been uploaded to B2 each time you stored data on the CacheDrive. The 1 TB CacheDrive keeps (caches) the most recent or most often used files on the CacheDrive. When you need a file not currently stored on the CacheDrive, the CloudNAS software automatically downloads the file from the B2 cloud to the CacheDrive and makes it available to use as desired.

Things to know about the CloudNAS solution

  • Sharing Systems: Multiple users can mount the same CacheDrive with each being able to update and share the files.
  • Synced Systems: If you have two or more CloudNAS systems on your network, they will keep the B2 directory of files synced between all of the systems. Everyone on the network sees the same file list.
  • Unlimited Data: Regardless of the size of the CacheDrive device you purchase, you will not run out of space as Backblaze B2 will contain all of your data. That said, you should choose the size of your CacheDrive that fits your operational environment.
  • Network Speed: Files are initially stored on the CacheDrive, then copied to B2. Local network connections are typically much faster than internet network speeds. This means your files are uploaded to the CacheDrive fast then transferred to B2 as time allows at the speed of your internet connection, all without slowing you down. This should be interesting to those of you who have slower internet connections.
  • Access: The files stored using the Cloud NAS solution can be accessed through the shared folder/directory on your desktop as well as through a web-based Team Portal.

Getting Started

To start, you purchase a Morro CacheDrive. The price starts at $499.00 for a unit with 1 TB of cache storage. Next you choose a CloudNAS subscription. This starts at $10/month for the Standard plan, and lets you manage up to 10 TB of data. Finally, you connect Backblaze B2 to the Morro system to finish the set-up process. You pay Backblaze each month for the data you store in and download from B2 while using the Morro solution.

The CloudNAS solution is certainly a different approach to storing your data. You get the ability to store a nearly unlimited amount of data without having to upgrade your hardware as you go, and all of your data is readily available with just a few clicks. For users who need to store terabytes of data that needs be available anytime, the CloudNAS solution is worth a look.

The post A new twist on data backup: CloudNAS appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

BitTorrent Users Form The World’s Largest Criminal Enterprise, Lawyer Says

Post Syndicated from Andy original https://torrentfreak.com/bittorrent-users-form-the-worlds-largest-criminal-enterprise-lawyer-says-170731/

As the sharing of copyrighted material on the Internet continues, so do the waves of lawsuits which claim compensation for alleged damage caused.

Run by so-called ‘copyright trolls’, these legal efforts are often painted as the only way for rightsholders to send a tough message to deter infringement. In reality, however, these schemes are often the basis for a separate revenue stream, one in which file-sharers are forced to pay large cash sums to make supposed jury trials disappear.

Courts around the United States are becoming familiar with these ‘settlement factories’ and sometimes choose to make life more difficult for the trolls. With this potential for friction, the language deployed in lawsuits is often amped up to paint copyright holders as fighting for their very existence. Meanwhile, alleged infringers are described as hardened criminals intent on wreaking havoc on the entertainment industries.

While this polarization is nothing new, a court filing spotted by the troll-fighters over at Fight Copyright Trolls sees the demonization of file-sharers amped up to eleven – and then some.

The case, which is being heard in a district court in Nevada, features LHF Productions, the outfit behind action movie London Has Fallen. It targets five people who allegedly shared the work using BitTorrent and failed to respond to the company’s requests to settle.

“[N]one of the Defendants referenced herein have made any effort to answer or otherwise respond to the Plaintiff’s allegations. In light of the Defendants’ apparent failure to take any action with respect to the present lawsuit, the Plaintiff is left with no choice but to seek a default judgment,” the motion reads.

In the absence of any defense, LHF Productions asks the court to grant default judgments of $15,000 per defendant, which amounts to $75,000 overall, a decent sum for what amounts to five downloads. LHF Productions notes that it could’ve demanded $150,000 from each individual but feels that a more modest sum would be sufficient to “deter future infringement.”

However, when reading the description of the defendants provided by LHF, one could be forgiven for thinking that they’re actually heinous criminals hell-bent on worldwide destruction.

“The Defendants are participants in a global piracy ring composed of one hundred fifty million members – a ring that threatens to tear down fundamental structures of intellectual property,” the lawsuit reads.

While there are indeed 150 million users of BitTorrent, this characterization that they’re all involved in a single “piracy ring” is both misleading and inaccurate.

BitTorrent swarms are separate entities, so the correct way of describing the defendants would be limited to their action for the movie London Has Fallen. Instead, they’re painted as being involved in a global conspiracy with more members than the populations of the United Kingdom, Canada, and Spain combined.

It seems that the introduction of more drama into these infringement lawsuits is becoming necessary as more courts become wise to the activities of trolls, not least organizations being branded criminal themselves, such as the now defunct Prenda Law.

Perhaps with this in mind, LHF Productions tries to convince the court that far from being small-time file-sharers, people downloading their movie online are actually part of something extremely big, a crime wave so huge that nothing like it has ever been witnessed.

“While the actions of each individual participant may seem innocuous, their collective action amounts to one of the largest criminal enterprises ever seen on earth,” LHF says of the defendants.

“[I]f this pervasive culture of piracy is allowed to continue undeterred, it threatens to undo centuries of intellectual property law and unravel a core pillar of our economy. After all, the right to intellectual property was something so fundamental, so essential, to our nation’s founding, that our founding father’s found it necessary to include in the first article of the Constitution.”

If the apocalyptic scenario painted by LHF in its lawsuit (pdf) is to be believed, recouping a mere $15,000 from each defendant begins to sound like a bargain. Certainly, the movie outfit will be hoping the judge sees it that way too.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Time-lapse Visualizes Game of Thrones Piracy Around The Globe

Post Syndicated from Ernesto original https://torrentfreak.com/time-lapse-visualizes-game-of-thrones-piracy-around-the-globe-17-730/

Game of Thrones has been the most pirated TV-show online for years, and this isn’t expected to change anytime soon.

While most of today’s piracy takes place through streaming services, BitTorrent traffic remains significant as well. The show’s episodes are generally downloaded millions of times each, by people from all over the world.

In recent years there have been several attempts to quantify this piracy bonanza. While MILLIONS of downloads make for a good headline, there are some other trends worth looking at as well.

TorrentFreak spoke to Abigail De Kosnik, an Associate Professor at the University of California, Berkeley. Together with computer scientist and artist Benjamin De Kosnik, she runs the BitTorrent-oriented research project “alpha60.”

The goal of alpha60 is to quantify and map BitTorrent activity around various media titles, to make this “shadow economy” visible to media scholars and the general public. Over the past two weeks, they’ve taken a close look at Game of Thrones downloads.

Their tracking software collected swarm data from 72 torrents that were released shortly after the first episode premiered. Before being anonymized, the collected IP-addresses were first translated to geographical locations, to reveal various traffic patterns.

The results, summarized in a white paper, reveal that during the first five days, alpha60 registered an estimated 1.77 million downloads. Of particular interest is the five-day time-lapse of the worldwide swarm activity.

Five-day Game of Thrones piracy timelapse

The time-lapse shows that download patterns vary depending on the time of the day. There is a lot of activity in Asia, but cities such as Athens, Toronto, and Sao Paulo also pop up regularly.

When looking at the absolute numbers, Seoul comes out on top as the Game of Thrones download capital of the world, followed by Athens, São Paulo, Guangzhou, Mumbai, and Bangalore.

Perhaps more interesting is the view of the number of downloads relative to the population, or the “over-pirating” cities, as alpha60 calls them. Here, Dallas comes out on top, before Brisbane, Chicago, Riyadh, Saudi Arabia, Seattle, and Perth.

Of course, VPNs may skew the results somewhat, but overall the data should give a pretty accurate impression of the download traffic around the globe.

Below are the complete top tens of most active cities, both in absolute numbers and relative to the population. Further insights and additional information is available in the full whitepaper, which can be accessed here.

Note: The download totals reported by alpha60 are significantly lower than the MUSO figures that came out last week. Alpha60 stresses, however, that their methods and data are accurate. MUSO, for its part, has made some dubious claims in the past.

Most downloads (absolute)

1 Seoul, Rep. of Korea
2 Athens, Greece
3 São Paulo, Brazil
4 Guangzhou, China
5 Mumbai, India
6 Bangalore, India
7 Shanghai, China
8 Riyadh, Saudi Arabia
9 Delhi, India
10 Beijing, China

Most downloads (relative)

1 Dallas, USA
2 Brisbane, Australia
3 Chicago, USA
4 Riyadh, Saudi Arabia
5 Seattle, USA
6 Perth, Australia
7 Phoenix, USA
8 Toronto, Canada
9 Athens, Greece
10 Guangzhou, China

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

AWS Hot Startups – July 2017

Post Syndicated from Tina Barr original https://aws.amazon.com/blogs/aws/aws-hot-startups-july-2017/

Welcome back to another month of Hot Startups! Every day, startups are creating innovative and exciting businesses, applications, and products around the world. Each month we feature a handful of startups doing cool things using AWS.

July is all about learning! These companies are focused on providing access to tools and resources to expand knowledge and skills in different ways.

This month’s startups:

  • CodeHS – provides fun and accessible computer science curriculum for middle and high schools.
  • Insight – offers intensive fellowships to grow technical talent in Data Science.
  • iTranslate – enables people to read, write, and speak in over 90 languages, anywhere in the world.

CodeHS (San Francisco, CA)

In 2012, Stanford students Zach Galant and Jeremy Keeshin were computer science majors and TAs for introductory classes when they noticed a trend among their peers. Many wished that they had been exposed to computer science earlier in life. In their senior year, Zach and Jeremy launched CodeHS to give middle and high schools the opportunity to provide a fun, accessible computer science education to students everywhere. CodeHS is a web-based curriculum pathway complete with teacher resources, lesson plans, and professional development opportunities. The curriculum is supplemented with time-saving teacher tools to help with lesson planning, grading and reviewing student code, and managing their classroom.

CodeHS aspires to empower all students to meaningfully impact the future, and believe that coding is becoming a new foundational skill, along with reading and writing, that allows students to further explore any interest or area of study. At the time CodeHS was founded in 2012, only 10% of high schools in America offered a computer science course. Zach and Jeremy set out to change that by providing a solution that made it easy for schools and districts to get started. With CodeHS, thousands of teachers have been trained and are teaching hundreds of thousands of students all over the world. To use CodeHS, all that’s needed is the internet and a web browser. Students can write and run their code online, and teachers can immediately see what the students are working on and how they are doing.

Amazon EC2, Amazon RDS, Amazon ElastiCache, Amazon CloudFront, and Amazon S3 make it possible for CodeHS to scale their site to meet the needs of schools all over the world. CodeHS also relies on AWS to compile and run student code in the browser, which is extremely important when teaching server-side languages like Java that powers the AP course. Since usage rises and falls based on school schedules, Amazon CloudWatch and ELBs are used to easily scale up when students are running code so they have a seamless experience.

Be sure to visit the CodeHS website, and to learn more about bringing computer science to your school, click here!

Insight (Palo Alto, CA)

Insight was founded in 2012 to create a new educational model, optimize hiring for data teams, and facilitate successful career transitions among data professionals. Over the last 5 years, Insight has kept ahead of market trends and launched a series of professional training fellowships including Data Science, Health Data Science, Data Engineering, and Artificial Intelligence. Finding individuals with the right skill set, background, and culture fit is a challenge for big companies and startups alike, and Insight is focused on developing top talent through intensive 7-week fellowships. To date, Insight has over 1,000 alumni at over 350 companies including Amazon, Google, Netflix, Twitter, and The New York Times.

The Data Engineering team at Insight is well-versed in the current ecosystem of open source tools and technologies and provides mentorship on the best practices in this space. The technical teams are continually working with external groups in a variety of data advisory and mentorship capacities, but the majority of Insight partners participate in professional sessions. Companies visit the Insight office to speak with fellows in an informal setting and provide details on the type of work they are doing and how their teams are growing. These sessions have proved invaluable as fellows experience a significantly better interview process and companies yield engaged and enthusiastic new team members.

An important aspect of Insight’s fellowships is the opportunity for hands-on work, focusing on everything from building big-data pipelines to contributing novel features to industry-standard open source efforts. Insight provides free AWS resources for all fellows to use, in addition to mentorships from the Data Engineering team. Fellows regularly utilize Amazon S3, Amazon EC2, Amazon Kinesis, Amazon EMR, AWS Lambda, Amazon Redshift, Amazon RDS, among other services. The experience with AWS gives fellows a solid skill set as they transition into the industry. Fellowships are currently being offered in Boston, New York, Seattle, and the Bay Area.

Check out the Insight blog for more information on trends in data infrastructure, artificial intelligence, and cutting-edge data products.

 

iTranslate (Austria)

When the App Store was introduced in 2008, the founders of iTranslate saw an opportunity to be part of something big. The group of four fully believed that the iPhone and apps were going to change the world, and together they brainstormed ideas for their own app. The combination of translation and mobile devices seemed a natural fit, and by 2009 iTranslate was born. iTranslate’s mission is to enable travelers, students, business professionals, employers, and medical staff to read, write, and speak in all languages, anywhere in the world. The app allows users to translate text, voice, websites and more into nearly 100 languages on various platforms. Today, iTranslate is the leading player for conversational translation and dictionary apps, with more than 60 million downloads and 6 million monthly active users.

iTranslate is breaking language barriers through disruptive technology and innovation, enabling people to translate in real time. The app has a variety of features designed to optimize productivity including offline translation, website and voice translation, and language auto detection. iTranslate also recently launched the world’s first ear translation device in collaboration with Bragi, a company focused on smart earphones. The Dash Pro allows people to communicate freely, while having a personal translator right in their ear.

iTranslate started using Amazon Polly soon after it was announced. CEO Alexander Marktl said, “As the leading translation and dictionary app, it is our mission at iTranslate to provide our users with the best possible tools to read, write, and speak in all languages across the globe. Amazon Polly provides us with the ability to efficiently produce and use high quality, natural sounding synthesized speech.” The stable and simple-to-use API, low latency, and free caching allow iTranslate to scale as they continue adding features to their app. Customers also enjoy the option to change speech rate and change between male and female voices. To assure quality, speed, and reliability of their products, iTranslate also uses Amazon EC2, Amazon S3, and Amazon Route 53.

To get started with iTranslate, visit their website here.

—–

Thanks for reading!

-Tina

Run Common Data Science Packages on Anaconda and Oozie with Amazon EMR

Post Syndicated from John Ohle original https://aws.amazon.com/blogs/big-data/run-common-data-science-packages-on-anaconda-and-oozie-with-amazon-emr/

In the world of data science, users must often sacrifice cluster set-up time to allow for complex usability scenarios. Amazon EMR allows data scientists to spin up complex cluster configurations easily, and to be up and running with complex queries in a matter of minutes.

Data scientists often use scheduling applications such as Oozie to run jobs overnight. However, Oozie can be difficult to configure when you are trying to use popular Python packages (such as “pandas,” “numpy,” and “statsmodels”), which are not included by default.

One such popular platform that contains these types of packages (and more) is Anaconda. This post focuses on setting up an Anaconda platform on EMR, with an intent to use its packages with Oozie. I describe how to run jobs using a popular open source scheduler like Oozie.

Walkthrough

For this post, you walk through the following tasks:

  • Create an EMR cluster.
  • Download Anaconda on your master node.
  • Configure Oozie.
  • Test the steps.

Create an EMR cluster

Spin up an Amazon EMR cluster using the console or the AWS CLI. Use the latest release, and include Apache Hadoop, Apache Spark, Apache Hive, and Oozie.

To create a three-node cluster in the us-east-1 region, issue an AWS CLI command such as the following. This command must be typed as one line, as shown below. It is shown here separated for readability purposes only.

aws emr create-cluster \ 
--release-label emr-5.7.0 \ 
 --name '<YOUR-CLUSTER-NAME>' \
 --applications Name=Hadoop Name=Oozie Name=Spark Name=Hive \ 
 --ec2-attributes '{"KeyName":"<YOUR-KEY-PAIR>","SubnetId":"<YOUR-SUBNET-ID>","EmrManagedSlaveSecurityGroup":"<YOUR-EMR-SLAVE-SECURITY-GROUP>","EmrManagedMasterSecurityGroup":"<YOUR-EMR-MASTER-SECURITY-GROUP>"}' \ 
 --use-default-roles \ 
 --instance-groups '[{"InstanceCount":1,"InstanceGroupType":"MASTER","InstanceType":"<YOUR-INSTANCE-TYPE>","Name":"Master - 1"},{"InstanceCount":<YOUR-CORE-INSTANCE-COUNT>,"InstanceGroupType":"CORE","InstanceType":"<YOUR-INSTANCE-TYPE>","Name":"Core - 2"}]'

One-line version for reference:

aws emr create-cluster --release-label emr-5.7.0 --name '<YOUR-CLUSTER-NAME>' --applications Name=Hadoop Name=Oozie Name=Spark Name=Hive --ec2-attributes '{"KeyName":"<YOUR-KEY-PAIR>","SubnetId":"<YOUR-SUBNET-ID>","EmrManagedSlaveSecurityGroup":"<YOUR-EMR-SLAVE-SECURITY-GROUP>","EmrManagedMasterSecurityGroup":"<YOUR-EMR-MASTER-SECURITY-GROUP>"}' --use-default-roles --instance-groups '[{"InstanceCount":1,"InstanceGroupType":"MASTER","InstanceType":"<YOUR-INSTANCE-TYPE>","Name":"Master - 1"},{"InstanceCount":<YOUR-CORE-INSTANCE-COUNT>,"InstanceGroupType":"CORE","InstanceType":"<YOUR-INSTANCE-TYPE>","Name":"Core - 2"}]'

Download Anaconda

SSH into your EMR master node instance and download the official Anaconda installer:

wget https://repo.continuum.io/archive/Anaconda2-4.4.0-Linux-x86_64.sh

At the time of publication, Anaconda 4.4 is the most current version available. For the download link location for the latest Python 2.7 version (Python 3.6 may encounter issues), see https://www.continuum.io/downloads.  Open the context (right-click) menu for the Python 2.7 download link, choose Copy Link Location, and use this value in the previous wget command.

This post used the Anaconda 4.4 installation. If you have a later version, it is shown in the downloaded filename:  “anaconda2-<version number>-Linux-x86_64.sh”.

Run this downloaded script and follow the on-screen installer prompts.

chmod u+x Anaconda2-4.4.0-Linux-x86_64.sh
./Anaconda2-4.4.0-Linux-x86_64.sh

For an installation directory, select somewhere with enough space on your cluster, such as “/mnt/anaconda/”.

The process should take approximately 1–2 minutes to install. When prompted if you “wish the installer to prepend the Anaconda2 install location”, select the default option of [no].

After you are done, export the PATH to include this new Anaconda installation:

export PATH=/mnt/anaconda/bin:$PATH

Zip up the Anaconda installation:

cd /mnt/anaconda/
zip -r anaconda.zip .

The zip process may take 4–5 minutes to complete.

(Optional) Upload this anaconda.zip file to your S3 bucket for easier inclusion into future EMR clusters. This removes the need to repeat the previous steps for future EMR clusters.

Configure Oozie

Next, you configure Oozie to use Pyspark and the Anaconda platform.

Get the location of your Oozie sharelibupdate folder. Issue the following command and take note of the “sharelibDirNew” value:

oozie admin -sharelibupdate

For this post, this value is “hdfs://ip-192-168-4-200.us-east-1.compute.internal:8020/user/oozie/share/lib/lib_20170616133136”.

Pass in the required Pyspark files into Oozies sharelibupdate location. The following files are required for Oozie to be able to run Pyspark commands:

  • pyspark.zip
  • py4j-0.10.4-src.zip

These are located on the EMR master instance in the location “/usr/lib/spark/python/lib/”, and must be put into the Oozie sharelib spark directory. This location is the value of the sharelibDirNew parameter value (shown above) with “/spark/” appended, that is, “hdfs://ip-192-168-4-200.us-east-1.compute.internal:8020/user/oozie/share/lib/lib_20170616133136/spark/”.

To do this, issue the following commands:

hdfs dfs -put /usr/lib/spark/python/lib/py4j-0.10.4-src.zip hdfs://ip-192-168-4-200.us-east-1.compute.internal:8020/user/oozie/share/lib/lib_20170616133136/spark/
hdfs dfs -put /usr/lib/spark/python/lib/pyspark.zip hdfs://ip-192-168-4-200.us-east-1.compute.internal:8020/user/oozie/share/lib/lib_20170616133136/spark/

After you’re done, Oozie can use Pyspark in its processes.

Pass the anaconda.zip file into HDFS as follows:

hdfs dfs -put /mnt/anaconda/anaconda.zip /tmp/myLocation/anaconda.zip

(Optional) Verify that it was transferred successfully with the following command:

hdfs dfs -ls /tmp/myLocation/

On your master node, execute the following command:

export PYSPARK_PYTHON=/mnt/anaconda/bin/python

Set the PYSPARK_PYTHON environment variable on the executor nodes. Put the following configurations in your “spark-opts” values in your Oozie workflow.xml file:

–conf spark.executorEnv.PYSPARK_PYTHON=./anaconda_remote/bin/python
–conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=./anaconda_remote/bin/python

This is referenced from the Oozie job in the following line in your workflow.xml file, also included as part of your “spark-opts”:

--archives hdfs:///tmp/myLocation/anaconda.zip#anaconda_remote

Your Oozie workflow.xml file should now look something like the following:

<workflow-app name="spark-wf" xmlns="uri:oozie:workflow:0.5">
<start to="start_spark" />
<action name="start_spark">
    <spark xmlns="uri:oozie:spark-action:0.1">
        <job-tracker>${jobTracker}</job-tracker>
        <name-node>${nameNode}</name-node>
        <prepare>
            <delete path="/tmp/test/spark_oozie_test_out3"/>
        </prepare>
        <master>yarn-cluster</master>
        <mode>cluster</mode>
        <name>SparkJob</name>
        <class>clear</class>
        <jar>hdfs:///user/oozie/apps/myPysparkProgram.py</jar>
        <spark-opts>--queue default
            --conf spark.ui.view.acls=*
            --executor-memory 2G --num-executors 2 --executor-cores 2 --driver-memory 3g
            --conf spark.executorEnv.PYSPARK_PYTHON=./anaconda_remote/bin/python
            --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=./anaconda_remote/bin/python
            --archives hdfs:///tmp/myLocation/anaconda.zip#anaconda_remote
        </spark-opts>
    </spark>
    <ok to="end"/>
    <error to="kill"/>
</action>
        <kill name="kill">
                <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
        </kill>
        <end name="end"/>
</workflow-app>

Test steps

To test this out, you can use the following job.properties and myPysparkProgram.py file, along with the following steps:

job.properties

masterNode ip-xxx-xxx-xxx-xxx.us-east-1.compute.internal
nameNode hdfs://${masterNode}:8020
jobTracker ${masterNode}:8032
master yarn
mode cluster
queueName default
oozie.libpath ${nameNode}/user/oozie/share/lib
oozie.use.system.libpath true
oozie.wf.application.path ${nameNode}/user/oozie/apps/

Note: You can get your master node IP address (denoted as “ip-xxx-xxx-xxx-xxx” here) from the value for the sharelibDirNew parameter noted earlier.

myPysparkProgram.py

from pyspark import SparkContext, SparkConf
import numpy
import sys

conf = SparkConf().setAppName('myPysparkProgram')
sc = SparkContext(conf=conf)

rdd = sc.textFile("/user/hadoop/input.txt")

x = numpy.sum([3,4,5]) #total = 12

rdd = rdd.map(lambda line: line + str(x))
rdd.saveAsTextFile("/user/hadoop/output")

Put the “myPysparkProgram.py” into the location mentioned between the “<jar>xxxxx</jar>” tags in your workflow.xml. In this example, the location is “hdfs:///user/oozie/apps/”. Use the following command to move the “myPysparkProgram.py” file to the correct location:

hdfs dfs -put myPysparkProgram.py /user/oozie/apps/

Put the above workflow.xml file into the “/user/oozie/apps/” location in hdfs:

hdfs dfs –put workflow.xml /user/oozie/apps/

Note: The job.properties file is run locally from the EMR master node.

Create a sample input.txt file with some data in it. For example:

input.txt

This is a sentence.
So is this. 
This is also a sentence.

Put this file into hdfs:

hdfs dfs -put input.txt /user/hadoop/

Execute the job in Oozie with the following command. This creates an Oozie job ID.

oozie job -config job.properties -run

You can check the Oozie job state with the command:

oozie job -info <Oozie job ID>

  1. When the job is successfully finished, the results are located at:
/user/hadoop/output/part-00000
/user/hadoop/output/part-00001

  1. Run the following commands to view the output:
hdfs dfs -cat /user/hadoop/output/part-00000
hdfs dfs -cat /user/hadoop/output/part-00001

The output will be:

This is a sentence. 12
So is this 12
This is also a sentence 12

Summary

The myPysparkProgram.py has successfully imported the numpy library from the Anaconda platform and has produced some output with it. If you tried to run this using standard Python, you’d encounter the following error:

Now when your Python job runs in Oozie, any imported packages that are implicitly imported by your Pyspark script are imported into your job within Oozie directly from the Anaconda platform. Simple!

If you have questions or suggestions, please leave a comment below.


Additional Reading

Learn how to use Apache Oozie workflows to automate Apache Spark jobs on Amazon EMR.

 


About the Author

John Ohle is an AWS BigData Cloud Support Engineer II for the BigData team in Dublin. He works to provide advice and solutions to our customers on their Big Data projects and workflows on AWS. In his spare time, he likes to play music, learn, develop tools and write documentation to further help others – both colleagues and customers alike.

 

 

 

Defending anti-netneutrality arguments

Post Syndicated from Robert Graham original http://blog.erratasec.com/2017/07/defending-anti-netneutrality-arguments.html

Last week, activists proclaimed a “NetNeutrality Day”, trying to convince the FCC to regulate NetNeutrality. As a libertarian, I tweeted many reasons why NetNeutrality is stupid. NetNeutrality is exactly the sort of government regulation Libertarians hate most. Somebody tweeted the following challenge, which I thought I’d address here.

The links point to two separate cases.

  • the Comcast BitTorrent throttling case
  • a lawsuit against Time Warning for poor service
The tone of the tweet suggests that my anti-NetNeutrality stance cannot be defended in light of these cases. But of course this is wrong. The short answers are:

  • the Comcast BitTorrent throttling benefits customers
  • poor service has nothing to do with NetNeutrality

The long answers are below.

The Comcast BitTorrent Throttling

The presumption is that any sort of packet-filtering is automatically evil, and against the customer’s interests. That’s not true.
Take GoGoInflight’s internet service for airplanes. They block access to video sites like NetFlix. That’s because they often have as little as 1-mbps for the entire plane, which is enough to support many people checking email and browsing Facebook, but a single person trying to watch video will overload the internet connection for everyone. Therefore, their Internet service won’t work unless they filter video sites.
GoGoInflight breaks a lot of other NetNeutrality rules, such as providing free access to Amazon.com or promotion deals where users of a particular phone get free Internet access that everyone else pays for. And all this is allowed by FCC, allowing GoGoInflight to break NetNeutrality rules because it’s clearly in the customer interest.
Comcast’s throttling of BitTorrent is likewise clearly in the customer interest. Until the FCC stopped them, BitTorrent users were allowed unlimited downloads. Afterwards, Comcast imposed a 300-gigabyte/month bandwidth cap.
Internet access is a series of tradeoffs. BitTorrent causes congestion during prime time (6pm to 10pm). Comcast has to solve it somehow — not solving it wasn’t an option. Their options were:
  • Charge all customers more, so that the 99% not using BitTorrent subsidizes the 1% who do.
  • Impose a bandwidth cap, preventing heavy BitTorrent usage.
  • Throttle BitTorrent packets during prime-time hours when the network is congested.
Option 3 is clearly the best. BitTorrent downloads take hours, days, and sometimes weeks. BitTorrent users don’t mind throttling during prime-time congested hours. That’s preferable to the other option, bandwidth caps.
I’m a BitTorrent user, and a heavy downloader (I scan the Internet on a regular basis from cloud machines, then download the results to home, which can often be 100-gigabytes in size for a single scan). I want prime-time BitTorrent throttling rather than bandwidth caps. The EFF/FCC’s action that prevented BitTorrent throttling forced me to move to Comcast Business Class which doesn’t have bandwidth caps, charging me $100 more a month. It’s why I don’t contribute the EFF — if they had not agitated for this, taking such choices away from customers, I’d have $1200 more per year to donate to worthy causes.
Ask any user of BitTorrent which they prefer: 300gig monthly bandwidth cap or BitTorrent throttling during prime-time congested hours (6pm to 10pm). The FCC’s action did not help Comcast’s customers, it hurt them. Packet-filtering would’ve been a good thing, not a bad thing.

The Time-Warner Case
First of all, no matter how you define the case, it has nothing to do with NetNeutrality. NetNeutrality is about filtering packets, giving some priority over others. This case is about providing slow service for everyone.
Secondly, it’s not true. Time Warner provided the same access speeds as everyone else. Just because they promise 10mbps download speeds doesn’t mean you get 10mbps to NetFlix. That’s not how the Internet works — that’s not how any of this works.
To prove this, look at NetFlix’s connection speed graphis. It shows Time Warner Cable is average for the industry. It had the same congestion problems most ISPs had in 2014, and it has the same inability to provide more than 3mbps during prime-time (6pm-10pm) that all ISPs have today.

The YouTube video quality diagnostic pages show Time Warner Cable to similar to other providers around the country. It also shows the prime-time bump between 6pm and 10pm.
Congestion is an essential part of the Internet design. When an ISP like Time Warner promises you 10mbps bandwidth, that’s only “best effort”. There’s no way they can promise 10mbps stream to everybody on the Internet, especially not to a site like NetFlix that gets overloaded during prime-time.
Indeed, it’s the defining feature of the Internet compared to the old “telecommunications” network. The old phone system guaranteed you a steady 64-kbps stream between any time points in the phone network, but it cost a lot of money. Today’s Internet provide a free multi-megabit stream for free video calls (Skype, Facetime) around the world — but with the occasional dropped packets because of congestion.
Whatever lawsuit money-hungry lawyers come up with isn’t about how an ISP like Time Warner works. It’s only about how they describe the technology. They work no different than every ISP — no different than how anything is possible.
Conclusion

The short answer to the above questions is this: Comcast’s BitTorrent throttling benefits customers, and the Time Warner issue has nothing to do with NetNeutrality at all.

The tweet demonstrates that NetNeutrality really means. It has nothing to do with the facts of any case, especially the frequency that people point to ISP ills that have nothing actually to do with NetNeutrality. Instead, what NetNeutrality really about is socialism. People are convinced corporations are evil and want the government to run the Internet. The Comcast/BitTorrent case is a prime example of why this is a bad idea: government definitions of what customers want is actually far different than what customers actually want.