Tag Archives: NSA

Belgium Wants to Blacklist Pirate Sites & Hijack Their Traffic

Post Syndicated from Andy original https://torrentfreak.com/belgium-wants-to-blacklist-pirate-sites-hijack-their-traffic-170924/

The thorny issue of how to deal with the online piracy phenomenon used to be focused on punishing site users. Over time, enforcement action progressed to the services themselves, until they became both too resilient and prevalent to tackle effectively.

In Europe in particular, there’s now a trend of isolating torrent, streaming, and hosting platforms from their users. This is mainly achieved by website blocking carried out by local ISPs following an appropriate court order.

While the UK is perhaps best known for this kind of action, Belgium was one of the early pioneers of the practice.

After filing a lawsuit in 2010, the Belgian Anti-Piracy Foundation (BAF) weathered an early defeat at the Antwerp Commercial Court to achieve success at the Court of Appeal. Since then, local ISPs have been forced to block The Pirate Bay.

Since then there have been several efforts (1,2) to block more sites but rightsholders have complained that the process is too costly, lengthy, and cumbersome. Now the government is stepping in to do something about it.

Local media reports that Deputy Prime Minister Kris Peeters has drafted new proposals to tackle online piracy. In his role as Minister of Economy and Employment, Peeters sees authorities urgently tackling pirate sites with a range of new measures.

For starters, he wants to create a new department, formed within the FPS Economy, to oversee the fight against online infringement. The department would be tasked with detecting pirate sites more quickly and rendering them inaccessible in Belgium, along with any associated mirror sites or proxies.

Peeters wants the new department to add all blocked sites to a national ‘pirate blacklist. Interestingly, when Internet users try to access any of these sites, he wants them to be automatically diverted to legal sites where a fee will have to be paid for content.

While it’s not unusual to try and direct users away from pirate sites, for the most part Internet service providers have been somewhat reluctant to divert subscribers to commercial sites. Their assistance would be needed in this respect, so it will be interesting to see how negotiations pan out.

The Belgian Entertainment Association (BEA), which was formed nine years ago to represent the music, video, software and videogame industries, welcomed Peeters’ plans.

“It’s so important to close the doors to illegal download sites and to actively lead people to legal alternatives,” said chairman Olivier Maeterlinck.

“Surfers should not forget that the motives of illegal download sites are not always obvious. These sites also regularly try to exploit personal data.”

The current narrative that pirate sites are evil places is clearly gaining momentum among anti-piracy bodies, but there’s little sign that the public intends to boycott sites as a result. With that in mind, alternative legal action will still be required.

With that in mind, Peeters wants to streamline the system so that all piracy cases go through a single court, the Commercial Court of Brussels. This should reduce costs versus the existing model and there’s also the potential for more consistent rulings.

“It’s a good idea to have a clearer legal framework on this,” says Maeterlinck from BEA.

“There are plenty of legal platforms, streaming services like Spotify, for example, which are constantly developing and reaching an ever-increasing audience. Those businesses have a business model that ensure that the creators of certain media content are properly compensated. The rotten apples must be tackled, and those procedures should be less time-consuming.”

There’s little doubt that BEA could benefit from a little government assistance. Back in February, the group filed a lawsuit at the French commercial court in Brussels, asking ISPs to block subscriber access to several ‘pirate’ sites.

“Our action aims to block nine of the most popular streaming sites which offer copyright-protected content on a massive scale and without authorization,” Maeterlinck told TF at the time.

“In accordance with the principles established by the CJEU (UPC Telekabel and GS Media), BEA seeks a court order confirming the infringement and imposing site blocking measures on the ISPs, who are content providers as well.”

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

ISO Rejects NSA Encryption Algorithms

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/09/iso_rejects_nsa.html

The ISO has decided not to approve two NSA-designed block encryption algorithms: Speck and Simon. It’s because the NSA is not trusted to put security ahead of surveillance:

A number of them voiced their distrust in emails to one another, seen by Reuters, and in written comments that are part of the process. The suspicions stem largely from internal NSA documents disclosed by Snowden that showed the agency had previously plotted to manipulate standards and promote technology it could penetrate. Budget documents, for example, sought funding to “insert vulnerabilities into commercial encryption systems.”

More than a dozen of the experts involved in the approval process for Simon and Speck feared that if the NSA was able to crack the encryption techniques, it would gain a “back door” into coded transmissions, according to the interviews and emails and other documents seen by Reuters.

“I don’t trust the designers,” Israeli delegate Orr Dunkelman, a computer science professor at the University of Haifa, told Reuters, citing Snowden’s papers. “There are quite a lot of people in NSA who think their job is to subvert standards. My job is to secure standards.”

I don’t trust the NSA, either.

What the NSA Collects via 702

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/09/what_the_nsa_co.html

New York Times reporter Charlie Savage writes about some bad statistics we’re all using:

Among surveillance legal policy specialists, it is common to cite a set of statistics from an October 2011 opinion by Judge John Bates, then of the FISA Court, about the volume of internet communications the National Security Agency was collecting under the FISA Amendments Act (“Section 702”) warrantless surveillance program. In his opinion, declassified in August 2013, Judge Bates wrote that the NSA was collecting more than 250 million internet communications a year, of which 91 percent came from its Prism system (which collects stored e-mails from providers like Gmail) and 9 percent came from its upstream system (which collects transmitted messages from network operators like AT&T).

These numbers are wrong. This blog post will address, first, the widespread nature of this misunderstanding; second, how I came to FOIA certain documents trying to figure out whether the numbers really added up; third, what those documents show; and fourth, what I further learned in talking to an intelligence official. This is far too dense and weedy for a New York Times article, but should hopefully be of some interest to specialists.

Worth reading for the details.

Turtle, the earthbound crowdfunded rover

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/turtle-rover/

With ten days to go until the end of their crowdfunding campaign, the team behind the Turtle Rover are waiting eagerly for their project to become a reality for earthbound explorers across the globe.

Turtle Rover

Turtle is the product of the Mars Rover prototype engineers at Wroclaw University of Technology, Poland. Their waterproof land rover can be controlled via your tablet or smartphone, and allows you to explore hidden worlds too small or dangerous for humans. The team says this about their project:

NASA and ESA plan to send another rover to Mars in 2020. SpaceX wants to send one million people to Mars in the next 100 years. However, before anyone sends a rover to another planet, we designed Turtle — a robot to remind you about how beautiful the Earth is.

With a Raspberry Pi at its core, Turtle is an open-source, modular device to which you can attach new, interesting features such as extra cameras, lights, and a DSLR adapter. Depending on the level at which you back the Kickstarter, you might also receive a robotic arm as a reward for your support.

Turtle Rover Kickstarter Raspberry Pi

The Turtle can capture photos and video, and even live-stream video to your device. Moreover, its emergency stop button offers peace of mind whenever your explorations takes your Turtle to cliff edges or other unsafe locations.

Constructed of aerospace-grade aluminium, plastics, and stainless steel, its robust form, watertight and dust-proof body, and 4-hour battery life make the Turtle a great tool for education and development, as well as a wonderful addition to recreational activities such as Airsoft.

Back the Turtle

If you want to join in the Turtle Rover revolution, you have ten days left to back the team on Kickstarter. Pledge €1497 for an unassembled kit (you’ll need your own Raspberry Pi, battery, and servos), or €1549 for a complete rover. The team plan to send your Turtle to you by June 2018 — so get ready to explore!

Turtle Rover Kickstarter Raspberry Pi

For more information on the build, including all crowdfunding rewards, check out their Kickstarter page. And if you’d like to follow their journey, be sure to follow them on Twitter.

Your Projects

Are you running a Raspberry Pi-based crowdfunding campaign? Or maybe you’ve got your idea, and you’re soon going to unleash it on the world? Whatever your plans, we’d love to see what you’re up to, so make sure to let us know via our social media channels or an email to [email protected]

 

The post Turtle, the earthbound crowdfunded rover appeared first on Raspberry Pi.

NSA Spied on Early File-Sharing Networks, Including BitTorrent

Post Syndicated from Andy original https://torrentfreak.com/nsa-spied-on-early-file-sharing-networks-including-bittorrent-170914/

In the early 2000s, when peer-to-peer (P2P) file-sharing was in its infancy, the majority of users had no idea that their activities could be monitored by outsiders. The reality was very different, however.

As few as they were, all of the major networks were completely open, with most operating a ‘shared folder’ type system that allowed any network participant to see exactly what another user was sharing. Nevertheless, with little to no oversight, file-sharing at least felt like a somewhat private affair.

As user volumes began to swell, software such as KaZaA (which utilized the FastTrack network) and eDonkey2000 (eD2k network) attracted attention from record labels, who were desperate to stop the unlicensed sharing of copyrighted content. The same held true for the BitTorrent networks that arrived on the scene a couple of years later.

Through the rise of lawsuits against consumers, the general public began to learn that their activities on P2P networks were not secret and they were being watched for some, if not all, of the time by copyright holders. Little did they know, however, that a much bigger player was also keeping a watchful eye.

According to a fascinating document just released by The Intercept as part of the Edward Snowden leaks, the National Security Agency (NSA) showed a keen interest in trying to penetrate early P2P networks.

Initially published by internal NSA news site SIDToday in June 2005, the document lays out the aims of a program called FAVA – File-Sharing Analysis and Vulnerability Assessment.

“One question that naturally arises after identifying file-sharing traffic is whether or not there is anything of intelligence value in this traffic,” the NSA document begins.

“By searching our collection databases, it is clear that many targets are using popular file sharing applications; but if they are merely sharing the latest release of their favorite pop star, this traffic is of dubious value (no offense to Britney Spears intended).”

Indeed, the vast majority of users of these early networks were only been interested in sharing relatively small music files, which were somewhat easy to manage given the bandwidth limitations of the day. However, the NSA still wanted to know what was happening on a broader scale, so that meant decoding their somewhat limited encryption.

“As many of the applications, such as KaZaA for example, encrypt their traffic, we first had to decrypt the traffic before we could begin to parse the messages. We have developed the capability to decrypt and decode both KaZaA and eDonkey traffic to determine which files are being shared, and what queries are being performed,” the NSA document reveals.

Most progress appears to have been made against KaZaA, with the NSA revealing the use of tools to parse out registry entries on users’ hard drives. This information gave up users’ email addresses, country codes, user names, the location of their stored files, plus a list of recent searches.

This gave the NSA the ability to look deeper into user behavior, which revealed some P2P users going beyond searches for basic run-of-the-mill multimedia content.

“[We] have discovered that our targets are using P2P systems to search for and share files which are at the very least somewhat surprising — not simply harmless music and movie files. With more widespread adoption, these tools will allow us to regularly assimilate data which previously had been passed over; giving us a more complete picture of our targets and their activities,” the document adds.

Today, more than 12 years later, with KaZaA long dead and eDonkey barely alive, scanning early pirate activities might seem a distant act. However, there’s little doubt that similar programs remain active today. Even in 2005, the FAVA program had lofty ambitions, targeting other networks and protocols including DirectConnect, Freenet, Gnutella, Gnutella2, JoltID, MSN Messenger, Windows Messenger and……BitTorrent.

“If you have a target using any of these applications or using some other application which might fall into the P2P category, please contact us,” the NSA document urges staff. “We would be more than happy to help.”

Confirming the continued interest in BitTorrent, The Intercept has published a couple of further documents which deal with the protocol directly.

The first details an NSA program called GRIMPLATE, which aimed to study how Department of Defense employees were using BitTorrent and whether that constituted a risk.

The second relates to P2P research carried out by Britain’s GCHQ spy agency. It details DIRTY RAT, a web application which gave the government to “the capability to identify users sharing/downloading files of interest on the eMule (Kademlia) and BitTorrent networks.”

The SIDToday document detailing the FAVA program can be viewed here

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

ShareBeast & AlbumJams Operator Pleads Guilty to Criminal Copyright Infringement

Post Syndicated from Andy original https://torrentfreak.com/sharebeast-albumjams-operator-pleads-guilty-to-criminal-copyright-infringement-170911/

In September 2015, U.S. authorities announced action against a pair of sites involved in music piracy.

ShareBeast.com and AlbumJams.com were allegedly responsible for the distribution of “a massive library” of popular albums and tracks. Both were accused of offering thousands of tracks before their official release dates.

The U.S. Department of Justice (DOJ) placed their now familiar seizure notice on both domains, with the RIAA claiming ShareBeast was the largest illegal file-sharing site operating in the United States. Indeed, the site’s IP addresses at the time indicated at least some hosting taking place in Illinois.

“This is a huge win for the music community and legitimate music services. Sharebeast operated with flagrant disregard for the rights of artists and labels while undermining the legal marketplace,” RIAA Chairman & CEO Cary Sherman commented at the time.

“Millions of users accessed songs from Sharebeast each month without one penny of compensation going to countless artists, songwriters, labels and others who created the music.”

Now, a full two years later, former Sharebeast operator Artur Sargsyan has pleaded guilty to one felony count of criminal copyright infringement, admitting to the unauthorized distribution and reproduction of over 1 billion copies of copyrighted works.

“Through Sharebeast and other related sites, this defendant profited by illegally distributing copyrighted music and albums on a massive scale,” said U. S. Attorney John Horn.

“The collective work of the FBI and our international law enforcement partners have shut down the Sharebeast websites and prevented further economic losses by scores of musicians and artists.”

The Department of Justice says that from 2012 to 2015, 29-year-old Sargsyan used ShareBeast as a pirate music repository, infringing works produced by Ariana Grande, Katy Perry, Beyonce, Kanye West, and Justin Bieber, among others. He linked to that content from Newjams.net and Albumjams.com, two other sites under his control.

The DoJ says that Sargsyan was informed at least 100 times that there was infringing content on ShareBeast but despite the warnings, the content remained available. When those warnings produced no results, the FBI – assisted by law enforcement in the UK and the Netherlands – seized servers used by Sargsyan to distribute the material.

Brad Buckles, EVP, Anti-Piracy at the RIAA, welcomed the guilty plea.

“Sharebeast and its related sites represented the most popular network of infringing music sites operated out of the United States. The network was responsible for providing millions of downloads of popular music files including unauthorized pre-release albums and tracks.This illicit activity was a gut-punch to music creators who were paid nothing by the service,” Buckles said.

“We are incredibly grateful for the government’s commitment to protecting the rights of artists and labels. We especially thank the dedicated agents of the FBI who painstakingly unraveled this criminal enterprise, and U.S. Attorney John Horn and his team for their work and diligence in seeing this case to its successful conclusion.”

Sargsyan, of Glendale, California, will be sentenced December 4 before U.S. District Judge Timothy C. Batten.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Demonoid Hopes to Return to Its Former Glory

Post Syndicated from Ernesto original https://torrentfreak.com/demonoid-hopes-to-return-to-its-former-glory-170910/

Demonoid has been around for well over a decade but the site is not really known for having a stable presence.

Quite the opposite, the torrent tracker has a ‘habit’ of going offline for weeks or even months on end, only to reappear as if nothing ever happened.

Earlier this year the site made another one if its trademark comebacks and it has been sailing relatively smoothly since then. Interestingly, the site is once again under the wings of a familiar face, its original founder Deimos.

Deimos decided to take the lead again after some internal struggles. “I gave control to the wrong guys while the problems started, but it’s time to control stuff again,” Deimos told us earlier.

Since the return a few months back, the site’s main focus has been on rebuilding the community and improving the site. Some may have already noticed the new logo, but more changes are coming, both on the front and backend.

“The backend development is going a bit slow, it’s a big change that will allow the server to run off a bunch of small servers all over the world,” Deimos informs TorrentFreak.

“For the frontend, we’re working on new features including a karma system, integrated forums, buddy list, etc. That part is faster to build once you have everything in the back working,” he adds.

Demonoid’s new logo

Deimos has been on and off the site a few times, but he and a few others most recently returned to get it back on track and increase its popularity. While the site has around eight million registered users, many of these have moved elsewhere in recent years.

“I want to to see the community we had back. Don’t know if it’s possible but that’s my aim,” Deimos says, admitting that he may not stay on forever.

Many torrent sites have come and gone in recent years, but they are still here today. Looking back, Demonoid has come a long way. What many people don’t know, is that it was originally a place to share demo tapes of metal bands. Hence the name DEMOnoid.

“It originally started as a modified PHP based forum that allowed posting of .torrent files. At some point, we started using a full torrent indexing script written in PHP that included a tracker, and started building the first version of the indexing site it is today,” Deimos says.

The site required users to have an invite to sign up, making it a semi-private tracker. This wasn’t done to encourage people to maintain a certain ratio, as some other trackers do, but mostly to keep unsavory characters away.

“The invitation system was implemented to keep spammers, trolls and the like out,” Deimos says. “Originally it was due to some very problematic people who happened to have a death metal band, back in the DEMOnoid days.

“We try to keep it open as often as possible but when we start to get these kinds of issues, we close it,” he adds.

In recent years, the site has had quite a few setbacks, but Deimos doesn’t want to dwell on these in public. Instead, he prefers to focus on the future. While torrent sites are no longer at the center of media distribution, there will always be a place for dedicated sharing communities.

Whether Demonoid will ever return to its former glory is a big unknown for now, but Deimos is sure to do his best.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

ShadowBrokers Releases NSA UNITEDRAKE Manual

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/09/shadowbrokers_r.html

The ShadowBrokers released the manual for UNITEDRAKE, a sophisticated NSA Trojan that targets Windows machines:

Able to compromise Windows PCs running on XP, Windows Server 2003 and 2008, Vista, Windows 7 SP 1 and below, as well as Windows 8 and Windows Server 2012, the attack tool acts as a service to capture information.

UNITEDRAKE, described as a “fully extensible remote collection system designed for Windows targets,” also gives operators the opportunity to take complete control of a device.

The malware’s modules — including FOGGYBOTTOM and GROK — can perform tasks including listening in and monitoring communication, capturing keystrokes and both webcam and microphone usage, the impersonation users, stealing diagnostics information and self-destructing once tasks are completed.

More news.

UNITEDRAKE was mentioned in several Snowden documents and also in the TAO catalog of implants.

And Kaspersky Labs has found evidence of these tools in the wild, associated with the Equation Group — generally assumed to be the NSA:

The capabilities of several tools in the catalog identified by the codenames UNITEDRAKE, STRAITBAZZARE, VALIDATOR and SLICKERVICAR appear to match the tools Kaspersky found. These codenames don’t appear in the components from the Equation Group, but Kaspersky did find “UR” in EquationDrug, suggesting a possible connection to UNITEDRAKE (United Rake). Kaspersky also found other codenames in the components that aren’t in the NSA catalog but share the same naming conventions­they include SKYHOOKCHOW, STEALTHFIGHTER, DRINKPARSLEY, STRAITACID, LUTEUSOBSTOS, STRAITSHOOTER, and DESERTWINTER.

ShadowBrokers has only released the UNITEDRAKE manual, not the tool itself. Presumably they’re trying to sell that

Cloud Storage Doesn’t have to be Convoluted, Complex, or Confusing

Post Syndicated from Ahin Thomas original https://www.backblaze.com/blog/cloud-storage-pricing-comparison/

business man frustrated over cloud storage pricing

So why do many vendors make it so hard to get information about how much you’re storing and how much you’re being charged?

Cloud storage is fast becoming the central repository for mission critical information, irreplaceable memories, and in some cases entire corporate and personal histories. Given this responsibility, we believe cloud storage vendors have an obligation to be transparent as possible in how they interact with their customers.

In that light we decided to challenge four cloud storage vendors and ask two simple questions:

  1. Can a customer understand how much data is stored?
  2. Can a customer understand the bill?

The detailed results are below, but if you wish to skip the details and the screen captures (TL;DR), we’ve summarized the results in the table below.

Summary of Cloud Storage Pricing Test

Our challenge was to upload 1 terabyte of data, store it for one month, and then download it.

Visibility to Data Stored Easy to Understand Bill Cost
Backblaze B2 Accurate, intuitive display of storage information. Available on demand, and the site clearly defines what has and will be charged for. $25
Microsoft Azure Storage is being measured in KiB, but is billed by the GB. With a calculator, it is unclear how much storage we are using. Available, but difficult to find. The nearly 30 day lag in billing creates business and accounting challenges. $72
Amazon S3 Incomplete. From the file browsing user interface, there is no reasonable way to understand how much data is being stored. Available on demand. While there are some line items that seem unnecessary for our test, the bill is generally straight-forward to understand. $71
Google Cloud Service Incomplete. From the file browsing user interface, there is no reasonable way to understand how much data is being stored. Available, but provides descriptions in units that are not on the pricing table nor commonly used. $100

Cloud Storage Test Details

For our tests, we choose Backblaze B2, Microsoft’s Azure, Amazon’s S3, and Google Cloud Storage. Our idea was simple: Upload 1 TB of data to the comparable service for each vendor, store it for 1 month, download that 1 TB, then document and share the results.

Let’s start with most obvious observation, the cost charged by each vendor for the test:

Cost
Backblaze B2 $25
Microsoft Azure $72
Amazon S3 $71
Google Cloud Service $100

Later in this post, we’ll see if we can determine the different cost components (storage, downloading, transactions, etc.) for each vendor, but our first step is to see if we can determine how much data we stored. In some cases, the answer is not as obvious as it would seem.

Test 1: Can a Customer Understand How Much Data Is Stored?

At the core, a provider of a service ought to be able to tell a customer how much of the service he or she is using. In this case, one might assume that providers of Cloud Storage would be able to tell customers how much data is being stored at any given moment. It turns out, it’s not that simple.

Backblaze B2
Logging into a Backblaze B2 account, one is presented with a summary screen that displays all “buckets.” Each bucket displays key summary information, including data currently stored.

B2 Cloud Storage Buckets screenshot

Clicking into a given bucket, one can browse individual files. Each file displays its size, and multiple files can be selected to create a size summary.

B2 file tree screenshot

Summary: Accurate, intuitive display of storage information.

Microsoft Azure

Moving on to Microsoft’s Azure, things get a little more “exciting.” There was no area that we could find where one can determine the total amount of data, in GB, stored with Azure.

There’s an area entitled “usage,” but that wasn’t helpful.

Microsoft Azure cloud storage screenshot

We then moved on to “Overview,” but had a couple challenges.The first issue was that we were presented with KiB (kibibyte) as a unit of measure. One GB (the unit of measure used in Azure’s pricing table) equates to roughly 976,563 KiB. It struck us as odd that things would be summarized by a unit of measure different from the billing unit of measure.

Microsoft Azure usage dashboard screenshot

Summary: Storage is being measured in KiB, but is billed by the GB. Even with a calculator, it is unclear how much storage we are using.

Amazon S3

Next we checked on the data we were storing in S3. We again ran into problems.

In the bucket overview, we were able to identify our buckets. However, we could not tell how much data was being stored.

Amazon S3 cloud storage buckets screenshot

Drilling into a bucket, the detail view does tell us file size. However, there was no method for summarizing the data stored within that bucket or for multiple files.

Amazon S3 cloud storage buckets usage screenshot

Summary: Incomplete. From the file browsing user interface, there is no reasonable way to understand how much data is being stored.

Google Cloud Storage (“GCS”)

GCS proved to have its own quirks, as well.

One can easily find the “bucket” summary, however, it does not provide information on data stored.

Google Cloud Storage Bucket screenshot

Clicking into the bucket, one can see files and the size of an individual file. However, no ability to see data total is provided.

Google Cloud Storage bucket files screenshot

Summary: Incomplete. From the file browsing user interface, there is no reasonable way to understand how much data is being stored.

Test 1 Conclusions

We knew how much storage we were uploading and, in many cases, the user will have some sense of the amount of data they are uploading. However, it strikes us as odd that many vendors won’t tell you how much data you have stored. Even stranger are the vendors that provide reporting in a unit of measure that is different from the units in their pricing table.

Test 2: Can a Customer Understand The Bill?

The cloud storage industry has done itself no favors with its tiered pricing that requires a calculator to figure out what’s going on. Setting that aside for a moment, one would presume that bills would be created in clear, auditable ways.

Backblaze

Inside of the Backblaze user interface, one finds a navigation link entitled “Billing.” Clicking on that, the user is presented with line items for previous bills, payments, and an estimate for the upcoming charges.

Backblaze B2 billing screenshot

One can expand any given row to see the the line item transactions composing each bill.

Backblaze B2 billing details screenshot

Summary: Available on demand, and the site clearly defines what has and will be charged for.

Azure

Trying to understand the Azure billing proved to be a bit tricky.

On August 6th, we logged into the billing console and were presented with this screen.

Microsoft Azure billing screenshot

As you can see, on Aug 6th, billing for the period of May-June was not available for download. For the period ending June 26th, we were charged nearly a month later, on July 24th. Clicking into that row item does display line item information.

Microsoft Azure cloud storage billing details screenshot

Summary: Available, but difficult to find. The nearly 30 day lag in billing creates business and accounting challenges.

Amazon S3

Amazon presents a clean billing summary and enables users to “drill down” into line items.

Going to the billing area of AWS, one can survey various monthly bills and is presented with a clean summary of billing charges.

AWS billing screenshot

Expanding into the billing detail, Amazon articulates each line item charge. Within each line item, charges are broken out into sub-line items for the different tiers of pricing.

AWS billing details screenshot

Summary: Available on demand. While there are some line items that seem unnecessary for our test, the bill is generally straight-forward to understand.

Google Cloud Storage (“GCS”)

This was an area where the GCS User Interface, which was otherwise relatively intuitive, became confusing.

Going to the Billing Overview page did not offer much in the way of an overview on charges.

Google Cloud Storage billing screenshot

However, moving down to the “Transactions” section did provide line item detail on all the charges incurred. However, similar to Azure introducing the concept of KiB, Google introduces the concept of the equally confusing Gibibyte (GiB). While all of Google’s pricing tables are listed in terms of GB, the line items reference GiB. 1 GiB is 1.07374 GBs.

Google Cloud Storage billing details screenshot

Summary: Available, but provides descriptions in units that are not on the pricing table nor commonly used.

Test 2 Conclusions

Clearly, some vendors do a better job than others in making their pricing available and understandable. From a transparency standpoint, it’s difficult to justify why a vendor would have their pricing table in units of X, but then put units of Y in the user interface.

Transparency: The Backblaze Way

Transparency isn’t easy. At Backblaze, we believe in investing time and energy into presenting the most intuitive user interfaces that we can create. We take pride in our heritage in the consumer backup space — servicing consumers has taught us how to make things understandable and usable. We do our best to apply those lessons to everything we do.

This philosophy reflects our desire to make our products usable, but it’s also part of a larger ethos of being transparent with our customers. We are being trusted with precious data. We want to repay that trust with, among other things, transparency.

It’s that spirit that was behind the decision to publish our hard drive performance stats, to open source the infrastructure that is behind us having the lowest cost of storage in the industry, and also to open source our erasure coding (the math that drives a significant portion of our redundancy for your data).

Why? We believe it’s not just about good user interface, it’s about the relationship we want to build with our customers.

The post Cloud Storage Doesn’t have to be Convoluted, Complex, or Confusing appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Sci-Hub Faces $4,8 Million Piracy Damages and ISP Blocking

Post Syndicated from Ernesto original https://torrentfreak.com/sci-hub-faces-48-million-piracy-damages-and-isp-blocking-170905/

In June, a New York District Court handed down a default judgment against Sci-Hub.

The pirate site, operated by Alexandra Elbakyan, was ordered to pay $15 million in piracy damages to academic publisher Elsevier.

With the ink on this order barely dry, another publisher soon tagged on with a fresh complaint. The American Chemical Society (ACS), a leading source of academic publications in the field of chemistry, also accused Sci-Hub of mass copyright infringement.

Founded more than 140 years ago, the non-profit organization has around 157,000 members and researchers who publish tens of thousands of articles a year in its peer-reviewed journals. Because many of its works are available for free on Sci-Hub, ACS wants to be compensated.

Sci-Hub was made aware of the legal proceedings but did not appear in court. As a result, a default was entered against the site, and a few days ago ACS specified its demands, which include $4.8 million in piracy damages.

“Here, ACS seeks a judgment against Sci-Hub in the amount of $4,800,000—which is based on infringement of a representative sample of publications containing the ACS Copyrighted Works multiplied by the maximum statutory damages of $150,000 for each publication,” they write.

The publisher notes that the maximum statutory damages are only requested for 32 of its 9,000 registered works. This still adds up to a significant sum of money, of course, but that is needed as a deterrent, ACS claims.

“Sci-Hub’s unabashed flouting of U.S. Copyright laws merits a strong deterrent. This Court has awarded a copyright holder maximum statutory damages where the defendant’s actions were ‘clearly willful’ and maximum damages were necessary to ‘deter similar actors in the future’,” they write.

Although the deterrent effect may sound plausible in most cases, another $4.8 million in debt is unlikely to worry Sci-Hub’s owner, as she can’t pay it off anyway. However, there’s also a broad injunction on the table that may be more of a concern.

The requested injunction prohibits Sci-Hub’s owner to continue her work on the site. In addition, it also bars a wide range of other service providers from assisting others to access it.

Specifically, it restrains “any Internet search engines, web hosting and Internet service providers, domain name registrars, and domain name registries, to cease facilitating access to any or all domain names and websites through which Defendant Sci-Hub engages in unlawful access to [ACS’s works].”

The above suggests that search engines may have to remove the site from their indexes while ISPs could be required to block their users’ access to the site as well, which goes quite far.

Since Sci-Hub is in default, ACS is likely to get what it wants. However, if the organization intends to enforce the order in full, it’s likely that some of these third-party services, including Internet providers, will have to spring into action.

While domain name registries are regularly ordered to suspend domains, search engine removals and ISP blocking are not common in the United States. It would, therefore, be no surprise if this case lingers a little while longer.

A copy of ACS’s proposed default judgment, obtained by TorrentFreak, is available here (pdf).

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

State of MAC address randomization

Post Syndicated from Robert Graham original http://blog.erratasec.com/2017/09/state-of-mac-address-randomization.html

tldr: I went to DragonCon, a conference of 85,000 people, so sniff WiFi packets and test how many phones now uses MAC address randomization. Almost all iPhones nowadays do, but it seems only a third of Android phones do.

Ten years ago at BlackHat, we presented the “data seepage” problem, how the broadcasts from your devices allow you to be tracked. Among the things we highlighted was how WiFi probes looking to connect to access-points expose the unique hardware address burned into the phone, the MAC address. This hardware address is unique to your phone, shared by no other device in the world. Evildoers, such as the NSA or GRU, could install passive listening devices in airports and train-stations around the world in order to track your movements. This could be done with $25 devices sprinkled around a few thousand places — within the budget of not only a police state, but also the average hacker.

In 2014, with the release of iOS 8, Apple addressed this problem by randomizing the MAC address. Every time you restart your phone, it picks a new, random, hardware address for connecting to WiFi. This causes a few problems: every time you restart your iOS devices, your home network sees a completely new device, which can fill up your router’s connection table. Since that table usually has at least 100 entries, this shouldn’t be a problem for your home, but corporations and other owners of big networks saw their connection tables suddenly get big with iOS 8.

In 2015, Google added the feature to Android as well. However, even though most Android phones today support this feature in theory, it’s usually not enabled.

Recently, I went to DragonCon in order to test out how well this works. DragonCon is a huge sci-fi/fantasy conference in Atlanta in August, second to San Diego’s ComicCon in popularity. It’s spread across several neighboring hotels in the downtown area. A lot of the traffic funnels through the Marriot Marquis hotel, which has a large open area where, from above, you can see thousands of people at a time.

And, with a laptop, see their broadcast packets.

So I went up on a higher floor and setup my laptop in order to capture “probe” broadcasts coming from phones, in order to record the hardware MAC addresses. I’ve done this in years past, before address randomization, in order to record the popularity of iPhones. The first three bytes of an old-style, non-randomized address, identifies the manufacturer. This time, I should see a lot fewer manufacturer IDs, and mostly just random addresses instead.

I recorded 9,095 unique probes over a couple hours. I’m not sure exactly how long — my laptop would go to sleep occasionally because of lack of activity on the keyboard. I should probably setup a Raspberry Pi somewhere next year to get a more consistent result.

A quick summary of the results are:

The 9,000 devices were split almost evenly between Apple and Android. Almost all of the Apple devices randomized their addresses. About a third of the Android devices randomized. (This assumes Android only randomizes the final 3 bytes of the address, and that Apple randomizes all 6 bytes — my assumption may be wrong).

A table of the major results are below. A little explanation:

  • The first item in the table is the number of phones that randomized the full 6 bytes of the MAC address. I’m guessing these are either mostly or all Apple iOS devices. They are nearly half of the total, or 4498 out of 9095 unique probes.
  • The second number is those that randomized the final 3 bytes of the MAC address, but left the first three bytes identifying themselves as Android devices. I’m guessing this represents all the Android devices that randomize. My guesses may be wrong, maybe some Androids randomize the full 6 bytes, which would get them counted in the first number.
  • The following numbers are phones from major Android manufacturers like Motorola, LG, HTC, Huawei, OnePlus, ZTE. Remember: the first 3 bytes of an un-randomized address identifies who made it. There are roughly 2500 of these devices.
  • There is a count for 309 Apple devices. These are either older iOS devices pre iOS 8, or which have turned off the feature (some corporations demand this), or which are actually MacBooks instead of phones.
  • The vendor of the access-points that Marriot uses is “Ruckus”. There have a lot of access-points in the hotel.
  • The “TCT mobile” entry is actually BlackBerry. Apparently, BlackBerry stopped making phones and instead just licenses the software/brand to other hardware makers. If you buy a BlackBerry from the phone store, it’s likely going to be a TCT phone instead.
  • I’m assuming the “Amazon” devices are Kindle ebooks.
  • Lastly, I’d like to point out the two records for “Ford”. I was capturing while walking out of the building, I think I got a few cars driving by.

(random)  4498
(Android)  1562
Samsung  646
Motorola  579
Murata  505
LG  412
Apple  309
HTC-phone  226
Huawei  66
Ruckus  60
OnePlus Tec  40
ZTE  23
TCT mobile  20
Amazon Tech  19
Nintendo  17
Intel  14
Microsoft  9
-hp-  8
BLU Product  8
Kyocera  8
AsusTek  6
Yulong Comp  6
Lite-On  4
Sony Mobile  4
Z-COM, INC.  4
ARRIS Group  2
AzureWave  2
Barnes&Nobl  2
Canon  2
Ford Motor  2
Foxconn  2
Google, Inc  2
Motorola (W  2
Sonos, Inc.  2
SparkLAN Co  2
Wi2Wi, Inc  2
Xiaomi Comm  2
Alps Electr  1
Askey  1
BlackBerry  1
Chi Mei Com  1
Clover Netw  1
CNet Techno  1
eSSys Co.,L  1
GoPro  1
InPro Comm  1
JJPlus Corp  1
Private  1
Quanta  1
Raspberry P  1
Roku, Inc.  1
Sonim Techn  1
Texas Instr  1
TP-LINK TEC  1
Vizio, Inc  1

Russian Hacking Tools Codenamed WhiteBear Exposed

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/09/russian_hacking.html

Kaspersky Labs exposed a highly sophisticated set of hacking tools from Russia called WhiteBear.

From February to September 2016, WhiteBear activity was narrowly focused on embassies and consular operations around the world. All of these early WhiteBear targets were related to embassies and diplomatic/foreign affair organizations. Continued WhiteBear activity later shifted to include defense-related organizations into June 2017. When compared to WhiteAtlas infections, WhiteBear deployments are relatively rare and represent a departure from the broader Skipper Turla target set. Additionally, a comparison of the WhiteAtlas framework to WhiteBear components indicates that the malware is the product of separate development efforts. WhiteBear infections appear to be preceded by a condensed spearphishing dropper, lack Firefox extension installer payloads, and contain several new components signed with a new code signing digital certificate, unlike WhiteAtlas incidents and modules.

The exact delivery vector for WhiteBear components is unknown to us, although we have very strong suspicion the group spearphished targets with malicious pdf files. The decoy pdf document above was likely stolen from a target or partner. And, although WhiteBear components have been consistently identified on a subset of systems previously targeted with the WhiteAtlas framework, and maintain components within the same filepaths and can maintain identical filenames, we were unable to firmly tie delivery to any specific WhiteAtlas component. WhiteBear focused on various embassies and diplomatic entities around the world in early 2016 — tellingly, attempts were made to drop and display decoy pdf’s with full diplomatic headers and content alongside executable droppers on target systems.

One of the clever things the tool does is use hijacked satellite connections for command and control, helping it evade detection by broad surveillance capabilities like what what NSA uses. We’ve seen Russian attack tools that do this before. More details are in the Kaspersky blog post.

Given all the trouble Kaspersky is having because of its association with Russia, it’s interesting to speculate on this disclosure. Either they are independent, and have burned a valuable Russian hacking toolset. Or the Russians decided that the toolset was already burned — maybe the NSA knows all about it and has neutered it somehow — and allowed Kaspersky to publish. Or maybe it’s something in between. That’s the problem with this kind of speculation: without any facts, your theories just amplify whatever opinion you had previously.

Oddly, there hasn’t been much press about this. I have only found one story.

EDITED TO ADD: A colleague pointed out to me that Kaspersky announcements like this often get ignored by the press. There was very little written about ProjectSauron, for example.

EDITED TO ADD: The text I originally wrote said that Kaspersky released the attacks tools, like what Shadow Brokers is doing. They did not. They just exposed the existence of them. Apologies for that error — it was sloppy wording.

Kim Dotcom Wants K.im to Trigger a “Copyright Revolution”

Post Syndicated from Ernesto original https://torrentfreak.com/kim-dotcom-wants-k-im-to-trigger-a-copyright-revolution-170831/

For many people Kim Dotcom is synonymous with Megaupload, the file-sharing giant that was taken down by the U.S. Government early 2012.

While Megaupload is no more, the New Zealand Internet entrepreneur is working on a new file-sharing site. Initially dubbed Megaupload 2, the new service will be called K.im, and it will be quite different from its predecessor.

This week Dotcom, who’s officially the chief “evangelist” of the service, showed a demo to a few thousand people revealing more about what it’s going to offer.

K.im is not a central hosting service, quite the contrary. It will allow users to upload content and distribute it to dozens of other services, including Dropbox, Google, Reddit, Storj, and even torrent sites.

The files are distributed across the Internet where they can be accessed freely. However, there is a catch. The uploaders set a price for each download and people who want a copy can only unlock it through the K.im app or browser addon, after they’ve paid.

Pick your price

K.im, paired with Bitcache, is basically a micropayment solution. It allows creators to charge the public for everything they upload. Every download is tied to a Bitcoin transaction, turning files into their own “stores.”

Kim Dotcom tells TorrentFreak that he sees the service as a copyright revolution. It should be a win-win solution for independent creators, rightsholders, and people who are used to pirating stuff.

“I’m working for both sides. For the copyright holders and also for the people who what to pay for content but have been geo-blocked and then are forced to download for free,” Dotcom says.

Like any other site that allows user uploaded content, K.im can also be used by pirates who want to charge a small fee for spreading infringing content. This is something Dotcom is aware of, but he has a solution in mind.

Much like YouTube, which allows rightsholders to “monetize” videos that use their work, K.im will provide an option to claim pirated content. Rightsholders can then change the price and all revenue will go to them.

So, if someone uploads a pirated copy of the Game of Thrones season finale through K.im, HBO can claim that file, charge an appropriate fee, and profit from it. The uploader, meanwhile, maintains his privacy.

“It is the holy grail of copyright enforcement. It is my gift to Hollywood, the movie studios, and everyone else,” Dotcom says.

Dotcom believes that piracy is in large part caused by an availability problem. People can often not find the content they’re looking for so it’s K.im’s goal to distribute files as widely as possible. This includes several torrent sites, which are currently featured in the demo.

Torrent uploads?

Interestingly, it will be hard to upload content to sites such as YTS, EZTV, KickassTorrents, and RARBG, as they’ve been shut down or don’t allow user uploads. However, Dotcom stresses that the names are just examples, and that they are still working on partnering with various sites.

Whether torrent sites will be eager to cooperate has yet to be seen. It’s possible that the encrypted files, which can’t be opened without paying, will be seen as “spam” by traditional torrent sites.

Also, from a user perspective, one has to wonder how many people are willing to pay for something if they set out to pirate it. After all, there will always be plenty of free options for those who refuse to or can’t pay.

Dotcom, however, is convinced that K.im can create a “copyright revolution.” He stresses that site owners and uploaders can greatly benefit from it as they receive affiliate fees, even after a pirated file is claimed by a rightsholder.

In addition, he says it will revolutionize copyright enforcement, as copyright holders can monetize the work of pirates. That is, if they are willing to work with the service.

“Rightsholders can turn piracy traffic into revenue and users can access the content on any platform. Since every file is a store, it doesn’t matter where it ends up,” Dotcom says.

Dotcom does have a very valid point here. Many people have simply grown used to pirating because it’s much more convenient than using a dozen different services. In Dotcom’s vision, people can just use one site to access everything.

The ideas don’t stop at sharing files either. In the future, Dotcom also wants to use the micropayment option to offer YouTubers and media organizations to accept payments from the public, BBC notes.

There’s still a long way to go before K.im and Bitcache go public though. The expected launch date is not final yet, but the services are expected to go live in mid-to-late 2018.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

The NSA’s 2014 Media Engagement and Outreach Plan

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/08/the_nsas_2014_m.html

Interesting post-Snowden reading, just declassified.

(U) External Communication will address at least one of “fresh look” narratives:

  1. (U) NSA does not access everything.
  2. (U) NSA does not collect indiscriminately on U.S. Persons and foreign nationals.
  3. (U) NSA does not weaken encryption.
  4. (U) NSA has value to the nation.

There’s lots more.

Piracy Fines For Dutch Pirates, Starting This Autumn

Post Syndicated from Andy original https://torrentfreak.com/piracy-fines-for-dutch-pirates-starting-this-autumn-170828/

In 2014, the European Court of Justice ruled that the “piracy levy”, used in the Netherlands to compensate rightsholders for illicit downloading, was unlawful. In the immediate aftermath, downloading from unauthorized sources was banned.

Three years on and illegal downloading is still considered by rightsholders to be a problem that needs to be brought under control. This means that BitTorrent users are the number one target since their activities also involve uploading, something that most courts consider to be a relatively serious offense.

With that in mind, Dutch film distributor Dutch Filmworks (DFW) is preparing a wave of anti-piracy activity that looks set to mimic the copyright-trolling activities of similar outfits all over the world.

A recent application to the Dutch Data Protection Authority (Autoriteit Persoonsgegevens), revealed that DFW wishes to combat “the unlawful dissemination of copyright protected works” by monitoring the activities of BitTorrent users.

“DFW intends to collect data from people who exchange files over the Internet through BitTorrent networks. The data processing consists of capturing proof of exchange of files via IP addresses for the purpose of researching involvement of these users in the distribution or reproduction of copyrighted works,” it reads.

People who are monitored sharing DFW titles (the company says it intends to track people sharing dozens of releases) will get a letter with an offer to settle in advance of being taken to court. Speaking with NOS, DFW CEO Willem Pruijsserts now reveals that the campaign will begin in the autumn.

“[The lettter] will propose a fee,” he says. “If someone does not agree [to pay], the organization can start a lawsuit.”

Quite how much DFW will ask for is not yet clear, but Pruijsserts says the Dutch model will be more reasonable than similar schemes underway in other regions.

“In Germany, this costs between €800 and €1,000, although we find this a bit excessive. But of course it has to be a deterrent, so it will be more than a tenner or two,” he said.

In comments to RTLZ, Pruijsserts confirmed ‘fines’ of at least hundreds of euros.

According to documents filed with the Dutch data protection authority, DFW will employ an external German-based tracking company to monitor alleged pirates which will “automatically participate in swarms in which works from DFW are being shared.” The company has been named by RTL Z as German company Excipion, which could be linked to the monitoring outfit Tecxipio, which began as Excipio.

In conversation with NOS, Pruijsserts said that “hundreds of thousands” of people watched films like Mechanic: Resurrection without paying. This particular movie is notable for appearing in many piracy cases in the United States. It is one of the titles pursued relentlessly by lawyers acting in concert with notorious copyright-trolling outfit Guardaley.

Perhaps the most crucial element moving forward is whether DFW will be able to get ISPs to cooperate in handing over the personal details of allegedly infringing subscribers. Thus far, ISPs Ziggo and KPN have indicated they won’t do so without a court order, so further legal action will be required for DFW to progress.

When DFW’s application for discovery is heard by the court, it will be interesting to see how far the ISPs dig into the anti-piracy scheme. Finding out more about Guardaley, if the company is indeed involved, would be an intriguing approach, especially given the outfit’s tendency to scurry away (1,2) when coming under intense scrutiny.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Entire Kim Dotcom Spying Operation Was Illegal, High Court Rules

Post Syndicated from Andy original https://torrentfreak.com/entire-kim-dotcom-spying-operation-was-illegal-high-court-rules-170825/

In the months that preceded the January 2012 raid on file-storage site Megaupload, authorities in New Zealand used the Government Communications Security Bureau (GCSB) spy agency to monitor Kim and Mona Dotcom, plus Megaupload co-defendant Bram van der Kolk.

When this fact was revealed it developed into a crisis. The GCSB was forbidden by law from conducting surveillance on its own citizens or permanent residents in the country, which led to former Prime Minister John Key later apologizing for the error.

With Dotcom determined to uncover the truth, the entrepreneur launched legal action in pursuit of the information illegally obtained by GCSB and to obtain compensation. In July, the High Court determined that Dotcom wouldn’t get access to the information but it also revealed that the scope of the spying went on much longer than previously admitted, a fact later confirmed by the police.

This raised the specter that not only did the GCSB continue to spy on Dotcom after it knew it was acting illegally, but that an earlier affidavit from a GCSB staff member was suspect.

With the saga continuing to drag on, revelations published in New Zealand this morning indicate that not only was the spying on Dotcom illegal, the entire spying operation – which included his Megaupload co-defendants – was too.

The reports are based on documents released by Lawyer Peter Spring, who is acting for Bram van der Kolk and Mathias Ortmann. Spring says that the High Court decision, which dates back to December but has only just been made available, shows that “the whole surveillance operation fell outside the authorization of the GCSB legislation as it was at the relevant time”.

Since Dotcom is a permanent resident of New Zealand, it’s long been established that the GCSB acted illegally when it spied on him. As foreigners, however, Megaupload co-defendants Finn Batato and Mathias Ortmann were previously considered valid surveillance targets.

It now transpires that the GCSB wasn’t prepared to mount a defense or reveal its methods concerning their surveillance, something which boosted the case against it.

“The circumstances of the interceptions of Messrs Ortmann and Batato’s communications are Top Secret and it has not proved possible to plead to the allegations the plaintiffs have made without revealing information which would jeopardize the national security of New Zealand,” the Court documents read.

“As a result the GCSB is deemed to have admitted the allegations in the statement of claim which relate to the manner in which the interceptions were effected.”

Speaking with RadioNZ, Grant Illingworth, a lawyer representing Ortmann and van der Kolk, said the decision calls the entire GCSB operation into doubt.

“The GCSB has now admitted that the unlawfulness was not just dependent upon residency issues, it went further. The reason it went further was because it didn’t have authorization to carry out the kind of surveillance that it was carrying out under the legislation, as it was at that time,” Illingworth said.

In comments to NZHerald, Illingworth added that the decision meant that the damages case for Ortmann and van der Kolk had come to an end. He refused to respond to questions of whether damages had been paid or a settlement reached.

He did indicate, however, that there could be implications for the battle underway to have Dotcom, Batato, Ortmann and van der Kolk extradited to the United States.

“If there was illegality in the arrest and search phase and that illegality has not previously been made known in the extradition context then it could be relevant to the extradition,” Illingworth said.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

From Data Lake to Data Warehouse: Enhancing Customer 360 with Amazon Redshift Spectrum

Post Syndicated from Dylan Tong original https://aws.amazon.com/blogs/big-data/from-data-lake-to-data-warehouse-enhancing-customer-360-with-amazon-redshift-spectrum/

Achieving a 360o-view of your customer has become increasingly challenging as companies embrace omni-channel strategies, engaging customers across websites, mobile, call centers, social media, physical sites, and beyond. The promise of a web where online and physical worlds blend makes understanding your customers more challenging, but also more important. Businesses that are successful in this medium have a significant competitive advantage.

The big data challenge requires the management of data at high velocity and volume. Many customers have identified Amazon S3 as a great data lake solution that removes the complexities of managing a highly durable, fault tolerant data lake infrastructure at scale and economically.

AWS data services substantially lessen the heavy lifting of adopting technologies, allowing you to spend more time on what matters most—gaining a better understanding of customers to elevate your business. In this post, I show how a recent Amazon Redshift innovation, Redshift Spectrum, can enhance a customer 360 initiative.

Customer 360 solution

A successful customer 360 view benefits from using a variety of technologies to deliver different forms of insights. These could range from real-time analysis of streaming data from wearable devices and mobile interactions to historical analysis that requires interactive, on demand queries on billions of transactions. In some cases, insights can only be inferred through AI via deep learning. Finally, the value of your customer data and insights can’t be fully realized until it is operationalized at scale—readily accessible by fleets of applications. Companies are leveraging AWS for the breadth of services that cover these domains, to drive their data strategy.

A number of AWS customers stream data from various sources into a S3 data lake through Amazon Kinesis. They use Kinesis and technologies in the Hadoop ecosystem like Spark running on Amazon EMR to enrich this data. High-value data is loaded into an Amazon Redshift data warehouse, which allows users to analyze and interact with data through a choice of client tools. Redshift Spectrum expands on this analytics platform by enabling Amazon Redshift to blend and analyze data beyond the data warehouse and across a data lake.

The following diagram illustrates the workflow for such a solution.

This solution delivers value by:

  • Reducing complexity and time to value to deeper insights. For instance, an existing data model in Amazon Redshift may provide insights across dimensions such as customer, geography, time, and product on metrics from sales and financial systems. Down the road, you may gain access to streaming data sources like customer-care call logs and website activity that you want to blend in with the sales data on the same dimensions to understand how web and call center experiences maybe correlated with sales performance. Redshift Spectrum can join these dimensions in Amazon Redshift with data in S3 to allow you to quickly gain new insights, and avoid the slow and more expensive alternative of fully integrating these sources with your data warehouse.
  • Providing an additional avenue for optimizing costs and performance. In cases like call logs and clickstream data where volumes could be many TBs to PBs, storing the data exclusively in S3 yields significant cost savings. Interactive analysis on massive datasets may now be economically viable in cases where data was previously analyzed periodically through static reports generated by inexpensive batch processes. In some cases, you can improve the user experience while simultaneously lowering costs. Spectrum is powered by a large-scale infrastructure external to your Amazon Redshift cluster, and excels at scanning and aggregating large volumes of data. For instance, your analysts maybe performing data discovery on customer interactions across millions of consumers over years of data across various channels. On this large dataset, certain queries could be slow if you didn’t have a large Amazon Redshift cluster. Alternatively, you could use Redshift Spectrum to achieve a better user experience with a smaller cluster.

Proof of concept walkthrough

To make evaluation easier for you, I’ve conducted a Redshift Spectrum proof-of-concept (PoC) for the customer 360 use case. For those who want to replicate the PoC, the instructions, AWS CloudFormation templates, and public data sets are available in the GitHub repository.

The remainder of this post is a journey through the project, observing best practices in action, and learning how you can achieve business value. The walkthrough involves:

  • An analysis of performance data from the PoC environment involving queries that demonstrate blending and analysis of data across Amazon Redshift and S3. Observe that great results are achievable at scale.
  • Guidance by example on query tuning, design, and data preparation to illustrate the optimization process. This includes tuning a query that combines clickstream data in S3 with customer and time dimensions in Amazon Redshift, and aggregates ~1.9 B out of 3.7 B+ records in under 10 seconds with a small cluster!
  • Guidance and measurements to help assess deciding between two options: accessing and analyzing data exclusively in Amazon Redshift, or using Redshift Spectrum to access data left in S3.

Stream ingestion and enrichment

The focus of this post isn’t stream ingestion and enrichment on Kinesis and EMR, but be mindful of performance best practices on S3 to ensure good streaming and query performance:

  • Use random object keys: The data files provided for this project are prefixed with SHA-256 hashes to prevent hot partitions. This is important to ensure that optimal request rates to support PUT requests from the incoming stream in addition to certain queries from large Amazon Redshift clusters that could send a large number of parallel GET requests.
  • Micro-batch your data stream: S3 isn’t optimized for small random write workloads. Your datasets should be micro-batched into large files. For instance, the “parquet-1” dataset provided batches >7 million records per file. The optimal file size for Redshift Spectrum is usually in the 100 MB to 1 GB range.

If you have an edge case that may pose scalability challenges, AWS would love to hear about it. For further guidance, talk to your solutions architect.

Environment

The project consists of the following environment:

  • Amazon Redshift cluster: 4 X dc1.large
  • Data:
    • Time and customer dimension tables are stored on all Amazon Redshift nodes (ALL distribution style):
      • The data originates from the DWDATE and CUSTOMER tables in the Star Schema Benchmark
      • The customer table contains attributes for 3 million customers.
      • The time data is at the day-level granularity, and spans 7 years, from the start of 1992 to the end of 1998.
    • The clickstream data is stored in an S3 bucket, and serves as a fact table.
      • Various copies of this dataset in CSV and Parquet format have been provided, for reasons to be discussed later.
      • The data is a modified version of the uservisits dataset from AMPLab’s Big Data Benchmark, which was generated by Intel’s Hadoop benchmark tools.
      • Changes were minimal, so that existing test harnesses for this test can be adapted:
        • Increased the 751,754,869-row dataset 5X to 3,758,774,345 rows.
        • Added surrogate keys to support joins with customer and time dimensions. These keys were distributed evenly across the entire dataset to represents user visits from six customers over seven years.
        • Values for the visitDate column were replaced to align with the 7-year timeframe, and the added time surrogate key.

Queries across the data lake and data warehouse 

Imagine a scenario where a business analyst plans to analyze clickstream metrics like ad revenue over time and by customer, market segment and more. The example below is a query that achieves this effect: 

The query part highlighted in red retrieves clickstream data in S3, and joins the data with the time and customer dimension tables in Amazon Redshift through the part highlighted in blue. The query returns the total ad revenue for three customers over the last three months, along with info on their respective market segment.

Unfortunately, this query takes around three minutes to run, and doesn’t enable the interactive experience that you want. However, there’s a number of performance optimizations that you can implement to achieve the desired performance.

Performance analysis

Two key utilities provide visibility into Redshift Spectrum:

  • EXPLAIN
    Provides the query execution plan, which includes info around what processing is pushed down to Redshift Spectrum. Steps in the plan that include the prefix S3 are executed on Redshift Spectrum. For instance, the plan for the previous query has the step “S3 Seq Scan clickstream.uservisits_csv10”, indicating that Redshift Spectrum performs a scan on S3 as part of the query execution.
  • SVL_S3QUERY_SUMMARY
    Statistics for Redshift Spectrum queries are stored in this table. While the execution plan presents cost estimates, this table stores actual statistics for past query runs.

You can get the statistics of your last query by inspecting the SVL_S3QUERY_SUMMARY table with the condition (query = pg_last_query_id()). Inspecting the previous query reveals that the entire dataset of nearly 3.8 billion rows was scanned to retrieve less than 66.3 million rows. Improving scan selectivity in your query could yield substantial performance improvements.

Partitioning

Partitioning is a key means to improving scan efficiency. In your environment, the data and tables have already been organized, and configured to support partitions. For more information, see the PoC project setup instructions. The clickstream table was defined as:

CREATE EXTERNAL TABLE clickstream.uservisits_csv10
…
PARTITIONED BY(customer int4, visitYearMonth int4)

The entire 3.8 billion-row dataset is organized as a collection of large files where each file contains data exclusive to a particular customer and month in a year. This allows you to partition your data into logical subsets by customer and year/month. With partitions, the query engine can target a subset of files:

  • Only for specific customers
  • Only data for specific months
  • A combination of specific customers and year/months

You can use partitions in your queries. Instead of joining your customer data on the surrogate customer key (that is, c.c_custkey = uv.custKey), the partition key “customer” should be used instead:

SELECT c.c_name, c.c_mktsegment, t.prettyMonthYear, SUM(uv.adRevenue)
…
ON c.c_custkey = uv.customer
…
ORDER BY c.c_name, c.c_mktsegment, uv.yearMonthKey  ASC

This query should run approximately twice as fast as the previous query. If you look at the statistics for this query in SVL_S3QUERY_SUMMARY, you see that only half the dataset was scanned. This is expected because your query is on three out of six customers on an evenly distributed dataset. However, the scan is still inefficient, and you can benefit from using your year/month partition key as well:

SELECT c.c_name, c.c_mktsegment, t.prettyMonthYear, SUM(uv.adRevenue)
…
ON c.c_custkey = uv.customer
…
ON uv.visitYearMonth = t.d_yearmonthnum
…
ORDER BY c.c_name, c.c_mktsegment, uv.visitYearMonth ASC

All joins between the tables are now using partitions. Upon reviewing the statistics for this query, you should observe that Redshift Spectrum scans and returns the exact number of rows, 66,270,117. If you run this query a few times, you should see execution time in the range of 8 seconds, which is a 22.5X improvement on your original query!

Predicate pushdown and storage optimizations 

Previously, I mentioned that Redshift Spectrum performs processing through large-scale infrastructure external to your Amazon Redshift cluster. It is optimized for performing large scans and aggregations on S3. In fact, Redshift Spectrum may even out-perform a medium size Amazon Redshift cluster on these types of workloads with the proper optimizations. There are two important variables to consider for optimizing large scans and aggregations:

  • File size and count. As a general rule, use files 100 MB-1 GB in size, as Redshift Spectrum and S3 are optimized for reading this object size. However, the number of files operating on a query is directly correlated with the parallelism achievable by a query. There is an inverse relationship between file size and count: the bigger the files, the fewer files there are for the same dataset. Consequently, there is a trade-off between optimizing for object read performance, and the amount of parallelism achievable on a particular query. Large files are best for large scans as the query likely operates on sufficiently large number of files. For queries that are more selective and for which fewer files are operating, you may find that smaller files allow for more parallelism.
  • Data format. Redshift Spectrum supports various data formats. Columnar formats like Parquet can sometimes lead to substantial performance benefits by providing compression and more efficient I/O for certain workloads. Generally, format types like Parquet should be used for query workloads involving large scans, and high attribute selectivity. Again, there are trade-offs as formats like Parquet require more compute power to process than plaintext. For queries on smaller subsets of data, the I/O efficiency benefit of Parquet is diminished. At some point, Parquet may perform the same or slower than plaintext. Latency, compression rates, and the trade-off between user experience and cost should drive your decision.

To help illustrate how Redshift Spectrum performs on these large aggregation workloads, run a basic query that aggregates the entire ~3.7 billion record dataset on Redshift Spectrum, and compared that with running the query exclusively on Amazon Redshift:

SELECT uv.custKey, COUNT(uv.custKey)
FROM <your clickstream table> as uv
GROUP BY uv.custKey
ORDER BY uv.custKey ASC

For the Amazon Redshift test case, the clickstream data is loaded, and distributed evenly across all nodes (even distribution style) with optimal column compression encodings prescribed by the Amazon Redshift’s ANALYZE command.

The Redshift Spectrum test case uses a Parquet data format with each file containing all the data for a particular customer in a month. This results in files mostly in the range of 220-280 MB, and in effect, is the largest file size for this partitioning scheme. If you run tests with the other datasets provided, you see that this data format and size is optimal and out-performs others by ~60X. 

Performance differences will vary depending on the scenario. The important takeaway is to understand the testing strategy and the workload characteristics where Redshift Spectrum is likely to yield performance benefits. 

The following chart compares the query execution time for the two scenarios. The results indicate that you would have to pay for 12 X DC1.Large nodes to get performance comparable to using a small Amazon Redshift cluster that leverages Redshift Spectrum. 

Chart showing simple aggregation on ~3.7 billion records

So you’ve validated that Spectrum excels at performing large aggregations. Could you benefit by pushing more work down to Redshift Spectrum in your original query? It turns out that you can, by making the following modification:

The clickstream data is stored at a day-level granularity for each customer while your query rolls up the data to the month level per customer. In the earlier query that uses the day/month partition key, you optimized the query so that it only scans and retrieves the data required, but the day level data is still sent back to your Amazon Redshift cluster for joining and aggregation. The query shown here pushes aggregation work down to Redshift Spectrum as indicated by the query plan:

In this query, Redshift Spectrum aggregates the clickstream data to the month level before it is returned to the Amazon Redshift cluster and joined with the dimension tables. This query should complete in about 4 seconds, which is roughly twice as fast as only using the partition key. The speed increase is evident upon reviewing the SVL_S3QUERY_SUMMARY table:

  • Bytes scanned is 21.6X less because of the Parquet data format.
  • Only 90 records are returned back to the Amazon Redshift cluster as a result of the push-down, instead of ~66.2 million, leading to substantially less join overhead, and about 530 MB less data sent back to your cluster.
  • No adverse change in average parallelism.

Assessing the value of Amazon Redshift vs. Redshift Spectrum

At this point, you might be asking yourself, why would I ever not use Redshift Spectrum? Well, you still get additional value for your money by loading data into Amazon Redshift, and querying in Amazon Redshift vs. querying S3.

In fact, it turns out that the last version of our query runs even faster when executed exclusively in native Amazon Redshift, as shown in the following chart:

Chart comparing Amazon Redshift vs. Redshift Spectrum with pushdown aggregation over 3 months of data

As a general rule, queries that aren’t dominated by I/O and which involve multiple joins are better optimized in native Amazon Redshift. For instance, the performance difference between running the partition key query entirely in Amazon Redshift versus with Redshift Spectrum is twice as large as that that of the pushdown aggregation query, partly because the former case benefits more from better join performance.

Furthermore, the variability in latency in native Amazon Redshift is lower. For use cases where you have tight performance SLAs on queries, you may want to consider using Amazon Redshift exclusively to support those queries.

On the other hand, when you perform large scans, you could benefit from the best of both worlds: higher performance at lower cost. For instance, imagine that you wanted to enable your business analysts to interactively discover insights across a vast amount of historical data. In the example below, the pushdown aggregation query is modified to analyze seven years of data instead of three months:

SELECT c.c_name, c.c_mktsegment, t.prettyMonthYear, uv.totalRevenue
…
WHERE customer <= 3 and visitYearMonth >= 199201
… 
FROM dwdate WHERE d_yearmonthnum >= 199201) as t
…
ORDER BY c.c_name, c.c_mktsegment, uv.visitYearMonth ASC

This query requires scanning and aggregating nearly 1.9 billion records. As shown in the chart below, Redshift Spectrum substantially speeds up this query. A large Amazon Redshift cluster would have to be provisioned to support this use case. With the aid of Redshift Spectrum, you could use an existing small cluster, keep a single copy of your data in S3, and benefit from economical, durable storage while only paying for what you use via the pay per query pricing model.

Chart comparing Amazon Redshift vs. Redshift Spectrum with pushdown aggregation over 7 years of data

Summary

Redshift Spectrum lowers the time to value for deeper insights on customer data queries spanning the data lake and data warehouse. It can enable interactive analysis on datasets in cases that weren’t economically practical or technically feasible before.

There are cases where you can get the best of both worlds from Redshift Spectrum: higher performance at lower cost. However, there are still latency-sensitive use cases where you may want native Amazon Redshift performance. For more best practice tips, see the 10 Best Practices for Amazon Redshift post.

Please visit the Amazon Redshift Spectrum PoC Environment Github page. If you have questions or suggestions, please comment below.

 


Additional Reading

Learn more about how Amazon Redshift Spectrum extends data warehousing out to exabytes – no loading required.


About the Author

Dylan Tong is an Enterprise Solutions Architect at AWS. He works with customers to help drive their success on the AWS platform through thought leadership and guidance on designing well architected solutions. He has spent most of his career building on his expertise in data management and analytics by working for leaders and innovators in the space.

 

 

Sean’s DIY Bitcoin Lottery with a Raspberry Pi

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/seans-diy-bitcoin-lottery/

After several explorations into the world of 3D printing, and fresh off the back of his $5 fidget spinner crowd funding campaign, Sean Hodgins brings us his latest project: a DIY Bitcoin Lottery!

DIY Bitcoin Lottery with a Raspberry Pi

Build your own lottery! Thingiverse Files: https://www.thingiverse.com/thing:2494568 Pi How-to: http://www.idlehandsproject.com/raspberry-pi-bitcoin-lottery/ Instructables: https://www.instructables.com/id/DIY-Bitcoin-Lottery-With-Raspberry-Pi/ Send me bitcoins if you want!

What is Bitcoin mining?

According to the internet, Bitcoin mining is:

[A] record-keeping service. Miners keep the blockchain consistent, complete, and unalterable by repeatedly verifying and collecting newly broadcast transactions into a new group of transactions called a block. Each block contains a cryptographic hash of the previous block, using the SHA-256 hashing algorithm, which links it to the previous block, thus giving the blockchain its name.

If that makes no sense to you, welcome to the club. So here’s a handy video which explains it better.

What is Bitcoin Mining?

For more information: https://www.bitcoinmining.com and https://www.weusecoins.com What is Bitcoin Mining? Have you ever wondered how Bitcoin is generated? This short video is an animated introduction to Bitcoin Mining. Credits: Voice – Chris Rice (www.ricevoice.com) Motion Graphics – Fabian Rühle (www.fabianruehle.de) Music/Sound Design – Christian Barth (www.akkord-arbeiter.de) Andrew Mottl (www.andrewmottl.com)

Okay, now I get it.

I swear.

Sean’s Bitcoin Lottery

As a retired Bitcoin miner, Sean understands how the system works and what is required for mining. And since news sources report that Bitcoin is currently valued at around $4000, Sean decided to use a Raspberry Pi to bring to life an idea he’d been thinking about for a little while.

Sean Hodgins Raspberry Pi Bitcoin Lottery

He fitted the Raspberry Pi into a 3D-printed body, together with a small fan, a strip of NeoPixels, and a Block Eruptor ASIC which is the dedicated mining hardware. The Pi runs a Python script compatible with CGMiner, a mining software that needs far more explanation than I can offer in this short blog post.

The Neopixels take the first 6 characters of the 64-character-long number of the current block, and interpret it as a hex colour code. In this way, the block’s data is converted into colour, which, when you think about it, is kind of beautiful.

The device moves on to trying to solve a new block every 20 minutes. When it does, the NeoPixel LEDs play a flashing ‘Win’ or ‘Lose’ animation to let you know whether you were the one to solve the previous block.

Sean Hodgins Raspberry Pi Bitcoin Lottery

Lottery results

Sean has done the maths to calculate the power consumption of the device. He says that the annual cost of running his Bitcoin Lottery is roughly what you would pay for two lottery scratch cards. Now, the odds of solving a block are much lower than those of buying a winning scratch card. However, since the mining device moves on to a new block every 20 minutes, the odds of being a winner with Bitcoin using Sean’s build are actually better than those of winning the lottery.

Sean Hodgins Raspberry Pi Bitcoin Lottery

MATHS!

But even if you don’t win, Sean’s project is a fun experiment in Bitcoin mining and creating colour through code. And if you want to make your own, you can download the 3D-files here, find the code here, and view the step-by-step guide here on Instructables.

Good luck and happy mining!

The post Sean’s DIY Bitcoin Lottery with a Raspberry Pi appeared first on Raspberry Pi.

Pirate Bay Founders Ordered to Pay Music Labels $477,000

Post Syndicated from Andy original https://torrentfreak.com/pirate-bay-founders-ordered-to-pay-music-labels-477000-170823/

In November 2011, the International Federation of the Phonographic Industry (IFPI), with support from Finnish anti-piracy group Copyright Information and Anti-Piracy Center (CIAPC), filed a lawsuit in the Helsinki District Court against The Pirate Bay.

IFPI, which represents the world’s major labels, demanded that the site’s operators stop facilitating the unauthorized distribution of music and pay compensation to IFPI and CIAPC-affiliated rightsholders for the damages caused through their website.

Progress in the case has been somewhat glacial but this morning, almost six years after the complaint was first filed, a decision was handed down.

Fredrik Neij and Gottfrid Svartholm, two founder members of the site, were ordered by the District Court to cease-and-desist the illegal operations of The Pirate Bay. They were also ordered to jointly and severally pay compensation to IFPI record labels to the tune of 405,000 euros ($477,000).

The Court was reportedly unable to contact Neij (aka TiAMO) or Svartholm (aka Anakata) in connection with the case. With no response received from the defendants by the deadline, the Court heard the case in their absence, handing a default judgment to the plaintiffs.

Last year a similar verdict was handed down by the Helsinki District Court to Pirate Bay co-founder Peter Sunde.

Sony Music Entertainment Finland, Universal Music, Warner Music, and EMI Finland sued Sunde claiming that the music of 60 of their artists has been shared illegally through The Pirate Bay.

Sunde was also found liable in his absence and ordered to pay the major labels around 350,000 euros ($412,000) in damages and 55,000 euros ($65,000) in costs. He later announced plans to sue the labels for defamation.

“I’m a public person in Finland and they’re calling me a criminal when they KNOW I’m not involved in what they’re suing me for,” Sunde told TorrentFreak at the time. “It’s defamation.”

Fredrik Neij, Gottfrid Svartholm, and Peter Sunde all owe large sums of money to copyright holders following decisions relating to The Pirate Bay dating back at least eight years. In all cases, the plaintiffs have recovered nothing so the latest judgment only seems likely to add to the growing list of unpaid bills.

Meanwhile, The Pirate Bay sails on, seemingly oblivious to the news.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

MPAA Wins Movie Piracy Case in China After Failed Anti-Piracy Deal

Post Syndicated from Andy original https://torrentfreak.com/mpaa-wins-movie-piracy-case-in-china-after-failed-anti-piracy-deal-170822/

As one of China’s top 10 Internet companies, Xunlei is a massive operation with hundreds of millions of monthly users.

Among other file-sharing ventures, Xunlei operates ‘Thunder’, the world’s most popular torrent client. This and other almost inevitable copyright-related issues put the company on the radar of the MPAA.

With Xunlei pursuing an IPO in the United States in 2014, relationships with the MPAA began to thaw, resulting in the breakthrough signing of a Content Protection Agreement (CPA) requiring Xunlei to protect MPAA studio content including movies and TV shows.

But in October 2014, with things clearly not going to plan, the MPAA reported Xunlei to the U.S. government, complaining of rampant piracy on the service. In January 2015, the MPAA stepped up a gear and sued Xunlei for copyright infringement.

“For too long we have witnessed valuable creative content being taken and monetized without the permission of the copyright owner. That has to stop and stop now,” said MPAA Asia-Pacific chief Mike Ellis.

Now, more than two-and-a-half years later, the case has come to a close. Yesterday, the Shenzhen Nanshan District People’s Court found Xunlei Networking Technologies Co. guilty of copyright infringement.

The Court found that Xunlei made 28 movie titles (belonging to companies including Paramount Pictures, Sony Pictures, 20th Century Fox, Universal Pictures, Disney and Warner Bros.) available to the public via its platforms without proper authorization, “in serious violation” of the movie group’s rights.

Xunlei was ordered to cease-and-desist and told to pay compensation of 1.4 million yuan ($210,368) plus the MPA’s litigation costs of $24,400. In its original complaint, the MPA demanded a public apology from Xunlei but it’s unclear whether that forms part of the ruling. The outcome was welcomed by the MPA.

“We are heartened that the court in Shenzhen has found in favor of strong copyright,” said MPAA Asia-Pacific chief Mike Ellis.

“The legitimate Chinese film and television industry has worked hard to provide audiences with a wide range of legal options for their audio-visual entertainment — a marketplace that has flourished because of the rights afforded to copyright owners under the law.”

How the MPAA and Xunlei move ahead from here is unclear. This case has taken more than two-and-a-half years to come to a conclusion so further litigation seems somewhat unlikely, if not unwieldy. Then there’s the question of the anti-piracy agreement signed in 2014 and whether that is still on the table.

As previously revealed, the agreement not only compelled Xunlei to use pre-emptive content filtering technology but also required the platform to terminate the accounts of people who attempt to infringe copyright in any way.

“[The] filter will identify each and every instance of a user attempting to infringe a studio work, by uploading or downloading,” an internal MPAA document revealed.

All that being said, the document also contained advice for the MPAA not to sue Xunlei, so at this point anything could happen.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.