Tag Archives: research

Security and Human Behavior (SHB 2017)

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/05/security_and_hu_6.html

I’m in Cambridge University, at the tenth Workshop on Security and Human Behavior.

SHB is a small invitational gathering of people studying various aspects of the human side of security, organized each year by Ross Anderson, Alessandro Acquisti, and myself. The 50 or so people in the room include psychologists, economists, computer security researchers, sociologists, political scientists, political scientists, neuroscientists, designers, lawyers, philosophers, anthropologists, business school professors, and a smattering of others. It’s not just an interdisciplinary event; most of the people here are individually interdisciplinary.

The goal is maximum interaction and discussion. We do that by putting everyone on panels. There are eight six-person panels over the course of the two days. Everyone gets to talk for ten minutes about their work, and then there’s half an hour of questions and discussion. We also have lunches, dinners, and receptions — all designed so people from different disciplines talk to each other.

It’s the most intellectually stimulating conference of my year, and influences my thinking about security in many different ways.

This year’s schedule is here. This page lists the participants and includes links to some of their work. As he does every year, Ross Anderson is liveblogging the talks.

Here are my posts on the first, second, third, fourth, fifth, sixth, seventh, eighth, and ninth SHB workshops. Follow those links to find summaries, papers, and occasionally audio recordings of the various workshops.

I don’t think any of us imagined that this conference would be around this long.

Ransomware and the Internet of Things

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/05/ransomware_and_.html

As devastating as the latest widespread ransomware attacks have been, it’s a problem with a solution. If your copy of Windows is relatively current and you’ve kept it updated, your laptop is immune. It’s only older unpatched systems on your computer that are vulnerable.

Patching is how the computer industry maintains security in the face of rampant Internet insecurity. Microsoft, Apple and Google have teams of engineers who quickly write, test and distribute these patches, updates to the codes that fix vulnerabilities in software. Most people have set up their computers and phones to automatically apply these patches, and the whole thing works seamlessly. It isn’t a perfect system, but it’s the best we have.

But it is a system that’s going to fail in the “Internet of things”: everyday devices like smart speakers, household appliances, toys, lighting systems, even cars, that are connected to the web. Many of the embedded networked systems in these devices that will pervade our lives don’t have engineering teams on hand to write patches and may well last far longer than the companies that are supposed to keep the software safe from criminals. Some of them don’t even have the ability to be patched.

Fast forward five to 10 years, and the world is going to be filled with literally tens of billions of devices that hackers can attack. We’re going to see ransomware against our cars. Our digital video recorders and web cameras will be taken over by botnets. The data that these devices collect about us will be stolen and used to commit fraud. And we’re not going to be able to secure these devices.

Like every other instance of product safety, this problem will never be solved without considerable government involvement.

For years, I have been calling for more regulation to improve security in the face of this market failure. In the short term, the government can mandate that these devices have more secure default configurations and the ability to be patched. It can issue best-practice regulations for critical software and make software manufacturers liable for vulnerabilities. It’ll be expensive, but it will go a long way toward improved security.

But it won’t be enough to focus only on the devices, because these things are going to be around and on the Internet much longer than the two to three years we use our phones and computers before we upgrade them. I expect to keep my car for 15 years, and my refrigerator for at least 20 years. Cities will expect the networks they’re putting in place to last at least that long. I don’t want to replace my digital thermostat ever again. Nor, if I ever need one, do I want a surgeon to ever have to go back in to replace my computerized heart defibrillator in order to fix a software bug.

No amount of regulation can force companies to maintain old products, and it certainly can’t prevent companies from going out of business. The future will contain billions of orphaned devices connected to the web that simply have no engineers able to patch them.

Imagine this: The company that made your Internet-enabled door lock is long out of business. You have no way to secure yourself against the ransomware attack on that lock. Your only option, other than paying, and paying again when it’s reinfected, is to throw it away and buy a new one.

Ultimately, we will also need the network to block these attacks before they get to the devices, but there again the market will not fix the problem on its own. We need additional government intervention to mandate these sorts of solutions.

None of this is welcome news to a government that prides itself on minimal intervention and maximal market forces, but national security is often an exception to this rule. Last week’s cyberattacks have laid bare some fundamental vulnerabilities in our computer infrastructure and serve as a harbinger. There’s a lot of good research into robust solutions, but the economic incentives are all misaligned. As politically untenable as it is, we need government to step in to create the market forces that will get us out of this mess.

This essay previously appeared in the New York Times. Yes, I know I’m repeating myself.

Hacking Fingerprint Readers with Master Prints

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/05/hacking_fingerp.html

There’s interesting research on using a set of “master” digital fingerprints to fool biometric readers. The work is theoretical at the moment, but they might be able to open about two-thirds of iPhones with these master prints.

Definitely something to keep watching.

Research paper (behind a paywall).

No, ExtraTorrent Has Not Been Resurrected

Post Syndicated from Ernesto original https://torrentfreak.com/no-extratorrent-has-not-been-resurected-170524/

Last week the torrent community entered a state of shock when another major torrent site closed its doors.

Having served torrents to the masses for over a decade, ExtraTorrent decided to throw in the towel, without providing any detail or an apparent motive.

The only strong message sent out by ExtraTorrent’s operator was to “stay away from fake ExtraTorrent websites and clones.”

Fast forward a few days and the first copycats have indeed appeared online. While this was expected, it’s always disappointing to see “news” sites including the likes of Forbes and The Inquirer are giving them exposure without doing thorough research.

“We are a group of uploaders and admins from ExtraTorrent. As you know, SAM from ExtraTorrent pulled the plug yesterday and took all data offline under pressure from authorities. We were in deep shock and have been working hard to get it back online with all previous data,” the email, sent out to several news outlets read.

What followed was a flurry of ‘ExtraTorrent is back’ articles and thanks to those, a lot of people now think that Extratorrent.cd is a true resurrection operated by the site’s former staffers and fans.

However, aside from its appearance, the site has absolutely nothing to do with ET.

The site is an imposter operated by the same people who also launched Kickass.cd when KAT went offline last summer. In fact, the content on both sites doesn’t come from the defunct sites they try to replace, but from The Pirate Bay.

Yes indeed, ExtraTorrent.cd is nothing more than a Pirate Bay mirror with an ExtraTorrent skin.

There are several signs clearly showing that the torrents come from The Pirate Bay. Most easy to spot, perhaps, is a comparison of search results which are identical on both sites.

Chaparall seach on Extratorrent.cd

The ExtraTorrent “resurrection” even lists TPB’s oldest active torrent from March 2004, which was apparently uploaded long before the original ExtraTorrent was launched.

Chaparall search on TPB

TorrentFreak is in touch with proper ex-staffers of ExtraTorrent who agree that the site is indeed a copycat. Some ex-staffers are considering the launch of a new ET version, just like the KAT admins did in the past, but if that happens, it will take a lot more time.

“At the moment we are all figuring out how to go about getting it back up and running in a proper fashion, but as you can imagine there a lot of obstacles and arguments, lol,” ex-ET admin Soup informed us.

So, for now, there is no real resurrection. ExtraTorrent.cd sells itself as much more than it is, as it did with Kickass.cd. While the site doesn’t have any malicious intent, aside from luring old ET members under false pretenses, people have the right to know what it really is.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Malicious Subtitles Threaten Kodi, VLC and Popcorn Time Users, Researchers Warn

Post Syndicated from Ernesto original https://torrentfreak.com/malicious-subtitles-threaten-kodi-vlc-and-popcorn-time-users-researchers-warn-170523/

Online streaming is booming, and applications such as Kodi, Popcorn Time and VLC have millions of daily users.

Some of these use pirated videos, often in combination with subtitles provided by third-party repositories.

While most subtitle makers do no harm, it appears that those with malicious intent can exploit these popular streaming applications to penetrate the devices and systems of these users.

Researchers from Check Point, who uncovered the problem, describe the subtitle ‘attack vector’ as the most widespread, easily accessed and zero-resistance vulnerability that has been reported in recent years.

“By conducting attacks through subtitles, hackers can take complete control over any device running them. From this point on, the attacker can do whatever he wants with the victim’s machine, whether it is a PC, a smart TV, or a mobile device,” they write.

“The potential damage the attacker can inflict is endless, ranging anywhere from stealing sensitive information, installing ransomware, mass Denial of Service attacks, and much more.”

In a demonstration video, using Popcorn Time, the researchers show how easy it is to compromise the system of a potential victim.

A demo of the subtitles vulnerability

XBMC Foundation’s Project lead Martijn Kaijser informs TorrentFreak that the Kodi team is aware of the situation, which they will address soon. “We will release 17.2 which will have the fix this week,” he told us.

VLC’s VideoLAN addressed the issue as well, and doesn’t expect that it is still exploitable.

“The VLC bug is not exploitable. The first big issue was fixed in 2.2.5. There are 2 other small issues, that will be fixed in 2.2.6,” VideoLAN informed us.

The team behind PopcornTime.sh applied a fix several months ago after the researchers approached them, TorrentFreak is informed. The Popcorn Time team trusts their subtitle provider OpenSubtitles but says that it now sanitizes malicious subtitle files, also those that are added by users.

The same applies to the Butter project, which is closely related to Popcorn Time. Butter was not contacted by Check Point but their fix is visible in a GitHub commit from February.

“None of the Butter Project developers were contacted by the research group. We’d love to have them talk to us if our code is still vulnerable. To the extent of our research it is not, but we’d like the ‘responsible disclosure’ terms to actually mean something,” The Butter project informs TorrentFreak.

Finally, another fork Popcorn-Time.to, also informed us that they are not affected by the reported vulnerability.

The Check Point researchers expect that other applications may also be affected. They do not disclose any technical details at this point, nor do they state which of the applications successfully addressed the vulnerability.

“Some of the issues were already fixed, while others are still under investigation. To allow the developers more time to address the vulnerabilities, we’ve decided not to publish any further technical details at this point,” the researchers state.

More updates will be added if more information becomes available. For now, however, people who regularly use subtitle files should remain vigilant.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

The Future of Ransomware

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/05/the_future_of_r.html

Ransomware isn’t new, but it’s increasingly popular and profitable.

The concept is simple: Your computer gets infected with a virus that encrypts your files until you pay a ransom. It’s extortion taken to its networked extreme. The criminals provide step-by-step instructions on how to pay, sometimes even offering a help line for victims unsure how to buy bitcoin. The price is designed to be cheap enough for people to pay instead of giving up: a few hundred dollars in many cases. Those who design these systems know their market, and it’s a profitable one.

The ransomware that has affected systems in more than 150 countries recently, WannaCry, made press headlines last week, but it doesn’t seem to be more virulent or more expensive than other ransomware. This one has a particularly interesting pedigree: It’s based on a vulnerability developed by the National Security Agency that can be used against many versions of the Windows operating system. The NSA’s code was, in turn, stolen by an unknown hacker group called Shadow Brokers ­ widely believed by the security community to be the Russians ­ in 2014 and released to the public in April.

Microsoft patched the vulnerability a month earlier, presumably after being alerted by the NSA that the leak was imminent. But the vulnerability affected older versions of Windows that Microsoft no longer supports, and there are still many people and organizations that don’t regularly patch their systems. This allowed whoever wrote WannaCry ­– it could be anyone from a lone individual to an organized crime syndicate — to use it to infect computers and extort users.

The lessons for users are obvious: Keep your system patches up to date and regularly backup your data. This isn’t just good advice to defend against ransomware, but good advice in general. But it’s becoming obsolete.

Everything is becoming a computer. Your microwave is a computer that makes things hot. Your refrigerator is a computer that keeps things cold. Your car and television, the traffic lights and signals in your city and our national power grid are all computers. This is the much-hyped Internet of Things (IoT). It’s coming, and it’s coming faster than you might think. And as these devices connect to the Internet, they become vulnerable to ransomware and other computer threats.

It’s only a matter of time before people get messages on their car screens saying that the engine has been disabled and it will cost $200 in bitcoin to turn it back on. Or a similar message on their phones about their Internet-enabled door lock: Pay $100 if you want to get into your house tonight. Or pay far more if they want their embedded heart defibrillator to keep working.

This isn’t just theoretical. Researchers have already demonstrated a ransomware attack against smart thermostats, which may sound like a nuisance at first but can cause serious property damage if it’s cold enough outside. If the device under attack has no screen, you’ll get the message on the smartphone app you control it from.

Hackers don’t even have to come up with these ideas on their own; the government agencies whose code was stolen were already doing it. One of the leaked CIA attack tools targets Internet-enabled Samsung smart televisions.

Even worse, the usual solutions won’t work with these embedded systems. You have no way to back up your refrigerator’s software, and it’s unclear whether that solution would even work if an attack targets the functionality of the device rather than its stored data.

These devices will be around for a long time. Unlike our phones and computers, which we replace every few years, cars are expected to last at least a decade. We want our appliances to run for 20 years or more, our thermostats even longer.

What happens when the company that made our smart washing machine — or just the computer part — goes out of business, or otherwise decides that they can no longer support older models? WannaCry affected Windows versions as far back as XP, a version that Microsoft no longer supports. The company broke with policy and released a patch for those older systems, but it has both the engineering talent and the money to do so.

That won’t happen with low-cost IoT devices.

Those devices are built on the cheap, and the companies that make them don’t have the dedicated teams of security engineers ready to craft and distribute security patches. The economics of the IoT doesn’t allow for it. Even worse, many of these devices aren’t patchable. Remember last fall when the Mirai botnet infected hundreds of thousands of Internet-enabled digital video recorders, webcams and other devices and launched a massive denial-of-service attack that resulted in a host of popular websites dropping off the Internet? Most of those devices couldn’t be fixed with new software once they were attacked. The way you update your DVR is to throw it away and buy a new one.

Solutions aren’t easy and they’re not pretty. The market is not going to fix this unaided. Security is a hard-to-evaluate feature against a possible future threat, and consumers have long rewarded companies that provide easy-to-compare features and a quick time-to-market at its expense. We need to assign liabilities to companies that write insecure software that harms people, and possibly even issue and enforce regulations that require companies to maintain software systems throughout their life cycle. We may need minimum security standards for critical IoT devices. And it would help if the NSA got more involved in securing our information infrastructure and less in keeping it vulnerable so the government can eavesdrop.

I know this all sounds politically impossible right now, but we simply cannot live in a future where everything — from the things we own to our nation’s infrastructure ­– can be held for ransom by criminals again and again.

This essay previously appeared in the Washington Post.

Hello World issue 2: celebrating ten years of Scratch

Post Syndicated from Carrie Anne Philbin original https://www.raspberrypi.org/blog/hello-world-issue-2/

We are very excited to announce that issue 2 of Hello World is out today! Hello World is our magazine about computing and digital making, written by educators, for educators. It  is a collaboration between the Raspberry Pi Foundation and Computing at School, part of the British Computing Society.

We’ve been extremely fortunate to be granted an exclusive interview with Mitch Resnick, Leader of the Scratch Team at MIT, and it’s in the latest issue. All around the world, educators and enthusiasts are celebrating ten years of Scratch, MIT’s block-based programming language. Scratch has helped millions of people to learn the building blocks of computer programming through play, and is our go-to tool at Code Clubs everywhere.

Cover of issue 2 of hello world magazine

A magazine by educators, for educators.

This packed edition of Hello World also includes news, features, lesson activities, research and opinions from Computing At School Master Teachers, Raspberry Pi Certified Educators, academics, informal learning leaders and brilliant classroom teachers. Highlights (for me) include:

  • A round-up of digital making research from Oliver Quinlan
  • Safeguarding children online by Penny Patterson
  • Embracing chaos inside and outside the classroom with Code Club’s Rik Cross, Raspberry Jam-maker-in-chief Ben Nuttall, Raspberry Pi Certified Educator Sway Grantham, and CPD trainer Alan O’Donohoe
  • How MicroPython on the Micro:bit is inspiring a generation, by Nicholas Tollervey
  • Incredibly useful lesson activities on programming graphical user interfaces (GUI) with guizero, simulating logic gates in Minecraft, and introducing variables through story telling.
  • Exploring computing and gender through Girls Who Code, Cyber First Girls, the BCSLovelace Colloqium, and Computing At School’s #include initiative
  • A review of browser based IDEs

Get your copy

Hello World is available as a free Creative Commons download for anyone around the world who is interested in Computer Science and digital making education. Grab the latest issue straight from the Hello World website.

Thanks to the very generous support of our sponsors BT, we are able to offer a free printed version of the magazine to serving educators in the UK. It’s for teachers, Code Club volunteers, teaching assistants, teacher trainers, and others who help children and young people learn about computing and digital making. Remember to subscribe to receive your free copy, posted directly to your home.

Get involved

Are you an educator? Then Hello World needs you! As a magazine for educators by educators, we want to hear about your experiences in teaching technology. If you hear a little niggling voice in your head say “I’m just a teacher, why would my contributions be useful to anyone else?” stop immediately. We want to hear from you, because you are amazing!

Get in touch: contact@helloworld.cc with your ideas, and we can help get them published.

 

The post Hello World issue 2: celebrating ten years of Scratch appeared first on Raspberry Pi.

Keylogger Found in HP Laptop Audio Drivers

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/05/keylogger_found.html

This is a weird story: researchers have discovered that an audio driver installed in some HP laptops includes a keylogger, which records all keystrokes to a local file. There seems to be nothing malicious about this, but it’s a vivid illustration of how hard it is to secure a modern computer. The operating system, drivers, processes, application software, and everything else is so complicated that it’s pretty much impossible to lock down every aspect of it. So many things are eavesdropping on different aspects of the computer’s operation, collecting personal data as they do so. If an attacker can get to the computer when the drive is unencrypted, he gets access to all sorts of information streams — and there’s often nothing the computer’s owner can do.

Using Wi-Fi to Get 3D Images of Surrounding Location

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/05/using_wi-fi_to_1.html

Interesting research:

The radio signals emitted by a commercial Wi-Fi router can act as a kind of radar, providing images of the transmitter’s environment, according to new experiments. Two researchers in Germany borrowed techniques from the field of holography to demonstrate Wi-Fi imaging. They found that the technique could potentially allow users to peer through walls and could provide images 10 times per second.

News article.

Law Professor Shows How to Fight Copyright Trolls

Post Syndicated from Ernesto original https://torrentfreak.com/law-professor-shows-how-to-fight-copyright-trolls-170514/

In recent years, file-sharers around the world have been pressured to pay significant settlement fees, or face legal repercussions.

These so-called “copyright trolling” efforts have been a common occurrence in the United States for more than half a decade, and still are.

While rightsholders should be able to take legitimate piracy claims to court, there are some who resort to dodgy and extortion-like tactics extract money from alleged pirates, including people who are innocent.

This practice has been a thorn in the side of Matthew Sag, a professor at Loyola University Chicago School of Law, and the Associate Director for Intellectual Property at the Institute for Consumer Antitrust Studies.

“Over the past few years, I have seen one example after another of innocent defendants being victimized by these lawsuits,” Sag explains to TorrentFreak.

This motivated the professor to take action. One of the problems he signals is that not all defense lawyers are familiar with these cases. They sometimes need dozens of hours to research them, which costs the defendant more than the cash settlement deal offered by the copyright holder.

As a result, paying off the trolls may seem like the most logical and safe option to the accused, even when they are innocent.

“Put simply, by the time your average lawyer has figured out what’s wrong with these cases and how to respond she has sunk 50 to 100 hours into a case that probably could’ve been settled for $2000 or $3000,” Sag notes.

“That makes no sense, so people settle cases with no merit. That, in turn, encourages meritless cases. We wanted to level the playing field and reduce the plaintiffs’ informational advantage,” he adds.

To balance the scales of justice, the professor wrote an article together with Jake Haskell, a recent Loyola University Law School graduate. Titled “Defense Against the Dark Arts of Copyright Trolling,” the paper provides a detailed overview of the various tactics the defense can use.

Not all cases filed by copyright holders can be characterized as “trolling.” According to Sag, copyright trolls can be best defined as “systematic opportunist, want” and he hopes that defense lawyers can use his article to prevent clear abuses.

Of course, judges play an important role as well, and some could certainly benefit from reading the paper.

“The federal courts should not be used as vending machines to issue indiscriminate hunting licenses. Judges need to keep a close eye on discovery and tactics used by the plaintiff to prolong proceedings or run up attorney’s fees,” Sag tells us.

“Hopefully, we have given defense lawyers a significant head start on figuring out how to defend these claims. If innocent defendants refused to settle, the plaintiffs would be forced to clean up their act,” he adds.

The article is a recommended read for everyone with an interest in copyright trolling in general, and is well worth a read for anyone wants to learn more about how these companies operate.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Build a Healthcare Data Warehouse Using Amazon EMR, Amazon Redshift, AWS Lambda, and OMOP

Post Syndicated from Ryan Hood original https://aws.amazon.com/blogs/big-data/build-a-healthcare-data-warehouse-using-amazon-emr-amazon-redshift-aws-lambda-and-omop/

In the healthcare field, data comes in all shapes and sizes. Despite efforts to standardize terminology, some concepts (e.g., blood glucose) are still often depicted in different ways. This post demonstrates how to convert an openly available dataset called MIMIC-III, which consists of de-identified medical data for about 40,000 patients, into an open source data model known as the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM). It describes the architecture and steps for analyzing data across various disconnected sources of health datasets so you can start applying Big Data methods to health research.

Note: If you arrived at this page looking for more info on the movie Mimic 3: Sentinel, you might not enjoy this post.

OMOP overview

The OMOP CDM helps standardize healthcare data and makes it easier to analyze outcomes at a large scale. The CDM is gaining a lot of traction in the health research community, which is deeply involved in developing and adopting a common data model. Community resources are available for converting datasets, and there are software tools to help unlock your data after it’s in the OMOP format. The great advantage of converting data sources into a standard data model like OMOP is that it allows for streamlined, comprehensive analytics and helps remove the variability associated with analyzing health records from different sources.

OMOP ETL with Apache Spark

Observational Health Data Sciences and Informatics (OHDSI) provides the OMOP CDM in a variety of formats, including Apache Impala, Oracle, PostgreSQL, and SQL Server. (See the OHDSI Common Data Model repo in GitHub.) In this scenario, the data is moved to AWS to take advantage of the unbounded scale of Amazon EMR and serverless technologies, and the variety of AWS services that can help make sense of the data in a cost-effective way—including Amazon Machine Learning, Amazon QuickSight, and Amazon Redshift.

This example demonstrates an architecture that can be used to run SQL-based extract, transform, load (ETL) jobs to map any data source to the OMOP CDM. It uses MIMIC ETL code provided by Md. Shamsuzzoha Bayzid. The code was modified to run in Amazon Redshift.

Getting access to the MIMIC-III data

Before you can retrieve the MIMIC-III data, you must request access on the PhysioNet website, which is hosted on Amazon S3 as part of the Amazon Web Services (AWS) Public Dataset Program. However, you don’t need access to the MIMIC-III data to follow along with this post.

Solution architecture and loading process

The following diagram shows the architecture that is used to convert the MIMIC-III dataset to the OMOP CDM.

The data conversion process includes the following steps:

  1. The entire infrastructure is spun up using an AWS CloudFormation template. This includes the Amazon EMR cluster, Amazon SNS topics/subscriptions, an AWS Lambda function and trigger, and AWS Identity and Access Management (IAM) roles.
  2. The MIMIC-III data is read in via an Apache Spark program that is running on Amazon EMR. The files are registered as tables in Spark so that they can be queried by Spark SQL.
  3. The transformation queries are located in a separate Amazon S3 location, which is read in by Spark and executed on the newly registered tables to convert the data into OMOP form.
  4. The data is then written to a staging S3 location, where it is ready to be copied into Amazon Redshift.
  5. As each file is loaded in OMOP form into S3, the Spark program sends a message to an SNS topic that signifies that the load completed successfully.
  6. After that message is pushed, it triggers a Lambda function that consumes the message and executes a COPY command from S3 into Amazon Redshift for the appropriate table.

This architecture provides a scalable way to use various healthcare sources and convert them to OMOP format, where the only changes needed are in the SQL transformation files. The transformation logic is stored in an S3 bucket and is completely de-coupled from the Apache Spark program that runs on EMR and converts the data into OMOP form. This makes the transformation code portable and allows the Spark jar to be reused if other data sources are added—for example, electronic health records (EHR), billing systems, and other research datasets.

Note: For larger files, you might experience the five-minute timeout limitation in Lambda. In that scenario you can use AWS Step Functions to split the file and load it one piece at a time.

Scaling the solution

The transformation code runs in a Spark container that can scale out based on how you define your EMR cluster. There are no single points of failure. As your data grows, your infrastructure can grow without requiring any changes to the underlying architecture.

If you add more data sources, such as EHRs and other research data, the high-level view of the ETL would look like the following:

In this case, the loads of the different systems are completely independent. If the EHR load is four times the size that you expected and uses all the resources, it has no impact on the Research Data or HR System loads because they are in separate containers.

You can scale your EMR cluster based on the size of the data that you anticipate. For example, you can have a 50-node cluster in your container for loading EHR data and a 2-node cluster for loading the HR System. This design helps you scale the resources based on what you consume, as opposed to expensive infrastructure sitting idle.

The only code that is unique to each execution is any diffs between the CloudFormation templates (e.g., cluster size and SQL file locations) and the transformation SQL that resides in S3 buckets. The Spark jar that is executed as an EMR step is reused across all three executions.

Upgrading versions

In this architecture, upgrading the versions of Amazon EMR, Apache Hadoop, or Spark requires a one-time change to one line of code in the CloudFormation template:

"EMRC2SparkBatch": {
      "Type": "AWS::EMR::Cluster",
      "Properties": {
        "Applications": [
          {
            "Name": "Hadoop"
          },
          {
            "Name": "Spark"
          }
        ],
        "Instances": {
          "MasterInstanceGroup": {
            "InstanceCount": 1,
            "InstanceType": "m3.xlarge",
            "Market": "ON_DEMAND",
            "Name": "Master"
          },
          "CoreInstanceGroup": {
            "InstanceCount": 1,
            "InstanceType": "m3.xlarge",
            "Market": "ON_DEMAND",
            "Name": "Core"
          },
          "TerminationProtected": false
        },
        "Name": "EMRC2SparkBatch",
        "JobFlowRole": { "Ref": "EMREC2InstanceProfile" },
          "ServiceRole": {
                    "Ref": "EMRRole"
                  },
        "ReleaseLabel": "emr-5.0.0",
        "VisibleToAllUsers": true      
}
    }

Note that this example uses a slightly lower version of EMR so that it can use Spark 2.0.0 instead of Spark 2.1.0, which does not support nulls in CSV files.

You can also select the version in the Release list in the General Configuration section of the EMR console:

The data sources all have different CloudFormation templates, so you can upgrade one data source at a time or upgrade them all together. As long as the reusable Spark jar is compatible with the new version, none of the transformation code has to change.

Executing queries on the data

After all the data is loaded, it’s easy to tear down the CloudFormation stack so you don’t pay for resources that aren’t being used:

CloudFormationManager cf = new CloudFormationManager(); 
cf.terminateStack(stack);    

This includes the EMR cluster, Lambda function, SNS topics and subscriptions, and temporary IAM roles that were created to push the data to Amazon Redshift. The S3 buckets that contain the raw MIMIC-III data and the data in OMOP form remain because they existed outside the CloudFormation stack.

You can now connect to the Amazon Redshift cluster and start executing queries on the ten OMOP tables that were created, as shown in the following example:

select *
from drug_exposure
limit 100;

OMOP analytics tools

For information about open source analytics tools that are built on top of the OMOP model, visit the OHDSI Software page.

The following are examples of data visualizations provided by Achilles, an open source visualization tool for OMOP.

Conclusion

This post demonstrated how to convert MIMIC-III data into OMOP form using data tools that are built for scale and flexibility. It compared the architecture against a traditional data warehouse and showed how this design scales by mixing a scale-out technology with EMR and a serverless technology with Lambda. It also showed how you can lower your costs by using CloudFormation to create your data pipeline infrastructure. And by tearing down the stack after the data is loaded, you don’t pay for idle servers.

You can find all the code in the AWS Labs GitHub repo with detailed, step-by-step instructions on how to load the data from MIMIC-III to OMOP using this design.

If you have any questions or suggestions, please add them below.


About the Author

Ryan Hood is a Data Engineer for AWS. He works on big data projects leveraging the newest AWS offerings. In his spare time, he enjoys watching the Cubs win the World Series and attempting to Sous-vide anything he can find in his refrigerator.

 

 


Related

Create a Healthcare Data Hub with AWS and Mirth Connect

 

 

 

 

 

 

 

Hiring a Content Director

Post Syndicated from Ahin Thomas original https://www.backblaze.com/blog/hiring-content-director/


Backblaze is looking to hire a full time Content Director. This role is an essential piece of our team, reporting directly to our VP of Marketing. As the hiring manager, I’d like to tell you a little bit more about the role, how I’m thinking about the collaboration, and why I believe this to be a great opportunity.

A Little About Backblaze and the Role

Since 2007, Backblaze has earned a strong reputation as a leader in data storage. Our products are astonishingly easy to use and affordable to purchase. We have engaged customers and an involved community that helps drive our brand. Our audience numbers in the millions and our primary interaction point is the Backblaze blog. We publish content for engineers (data infrastructure, topics in the data storage world), consumers (how to’s, merits of backing up), and entrepreneurs (business insights). In all categories, our Content Director drives our earned positioned as leaders.

Backblaze has a culture focused on being fair and good (to each other and our customers). We have created a sustainable business that is profitable and growing. Our team places a premium on open communication, being cleverly unconventional, and helping each other out. The Content Director, specifically, balances our needs as a commercial enterprise (at the end of the day, we want to sell our products) with the custodianship of our blog (and the trust of our audience).

There’s a lot of ground to be covered at Backblaze. We have three discreet business lines:

  • Computer Backup -> a 10 year old business focusing on backing up consumer computers.
  • B2 Cloud Storage -> Competing with Amazon, Google, and Microsoft… just at ¼ of the price (but with the same performance characteristics).
  • Business Backup -> Both Computer Backup and B2 Cloud Storage, but focused on SMBs and enterprise.

The Best Candidate Is…

An excellent writer – possessing a solid academic understanding of writing, the creative process, and delivering against deadlines. You know how to write with multiple voices for multiple audiences. We do not expect our Content Director to be a storage infrastructure expert; we do expect a facility with researching topics, accessing our engineering and infrastructure team for guidance, and generally translating the technical into something easy to understand. The best Content Director must be an active participant in the business/ strategy / and editorial debates and then must execute with ruthless precision.

Our Content Director’s “day job” is making sure the blog is running smoothly and the sales team has compelling collateral (emails, case studies, white papers).

Specifically, the Perfect Content Director Excels at:

  • Creating well researched, elegantly constructed content on deadline. For example, each week, 2 articles should be published on our blog. Blog posts should rotate to address the constituencies for our 3 business lines – not all blog posts will appeal to everyone, but over the course of a month, we want multiple compelling pieces for each segment of our audience. Similarly, case studies (and outbound emails) should be tailored to our sales team’s proposed campaigns / audiences. The Content Director creates ~75% of all content but is responsible for editing 100%.
  • Understanding organic methods for weaving business needs into compelling content. The majority of our content (but not EVERY piece) must tie to some business strategy. We hate fluff and hold our promotional content to a standard of being worth someone’s time to read. To be effective, the Content Director must understand the target customer segments and use cases for our products.
  • Straddling both Consumer & SaaS mechanics. A key part of the job will be working to augment the collateral used by our sales team for both B2 Cloud Storage and Business Backup. This content should be compelling and optimized for converting leads. And our foundational business line, Computer Backup, deserves to be nurtured and grown.
  • Product marketing. The Content Director “owns” the blog. But also assists in writing cases studies / white papers, creating collateral (email, trade show). Each of these things has a variety of call to action(s) and audiences. Direct experience is a plus, experience that will plausibly translate to these areas is a requirement.
  • Articulating views on storage, backup, and cloud infrastructure. Not everyone has experience with this. That’s fine, but if you do, it’s strongly beneficial.

A Thursday In The Life:

  • Coordinate Collaborators – We are deliverables driven culture, not a meeting driven one. We expect you to collaborate with internal blog authors and the occasional guest poster.
  • Collaborate with Design – Ensure imagery for upcoming posts / collateral are on track.
  • Augment Sales team – Lock content for next week’s outbound campaign.
  • Self directed blog agenda – Feedback for next Tuesday’s post is addressed, next Thursday’s post is circulated to marketing team for feedback & SEO polish.
  • Review Editorial calendar, make any changes.

Oh! And We Have Great Perks:

  • Competitive healthcare plans
  • Competitive compensation and 401k
  • All employees receive Option grants
  • Unlimited vacation days
  • Strong coffee & fully stocked Micro kitchen
  • Catered breakfast and lunches
  • Awesome people who work on awesome projects
  • Childcare bonus
  • Normal work hours
  • Get to bring your pets into the office
  • San Mateo Office – located near Caltrain and Highways 101 & 280.

Interested in Joining Our Team?

Send us an email to jobscontact@backblaze.com with the subject “Content Director”. Please include your resume and 3 brief abstracts for content pieces.
Some hints for each of your three abstracts:

  • Create a compelling headline
  • Write clearly and concisely
  • Be brief, each abstract should be 100 words or less – no longer
  • Target each abstract to a different specific audience that is relevant to our business lines

Thank you for taking the time to read and consider all this. I hope it sounds like a great opportunity for you or someone you know. Principles only need apply.

The post Hiring a Content Director appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Julia language for Raspberry Pi

Post Syndicated from Ben Nuttall original https://www.raspberrypi.org/blog/julia-language-raspberry-pi/

Julia is a free and open-source general purpose programming language made specifically for scientific computing. It combines the ease of writing in high-level languages like Python and Ruby with the technical power of MATLAB and Mathematica and the speed of C. Julia is ideal for university-level scientific programming and it’s used in research.

Julia language logo

Some time ago Viral Shah, one of the language’s co-creators, got in touch with us at the Raspberry Pi Foundation to say his team was working on a port of Julia to the ARM platform, specifically for the Raspberry Pi. Since then, they’ve done sterling work to add support for ARM. We’re happy to announce that we’ve now added Julia to the Raspbian repository, and that all Raspberry Pi models are supported!

Not only did the Julia team port the language itself to the Pi, but they also added support for GPIO, the Sense HAT and Minecraft. What I find really interesting is that when they came to visit and show us a demo, they took a completely different approach to the Sense HAT than I’d seen before: Simon, one of the Julia developers, started by loading the Julia logo into a matrix within the Jupyter notebook and then displayed it on the Sense HAT LED matrix. He then did some matrix transformations and the Sense HAT showed the effect of these manipulations.

Viral says:

The combination of Julia’s performance and Pi’s hardware unlocks new possibilities. Julia on the Pi will attract new communities and drive applications in universities, research labs and compute modules. Instead of shipping the data elsewhere for advanced analytics, it can simply be processed on the Pi itself in Julia.

Our port to ARM took a while, since we started at a time when LLVM on ARM was not fully mature. We had a bunch of people contributing to it – chipping away for a long time. Yichao did a bunch of the hard work, since he was using it for his experiments. The folks at the Berkeley Race car project also put Julia and JUMP on their self-driving cars, giving a pretty compelling application. We think we will see many more applications.

I organised an Intro to Julia session for the Cambridge Python user group earlier this week, and rather than everyone having to install Julia, Jupyter and all the additional modules on their own laptops, we just set up a room full of Raspberry Pis and prepared an SD card image. This was much easier and also meant we could use the Sense HAT to display output.

Intro to Julia language session at Raspberry Pi Foundation
Getting started with Julia language on Raspbian
Julia language logo on the Sense HAT LED array

Simon kindly led the session, and before long we were using Julia to generate the Mandelbrot fractal and display it on the Sense HAT:

Ben Nuttall on Twitter

@richwareham’s Sense HAT Mandelbrot fractal with @JuliaLanguage at @campython https://t.co/8FK7Vrpwwf

Naturally, one of the attendees, Rich Wareham, progressed to the Julia set – find his code here: gist.github.com/bennuttall/…

Last year at JuliaCon, there were two talks about Julia on the Pi. You can watch them on YouTube:

Install Julia on your Raspberry Pi with:

sudo apt update
sudo apt install julia

You can install the Jupyter notebook for Julia with:

sudo apt install julia libzmq3-dev python3-zmq
sudo pip3 install jupyter
julia -e 'Pkg.add("IJulia");'

And you can easily install extra packages from the Julia console:

Pkg.add("SenseHat")

The Julia team have also created a resources website for getting started with Julia on the Pi: juliaberry.github.io

Julia team visiting Pi Towers

There never was a story of more joy / Than this of Julia and her Raspberry Pi

Many thanks to Viral Shah, Yichao Yu, Tim Besard, Valentin Churavy, Jameson Nash, Tony Kelman, Avik Sengupta and Simon Byrne for their work on the port. We’re all really excited to see what people do with Julia on Raspberry Pi, and we look forward to welcoming Julia programmers to the Raspberry Pi community.

The post Julia language for Raspberry Pi appeared first on Raspberry Pi.

YouTube Keeps People From Pirate Sites, Study Shows

Post Syndicated from Ernesto original https://torrentfreak.com/youtube-keeps-people-from-pirate-sites-study-shows-170511/

The music industry has witnessed some dramatic changes over the past decade and a half.

With the rise of digital, people’s music consumption habits evolved dramatically, followed by more change when subscription streaming services came along.

Another popular way for people to enjoy music noawdays is via YouTube. The video streaming platform offers free access to millions of songs, which are often uploaded by artists or the labels themselves.

Still, YouTube is getting little praise from the major labels. Instead, music insiders often characterize the video platform as a DMCA-protected piracy racketeer, that exploits legal loopholes to profit from artists’ hard work.

YouTube is generating healthy profits at a minimal cost and drives people away from legal platforms, the argument goes.

In an attempt to change this perception, YouTube has commissioned a study from the research outfit RBB Economics to see how the service impacts the music industry. The first results, published today, are a positive start.

The study examined exclusive YouTube data and a survey of 1,500 users across Germany, France, Italy and the U.K, asking them about their consumption habits. In particular, they were asked if YouTube keeps them away from paid music alternatives.

According to YouTube, which just unveiled the results, the data paints a different picture.

“The study finds that this is not the case. In fact, if YouTube didn’t exist, 85% of time spent on YouTube would move to lower value channels, and would result in a significant increase in piracy,” YouTube’s Simon Morrison writes.

If YouTube disappeared overnight, roughly half of all the time spent there on music would be “lost.” Furthermore, a significant portion of YouTube users would switch to using pirate sites and services instead.

“The results suggest that if YouTube were no longer able to offer music, time spent listening to pirated content would increase by +29%. This is consistent with YouTube being a substitute for pirated content,” RBB Economics writes.

In addition, the researchers also found that blocking music on YouTube doesn’t lead to an increase in streaming on other platforms, such as Spotify.

While YouTube doesn’t highlight it, the report also finds that some people would switch to “higher value” (e.g. paid) services if YouTube weren’t available. This amounts to roughly 15% of the total.

In other words, if the music industry is willing to pass on the $1 billion YouTube currently pays out and accept a hefty increase in piracy, there would be a boost in revenue through other channels. Whether that’s worth it is up for debate of course.

YouTube believes that the results are pretty convincing though. They rely on RBB Economic’s conclusion that there no evidence of “significant cannibalization” and believe that their service has a positive impact overall.

“The cumulative effect of these findings is that YouTube has a market expansion effect, not a cannibalizing one,” YouTube writes.

The full results are available here (pdf), courtesy of RBB Economics. YouTube announced that more of these reports will follow in the near future.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

New – USASpending.gov on an Amazon RDS Snapshot

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-usaspending-gov-on-an-amazon-rds-snapshot/

My colleague Jed Sundwall runs the AWS Public Datasets program. He wrote the guest post below to tell you about an important new dataset that is available as an Amazon RDS Snapshot. In the post, Jed introduces the dataset and shows you how to create an Amazon RDS DB Instance from the snapshot.

Jeff;


I am very excited to announce that, starting today, the entire public USAspending.gov database is available for anyone to copy via Amazon Relational Database Service (RDS). USAspending.gov data includes data on all spending by the federal government, including contracts, grants, loans, employee salaries, and more. The data is available via a PostgreSQL snapshot, which provides bulk access to the entire USAspending.gov database, and is updated nightly. At this time, the database includes all USAspending.gov for the second quarter of fiscal year 2017, and data going back to the year 2000 will be added over the summer. You can learn more about the database and how to access it on its AWS Public Dataset landing page.

Through the AWS Public Datasets program, we work with AWS customers to experiment with ways that the cloud can make data more accessible to more people. Most of our AWS Public Datasets are made available through Amazon S3 because of its tremendous flexibility and ability to scale to serve any volume of any kind of data files. What’s exciting about the USAspending.gov database is that it provides a great example of how Amazon RDS can be used to share an entire relational database quickly and easily. Typically, sharing a relational database requires extract, transfer, and load (ETL) processes that require redundant storage capacity, time for data transfer, and often scripts to migrate your database schema from one database engine to another. ETL processes can be so intimidating and cumbersome that they’re effectively impossible for many people to carry out.

By making their data available as a public Amazon RDS snapshot, the team at USASPending.gov has made it easy for anyone to get a copy of their entire production database for their own use within minutes. This will be useful for researchers and businesses who want to work with real data about all US Government spending and quickly combine it with their own data or other data resources.

Deploying the USASpending.gov Database Using the AWS Management Console
Let’s go through the steps involved in deploying the database in your AWS account using the AWS Management Console.

  1. Sign in to the AWS Management Console and select the US East (N. Virginia) region in the menu bar.
  2. Open the Amazon RDS Console and choose Snapshots in the navigation pane.
  3. In the filter for the search bar, select All Public Snapshots and search for 515495268755:
  4. Select the snapshot named arn:aws:rds:us-east-1:515495268755:snapshot:usaspending-db.
  5. Select Snapshot Actions -> Restore Snapshot. Select an instance size, and enter the other details, then click on Restore DB Instance.
  6. You will see that a DB Instance is being created from the snapshot, within your AWS account.
  7. After a few minutes, the status of the instance will change to Available.
  8. You can see the endpoint for your database on the main page along with other useful info:

Deploying the USASpending.gov Database Using the AWS CLI
You can also install the AWS Command Line Interface (CLI) and use it to create a DB Instance from the snapshot. Here’s a sample command:

$ aws rds restore-db-instance-from-db-snapshot --db-instance-identifier my-test-db-cli \
  --db-snapshot-identifier arn:aws:rds:us-east-1:515495268755:snapshot:usaspending-db \
  --region us-east-1

This will give you an ARN (Amazon Resource Name) that you can use to reference the DB Instance. For example:

$ aws rds describe-db-instances \
  --db-instance-identifier arn:aws:rds:us-east-1:917192695859:db:my-test-db-cli

This command will display the Endpoint.Address that you use to connect to the database.

Connecting to the DB Instance
After following the AWS Management Console or AWS CLI instructions above, you will have access to the full USAspending.gov database within this Amazon RDS DB instance, and you can connect to it using any PostgreSQL client using the following credentials:

  • Username: root
  • Password: password
  • Database: data_store_api

If you use psql, you can access the database using this command:

$ psql -h my-endpoint.rds.amazonaws.com -U root -d data_store_api

You should change the database password after you log in:

ALTER USER "root" WITH ENCRYPTED PASSWORD '{new password}';

If you can’t connect to your instance but think you should be able to, you may need to check your VPC Security Groups and make sure inbound and outbound traffic on the port (usually 5432) is allowed from your IP address.

Exploring the Data
The USAspending.gov data is very rich, so it will be hard to do it justice in this blog post, but hopefully these queries will give you an idea of what’s possible. To learn about the contents of the database, please review the USAspending.gov Data Dictionary.

The following query will return the total amount of money the government is obligated to pay for contracts awarded by NASA that include “Mars” or “Martian” in the description of the award:

select sum(total_obligation) from awards, subtier_agency 
  where (awards.description like '% MARTIAN %' OR awards.description like '% MARS %') 
  AND subtier_agency.name = 'National Aeronautics and Space Administration';

As I write this, the result I get for this query is $55,411,025.42. Note that the database is updated nightly and will include more historical data in the coming months, so you may get a different result if you run this query.

Now, here’s the same query, but looking for awards with “Jupiter” or “Jovian” in the description:

select sum(total_obligation) from awards, subtier_agency
  where (awards.description like '%JUPITER%' OR awards.description like '%JOVIAN%') 
  AND subtier_agency.name = 'National Aeronautics and Space Administration';

The result I get is $14,766,392.96.

Questions & Comments
I’m looking forward to seeing what people can do with this data. If you have any questions about the data, please create an issue on the USAspending.gov API’s issue tracker on GitHub.

— Jed

Spotify’s Beta Used ‘Pirate’ MP3 Files, Some From Pirate Bay

Post Syndicated from Andy original https://torrentfreak.com/spotifys-beta-used-pirate-mp3-files-some-from-pirate-bay-170509/

While some pirates will probably never be tempted away from the digital high seas, over the past decade millions have ditched or tapered down their habit with the help of Spotify.

It’s no coincidence that from the very beginning more than a decade ago, the streaming service had more than a few things in common with the piracy scene.

Spotify CEO Daniel Ek originally worked with uTorrent creator Ludvig ‘Ludde’ Strigeus before the pair sold to BitTorrent Inc. and began work on Spotify. Later, the company told TF that pirates were their target.

“Spotify is a new way of enjoying music. We believe Spotify provides a viable alternative to music piracy,” the company said.

“We think the way forward is to create a service better than piracy, thereby converting users into a legal, sustainable alternative which also enriches the total music experience.”

The technology deployed by Spotify was also familiar. Like the majority of ‘pirate’ platforms at the time, Spotify operated a peer-to-peer (P2P) system which grew to become one of the largest on the Internet. It was shut down in 2011.

But in the clearest nod to pirates, Spotify was available for free, supported by ads if the user desired. This was the platform’s greatest asset as it sought to win over a generation that had grown accustomed to gorging on free MP3s. Interestingly, however, an early Pirate Bay figure has now revealed that Spotify also had a use for the free content floating around the Internet.

As one of the early members of Sweden’s infamous Piratbyrån (piracy bureau), Rasmus Fleischer was also one of key figures at The Pirate Bay. Over the years he’s been a writer, researcher, debater and musician, and in 2012 he finished his PhD thesis on “music’s political economy.”

As part of a five-person team, Fleischer is now writing a book about Spotify. Titled ‘Spotify Teardown – Inside the Black Box of Streaming Music’, the book aims to shine light on the history of the famous music service and also spills the beans on a few secrets.

In an interview with Sweden’s DI.se, Fleischer reveals that when Spotify was in early beta, the company used unlicensed music to kick-start the platform.

“Spotify’s beta version was originally a pirate service. It was distributing MP3 files that the employees happened to have on their hard drives,” he reveals.

Rumors that early versions of Spotify used ‘pirate’ MP3s have been floating around the Internet for years. People who had access to the service in the beginning later reported downloading tracks that contained ‘Scene’ labeling, tags, and formats, which are the tell-tale signs that content hadn’t been obtained officially.

Solid proof has been more difficult to come by but Fleischer says he knows for certain that Spotify was using music obtained not only from pirate sites, but the most famous pirate site of all.

According to the writer, a few years ago he was involved with a band that decided to distribute their music on The Pirate Bay instead of the usual outlets. Soon after, the album appeared on Spotify’s beta service.

“I thought that was funny. So I emailed Spotify and asked how they obtained it. They said that ‘now, during the test period, we will use music that we find’,” Fleischer recalls.

For a company that has attracting pirates built into its DNA, it’s perhaps fitting that it tempted them with the same bait found on pirate sites. Certainly, the company’s history of a pragmatic attitude towards piracy means that few will be shouting ‘hypocrites’ at the streaming platform now.

Indeed, according to Fleischer the successes and growth of Spotify are directly linked to the temporary downfall of The Pirate Bay following the raid on the site in 2006, and the lawsuits that followed.

“The entire Spotify beta period and its early launch history is in perfect sync with the Pirate Bay process,” Fleischer explains.

“They would not have had as much attention if they had not been able to surf that wave. The company’s early history coincides with the Pirate Party becoming a hot topic, and the trial of the Pirate Bay in the Stockholm District Court.”

In 2013, Fleischer told TF that The Pirate Bay had “helped catalyze so-called ‘new business models’,” and it now appears that Spotify is reaping the benefits and looks set to keep doing so into the future.

An in-depth interview with Rasmus Fleischer will be published here soon, including an interesting revelation detailing how TorrentFreak readers positively affected the launch of Spotify in the United States.

Spotify Teardown – Inside the Black Box of Streaming Music will be published early 2018.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Hard Drive Stats for Q1 2017

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/hard-drive-failure-rates-q1-2017/

2017 hard drive stats

In this update, we’ll review the Q1 2017 and lifetime hard drive failure rates for all our current drive models, and we’ll look at a relatively new class of drives for us – “enterprise”. We’ll share our observations and insights, and as always, you can download the hard drive statistics data we use to create these reports.

Our Hard Drive Data Set

Backblaze has now recorded and saved daily hard drive statistics from the drives in our data centers for over 4 years. This data includes the SMART attributes reported by each drive, along with related information such a the drive serial number and failure status. As of March 31, 2017 we had 84,469 operational hard drives. Of that there were 1,800 boot drives and 82,669 data drives. For our review, we remove drive models of which we have less than 45 drives, leaving us to analyze 82,516 hard drives for this report. There are currently 17 different hard drives models, ranging in size from 3 to 8 TB in size. All of these models are 3½” drives.

Hard Drive Reliability Statistics for Q1 2017

Since our last report in Q4 2016, we have added 10,577 additional hard drives to bring us to the 82,516 drives we’ll focus on. We’ll start by looking at the statistics for the period of January 1, 2017 through March 31, 2017 – Q1 2017. This is for the drives that were operational during that period, ranging in size from 3 to 8 TB as listed below.

hard drive failure rates by model

Observations and Notes on the Q1 Review

You’ll notice that some of the drive models have a failure rate of “0” (zero). Here a failure rate of zero means there were no drive failures for that model during Q1 2017. Later, we will cover how these same drive models faired over their lifetime. Why is the quarterly data important? We use it to look for anything unusual. For example, in Q1 the 4 TB Seagate drive model: ST4000DX000, has a high failure rate of 35.88%, while the lifetime annualized failure rate for this model is much lower, 7.50%. In this case, we only have a 170 drives of this particular drive model, so the failure rate is not statistically significant, but such information could be useful if we were using several thousand drives of this particular model.

There were a total 375 drive failures in Q1. A drive is considered failed if one or more of the following conditions are met:

  • The drive will not spin up or connect to the OS.
  • The drive will not sync, or stay synced, in a RAID Array (see note below).
  • The Smart Stats we use show values above our thresholds.
  • Note: Our stand-alone Storage Pods use RAID-6, our Backblaze Vaults use our own open-sourced implementation of Reed-Solomon erasure coding instead. Both techniques have a concept of a drive not syncing or staying synced with the other member drives in its group.

The annualized hard drive failure rate for Q1 in our current population of drives is 2.11%. That’s a bit higher than previous quarters, but might be a function of us adding 10,577 new drives to our count in Q1. We’ve found that there is a slightly higher rate of drive failures early on, before the drives “get comfortable” in their new surroundings. This is seen in the drive failure rate “bathtub curve” we covered in a previous post.

10,577 More Drives

The additional 10,577 drives are really a combination of 11,002 added drives, less 425 drives that were removed. The removed drives were in addition to the 375 drives marked as failed, as those were replaced 1 for 1. The 425 drives were primarily removed from service due to migrations to higher density drives.

The table below shows the breakdown of the drives added in Q1 2017 by drive size.

drive counts by size

Lifetime Hard Drive Failure Rates for Current Drives

The table below shows the failure rates for the hard drive models we had in service as of March 31, 2017. This is over the period beginning in April 2013 and ending March 31, 2017. If you are interested in the hard drive failure rates for all the hard drives we’ve used over the years, please refer to our 2016 hard drive review.

lifetime hard drive reliability rates

The annualized failure rate for the drive models listed above is 2.07%. This compares to 2.05% for the same collection of drive models as of the end of Q4 2016. The increase makes sense given the increase in Q1 2017 failure rate over previous quarters noted earlier. No new models were added during the current quarter and no old models exited the collection.

Backblaze is Using Enterprise Drives – Oh My!

Some of you may have noticed we now have a significant number of enterprise drives in our data center, namely 2,459 Seagate 8 TB drives, model: ST8000NM055. The HGST 8 TB drives were the first true enterprise drives we used as data drives in our data centers, but we only have 45 of them. So, why did we suddenly decide to purchase 2,400+ of the Seagate 8 TB enterprise drives? There was a very short period of time, as Seagate was introducing new and phasing out old drive models, that the cost per terabyte of the 8 TB enterprise drives fell within our budget. Previously we had purchased 60 of these drives to test in one Storage Pod and were satisfied they could work in our environment. When the opportunity arose to acquire the enterprise drives at a price we liked, we couldn’t resist.

Here’s a comparison of the 8 TB consumer drives versus the 8 TB enterprise drives to date:

enterprise vs. consumer hard drives

What have we learned so far…

  1. It is too early to compare failure rates – The oldest enterprise drives have only been in service for about 2 months, with most being placed into service just prior to the end of Q1. The Backblaze Vaults the enterprise drives reside in have yet to fill up with data. We’ll need at least 6 months before we could start comparing failure rates as the data is still too volatile. For example, if the current enterprise drives were to experience just 2 failures in Q2, their annualized failure rate would be about 0.57% lifetime.
  2. The enterprise drives load data faster – The Backblaze Vaults containing the enterprise drives, loaded data faster than the Backblaze Vaults containing consumer drives. The vaults with the enterprise drives loaded on average 140 TB per day, while the vaults with the consumer drives loaded on average 100 TB per day.
  3. The enterprise drives use more power – No surprise here as according to the Seagate specifications the enterprise drives use 9W average in idle and 10W average in operation. While the consumer drives use 7.2W average in idle and 9W average in operation. For a single drive this may seem insignificant, but when you put 60 drives in a 4U Storage Pod chassis and then 10 chassis in a rack, the difference adds up quickly.
  4. Enterprise drives have some nice features – The Seagate enterprise 8TB drives we used have PowerChoice™ technology that gives us the option to use less power. The data loading times noted above were recorded after we changed to a lower power mode. In short, the enterprise drive in a low power mode still stored 40% more data per day on average than the consumer drives.
  5. While it is great that the enterprise drives can load data faster, drive speed has never been a bottleneck in our system. A system that can load data faster will just “get in line” more often and fill up faster. There is always extra capacity when it comes to accepting data from customers.

    Wrapping Up

    We’ll continue to monitor the 8 TB enterprise drives and keep reporting our findings.

    If you’d like to hear more about our Hard Drive Stats, Backblaze will be presenting at the 33rd International Conference on Massive Storage Systems and Technology (MSST 2017) being held at Santa Clara University in Santa Clara California from May 15th – 19th. The conference will dedicate five days to computer-storage technology, including a day of tutorials, two days of invited papers, two days of peer-reviewed research papers, and a vendor exposition. Come join us.

    As a reminder, the hard drive data we use is available on our Hard Drive Test Data page. You can download and use this data for free for your own purpose, all we ask is three things 1) you cite Backblaze as the source if you use the data, 2) you accept that you are solely responsible for how you use the data, and 3) you do not sell this data to anyone, it is free.

    Good luck and let us know if you find anything interesting.

The post Hard Drive Stats for Q1 2017 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

OSS-Fuzz: Five months later, and rewarding projects

Post Syndicated from ris original https://lwn.net/Articles/722201/rss

Google Open Source Blog takes
a look
at the progress made by the OSS-Fuzz project. “OSS-Fuzz
has found numerous security vulnerabilities in several critical open source
projects: 10 in FreeType2, 17 in FFmpeg, 33 in LibreOffice, 8 in SQLite 3,
10 in GnuTLS, 25 in PCRE2, 9 in gRPC, and 7 in Wireshark, etc. We’ve also
had at least one bug collision with another independent security researcher
(CVE-2017-2801). (Some of the bugs are still view restricted so links may
show smaller numbers.)
” LWN covered
OSS-Fuzz last January.

Using Ultrasonic Beacons to Track Users

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/05/using_ultrasoni.html

I’ve previously written about ad networks using ultrasonic communications to jump from one device to another. The idea is for devices like televisions to play ultrasonic codes in advertisements and for nearby smartphones to detect them. This way the two devices can be linked.

Creepy, yes. And also increasingly common, as this research demonstrates:

Privacy Threats through Ultrasonic Side Channels on Mobile Devices

by Daniel Arp, Erwin Quiring, Christian Wressnegger and Konrad Rieck

Abstract: Device tracking is a serious threat to the privacy of users, as it enables spying on their habits and activities. A recent practice embeds ultrasonic beacons in audio and tracks them using the microphone of mobile devices. This side channel allows an adversary to identify a user’s current location, spy on her TV viewing habits or link together her different mobile devices. In this paper, we explore the capabilities, the current prevalence and technical limitations of this new tracking technique based on three commercial tracking solutions. To this end, we develop detection approaches for ultrasonic beacons and Android applications capable of processing these. Our findings confirm our privacy concerns: We spot ultrasonic beacons in various web media content and detect signals in 4 of 35 stores in two European cities that are used for location tracking. While we do not find ultrasonic beacons in TV streams from 7 countries, we spot 234 Android applications that are constantly listening for ultrasonic beacons in the background without the user’s knowledge.

News article. BoingBoing post.