Tag Archives: soccer

Millions of UK Football Fans Seem Confused About Piracy

Post Syndicated from Andy original https://torrentfreak.com/millions-of-uk-football-fans-seem-confused-about-piracy/

Football, or soccer as it’s more commonly known in the US, is the most popular spectator sport in the UK. As a result, millions watch matches every week, both legally and illegally.

The latter method of consumption is a big thorn in the side of organizations such as the Premier League, which has been working hard to stamp out piracy in all its forms, often via aggressive enforcement. However, a new survey published today suggests more education is also needed.

Commissioned by betting tips service OLBG and carried out by market research company OnePoll in September, the survey looks at some of the habits of 1,000 football fan respondents.

The survey begins by noting that 16.6% of respondents usually attend live games, closely followed by 14.3% who “usually” watch in the pub. However, the largest audience (46.9%) are those who regularly watch matches live at home.

This, of course, opens up the opportunity for piracy. The report states that 22.4% of football fans surveyed admitted to knowingly using “unofficial streams” at some time in the past, a figure that is extrapolated in the report to “over five million UK football fans” admitting to illegal streaming.

Asking whether fans had watched a pirated stream in the past 12 months (or even “usually”) would have arguably been a little more useful, in order not to inflate the figures beyond current consumption habits. There will be fans in those millions who, in varying combinations, attend matches, watch legally in the pub, and on occasion, illegally at home too.

Nevertheless, the report provides some interesting data on the knowledge of those surveyed when it comes to illegal and legal consumption.

For example, just over 61% of respondents acknowledged that accessing streams from unofficial providers is illegal, meaning that almost 40% believe that watching matches from third-party sources is absolutely fine. That’s a pretty big problem for the Premier League and other broadcasters when four out of ten fans can’t tell the difference between a legal and illegal provider.

Strangely, the figure drops slightly when respondents were asked about “Kodi-style” devices. Just 49% said that these boxes provide content illegally, meaning around half believe they offer football matches legally. Given the drive to stamp out the illegal use of these devices globally, this is also an eye-opener.

Moving to other methods of access, the figures are a little bit more predictable. Just under 29% felt that social media streams (Facebook Live etc) are illegal, so that may raise the possibility that respondents associated the perceived legitimacy of the platform with legality.

Password sharing is also tackled in the survey, with 32.5% of respondents stating that they believe that using someone else’s login to access football matches is illegal. If that happens outside the subscriber’s household it might constitute a terms-of-service breach but actual illegality is open to question, account stealing aside.

All that being said, according to the survey, just 11% have actually used a family member’s login to watch football during the past 12 months, a figure that drops to 9.8% when borrowing from a friend.

In common with the debate around password sharing on Netflix and other platforms, this issue is likely to receive greater attention in the future but how it will be tackled by providers is far from clear. At least at the moment, the problem seems limited.

Finally, and just returning to the headline “five million football pirates in the UK”, it’s worth noting that this refers to people who have “EVER” used an unofficial stream to watch football, so it’s not necessarily five million fans who don’t ever part with a penny.

As far as we could see, no question in the report tried to determine what percentage of fans currently freeload all of the time, which is undoubtedly the biggest problem for the Premier League.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

Premier League Claims Victories with Multi-Faceted Anti-Piracy Approach

Post Syndicated from Ernesto original https://torrentfreak.com/premier-league-claims-victories-with-multi-faceted-anti-piracy-approach-190909/

In many parts of the world, football – or soccer as some prefer to call it – is the number one spectator sport.

The English Premier League, widely regarded as one of the top competitions, draws hundreds of millions of viewers per year. Many of these pay for access to the matches, but there’s also a massive circuit of unauthorized streams.

In recent years the League has worked hard to decrease the availability of these live streams, which isn’t an easy feat. It has been a driving force behind criminal prosecutions, pursued dynamic blocking orders in court, and issued many takedown notices.

In the latest IP Crime and Enforcement Report, published by the UK Government’s Intellectual Property Office, the Premier League provides an overview of some of its key achievements over the past 12 months. These are the results of what the organization describes as a “multi-faceted” anti-piracy approach.

One of the key pillars of the anti-piracy drive is to reduce the availability of online streams and clips. This worked quite well, apparently, with the League reporting hundreds of thousands of removed or blocked live streams and other video content.

“In Season 2018/19 the Premier League removed or blocked over 210,000 live streams and over 360,000 clips of its matches that would otherwise have been available to view in the UK,” the report reads.

The dynamic blocking injunctions issued against UK ISPs are also listed as successes. With these, the Premier League can provide Internet providers with continuously updated lists of live streaming sources that need to be blocked during Premier League matches.

Another major achievement, which thus far hasn’t been publicized, is the Premier League’s involvement in the demise of the popular live stream subreddit ‘Soccerstreams’, which had over 420,000 subscribers. This subreddit was effectively shut down by its operators in January due to an increasing number of complaints.

Initially, the operators banned all user submissions, planning to use the subreddit for news announcements. However, not much later Reddit pulled it offline permanently for violating its repeat infringer policy.

Apparently, the Premier League was one of the main complainants, as the Soccerstreams ‘shutdown’ is listed among the organization’s largest successes of the past year. According to the report, the football league worked “with Reddit to close its ‘soccerstreams’ thread”

A similar victory was booked against another popular streaming site, Ronaldo7.net. While the site is still up and running, the Premier League notes that it previously secured the removal of all its content.

Offline there was progress made as well. According to the League, it conducted over 6,000 investigative visits to pubs, clubs, and other commercial venues where its content was displayed. This helped to prevent an unspecified number of illegal broadcasts.

The biggest success in court this year came from a criminal prosecution. Together with FACT, the Premier League went after three men who sold pirate IPTV subscriptions to more than 1,000 pubs, clubs and homes throughout England and Wales.

Following a four-week trial against the “Dreambox” defendants, the private prosecution resulted in prison sentences ranging from three years and three months to seven years and four months.

The same prosecution is also highlighted in a FACT case study in the same ‘IP Crime and Enforcement Report’. This overview ends with a strong focus on press coverage and the associated “advertising” value of the prosecutions.

“This result gathered worldwide media interest. It was mentioned in a total of 51 articles in no less than 16 countries worldwide, reaching a potential worldwide audience of over 165 million people. A BBC News article (pictured) had the largest reach, with a potential 35 million readers.

“The advertising value equivalent for the press received is estimated at over £1.5 million,” the FACT case study adds.

The comparison with advertising value may seem odd in this context, but it makes sense. The goal of prosecutions of this nature is not just to stop the infringing activities. FACT and the Premier League also want to send a clear message to other people participating in similar businesses, hoping they will stop.

While there are still plenty of pirate streaming operations online, the Premier League’s overview shows that the organization is taking the issue rather seriously. As such, it will likely continue in the same steps in the new season.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

La Liga Fined €250K For Breaching GDPR While Spying on Piracy

Post Syndicated from Andy original https://torrentfreak.com/la-liga-fined-for-breaching-gdpr-while-spying-on-piracy-190612/

With millions of fans around the globe, Spain’s La Liga soccer league is one of the most popular in the game.

To allow fans to keep up with all the latest news, La Liga offers an Android app with a number of features including schedules, kick-off times, and the all-important results.

Controversially, however, the app also has a surprising trick up its sleeve.

After gaining consent from users, La Liga’s software turns fans’ phones into spying devices which are able to analyze their surroundings using the microphone, listening out for unauthorized broadcasts in bars and restaurants, for example. This audio, collected Shazam-style, is then paired with phone GPS data to pinpoint establishments airing matches without a license.

The feature was outlined in the app’s privacy policy along with stated uses that include combating piracy.

“The purposes for which this functionality will be used are: (i) to develop statistical patterns on soccer consumption and (ii) to detect fraudulent operations of the retransmissions of LaLiga football matches (piracy),” the policy read when first uncovered last summer.

While controversial, La Liga felt that it was on solid ground in respect of the feature and its declaration to app users. AEPD, Spain’s data protection agency (Agencia Española de Protección de Datos), fundamentally disagrees.

As a result, AEPD has hit La Liga with a significant 250,000 euro fine for not properly informing its users in respect of the ‘microphone’ feature, including not displaying a mic icon when recording.

The data protection agency said that La Liga’s actions breached several aspects of the EU’s GDPR, including a failure to gain consent every time the microphones in users’ devices were activated.

In a statement, La Liga says it “disagrees deeply” with the AEPD’s decision and believes the agency has “not made the effort to understand how the technology works.” Announcing it will go to court to challenge the ruling, La Liga says it has always complied with the GDPR and other relevant data protection regulations.

Noting that users of the app must “expressly, proactively and on two occasions give their consent” for the microphone to be used, La Liga further insists that the app does not “record, store or listen” to people’s conversations.

“[T]he technology used is designed to generate only a specific sound footprint (acoustic fingerprint). This fingerprint only contains 0.75% of the information, discarding the remaining 99.25%, so it is technically impossible to interpret the voice or human conversations. This footprint is transformed into an alphanumeric code (hash) that is not reversible to the original sound,” La Liga says.

AEPD has ordered La Liga to introduce new mechanisms to ensure that users are properly notified when the anti-piracy features of the app are in use. However, La Liga says it has no need to implement them because at the end of the current season (June 30, 2019), the functionality will be disabled.

“La Liga will continue to test and implement new technologies and innovations that allow us to improve the experience of our fans and, of course, fight against this very serious scourge that is piracy,” the league concludes.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

LaLiga & Rights Alliance Win Dynamic Football Piracy Blocking Order

Post Syndicated from Andy original https://torrentfreak.com/la-liga-rights-alliance-win-dynamic-sports-piracy-blocking-order-190426/

Movies, TV shows, and music have all proven popular with online pirates for years but with fast Internet connections now widespread, streaming live television is on the increase.

This presents a unique problem for football leagues hoping to generate large revenues from fans keen to catch the big game on TV.

There are dozens of pirate sites available today willing to provide that content for free, a point not lost on Danish anti-piracy group Rights Alliance (RettighedsAlliancen). Figures provided by the group indicate that between February 2018 and February 2019, locals made 18 million visits to the most popular illegal live sports services.

To help counter this threat, Rights Alliance teamed up with Spanish top-tier football league LaLiga in legal action designed to prevent local Internet users from accessing the sites.

In a case filed on February 8, 2019, LaLiga demanded that local ISP Telenor should prevent its subscribers from accessing 10 sites (list below) that infringe its copyrights by showing live matches. LaLiga suggested DNS blocking as a possible method but Telenor asked for the whole injunction to be denied.

LaLiga stated that that previous rulings from the European Court of Justice found that the English Premier League owned copyright in its broadcasts, which included videos, music, highlights of previous matches and graphics.

When these are made available to the public, it is only the rightsholder that holds an exclusive license to do so. The same holds true when such broadcasts are made available to a “new audience” – i.e one that the rightsholder hadn’t initially taken into account, such as unauthorized streaming over the Internet of an otherwise terrestrial broadcast.

As is becoming typical in similar cases, the local court referenced other important rulings from the EU Court, including GS Media, BREIN v Ziggo and BREIN v Filmspeler, to determine if the protected works were being made available to the public in contravention of EU law.

Nine of the ‘pirate’ services listed in the complaint were ultimately deemed to be infringing due to them offering copyrighted works and generating revenue via advertising. The tenth, Spain-based RojaDirecta, requested more time to respond to LaLiga’s complaint, so the site will be dealt with at a later date.

On April 15, 2019, the Court of Frederiksberg handed down its order, which requires Telenor to block the listed sites using a “technical solution” such as DNS blocking. The provider is also required to block other domains that appear in future which facilitate access to the same sites. These will be advised by Rights Alliance under strict rules laid down by the Court.

Under the Danish ISP Code of Conduct, other major ISPs in Denmark will also implement the blocks against the sites in the complaint.

This is an important case in Denmark for both LaLiga and Rights Alliance, one that paves the way for blocking of unlicensed live sports and general TV portals in general.

LaLiga’s Audiovisual Director Melcior Soler welcomed the decision.

“Audiovisual Piracy is illegal and has great consequences, not only for us, but for the league and the future of the game, so we are very happy that RettighedsAlliancen has joined us in the fight. We know that Denmark is at the forefront of the development of digital tools to fight online piracy, and this is a big issue for us,” Soler said.

“We are now looking forward to seeing the effects of the blockings and hope that they can serve as an example for other countries, so that we can stand together in the fight against online piracy.”

The full order (supplied to TF by Rights Alliance) can be found here (pdf).

The site names, which are partially redacted in the order, are as follows:

  • livetv*****
  • tvron*****
  • ronaldo7****
  • kora-star****
  • live.harleyquinnwidget****
  • myfeed2all****
  • stream2watch****
  • jokerlivestream****
  • kora-online****

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

World Cup fever: Raspberry Pi football projects to try

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/raspberry-pi-football/

Rumour has it that there’s a worldwide football tournament on, and that England, surprisingly, are doing quite well. In celebration, here are some soccer-themed Raspberry Pi projects for you to try out at home between (or during) matches.

FutureLearn Football

Uploaded by Raspberry Pi on 2018-07-09.

Beat the goalie

Score as many goals as you can in 30 seconds with our code-it-yourself Beat the Goalie game for Scratch. You can access Scratch in any web browser, or offline with your Raspberry Pi.

Beat the goalie scratch raspberry pi

Start by coding a moving football in Scratch, and work through the project to build a game that tallies your successful attempts on goal within a time limit that you choose. Up the stakes by upgrading your game to include second-player control of the penguin goalie.

Table football

Once you’ve moved on from penalty practice, it’s time to recruit the whole team!

Table football Scratch

Our Table Football project – free, like all of our learning projects – comes with all the ingredients you need to recreate the classic game, including player sprites, graphics, and sounds.

Instant replay!

Scratch is all well and good, but it’s time we had some real-life table football, with all the snazzy upgrades you can add using a Raspberry Pi.

Foosball Instant Replay

Demo of Foosball Instant Replay system More info here: * https://github.com/swehner/foos * https://github.com/netsuso/foos-tournament Music: http://freemusicarchive.org/music/Jahzzar/Blinded_by_dust/Magic_Mountain_1877

Stefan Wehner’s build is fully documented, so you can learn how to add automatic goal detection, slow-motion instant replay, scorekeeping, tallying, and more.

Ball tracking with Marty

Marty is a 3D-printable educational robot powered by a Raspberry Pi. With the capacity to add the Raspberry Pi camera module, Marty is a great tool for practising object tracking – in this case, ball tracking – for some football fun with robots!

Teaching Marty the Robot to Play Football

In this video we start to program Marty The Robot to play football, using a camera and Raspberry Pi on board to detect the ball and the goal. With the camera, Marty can spot a ball, and detect a pattern next to the goal.

You can also check out Circuit Digest’s ball-tracking robot using a Raspberry Pi, and this ball tracking tutorial by amey_s on Instructables.

What did we miss?

Have you built a football-themed project using a Raspberry Pi? What projects did we miss in our roundup? Share them with us here in the comments, or on social media.

The post World Cup fever: Raspberry Pi football projects to try appeared first on Raspberry Pi.

Safety first: a Raspberry Pi safety helmet

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/safety-helmet/

Jennifer Fox is back, this time with a Raspberry Pi Zero–controlled impact force monitor that will notify you if your collision is a worth a trip to the doctor.

Make an Impact Force Monitor!

Check out my latest Hacker in Residence project for SparkFun Electronics: the Helmet Guardian! It’s a Pi Zero powered impact force monitor that turns on an LED if your head/body experiences a potentially dangerous impact. Install in your sports helmets, bicycle, or car to keep track of impact and inform you when it’s time to visit the doctor.

Concussion

We’ve all knocked our heads at least once in our lives, maybe due to tripping over a loose paving slab, or to falling off a bike, or to walking into the corner of the overhead cupboard door for the third time this week — will I ever learn?! More often than not, even when we’re seeing stars, we brush off the accident and continue with our day, oblivious to the long-term damage we may be doing.

Force of impact

After some thorough research, Jennifer Fox, founder of FoxBot Industries, concluded that forces of 4 to 6 G sustained for more than a few seconds are dangerous to the human body. With this in mind, she decided to use a Raspberry Pi Zero W and an accelerometer to create helmet with an impact force monitor that notifies its wearer if this level of G-force has been met.

Jennifer Fox Raspberry Pi Impact Force Monitor

Obviously, if you do have a serious fall, you should always seek medical advice. This project is an example of how affordable technology can be used to create medical and citizen science builds, and not a replacement for professional medical services.

Setting up the impact monitor

Jennifer’s monitor requires only a few pieces of tech: a Zero W, an accelerometer and breakout board, a rechargeable USB battery, and an LED, plus the standard wires and resistors for these components.

After installing Raspbian, Jennifer enabled SSH and I2C on the Zero W to make it run headlessly, and then accessed it from a laptop. This allows her to control the Pi without physically connecting to it, and it makes for a wireless finished project.

Jen wired the Pi to the accelerometer breakout board and LED as shown in the schematic below.

Jennifer Fox Raspberry Pi Impact Force Monitor

The LED acts as a signal of significant impacts, turning on when the G-force threshold is reached, and not turning off again until the program is reset.

Jennifer Fox Raspberry Pi Impact Force Monitor

Make your own and more

Jennifer’s full code for the impact monitor is on GitHub, and she’s put together a complete tutorial on SparkFun’s website.

For more tutorials from Jennifer Fox, such as her ‘Bark Back’ IoT Pet Monitor, be sure to follow her on YouTube. And for similar projects, check out Matt’s smart bike light and Amelia Day’s physical therapy soccer ball.

The post Safety first: a Raspberry Pi safety helmet appeared first on Raspberry Pi.

Success at Apache: A Newbie’s Narrative

Post Syndicated from mikesefanov original https://yahooeng.tumblr.com/post/170536010891

yahoodevelopers:

Kuhu Shukla (bottom center) and team at the 2017 DataWorks Summit


By Kuhu Shukla

This post first appeared here on the Apache Software Foundation blog as part of ASF’s “Success at Apache” monthly blog series.

As I sit at my desk on a rather frosty morning with my coffee, looking up new JIRAs from the previous day in the Apache Tez project, I feel rather pleased. The latest community release vote is complete, the bug fixes that we so badly needed are in and the new release that we tested out internally on our many thousand strong cluster is looking good. Today I am looking at a new stack trace from a different Apache project process and it is hard to miss how much of the exceptional code I get to look at every day comes from people all around the globe. A contributor leaves a JIRA comment before he goes on to pick up his kid from soccer practice while someone else wakes up to find that her effort on a bug fix for the past two months has finally come to fruition through a binding +1.

Yahoo – which joined AOL, HuffPost, Tumblr, Engadget, and many more brands to form the Verizon subsidiary Oath last year – has been at the frontier of open source adoption and contribution since before I was in high school. So while I have no historical trajectories to share, I do have a story on how I found myself in an epic journey of migrating all of Yahoo jobs from Apache MapReduce to Apache Tez, a then-new DAG based execution engine.

Oath grid infrastructure is through and through driven by Apache technologies be it storage through HDFS, resource management through YARN, job execution frameworks with Tez and user interface engines such as Hive, Hue, Pig, Sqoop, Spark, Storm. Our grid solution is specifically tailored to Oath’s business-critical data pipeline needs using the polymorphic technologies hosted, developed and maintained by the Apache community.

On the third day of my job at Yahoo in 2015, I received a YouTube link on An Introduction to Apache Tez. I watched it carefully trying to keep up with all the questions I had and recognized a few names from my academic readings of Yarn ACM papers. I continued to ramp up on YARN and HDFS, the foundational Apache technologies Oath heavily contributes to even today. For the first few weeks I spent time picking out my favorite (necessary) mailing lists to subscribe to and getting started on setting up on a pseudo-distributed Hadoop cluster. I continued to find my footing with newbie contributions and being ever more careful with whitespaces in my patches. One thing was clear – Tez was the next big thing for us. By the time I could truly call myself a contributor in the Hadoop community nearly 80-90% of the Yahoo jobs were now running with Tez. But just like hiking up the Grand Canyon, the last 20% is where all the pain was. Being a part of the solution to this challenge was a happy prospect and thankfully contributing to Tez became a goal in my next quarter.

The next sprint planning meeting ended with me getting my first major Tez assignment – progress reporting. The progress reporting in Tez was non-existent – “Just needs an API fix,”  I thought. Like almost all bugs in this ecosystem, it was not easy. How do you define progress? How is it different for different kinds of outputs in a graph? The questions were many.

I, however, did not have to go far to get answers. The Tez community actively came to a newbie’s rescue, finding answers and posing important questions. I started attending the bi-weekly Tez community sync up calls and asking existing contributors and committers for course correction. Suddenly the team was much bigger, the goals much more chiseled. This was new to anyone like me who came from the networking industry, where the most open part of the code are the RFCs and the implementation details are often hidden. These meetings served as a clean room for our coding ideas and experiments. Ideas were shared, to the extent of which data structure we should pick and what a future user of Tez would take from it. In between the usual status updates and extensive knowledge transfers were made.

Oath uses Apache Pig and Apache Hive extensively and most of the urgent requirements and requests came from Pig and Hive developers and users. Each issue led to a community JIRA and as we started running Tez at Oath scale, new feature ideas and bugs around performance and resource utilization materialized. Every year most of the Hadoop team at Oath travels to the Hadoop Summit where we meet our cohorts from the Apache community and we stand for hours discussing the state of the art and what is next for the project. One such discussion set the course for the next year and a half for me.

We needed an innovative way to shuffle data. Frameworks like MapReduce and Tez have a shuffle phase in their processing lifecycle wherein the data from upstream producers is made available to downstream consumers. Even though Apache Tez was designed with a feature set corresponding to optimization requirements in Pig and Hive, the Shuffle Handler Service was retrofitted from MapReduce at the time of the project’s inception. With several thousands of jobs on our clusters leveraging these features in Tez, the Shuffle Handler Service became a clear performance bottleneck. So as we stood talking about our experience with Tez with our friends from the community, we decided to implement a new Shuffle Handler for Tez. All the conversation points were tracked now through an umbrella JIRA TEZ-3334 and the to-do list was long. I picked a few JIRAs and as I started reading through I realized, this is all new code I get to contribute to and review. There might be a better way to put this, but to be honest it was just a lot of fun! All the whiteboards were full, the team took walks post lunch and discussed how to go about defining the API. Countless hours were spent debugging hangs while fetching data and looking at stack traces and Wireshark captures from our test runs. Six months in and we had the feature on our sandbox clusters. There were moments ranging from sheer frustration to absolute exhilaration with high fives as we continued to address review comments and fixing big and small issues with this evolving feature.

As much as owning your code is valued everywhere in the software community, I would never go on to say “I did this!” In fact, “we did!” It is this strong sense of shared ownership and fluid team structure that makes the open source experience at Apache truly rewarding. This is just one example. A lot of the work that was done in Tez was leveraged by the Hive and Pig community and cross Apache product community interaction made the work ever more interesting and challenging. Triaging and fixing issues with the Tez rollout led us to hit a 100% migration score last year and we also rolled the Tez Shuffle Handler Service out to our research clusters. As of last year we have run around 100 million Tez DAGs with a total of 50 billion tasks over almost 38,000 nodes.

In 2018 as I move on to explore Hadoop 3.0 as our future release, I hope that if someone outside the Apache community is reading this, it will inspire and intrigue them to contribute to a project of their choice. As an astronomy aficionado, going from a newbie Apache contributor to a newbie Apache committer was very much like looking through my telescope - it has endless possibilities and challenges you to be your best.

About the Author:

Kuhu Shukla is a software engineer at Oath and did her Masters in Computer Science at North Carolina State University. She works on the Big Data Platforms team on Apache Tez, YARN and HDFS with a lot of talented Apache PMCs and Committers in Champaign, Illinois. A recent Apache Tez Committer herself she continues to contribute to YARN and HDFS and spoke at the 2017 Dataworks Hadoop Summit on “Tez Shuffle Handler: Shuffling At Scale With Apache Hadoop”. Prior to that she worked on Juniper Networks’ router and switch configuration APIs. She likes to participate in open source conferences and women in tech events. In her spare time she loves singing Indian classical and jazz, laughing, whale watching, hiking and peering through her Dobsonian telescope.

Analyze OpenFDA Data in R with Amazon S3 and Amazon Athena

Post Syndicated from Ryan Hood original https://aws.amazon.com/blogs/big-data/analyze-openfda-data-in-r-with-amazon-s3-and-amazon-athena/

One of the great benefits of Amazon S3 is the ability to host, share, or consume public data sets. This provides transparency into data to which an external data scientist or developer might not normally have access. By exposing the data to the public, you can glean many insights that would have been difficult with a data silo.

The openFDA project creates easy access to the high value, high priority, and public access data of the Food and Drug Administration (FDA). The data has been formatted and documented in consumer-friendly standards. Critical data related to drugs, devices, and food has been harmonized and can easily be called by application developers and researchers via API calls. OpenFDA has published two whitepapers that drill into the technical underpinnings of the API infrastructure as well as how to properly analyze the data in R. In addition, FDA makes openFDA data available on S3 in raw format.

In this post, I show how to use S3, Amazon EMR, and Amazon Athena to analyze the drug adverse events dataset. A drug adverse event is an undesirable experience associated with the use of a drug, including serious drug side effects, product use errors, product quality programs, and therapeutic failures.

Data considerations

Keep in mind that this data does have limitations. In addition, in the United States, these adverse events are submitted to the FDA voluntarily from consumers so there may not be reports for all events that occurred. There is no certainty that the reported event was actually due to the product. The FDA does not require that a causal relationship between a product and event be proven, and reports do not always contain the detail necessary to evaluate an event. Because of this, there is no way to identify the true number of events. The important takeaway to all this is that the information contained in this data has not been verified to produce cause and effect relationships. Despite this disclaimer, many interesting insights and value can be derived from the data to accelerate drug safety research.

Data analysis using SQL

For application developers who want to perform targeted searching and lookups, the API endpoints provided by the openFDA project are “ready to go” for software integration using a standard API powered by Elasticsearch, NodeJS, and Docker. However, for data analysis purposes, it is often easier to work with the data using SQL and statistical packages that expect a SQL table structure. For large-scale analysis, APIs often have query limits, such as 5000 records per query. This can cause extra work for data scientists who want to analyze the full dataset instead of small subsets of data.

To address the concern of requiring all the data in a single dataset, the openFDA project released the full 100 GB of harmonized data files that back the openFDA project onto S3. Athena is an interactive query service that makes it easy to analyze data in S3 using standard SQL. It’s a quick and easy way to answer your questions about adverse events and aspirin that does not require you to spin up databases or servers.

While you could point tools directly at the openFDA S3 files, you can find greatly improved performance and use of the data by following some of the preparation steps later in this post.

Architecture

This post explains how to use the following architecture to take the raw data provided by openFDA, leverage several AWS services, and derive meaning from the underlying data.

Steps:

  1. Load the openFDA /drug/event dataset into Spark and convert it to gzip to allow for streaming.
  2. Transform the data in Spark and save the results as a Parquet file in S3.
  3. Query the S3 Parquet file with Athena.
  4. Perform visualization and analysis of the data in R and Python on Amazon EC2.

Optimizing public data sets: A primer on data preparation

Those who want to jump right into preparing the files for Athena may want to skip ahead to the next section.

Transforming, or pre-processing, files is a common task for using many public data sets. Before you jump into the specific steps for transforming the openFDA data files into a format optimized for Athena, I thought it would be worthwhile to provide a quick exploration on the problem.

Making a dataset in S3 efficiently accessible with minimal transformation for the end user has two key elements:

  1. Partitioning the data into objects that contain a complete part of the data (such as data created within a specific month).
  2. Using file formats that make it easy for applications to locate subsets of data (for example, gzip, Parquet, ORC, etc.).

With these two key elements in mind, you can now apply transformations to the openFDA adverse event data to prepare it for Athena. You might find the data techniques employed in this post to be applicable to many of the questions you might want to ask of the public data sets stored in Amazon S3.

Before you get started, I encourage those who are interested in doing deeper healthcare analysis on AWS to make sure that you first read the AWS HIPAA Compliance whitepaper. This covers the information necessary for processing and storing patient health information (PHI).

Also, the adverse event analysis shown for aspirin is strictly for demonstration purposes and should not be used for any real decision or taken as anything other than a demonstration of AWS capabilities. However, there have been robust case studies published that have explored a causal relationship between aspirin and adverse reactions using OpenFDA data. If you are seeking research on aspirin or its risks, visit organizations such as the Centers for Disease Control and Prevention (CDC) or the Institute of Medicine (IOM).

Preparing data for Athena

For this walkthrough, you will start with the FDA adverse events dataset, which is stored as JSON files within zip archives on S3. You then convert it to Parquet for analysis. Why do you need to convert it? The original data download is stored in objects that are partitioned by quarter.

Here is a small sample of what you find in the adverse events (/drugs/event) section of the openFDA website.

If you were looking for events that happened in a specific quarter, this is not a bad solution. For most other scenarios, such as looking across the full history of aspirin events, it requires you to access a lot of data that you won’t need. The zip file format is not ideal for using data in place because zip readers must have random access to the file, which means the data can’t be streamed. Additionally, the zip files contain large JSON objects.

To read the data in these JSON files, a streaming JSON decoder must be used or a computer with a significant amount of RAM must decode the JSON. Opening up these files for public consumption is a great start. However, you still prepare the data with a few lines of Spark code so that the JSON can be streamed.

Step 1:  Convert the file types

Using Apache Spark on EMR, you can extract all of the zip files and pull out the events from the JSON files. To do this, use the Scala code below to deflate the zip file and create a text file. In addition, compress the JSON files with gzip to improve Spark’s performance and reduce your overall storage footprint. The Scala code can be run in either the Spark Shell or in an Apache Zeppelin notebook on your EMR cluster.

If you are unfamiliar with either Apache Zeppelin or the Spark Shell, the following posts serve as great references:

 

import scala.io.Source
import java.util.zip.ZipInputStream
import org.apache.spark.input.PortableDataStream
import org.apache.hadoop.io.compress.GzipCodec

// Input Directory
val inputFile = "s3://download.open.fda.gov/drug/event/2015q4/*.json.zip";

// Output Directory
val outputDir = "s3://{YOUR OUTPUT BUCKET HERE}/output/2015q4/";

// Extract zip files from 
val zipFiles = sc.binaryFiles(inputFile);

// Process zip file to extract the json as text file and save it
// in the output directory 
val rdd = zipFiles.flatMap((file: (String, PortableDataStream)) => {
    val zipStream = new ZipInputStream(file.2.open)
    val entry = zipStream.getNextEntry
    val iter = Source.fromInputStream(zipStream).getLines
    iter
}).map(.replaceAll("\s+","")).saveAsTextFile(outputDir, classOf[GzipCodec])

Step 2:  Transform JSON into Parquet

With just a few more lines of Scala code, you can use Spark’s abstractions to convert the JSON into a Spark DataFrame and then export the data back to S3 in Parquet format.

Spark requires the JSON to be in JSON Lines format to be parsed correctly into a DataFrame.

// Output Parquet directory
val outputDir = "s3://{YOUR OUTPUT BUCKET NAME}/output/drugevents"
// Input json file
val inputJson = "s3://{YOUR OUTPUT BUCKET NAME}/output/2015q4/*”
// Load dataframe from json file multiline 
val df = spark.read.json(sc.wholeTextFiles(inputJson).values)
// Extract results from dataframe
val results = df.select("results")
// Save it to Parquet
results.write.parquet(outputDir)

Step 3:  Create an Athena table

With the data cleanly prepared and stored in S3 using the Parquet format, you can now place an Athena table on top of it to get a better understanding of the underlying data.

Because the openFDA data structure incorporates several layers of nesting, it can be a complex process to try to manually derive the underlying schema in a Hive-compatible format. To shorten this process, you can load the top row of the DataFrame from the previous step into a Hive table within Zeppelin and then extract the “create  table” statement from SparkSQL.

results.createOrReplaceTempView("data")

val top1 = spark.sql("select * from data tablesample(1 rows)")

top1.write.format("parquet").mode("overwrite").saveAsTable("drugevents")

val show_cmd = spark.sql("show create table drugevents”).show(1, false)

This returns a “create table” statement that you can almost paste directly into the Athena console. Make some small modifications (adding the word “external” and replacing “using with “stored as”), and then execute the code in the Athena query editor. The table is created.

For the openFDA data, the DDL returns all string fields, as the date format used in your dataset does not conform to the yyy-mm-dd hh:mm:ss[.f…] format required by Hive. For your analysis, the string format works appropriately but it would be possible to extend this code to use a Presto function to convert the strings into time stamps.

CREATE EXTERNAL TABLE  drugevents (
   companynumb  string, 
   safetyreportid  string, 
   safetyreportversion  string, 
   receiptdate  string, 
   patientagegroup  string, 
   patientdeathdate  string, 
   patientsex  string, 
   patientweight  string, 
   serious  string, 
   seriousnesscongenitalanomali  string, 
   seriousnessdeath  string, 
   seriousnessdisabling  string, 
   seriousnesshospitalization  string, 
   seriousnesslifethreatening  string, 
   seriousnessother  string, 
   actiondrug  string, 
   activesubstancename  string, 
   drugadditional  string, 
   drugadministrationroute  string, 
   drugcharacterization  string, 
   drugindication  string, 
   drugauthorizationnumb  string, 
   medicinalproduct  string, 
   drugdosageform  string, 
   drugdosagetext  string, 
   reactionoutcome  string, 
   reactionmeddrapt  string, 
   reactionmeddraversionpt  string)
STORED AS parquet
LOCATION
  's3://{YOUR TARGET BUCKET}/output/drugevents'

With the Athena table in place, you can start to explore the data by running ad hoc queries within Athena or doing more advanced statistical analysis in R.

Using SQL and R to analyze adverse events

Using the openFDA data with Athena makes it very easy to translate your questions into SQL code and perform quick analysis on the data. After you have prepared the data for Athena, you can begin to explore the relationship between aspirin and adverse drug events, as an example. One of the most common metrics to measure adverse drug events is the Proportional Reporting Ratio (PRR). It is defined as:

PRR = (m/n)/( (M-m)/(N-n) )
Where
m = #reports with drug and event
n = #reports with drug
M = #reports with event in database
N = #reports in database

Gastrointestinal haemorrhage has the highest PRR of any reaction to aspirin when viewed in aggregate. One question you may want to ask is how the PRR has trended on a yearly basis for gastrointestinal haemorrhage since 2005.

Using the following query in Athena, you can see the PRR trend of “GASTROINTESTINAL HAEMORRHAGE” reactions with “ASPIRIN” since 2005:

with drug_and_event as 
(select rpad(receiptdate, 4, 'NA') as receipt_year
    , reactionmeddrapt
    , count(distinct (concat(safetyreportid,receiptdate,reactionmeddrapt))) as reports_with_drug_and_event 
from fda.drugevents
where rpad(receiptdate,4,'NA') 
     between '2005' and '2015' 
     and medicinalproduct = 'ASPIRIN'
     and reactionmeddrapt= 'GASTROINTESTINAL HAEMORRHAGE'
group by reactionmeddrapt, rpad(receiptdate, 4, 'NA') 
), reports_with_drug as 
(
select rpad(receiptdate, 4, 'NA') as receipt_year
    , count(distinct (concat(safetyreportid,receiptdate,reactionmeddrapt))) as reports_with_drug 
 from fda.drugevents 
 where rpad(receiptdate,4,'NA') 
     between '2005' and '2015' 
     and medicinalproduct = 'ASPIRIN'
group by rpad(receiptdate, 4, 'NA') 
), reports_with_event as 
(
   select rpad(receiptdate, 4, 'NA') as receipt_year
    , count(distinct (concat(safetyreportid,receiptdate,reactionmeddrapt))) as reports_with_event 
   from fda.drugevents
   where rpad(receiptdate,4,'NA') 
     between '2005' and '2015' 
     and reactionmeddrapt= 'GASTROINTESTINAL HAEMORRHAGE'
   group by rpad(receiptdate, 4, 'NA')
), total_reports as 
(
   select rpad(receiptdate, 4, 'NA') as receipt_year
    , count(distinct (concat(safetyreportid,receiptdate,reactionmeddrapt))) as total_reports 
   from fda.drugevents
   where rpad(receiptdate,4,'NA') 
     between '2005' and '2015' 
   group by rpad(receiptdate, 4, 'NA')
)
select  drug_and_event.receipt_year, 
(1.0 * drug_and_event.reports_with_drug_and_event/reports_with_drug.reports_with_drug)/ (1.0 * (reports_with_event.reports_with_event- drug_and_event.reports_with_drug_and_event)/(total_reports.total_reports-reports_with_drug.reports_with_drug)) as prr
, drug_and_event.reports_with_drug_and_event
, reports_with_drug.reports_with_drug
, reports_with_event.reports_with_event
, total_reports.total_reports
from drug_and_event
    inner join reports_with_drug on  drug_and_event.receipt_year = reports_with_drug.receipt_year   
    inner join reports_with_event on  drug_and_event.receipt_year = reports_with_event.receipt_year
    inner join total_reports on  drug_and_event.receipt_year = total_reports.receipt_year
order by  drug_and_event.receipt_year


One nice feature of Athena is that you can quickly connect to it via R or any other tool that can use a JDBC driver to visualize the data and understand it more clearly.

With this quick R script that can be run in R Studio either locally or on an EC2 instance, you can create a visualization of the PRR and Reporting Odds Ratio (RoR) for “GASTROINTESTINAL HAEMORRHAGE” reactions from “ASPIRIN” since 2005 to better understand these trends.

# connect to ATHENA
conn <- dbConnect(drv, '<Your JDBC URL>',s3_staging_dir="<Your S3 Location>",user=Sys.getenv(c("USER_NAME"),password=Sys.getenv(c("USER_PASSWORD"))

# Declare Adverse Event
adverseEvent <- "'GASTROINTESTINAL HAEMORRHAGE'"

# Build SQL Blocks
sqlFirst <- "SELECT rpad(receiptdate, 4, 'NA') as receipt_year, count(DISTINCT safetyreportid) as event_count FROM fda.drugsflat WHERE rpad(receiptdate,4,'NA') between '2005' and '2015'"
sqlEnd <- "GROUP BY rpad(receiptdate, 4, 'NA') ORDER BY receipt_year"

# Extract Aspirin with adverse event counts
sql <- paste(sqlFirst,"AND medicinalproduct ='ASPIRIN' AND reactionmeddrapt=",adverseEvent, sqlEnd,sep=" ")
aspirinAdverseCount = dbGetQuery(conn,sql)

# Extract Aspirin counts
sql <- paste(sqlFirst,"AND medicinalproduct ='ASPIRIN'", sqlEnd,sep=" ")
aspirinCount = dbGetQuery(conn,sql)

# Extract adverse event counts
sql <- paste(sqlFirst,"AND reactionmeddrapt=",adverseEvent, sqlEnd,sep=" ")
adverseCount = dbGetQuery(conn,sql)

# All Drug Adverse event Counts
sql <- paste(sqlFirst, sqlEnd,sep=" ")
allDrugCount = dbGetQuery(conn,sql)

# Select correct rows
selAll =  allDrugCount$receipt_year == aspirinAdverseCount$receipt_year
selAspirin = aspirinCount$receipt_year == aspirinAdverseCount$receipt_year
selAdverse = adverseCount$receipt_year == aspirinAdverseCount$receipt_year

# Calculate Numbers
m <- c(aspirinAdverseCount$event_count)
n <- c(aspirinCount[selAspirin,2])
M <- c(adverseCount[selAdverse,2])
N <- c(allDrugCount[selAll,2])

# Calculate proptional reporting ratio
PRR = (m/n)/((M-m)/(N-n))

# Calculate reporting Odds Ratio
d = n-m
D = N-M
ROR = (m/d)/(M/D)

# Plot the PRR and ROR
g_range <- range(0, PRR,ROR)
g_range[2] <- g_range[2] + 3
yearLen = length(aspirinAdverseCount$receipt_year)
axis(1,1:yearLen,lab=ax)
plot(PRR, type="o", col="blue", ylim=g_range,axes=FALSE, ann=FALSE)
axis(1,1:yearLen,lab=ax)
axis(2, las=1, at=1*0:g_range[2])
box()
lines(ROR, type="o", pch=22, lty=2, col="red")

As you can see, the PRR and RoR have both remained fairly steady over this time range. With the R Script above, all you need to do is change the adverseEvent variable from GASTROINTESTINAL HAEMORRHAGE to another type of reaction to analyze and compare those trends.

Summary

In this walkthrough:

  • You used a Scala script on EMR to convert the openFDA zip files to gzip.
  • You then transformed the JSON blobs into flattened Parquet files using Spark on EMR.
  • You created an Athena DDL so that you could query these Parquet files residing in S3.
  • Finally, you pointed the R package at the Athena table to analyze the data without pulling it into a database or creating your own servers.

If you have questions or suggestions, please comment below.


Next Steps

Take your skills to the next level. Learn how to optimize Amazon S3 for an architecture commonly used to enable genomic data analysis. Also, be sure to read more about running R on Amazon Athena.

 

 

 

 

 


About the Authors

Ryan Hood is a Data Engineer for AWS. He works on big data projects leveraging the newest AWS offerings. In his spare time, he enjoys watching the Cubs win the World Series and attempting to Sous-vide anything he can find in his refrigerator.

 

 

Vikram Anand is a Data Engineer for AWS. He works on big data projects leveraging the newest AWS offerings. In his spare time, he enjoys playing soccer and watching the NFL & European Soccer leagues.

 

 

Dave Rocamora is a Solutions Architect at Amazon Web Services on the Open Data team. Dave is based in Seattle and when he is not opening data, he enjoys biking and drinking coffee outside.

 

 

 

 

Physical therapy with a pressure-sensing football

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/physical-therapy-pressure-sensing-football/

Every year, eighth-grade science teacher Michele Chamberlain challenges her students to find a solution to a real-world problem. The solution must be environmentally friendly, and must demonstrate their sense of global awareness.

Amelia Day

Amelia with her project.

One of Michele’s students, 14-year-old Amelia Day, knew she wanted to create something that would help her practice her favourite sport, and approached Chamberlain with an idea for a football-related project.

I know you said to choose a project you love,” Amelia explained, “I love soccer and I want to do something with engineering. I know I want to compete.”

Originally, the tool was built to help budding football players practise how to kick a ball correctly. The ball, tethered to a parasol shaft, uses a Raspberry Pi, LEDs, Bluetooth, and pressure points; together, these help athletes to connect with the ball with the right degree of force at the appropriate spot.

However, after a conversation with her teacher, it became apparent that Amelia’s ball could be used for so much more. As a result, the project was gradually redirected towards working with stroke therapy patients.

“It uses the aspect of a soccer training tool and that interface makes it fun, but it also uses Bluetooth audio feedback to rebuild the neural pathways inside the brain, and this is what is needed to recover from a stroke,” explains Amelia. 

“DE3MYSC Submission – [Press-Sure Soccer Ball]”

Uploaded by Amelia Day on 2016-04-20.

The video above comes as part of Amelia’s submission for the Discover Education’s 3M ‘Young Science Challenge 2016’, a national competition for fifth- to eight-grade students from across the USA.

Down to the last ten finalists, Amelia travelled to 3M HQ in Minnesota this October where she had to present her project to a panel of judges. She placed third runner up and received a cash prize.

LMS Hawks on Twitter

Our very own Amelia Day placed 3rd runner up @ the 3M National Junior Scientist competition this week. Proud to call her a Hawk!📓✏️🔎⚽️ #LMS

We’re always so proud to see young makers working to change the world and we wish Amelia the best of luck with her future. We expect to see great things from this Lakeridge Middle School Hawk.

The post Physical therapy with a pressure-sensing football appeared first on Raspberry Pi.

Instant-replay table football

Post Syndicated from Liz Upton original https://www.raspberrypi.org/blog/instant-replay-table-football/

So, England, nominally the home of football, is out of the European Cup, having lost to Iceland. Iceland is a country with a population of 330,000 hardy Vikings, whose national sport is handball. England’s population is over 53 million. And we invented soccer.

Iceland’s only football pitch is under snow for much of the year, and their part-time manager is a full-time dentist.

I think perhaps England should refocus their sporting efforts on something a little less challenging. Like table football. With a Raspberry Pi on hand, you can even make it feel stadium-like, with automatic goal detection, slow-motion instant replay, score-keeping, tallying for a league of competitors and more. Come on, nation. I feel that we could do quite well with this; and given that it cuts the size of the team down to two people, it’d keep player salaries at a minimum.

Foosball Instant Replay

Demo of Foosball Instant Replay system More info here: * https://github.com/swehner/foos * https://github.com/netsuso/foos-tournament Music: http://freemusicarchive.org/music/Jahzzar/Blinded_by_dust/Magic_Mountain_1877

This build comes from Stefan Wehner, who has documented it meticulously on GitHub. You’ll find full build instructions and a parts list (which starts with a football table), along with all the code you’ll need.

Well done Iceland, by the way. We’re not bitter or anything.

 

 

The post Instant-replay table football appeared first on Raspberry Pi.