Tag Archives: books

Subtitle Heroes: Fansubbing Movie Criticized For Piracy Promotion

Post Syndicated from Andy original https://torrentfreak.com/subtitle-heroes-fansubbing-movie-criticized-for-piracy-promotion-180217/

With many thousands of movies and TV shows being made available illegally online every year, a significant number will be enjoyed by speakers of languages other than that presented in the original production.

When Hollywood blockbusters appear online, small armies of individuals around the world spring into action, translating the dialog into Chinese and Czech, Dutch and Danish, French and Farsi, Russian and Romanian, plus a dozen languages in between. TV shows, particularly those produced in the US, get the same immediate treatment.

For many years, subtitling (‘fansubbing’) communities have provided an incredible service to citizens around the globe, from those seeking to experience new culture and languages to the hard of hearing and profoundly deaf. Now, following in the footsteps of movies like TPB:AFK and Kim Dotcom: Caught in the Web, a new movie has premiered in Italy which celebrates this extraordinary movement.

Subs Heroes from writer and director Franco Dipietro hit cinemas at the end of January. It documents the contribution fansubbing has made to Italian culture in a country that under fascism in 1934 banned the use of foreign languages in films, books, newspapers and everyday speech.

The movie centers on the large subtitle site ItalianSubs.net. Founded by a group of teenagers in 2006, it is now run by a team of men and women who maintain their identities as regular citizens during the day but transform into “superheroes of fansubbing” at night.

Needless to say, not everyone is pleased with this depiction of the people behind the now-infamous 500,000 member site.

For many years, fansubbing attracted very little heat but over time anti-piracy groups have been turning up the pressure, accusing subtitling teams of fueling piracy. This notion is shared by local anti-piracy outfit FAPAV (Federation for the Protection of Audiovisual and Multimedia Content), which has accused Dipietro’s movie of glamorizing criminal activity.

In a statement following the release of Subs Heroes, FAPAV made its position crystal clear: sites like ItalianSubs do not contribute to the development of the audiovisual market in Italy.

“It is necessary to clarify: when a protected work is subtitled and there is no right to do so, a crime is committed,” the anti-piracy group says.

“[Italiansubs] translates and makes available subtitles of audiovisual works (films and television series) in many cases not yet distributed on the Italian market. All this without having requested the consent of the rights holders. Ergo the Italiansubs community is illegal.”

Italiansubs (note ad for movie, top right)

FAPAV General Secretary Federico Bagnoli Rossi says that the impact that fansubbers have on the market is significant, causing damage not only to companies distributing the content but also to those who invest in official translations.

The fact that fansubbers often translate content that is not yet available in the region only compounds matters, Rossi says, noting that unofficial translations can also have “direct consequences” on those who have language dubbing as an occupation.

“The audiovisual market today needs to be supported and the protection and fight against illicit behaviors are as fundamental as investments and creative ideas,” Rossi notes.

“Everyone must do their part, respecting the rules and with a competitive and global cultural vision. There are no ‘superheroes’ or noble goals behind piracy, but only great damage to the audiovisual sector and all its workers.”

Also piling on the criticism is the chief of the National Cinema Exhibitors’ Association, who wrote to all of the companies involved to remind them that unauthorized subtitling is a crime. According to local reports, there seems to be an underlying tone that people should avoid becoming associated with the movie.

This did not please director Franco Dipietro who is defending his right to document the fansubbing movement, whether the industry likes it or not.

“We invite those who perhaps think differently to deepen the discussion and maybe organize an event to talk about it together. The film is made to confront and talk about a phenomenon that, whether we like it or not, exists and we can not pretend that it is not there,” Dipietro concludes.



Subs Heroes Trailer 1 from Duel: on Vimeo.



Subs Heroes Trailer 2 from Duel: on Vimeo.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

[$] Jupyter: notebooks for education and collaboration

Post Syndicated from jake original https://lwn.net/Articles/746386/rss

The popular interpreted language Python shares a mode of interaction
with many other languages, from Lisp to APL to Julia: the REPL (read-eval-print-loop)
allows the user to experiment with and explore their code, while maintaining a
workspace of global variables and functions. This is in contrast with
languages such as Fortran and C, which must be compiled and run as complete
programs (a mode of operation available to the REPL-enabled languages as
well). But using a REPL is a solitary task; one can write a program to
share based on their explorations, but the REPL session itself not easily
shareable. So REPLs have gotten more sophisticated over time, evolving
into shareable notebooks, such as what IPython, and its more recent
descendant, Jupyter, have. Here we look at Jupyter: its history,
notebooks, and how it enables better collaboration in languages well beyond
its Python roots.

Community Profile: Dr. Lucy Rogers

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/community-profile-lucy-rogers/

This column is from The MagPi issue 58. You can download a PDF of the full issue for free, or subscribe to receive the print edition through your letterbox or the digital edition on your tablet. All proceeds from the print and digital editions help the Raspberry Pi Foundation achieve our charitable goals.

Dr Lucy Rogers calls herself a Transformer. “I transform simple electronics into cool gadgets, I transform science into plain English, I transform problems into opportunities. I am also a catalyst. I am interested in everything around me, and can often see ways of putting two ideas from very different fields together into one package. If I cannot do this myself, I connect the people who can.”

Dr Lucy Rogers Raspberry Pi The MagPi Community Profile

Among many other projects, Dr Lucy Rogers currently focuses much of her attention on reducing the damage from space debris

It’s a pretty wide range of interests and skills for sure. But it only takes a brief look at Lucy’s résumé to realise that she means it. When she says she’s interested in everything around her, this interest reaches from electronics to engineering, wearable tech, space, robotics, and robotic dinosaurs. And she can be seen talking about all of these things across various companies’ social media, such as IBM, websites including the Women’s Engineering Society, and books, including her own.

Dr Lucy Rogers Raspberry Pi The MagPi Community Profile

With her bright LED boots, Lucy was one of the wonderful Pi community members invited to join us and HRH The Duke of York at St James’s Palace just over a year ago

When not attending conferences as guest speaker, tinkering with electronics, or creating engaging IoT tutorials, she can be found retrofitting Raspberry Pis into the aforementioned robotic dinosaurs at Blackgang Chine Land of Imagination, writing, and judging battling bots for the BBC’s Robot Wars.

Dr Lucy Rogers Raspberry Pi The MagPi Community Profile

First broadcast in the UK between 1998 and 2004, Robot Wars was revived in 2016 with a new look and new judges, including Dr Lucy Rogers. Competitors battle their home-brew robots, and Lucy, together with the other two judges, awards victories among the carnage of robotic remains

Lucy graduated from Lancaster University with a degree in Mechanical Engineering. After that, she spent seven years at Rolls-Royce Industrial Power Group as a graduate trainee before becoming a chartered engineer and earning her PhD in bubbles.

Bubbles?

“Foam formation in low‑expansion fire-fighting equipment. I investigated the equipment to determine how the bubbles were formed,” she explains. Obviously. Bubbles!

Dr Lucy Rogers Raspberry Pi The MagPi Community Profile

Lucy graduated from the Singularity University Graduate Studies Program in 2011, focusing on how robotics, nanotech, medicine, and various technologies can tackle the challenges facing the world

She then went on to become a fellow of the Royal Astronomical Society (RAS) in 2005 and, later, a fellow of both the Institution of Mechanical Engineers (IMechE) and British Interplanetary Society. As a member of the Association of British Science Writers, Lucy wrote It’s ONLY Rocket Science: an Introduction in Plain English.

Dr Lucy Rogers Raspberry Pi The MagPi Community Profile

In It’s Only Rocket Science: An Introduction in Plain English Lucy explains that ‘hard to understand’ isn’t the same as ‘impossible to understand’, and takes her readers through the journey of building a rocket, leaving Earth, and travelling the cosmos

As a standout member of the industry, and all-round fun person to be around, Lucy has quickly established herself as a valued member of the Pi community.

In 2014, with the help of Neil Ford and Andy Stanford-Clark, Lucy worked with the UK’s oldest amusement park, Blackgang Chine Land of Imagination, on the Isle of Wight, with the aim of updating its animatronic dinosaurs. The original Blackgang Chine dinosaurs had a limited range of behaviour: able to roar, move their heads, and stomp a foot in a somewhat repetitive action.

When she contacted Raspberry Pi back in the November of that same year, the team were working on more creative, varied behaviours, giving each dinosaur a new Raspberry Pi-sized brain. This later evolved into a very successful dino-hacking Raspberry Jam.

Dr Lucy Rogers Raspberry Pi The MagPi Community Profile

Lucy, Neil Ford, and Andy Stanford-Clark used several Raspberry Pis and Node-RED to visualise flows of events when updating the robotic dinosaurs at Blackgang Chine. They went on to create the successful WightPi Raspberry Jam event, where visitors could join in with the unique hacking opportunity.

Given her love for tinkering with tech, and a love for stand-up comedy that can be uncovered via a quick YouTube search, it’s no wonder that Lucy was asked to help judge the first round of the ‘Make us laugh’ Pioneers challenge for Raspberry Pi. Alongside comedian Bec Hill, Code Club UK director Maria Quevedo, and the face of the first challenge, Owen Daughtery, Lucy lent her expertise to help name winners in the various categories of the teens event, and offered her support to future Pioneers.

The post Community Profile: Dr. Lucy Rogers appeared first on Raspberry Pi.

Blame privacy activists for the Memo??

Post Syndicated from Robert Graham original http://blog.erratasec.com/2018/02/blame-privacy-activists-for-memo.html

Former FBI agent Asha Rangappa @AshaRangappa_ has a smart post debunking the Nunes Memo, then takes it all back again with an op-ed on the NYTimes blaming us privacy activists. She presents an obviously false narrative that the FBI and FISA courts are above suspicion.

I know from first hand experience the FBI is corrupt. In 2007, they threatened me, trying to get me to cancel a talk that revealed security vulnerabilities in a large corporation’s product. Such abuses occur because there is no transparency and oversight. FBI agents write down our conversation in their little notebooks instead of recording it, so that they can control the narrative of what happened, presenting their version of the converstion (leaving out the threats). In this day and age of recording devices, this is indefensible.

She writes “I know firsthand that it’s difficult to get a FISA warrant“. Yes, the process was difficult for her, an underling, to get a FISA warrant. The process is different when a leader tries to do the same thing.

I know this first hand having casually worked as an outsider with intelligence agencies. I saw two processes in place: one for the flunkies, and one for those above the system. The flunkies constantly complained about how there is too many process in place oppressing them, preventing them from getting their jobs done. The leaders understood the system and how to sidestep those processes.

That’s not to say the Nunes Memo has merit, but it does point out that privacy advocates have a point in wanting more oversight and transparency in such surveillance of American citizens.

Blaming us privacy advocates isn’t the way to go. It’s not going to succeed in tarnishing us, but will push us more into Trump’s camp, causing us to reiterate that we believe the FBI and FISA are corrupt.

Udemy Targets ‘Pirate’ Site Giving Away its Paid Courses For Free

Post Syndicated from Andy original https://torrentfreak.com/udemy-targets-pirate-site-giving-away-its-paid-courses-for-free-180129/

While there’s no shortage of people who advocate free sharing of movies and music, passions are often raised when it comes to the availability of educational information.

Significant numbers of people believe that learning should be open to all and that texts and associated materials shouldn’t be locked away by copyright holders trying to monetize knowledge. Of course, people who make a living creating learning materials see the position rather differently.

A clash of these ideals is brewing in the United States where online learning platform Udemy has been trying to have some of its courses taken down from FreeTutorials.us, a site that makes available premium tutorials and other learning materials for free.

Early December 2017, counsel acting for Udemy and a number of its individual and corporate instructors (Maximilian Schwarzmüller, Academind GmbH, Peter Dalmaris, Futureshock Enterprises, Jose Marcial Portilla, and Pierian Data) wrote to FreeTutorials.us with DMCA takedown notice.

“Pursuant to 17 U.S.C. § 512(c)(3)(A) of the Digital Millennium Copyright Act (‘DMCA’), this communication serves as a notice of infringement and request for removal of certain web content available on freetutorials.us,” the letter reads.

“I hereby request that you remove or disable access to the material listed in Exhibit A in as expedient a fashion as possible. This communication does not constitute a waiver of any right to recover damages incurred by virtue of any such unauthorized activities, and such rights as well as claims for other relief are expressly retained.”

A small sample of Exhibit A

On January 10, 2018, the same law firm wrote to Cloudflare, which provides services to FreeTutorials. The DMCA notice asked Cloudflare to disable access to the same set of infringing content listed above.

It seems likely that whatever happened next wasn’t to Udemy’s satisfaction. On January 16, an attorney from the same law firm filed a DMCA subpoena at a district court in California. A DMCA subpoena can enable a copyright holder to obtain the identity of an alleged infringer without having to file a lawsuit and without needing a signature from a judge.

The subpoena was directed at Cloudflare, which provides services to FreeTutorials. The company was ordered to hand over “all identifying information identifying the owner, operator and/or contact person(s) associated with the domain www.freetutorials.us, including but not limited to name(s), address(es), telephone number(s), email address(es), Internet protocol connection records, administrative records and billing records from the time the account was established to the present.”

On January 26, the date by which Cloudflare was ordered to hand over the information, Cloudflare wrote to FreeTutorials with a somewhat late-in-the-day notification.

“We received the attached subpoena regarding freetutorials.us, a domain managed through your Cloudflare account. The subpoena requires us to provide information in our systems related to this website,” the company wrote.

“We have determined that this is a valid subpoena, and we are required to provide the requested information. In accordance with our Privacy Policy, we are informing you before we provide any of the requested subscriber information. We plan to turn over documents in response to the subpoena on January 26th, 2018, unless you intervene in the case.”

With that deadline passing last Friday, it’s safe to say that Cloudflare has complied with the subpoena as the law requires. However, TorrentFreak spoke with FreeTutorials who told us that the company doesn’t hold anything useful on them.

“No, they have nothing,” the team explained.

Noting that they’ll soon dispense with the services of Cloudflare, the team confirmed that they had received emails from Udemy and its instructors but hadn’t done a lot in response.

“How about a ‘NO’? was our answer to all the DMCA takedown requests from Udemy and its Instructors,” they added.

FreeTutorials (FTU) are affiliated with FreeCoursesOnline (FCO) and seem passionate about what they do. In common with others who distribute learning materials online, they express a belief in free education for all, irrespective of financial resources.

“We, FTU and FCO, are a group of seven members assorted as a team from different countries and cities. We are JN, SRZ aka SunRiseZone, Letap, Lihua Google Drive, Kaya, Zinnia, Faiz MeemBazooka,” a spokesperson revealed.

“We’re all members and colleagues and we also have our own daily work and business stuff to do. We have been through that phase of life when we didn’t have enough money to buy books and get tuition or even apply for a good course that we always wanted to have, so FTU & FCO are just our vision to provide Free Education For Everyone.

“We would love to change our priorities towards our current and future projects, only if we manage to get some faithful FTU’ers to join in and help us to grow together and make FTU a place it should be.”

TorrentFreak requested comment from Udemy but at the time of publication, we were yet to hear back. However, we did manage to get in touch with Jonathan Levi, an Udemy instructor who sent this takedown notice to the site in October 2017:

“I’m writing to you on behalf of SuperHuman Enterprises, LLC. You are in violation of our copyright, using our images, and linking to pirated copies of our courses. Remove them IMMEDIATELY or face severe legal action….You have 48 hours to comply,” he wrote, adding:

“And in case you’re going to say I don’t have evidence that I own the files, it’s my fucking face in the videos.”

Levi says that the site had been non-responsive so now things are being taken to the next level.

“They don’t reply to takedowns, so we’ve joined a class action lawsuit against FTU lead by Udemy and a law firm specializing in this type of thing,” Levi concludes.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Hollywood Says Only Site-Blocking Left to Beat Piracy in New Zealand

Post Syndicated from Andy original https://torrentfreak.com/hollywood-says-only-site-blocking-left-to-beat-piracy-in-new-zealand-180123/

The Motion Picture Distributors’ Association (MPDA) is a non-profit organisation which represents major international film studios in New Zealand.

With companies including Fox, Sony, Paramount, Roadshow, Disney, and Universal on the books, the MPDA sings from the same sheet as the MPAA and MPA. It also hopes to achieve in New Zealand what its counterparts have achieved in Europe and Australia but cannot on home soil – mass pirate site blocking.

In a release heralding the New Zealand screen industry’s annual contribution of around NZ$1.05 billion to GDP and NZ$706 million to exports, MPDA Managing Director Matthew Cheetham says that despite the successes, serious challenges lie ahead.

“When we have the illegal file sharing site the Pirate Bay as New Zealand’s 19th most popular site in New Zealand, it is clear that legitimate movie and TV distribution channels face challenges,” Cheetham says.

MPDA members in New Zealand

In common with movie bosses in many regions, Cheetham is hoping that the legal system will rise to the challenge and assist distributors to tackle the piracy problem. In New Zealand, that might yet require a change in the law but given recent changes in Australia, that doesn’t seem like a distant proposition.

Last December, the New Zealand government announced an overhaul of the country’s copyright laws. A review of the Copyright Act 1994 was announced by the previous government and is now scheduled to go ahead this year. The government has already indicated a willingness to consider amendments to the Act in order to meet the objectives of New Zealand’s copyright regime.

“In New Zealand, piracy is almost an accepted thing, because no one’s really doing anything about it, because no one actually can do anything about it,” Cheetham said last month.

It’s quite unusual for Hollywood’s representatives to say nothing can be done about piracy. However, there was a small ray of hope this morning when Cheetham said that there is actually one option left.

“There’s nothing we can do in New Zealand apart from site blocking,” Cheetham said.

So, as the MPDA appears to pin its hopes on legislative change, other players in the entertainment industry are testing the legal system as it stands today.

Last September, Sky TV began a pioneering ‘pirate’ site-blocking challenge in the New Zealand High Court, applying for an injunction against several local ISPs to prevent their subscribers from accessing several pirate sites.

The boss of Vocus, one of the ISP groups targeted, responded angrily, describing Sky’s efforts as “dinosaur behavior” and something one would expect in North Korea, not in New Zealand.

“It isn’t our job to police the Internet and it sure as hell isn’t SKY’s either, all sites should be equal and open,” General Manager Taryn Hamilton said.

The response from ISPs suggests that even when the matter of site-blocking is discussed as part of the Copyright Act review, introducing specific legislation may not be smooth sailing. In that respect, all eyes will turn to the Sky process, to see if some precedent can be set there.

Finally, another familiar problem continues to raise its head down under. So-called “Kodi boxes” – the now generic phrase often used to describe set-top devices configured for piracy – are also on the content industries’ radar.

There are a couple of cases still pending against sellers, including one in which a budding entrepreneur sent out marketing letters claiming that his service was better than Sky’s offering. For seller Krish Reddy, this didn’t turn out well as the company responded with a NZ$1m lawsuit.

Generally, however, both content industries and consumers are having a good time in New Zealand but the MPDA’s Cheetham says that taking on pirates is never easy.

“It’s been called the golden age of television and a lot of premium movies have been released in the last 12 or 18 months. Content providers and distributors have really upped their game in the last five or 10 years to meet what people want but it’s very difficult to compete with free,” Cheetham concludes.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Copyright Trolls Obtained Details of 200,000 Finnish Internet Users

Post Syndicated from Andy original https://torrentfreak.com/copyright-trolls-obtained-details-of-200000-finnish-internet-users-180118/

Fifteen years ago, the RIAA was contacting alleged file-sharers in the United States, demanding cash payments to make supposed lawsuits go away. In the years that followed, dozens of companies followed in their footsteps – not as a deterrent – but as a way to turn piracy into profit.

The practice is now widespread, not just in the United States, but also in Europe where few major countries have avoided the clutches of trolls. Germany has been hit particularly hard, with millions of cases. The UK has also seen tens of thousands of individuals targeted since 2006 although more recently the trolls there have been in retreat. The same cannot be said about Finland, however.

From a relatively late start in 2013, trolls have been stepping up their game in leaps and bounds but the true scale of developments in this Scandinavian country will probably come as a surprise to even the most seasoned of troll-watchers.

According to data compiled by NGO activist Ritva Puolakka, the business in Finland has grown to epidemic proportions. In fact, between 2013 and 2017 the Market Court (which deals with Intellectual Property matters, among other things) has ordered local Internet service providers to hand over the details of almost 200,000 Finnish Internet subscribers.

Published on the Ministry of Education and Culture website (via mikrobitti.fi) the data (pdf) reveals hundreds of processes against major Finnish ISPs.

Notably, every single case has been directed at a core group of three providers – Elisa, TeliaSonera and DNA – while customers of other ISPs seem to have been completely overlooked. Exactly why isn’t clear but in other jurisdictions it’s proven more cost-effective to hone a process with a small number of ISPs, rather than spread out to those with fewer customers.

Only one legal process is listed for 2013 but that demanded the identities of people behind 50 IP addresses. In 2014 there was a 14-fold increase in processes and the number of IP addresses targeted grew to 1,387.

For 2015, a total of 117 processes are listed, demanding the identities of people behind 37,468 IP addresses. In 2016 the trolls really upped their game. A total of 131 processes demanded the details of individuals behind 98,966 IP addresses. For last year, 79 processes are on the books, which in total amounted to 60,681 potential defendants in settlement cases.

In total, between 2013 and 2017 the Market Court ordered the ISPs to hand over the personal details of people behind a staggering 198,552 IP addresses. While it should be noted that each might not lead to a unique individual, the number is huge when one considers the potential returns if everyone pays up hundreds of euros to make supposed court cases go away.

But despite the significant scale, it will probably come as no surprise that very few companies are involved. Troll operations tend to be fairly centralized, often using the same base services to track and collect evidence against alleged pirates.

In the order they entered the settlement business in Finland the companies involved are: LFP Video Group LLC, International Content Holding B.V., Dallas Buyers Club LLC, Crystalis Entertainment UG, Scanbox Entertainment A/S, Fairway Film Alliance LLC, Copyright Collections Ltd, Mircom International Content Management, Interallip LLP, and Oy Atlantic Film Finland Ab.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

timeShift(GrafanaBuzz, 1w) Issue 30

Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2018/01/19/timeshiftgrafanabuzz-1w-issue-30/

Welcome to TimeShift

We’re only 6 weeks away from the next GrafanaCon and here at Grafana Labs we’re buzzing with excitement. We have some great talks lined up that you won’t want to miss.

This week’s TimeShift covers Grafana’s annotation functionality, monitoring with Prometheus, integrating Grafana with NetFlow and a peek inside Stream’s monitoring stack. Enjoy!


Latest Stable Release

Grafana 4.6.3 is now available. Latest bugfixes include:

  • Gzip: Fixes bug Gravatar images when gzip was enabled #5952
  • Alert list: Now shows alert state changes even after adding manual annotations on dashboard #99513
  • Alerting: Fixes bug where rules evaluated as firing when all conditions was false and using OR operator. #93183
  • Cloudwatch: CloudWatch no longer display metrics’ default alias #101514, thx @mtanda

Download Grafana 4.6.3 Now


From the Blogosphere

Walkthrough: Watch your Ansible deployments in Grafana!: Your graphs start spiking and your platform begins behaving abnormally. Did the config change in a deployment, causing the problem? This article covers Grafana’s new annotation functionality, and specifically, how to create deployment annotations via Ansible playbooks.

Application Monitoring in OpenShift with Prometheus and Grafana: There are many article describing how to monitor OpenShift with Prometheus running in the same cluster, but what if you don’t have admin permissions to the cluster you need to monitor?

Spring Boot Metrics Monitoring Using Prometheus & Grafana: As the title suggests, this post walks you through how to configure Prometheus and Grafana to monitor you Spring Boot application metrics.

How to Integrate Grafana with NetFlow: Learn how to monitor NetFlow from Scrutinizer using Grafana’s SimpleJSON data source.

Stream & Go: News Feeds for Over 300 Million End Users: Stream lets you build scalable newsfeeds and activity streams via their API, which is used by more than 300 million end users. In this article, they discuss their monitoring stack and why they chose particular components and technologies.


GrafanaCon EU Tickets are Going Fast!

We’re six weeks from kicking off GrafanaCon EU! Join us for talks from Google, Bloomberg, Tinder, eBay and more! You won’t want to miss two great days of open source monitoring talks and fun in Amsterdam. Get your tickets before they sell out!

Get Your Ticket Now


Grafana Plugins

We have a couple of plugin updates to share this week that add some new features and improvements. Updating your plugins is easy. For on-prem Grafana, use the Grafana-cli tool, or update with 1 click on your Hosted Grafana.

UPDATED PLUGIN

Druid Data Source – This new update is packed with new features. Notable enhancement include:

  • Post Aggregation feature
  • Support for thetaSketch
  • Improvements to the Query editor

Update Now

UPDATED PLUGIN

Breadcrumb Panel – The Breadcrumb Panel is a small panel you can include in your dashboard that tracks other dashboards you have visited – making it easy to navigate back to a previously visited dashboard. The latest release adds support for dashboards loaded from a file.

Update Now


Upcoming Events

In between code pushes we like to speak at, sponsor and attend all kinds of conferences and meetups. We also like to make sure we mention other Grafana-related events happening all over the world. If you’re putting on just such an event, let us know and we’ll list it here.

SnowCamp 2018: Yves Brissaud – Application metrics with Prometheus and Grafana | Grenoble, France – Jan 24, 2018:
We’ll take a look at how Prometheus, Grafana and a bit of code make it possible to obtain temporal data to visualize the state of our applications as well as to help with development and debugging.

Register Now

Women Who Go Berlin: Go Workshop – Monitoring and Troubleshooting using Prometheus and Grafana | Berlin, Germany – Jan 31, 2018: In this workshop we will learn about one of the most important topics in making apps production ready: Monitoring. We will learn how to use tools you’ve probably heard a lot about – Prometheus and Grafana, and using what we learn we will troubleshoot a particularly buggy Go app.

Register Now

FOSDEM | Brussels, Belgium – Feb 3-4, 2018: FOSDEM is a free developer conference where thousands of developers of free and open source software gather to share ideas and technology. There is no need to register; all are welcome.

Jfokus | Stockholm, Sweden – Feb 5-7, 2018:
Carl Bergquist – Quickie: Monitoring? Not OPS Problem

Why should we monitor our system? Why can’t we just rely on the operations team anymore? They use to be able to do that. What’s currently changing? Presentation content: – Why do we monitor our system – How did it use to work? – Whats changing – Why do we need to shift focus – Everyone should be on call. – Resilience is the goal (Best way of having someone care about quality is to make them responsible).

Register Now

Jfokus | Stockholm, Sweden – Feb 5-7, 2018:
Leonard Gram – Presentation: DevOps Deconstructed

What’s a Site Reliability Engineer and how’s that role different from the DevOps engineer my boss wants to hire? I really don’t want to be on call, should I? Is Docker the right place for my code or am I better of just going straight to Serverless? And why should I care about any of it? I’ll try to answer some of these questions while looking at what DevOps really is about and how commodisation of servers through “the cloud” ties into it all. This session will be an opinionated piece from a developer who’s been on-call for the past 6 years and would like to convince you to do the same, at least once.

Register Now

Stockholm Metrics and Monitoring | Stockholm, Sweden – Feb 7, 2018:
Observability 3 ways – Logging, Metrics and Distributed Tracing

Let’s talk about often confused telemetry tools: Logging, Metrics and Distributed Tracing. We’ll show how you capture latency using each of the tools and how they work differently. Through examples and discussion, we’ll note edge cases where certain tools have advantages over others. By the end of this talk, we’ll better understand how each of Logging, Metrics and Distributed Tracing aids us in different ways to understand our applications.

Register Now

OpenNMS – Introduction to “Grafana” | Webinar – Feb 21, 2018:
IT monitoring helps detect emerging hardware damage and performance bottlenecks in the enterprise network before any consequential damage or disruption to business processes occurs. The powerful open-source OpenNMS software monitors a network, including all connected devices, and provides logging of a variety of data that can be used for analysis and planning purposes. In our next OpenNMS webinar on February 21, 2018, we introduce “Grafana” – a web-based tool for creating and displaying dashboards from various data sources, which can be perfectly combined with OpenNMS.

Register Now


Tweet of the Week

We scour Twitter each week to find an interesting/beautiful dashboard and show it off! #monitoringLove

As we say with pie charts, use emojis wisely 😉


Grafana Labs is Hiring!

We are passionate about open source software and thrive on tackling complex challenges to build the future. We ship code from every corner of the globe and love working with the community. If this sounds exciting, you’re in luck – WE’RE HIRING!

Check out our Open Positions


How are we doing?

That wraps up our 30th issue of TimeShift. What do you think? Are there other types of content you’d like to see here? Submit a comment on this issue below, or post something at our community forum.

Follow us on Twitter, like us on Facebook, and join the Grafana Labs community.

UK Government Teaches 7-Year-Olds That Piracy is Stealing

Post Syndicated from Ernesto original https://torrentfreak.com/uk-government-teaches-7-year-olds-that-piracy-is-stealing-180118/

In 2014, Mike Weatherley, the UK Government’s top IP advisor at the time, offered a recommendation that copyright education should be added to the school curriculum, starting with the youngest kids in primary school.

New generations should learn copyright moral and ethics, the idea was, and a few months later the first version of the new “Cracking Ideas” curriculum was made public.

In the years that followed new course material was added, published by the UK’s Intellectual Property Office (IPO) with support from the local copyright industry. The teaching material is aimed at a variety of ages, including those who have just started primary school.

Part of the education features a fictitious cartoon band called Nancy and the Meerkats. With help from their manager, they learn key copyright insights and this week several new videos were published, BBC points out.

The videos try to explain concepts including copyright, trademarks, and how people can protect the things they’ve created. Interestingly, the videos themselves use names of existing musicians, with puns such as Ed Shealing, Justin Beaver, and the evil Kitty Perry. Even Nancy and the Meerkats appears to be a play on the classic 1970s cartoon series Josie and the Pussycats, featuring a pop band of the same name.

The play on Ed Sheeran’s name is interesting, to say the least. While he’s one of the most popular artists today, he also mentioned in the past that file-sharing made his career.

“…illegal fire sharing was what made me. It was students in England going to university, sharing my songs with each other,” Sheeran said in an interview with CBS last year.

But that didn’t stop the IPO from using his likeness for their anti-file-sharing campaign. According to Catherine Davies of IPO’s education outreach department, knowledge about key intellectual property issues is a “life skill” nowadays.

“In today’s digital environment, even very young people are IP consumers, accessing online digital content independently and regularly,” she tells the BBC. “A basic understanding of IP and a respect for others’ IP rights is therefore a key life skill.”

While we doubt that these concepts will appeal to the average five-year-old, the course material does it best to simplify complex copyright issues. Perhaps that’s also where the danger lies.

The program is in part backed by copyright-reliant industries, who have a different view on the matter than many others. For example, a previously published video of Nancy and the Meerkats deals with the topic of file-sharing.

After the Meerkats found out that people were downloading their tracks from pirate sites and became outraged, their manager Big Joe explained that file-sharing is just the same as stealing a CD from a physical store.

“In a way, all those people who downloaded free copies are doing the same thing as walking out of the shop with a CD and forgetting to go the till,” he says.

“What these sites are doing is sometimes called piracy. It not only affects music but also videos, books, and movies.If someone owns the copyright to something, well, it is stealing. Simple as that,” Big Joe adds.

The Pirates of the Internet!

While we won’t go into the copying vs. stealing debate, it’s interesting that there is no mention of more liberal copyright licenses. There are thousands of artists who freely share their work after all, by adopting Creative Commons licenses for example. Downloading these tracks is certainly not stealing.

Jim Killock, director of the Open Rights Group, notes that the campaign is a bit extreme at points.

“Infringing copyright is a bad thing, but it is not the same as physical theft. Many children will guess that making a copy is not the same as making off with the local store’s chocolate bars,” he says.

“Children aren’t born bureaucrats, and they are surrounded by stupid rules made by stupid adults. Presumably, the IPO doesn’t want children to conclude that copyright is just another one, so they should be a bit more careful with how they explain things.”

Killock also stresses that children copy a lot of things in school, which would normally violate copyright. However, thanks to the educational exceptions they’re not getting in trouble. The IPO could pay more attention to these going forward.

Perhaps Nancy and the Meerkats could decide to release a free to share track in a future episode, for example, and encourage kids to use it for their own remixes, or other creative projects. Creativity and copyright are not all about restrictions, after all.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

US Govt Brands Torrent, Streaming & Cyberlocker Sites As Notorious Markets

Post Syndicated from Andy original https://torrentfreak.com/us-govt-brands-torrent-streaming-cyberlocker-sites-as-notorious-markets-180115/

In its annual “Out-of-Cycle Review of Notorious Markets” the office of the United States Trade Representative (USTR) has listed a long list of websites said to be involved in online piracy.

The list is compiled with high-level input from various trade groups, including the MPAA and RIAA who both submitted their recommendations (1,2) during early October last year.

With the word “allegedly” used more than two dozen times in the report, the US government notes that its report does not constitute cast-iron proof of illegal activity. However, it urges the countries from where the so-called “notorious markets” operate to take action where they can, while putting owners and facilitators on notice that their activities are under the spotlight.

“A goal of the List is to motivate appropriate action by owners, operators, and service providers in the private sector of these and similar markets, as well as governments, to reduce piracy and counterfeiting,” the report reads.

“USTR highlights the following marketplaces because they exemplify global counterfeiting and piracy concerns and because the scale of infringing activity in these marketplaces can cause significant harm to U.S. intellectual property (IP) owners, consumers, legitimate online platforms, and the economy.”

The report begins with a page titled “Issue Focus: Illicit Streaming Devices”. Unsurprisingly, particularly given their place in dozens of headlines last year, the segment focus on the set-top box phenomenon. The piece doesn’t list any apps or software tools as such but highlights the general position, claiming a cost to the US entertainment industry of $4-5 billion a year.

Torrent Sites

In common with previous years, the USTR goes on to list several of the world’s top torrent sites but due to changes in circumstances, others have been delisted. ExtraTorrent, which shut down May 2017, is one such example.

As the world’s most famous torrent site, The Pirate Bay gets a prominent mention, with the USTR noting that the site is of “symbolic importance as one of the longest-running and most vocal torrent sites. The USTR underlines the site’s resilience by noting its hydra-like form while revealing an apparent secret concerning its hosting arrangements.

“The Pirate Bay has allegedly had more than a dozen domains hosted in various countries around the world, applies a reverse proxy service, and uses a hosting provider in Vietnam to evade further enforcement action,” the USTR notes.

Other torrent sites singled out for criticism include RARBG, which was nominated for the listing by the movie industry. According to the USTR, the site is hosted in Bosnia and Herzegovina and has changed hosting services to prevent shutdowns in recent years.

1337x.to and the meta-search engine Torrentz2 are also given a prime mention, with the USTR noting that they are “two of the most popular torrent sites that allegedly infringe U.S. content industry’s copyrights.” Russia’s RuTracker is also targeted for criticism, with the government noting that it’s now one of the most popular torrent sites in the world.

Streaming & Cyberlockers

While torrent sites are still important, the USTR reserves considerable space in its report for streaming portals and cyberlocker-type services.

4Shared.com, a file-hosting site that has been targeted by dozens of millions of copyright notices, is reportedly no longer able to use major US payment providers. Nevertheless, the British Virgin Islands company still collects significant sums from premium accounts, advertising, and offshore payment processors, USTR notes.

Cyberlocker Rapidgator gets another prominent mention in 2017, with the USTR noting that the Russian-hosted platform generates millions of dollars every year through premium memberships while employing rewards and affiliate schemes.

Due to its increasing popularity as a hosting and streaming operation, Openload.co (Romania) is now a big target for the USTR. “The site is used frequently in combination with add-ons in illicit streaming devices. In November 2017, users visited Openload.co a staggering 270 million times,” the USTR writes.

Owned by a Swiss company and hosted in the Netherlands, the popular site Uploaded is also criticized by the US alongside France’s 1Fichier.com, which allegedly hosts pirate games while being largely unresponsive to takedown notices. Dopefile.pk, a Pakistan-based storage outfit, is also highlighted.

On the video streaming front, it’s perhaps no surprise that the USTR focuses on sites like FMovies (Sweden), GoStream (Vietnam), Movie4K.tv (Russia) and PrimeWire. An organization collectively known as the MovShare group which encompasses Nowvideo.sx, WholeCloud.net, NowDownload.cd, MeWatchSeries.to and WatchSeries.ac, among others, is also listed.

Unauthorized music / research papers

While most of the above are either focused on video or feature it as part of their repertoire, other sites are listed for their attention to music. Convert2MP3.net is named as one of the most popular stream-ripping sites in the world and is highlighted due to the prevalence of YouTube-downloader sites and the 2017 demise of YouTube-MP3.

“Convert2MP3.net does not appear to have permission from YouTube or other sites and does not have permission from right holders for a wide variety of music represented by major U.S. labels,” the USTR notes.

Given the amount of attention the site has received in 2017 as ‘The Pirate Bay of Research’, Libgen.io and Sci-Hub.io (not to mention the endless proxy and mirror sites that facilitate access) are given a detailed mention in this year’s report.

“Together these sites make it possible to download — all without permission and without remunerating authors, publishers or researchers — millions of copyrighted books by commercial publishers and university presses; scientific, technical and medical journal articles; and publications of technological standards,” the USTR writes.

Service providers

But it’s not only sites that are being put under pressure. Following a growing list of nominations in previous years, Swiss service provider Private Layer is again singled out as a rogue player in the market for hosting 1337x.to and Torrentz2.eu, among others.

“While the exact configuration of websites changes from year to year, this is the fourth consecutive year that the List has stressed the significant international trade impact of Private Layer’s hosting services and the allegedly infringing sites it hosts,” the USTR notes.

“Other listed and nominated sites may also be hosted by Private Layer but are using
reverse proxy services to obfuscate the true host from the public and from law enforcement.”

The USTR notes Switzerland’s efforts to close a legal loophole that restricts enforcement and looks forward to a positive outcome when the draft amendment is considered by parliament.

Perhaps a little surprisingly given its recent anti-piracy efforts and overtures to the US, Russia’s leading social network VK.com again gets a place on the new list. The USTR recognizes VK’s efforts but insists that more needs to be done.

Social networking and e-commerce

“In 2016, VK reached licensing agreements with major record companies, took steps to limit third-party applications dedicated to downloading infringing content from the site, and experimented with content recognition technologies,” the USTR writes.

“Despite these positive signals, VK reportedly continues to be a hub of infringing activity and the U.S. motion picture industry reports that they find thousands of infringing files on the site each month.”

Finally, in addition to traditional pirate sites, the US also lists online marketplaces that allegedly fail to meet appropriate standards. Re-added to the list in 2016 after a brief hiatus in 2015, China’s Alibaba is listed again in 2017. The development provoked an angry response from the company.

Describing his company as a “scapegoat”, Alibaba Group President Michael Evans said that his platform had achieved a 25% drop in takedown requests and has even been removing infringing listings before they make it online.

“In light of all this, it’s clear that no matter how much action we take and progress we make, the USTR is not actually interested in seeing tangible results,” Evans said in a statement.

The full list of sites in the Notorious Markets Report 2017 (pdf) can be found below.

– 1fichier.com – (cyberlocker)
– 4shared.com – (cyberlocker)
– convert2mp3.net – (stream-ripper)
– Dhgate.com (e-commerce)
– Dopefile.pl – (cyberlocker)
– Firestorm-servers.com (pirate gaming service)
– Fmovies.is, Fmovies.se, Fmovies.to – (streaming)
– Gostream.is, Gomovies.to, 123movieshd.to (streaming)
– Indiamart.com (e-commerce)
– Kinogo.club, kinogo.co (streaming host, platform)
– Libgen.io, sci-hub.io, libgen.pw, sci-hub.cc, sci-hub.bz, libgen.info, lib.rus.ec, bookfi.org, bookzz.org, booker.org, booksc.org, book4you.org, bookos-z1.org, booksee.org, b-ok.org (research downloads)
– Movshare Group – Nowvideo.sx, wholecloud.net, auroravid.to, bitvid.sx, nowdownload.ch, cloudtime.to, mewatchseries.to, watchseries.ac (streaming)
– Movie4k.tv (streaming)
– MP3VA.com (music)
– Openload.co (cyberlocker / streaming)
– 1337x.to (torrent site)
– Primewire.ag (streaming)
– Torrentz2, Torrentz2.me, Torrentz2.is (torrent site)
– Rarbg.to (torrent site)
– Rebel (domain company)
– Repelis.tv (movie and TV linking)
– RuTracker.org (torrent site)
– Rapidgator.net (cyberlocker)
– Taobao.com (e-commerce)
– The Pirate Bay (torrent site)
– TVPlus, TVBrowser, Kuaikan (streaming apps and addons, China)
– Uploaded.net (cyberlocker)
– VK.com (social networking)

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

AWS Glue Now Supports Scala Scripts

Post Syndicated from Mehul Shah original https://aws.amazon.com/blogs/big-data/aws-glue-now-supports-scala-scripts/

We are excited to announce AWS Glue support for running ETL (extract, transform, and load) scripts in Scala. Scala lovers can rejoice because they now have one more powerful tool in their arsenal. Scala is the native language for Apache Spark, the underlying engine that AWS Glue offers for performing data transformations.

Beyond its elegant language features, writing Scala scripts for AWS Glue has two main advantages over writing scripts in Python. First, Scala is faster for custom transformations that do a lot of heavy lifting because there is no need to shovel data between Python and Apache Spark’s Scala runtime (that is, the Java virtual machine, or JVM). You can build your own transformations or invoke functions in third-party libraries. Second, it’s simpler to call functions in external Java class libraries from Scala because Scala is designed to be Java-compatible. It compiles to the same bytecode, and its data structures don’t need to be converted.

To illustrate these benefits, we walk through an example that analyzes a recent sample of the GitHub public timeline available from the GitHub archive. This site is an archive of public requests to the GitHub service, recording more than 35 event types ranging from commits and forks to issues and comments.

This post shows how to build an example Scala script that identifies highly negative issues in the timeline. It pulls out issue events in the timeline sample, analyzes their titles using the sentiment prediction functions from the Stanford CoreNLP libraries, and surfaces the most negative issues.

Getting started

Before we start writing scripts, we use AWS Glue crawlers to get a sense of the data—its structure and characteristics. We also set up a development endpoint and attach an Apache Zeppelin notebook, so we can interactively explore the data and author the script.

Crawl the data

The dataset used in this example was downloaded from the GitHub archive website into our sample dataset bucket in Amazon S3, and copied to the following locations:

s3://aws-glue-datasets-<region>/examples/scala-blog/githubarchive/data/

Choose the best folder by replacing <region> with the region that you’re working in, for example, us-east-1. Crawl this folder, and put the results into a database named githubarchive in the AWS Glue Data Catalog, as described in the AWS Glue Developer Guide. This folder contains 12 hours of the timeline from January 22, 2017, and is organized hierarchically (that is, partitioned) by year, month, and day.

When finished, use the AWS Glue console to navigate to the table named data in the githubarchive database. Notice that this data has eight top-level columns, which are common to each event type, and three partition columns that correspond to year, month, and day.

Choose the payload column, and you will notice that it has a complex schema—one that reflects the union of the payloads of event types that appear in the crawled data. Also note that the schema that crawlers generate is a subset of the true schema because they sample only a subset of the data.

Set up the library, development endpoint, and notebook

Next, you need to download and set up the libraries that estimate the sentiment in a snippet of text. The Stanford CoreNLP libraries contain a number of human language processing tools, including sentiment prediction.

Download the Stanford CoreNLP libraries. Unzip the .zip file, and you’ll see a directory full of jar files. For this example, the following jars are required:

  • stanford-corenlp-3.8.0.jar
  • stanford-corenlp-3.8.0-models.jar
  • ejml-0.23.jar

Upload these files to an Amazon S3 path that is accessible to AWS Glue so that it can load these libraries when needed. For this example, they are in s3://glue-sample-other/corenlp/.

Development endpoints are static Spark-based environments that can serve as the backend for data exploration. You can attach notebooks to these endpoints to interactively send commands and explore and analyze your data. These endpoints have the same configuration as that of AWS Glue’s job execution system. So, commands and scripts that work there also work the same when registered and run as jobs in AWS Glue.

To set up an endpoint and a Zeppelin notebook to work with that endpoint, follow the instructions in the AWS Glue Developer Guide. When you are creating an endpoint, be sure to specify the locations of the previously mentioned jars in the Dependent jars path as a comma-separated list. Otherwise, the libraries will not be loaded.

After you set up the notebook server, go to the Zeppelin notebook by choosing Dev Endpoints in the left navigation pane on the AWS Glue console. Choose the endpoint that you created. Next, choose the Notebook Server URL, which takes you to the Zeppelin server. Log in using the notebook user name and password that you specified when creating the notebook. Finally, create a new note to try out this example.

Each notebook is a collection of paragraphs, and each paragraph contains a sequence of commands and the output for that command. Moreover, each notebook includes a number of interpreters. If you set up the Zeppelin server using the console, the (Python-based) pyspark and (Scala-based) spark interpreters are already connected to your new development endpoint, with pyspark as the default. Therefore, throughout this example, you need to prepend %spark at the top of your paragraphs. In this example, we omit these for brevity.

Working with the data

In this section, we use AWS Glue extensions to Spark to work with the dataset. We look at the actual schema of the data and filter out the interesting event types for our analysis.

Start with some boilerplate code to import libraries that you need:

%spark

import com.amazonaws.services.glue.DynamicRecord
import com.amazonaws.services.glue.GlueContext
import com.amazonaws.services.glue.util.GlueArgParser
import com.amazonaws.services.glue.util.Job
import com.amazonaws.services.glue.util.JsonOptions
import com.amazonaws.services.glue.types._
import org.apache.spark.SparkContext

Then, create the Spark and AWS Glue contexts needed for working with the data:

@transient val spark: SparkContext = SparkContext.getOrCreate()
val glueContext: GlueContext = new GlueContext(spark)

You need the transient decorator on the SparkContext when working in Zeppelin; otherwise, you will run into a serialization error when executing commands.

Dynamic frames

This section shows how to create a dynamic frame that contains the GitHub records in the table that you crawled earlier. A dynamic frame is the basic data structure in AWS Glue scripts. It is like an Apache Spark data frame, except that it is designed and optimized for data cleaning and transformation workloads. A dynamic frame is well-suited for representing semi-structured datasets like the GitHub timeline.

A dynamic frame is a collection of dynamic records. In Spark lingo, it is an RDD (resilient distributed dataset) of DynamicRecords. A dynamic record is a self-describing record. Each record encodes its columns and types, so every record can have a schema that is unique from all others in the dynamic frame. This is convenient and often more efficient for datasets like the GitHub timeline, where payloads can vary drastically from one event type to another.

The following creates a dynamic frame, github_events, from your table:

val github_events = glueContext
                    .getCatalogSource(database = "githubarchive", tableName = "data")
                    .getDynamicFrame()

The getCatalogSource() method returns a DataSource, which represents a particular table in the Data Catalog. The getDynamicFrame() method returns a dynamic frame from the source.

Recall that the crawler created a schema from only a sample of the data. You can scan the entire dataset, count the rows, and print the complete schema as follows:

github_events.count
github_events.printSchema()

The result looks like the following:

The data has 414,826 records. As before, notice that there are eight top-level columns, and three partition columns. If you scroll down, you’ll also notice that the payload is the most complex column.

Run functions and filter records

This section describes how you can create your own functions and invoke them seamlessly to filter records. Unlike filtering with Python lambdas, Scala scripts do not need to convert records from one language representation to another, thereby reducing overhead and running much faster.

Let’s create a function that picks only the IssuesEvents from the GitHub timeline. These events are generated whenever someone posts an issue for a particular repository. Each GitHub event record has a field, “type”, that indicates the kind of event it is. The issueFilter() function returns true for records that are IssuesEvents.

def issueFilter(rec: DynamicRecord): Boolean = { 
    rec.getField("type").exists(_ == "IssuesEvent") 
}

Note that the getField() method returns an Option[Any] type, so you first need to check that it exists before checking the type.

You pass this function to the filter transformation, which applies the function on each record and returns a dynamic frame of those records that pass.

val issue_events =  github_events.filter(issueFilter)

Now, let’s look at the size and schema of issue_events.

issue_events.count
issue_events.printSchema()

It’s much smaller (14,063 records), and the payload schema is less complex, reflecting only the schema for issues. Keep a few essential columns for your analysis, and drop the rest using the ApplyMapping() transform:

val issue_titles = issue_events.applyMapping(Seq(("id", "string", "id", "string"),
                                                 ("actor.login", "string", "actor", "string"), 
                                                 ("repo.name", "string", "repo", "string"),
                                                 ("payload.action", "string", "action", "string"),
                                                 ("payload.issue.title", "string", "title", "string")))
issue_titles.show()

The ApplyMapping() transform is quite handy for renaming columns, casting types, and restructuring records. The preceding code snippet tells the transform to select the fields (or columns) that are enumerated in the left half of the tuples and map them to the fields and types in the right half.

Estimating sentiment using Stanford CoreNLP

To focus on the most pressing issues, you might want to isolate the records with the most negative sentiments. The Stanford CoreNLP libraries are Java-based and offer sentiment-prediction functions. Accessing these functions through Python is possible, but quite cumbersome. It requires creating Python surrogate classes and objects for those found on the Java side. Instead, with Scala support, you can use those classes and objects directly and invoke their methods. Let’s see how.

First, import the libraries needed for the analysis:

import java.util.Properties
import edu.stanford.nlp.ling.CoreAnnotations
import edu.stanford.nlp.neural.rnn.RNNCoreAnnotations
import edu.stanford.nlp.pipeline.{Annotation, StanfordCoreNLP}
import edu.stanford.nlp.sentiment.SentimentCoreAnnotations
import scala.collection.convert.wrapAll._

The Stanford CoreNLP libraries have a main driver that orchestrates all of their analysis. The driver setup is heavyweight, setting up threads and data structures that are shared across analyses. Apache Spark runs on a cluster with a main driver process and a collection of backend executor processes that do most of the heavy sifting of the data.

The Stanford CoreNLP shared objects are not serializable, so they cannot be distributed easily across a cluster. Instead, you need to initialize them once for every backend executor process that might need them. Here is how to accomplish that:

val props = new Properties()
props.setProperty("annotators", "tokenize, ssplit, parse, sentiment")
props.setProperty("parse.maxlen", "70")

object myNLP {
    lazy val coreNLP = new StanfordCoreNLP(props)
}

The properties tell the libraries which annotators to execute and how many words to process. The preceding code creates an object, myNLP, with a field coreNLP that is lazily evaluated. This field is initialized only when it is needed, and only once. So, when the backend executors start processing the records, each executor initializes the driver for the Stanford CoreNLP libraries only one time.

Next is a function that estimates the sentiment of a text string. It first calls Stanford CoreNLP to annotate the text. Then, it pulls out the sentences and takes the average sentiment across all the sentences. The sentiment is a double, from 0.0 as the most negative to 4.0 as the most positive.

def estimatedSentiment(text: String): Double = {
    if ((text == null) || (!text.nonEmpty)) { return Double.NaN }
    val annotations = myNLP.coreNLP.process(text)
    val sentences = annotations.get(classOf[CoreAnnotations.SentencesAnnotation])
    sentences.foldLeft(0.0)( (csum, x) => { 
        csum + RNNCoreAnnotations.getPredictedClass(x.get(classOf[SentimentCoreAnnotations.SentimentAnnotatedTree])) 
    }) / sentences.length
}

Now, let’s estimate the sentiment of the issue titles and add that computed field as part of the records. You can accomplish this with the map() method on dynamic frames:

val issue_sentiments = issue_titles.map((rec: DynamicRecord) => { 
    val mbody = rec.getField("title")
    mbody match {
        case Some(mval: String) => { 
            rec.addField("sentiment", ScalarNode(estimatedSentiment(mval)))
            rec }
        case _ => rec
    }
})

The map() method applies the user-provided function on every record. The function takes a DynamicRecord as an argument and returns a DynamicRecord. The code above computes the sentiment, adds it in a top-level field, sentiment, to the record, and returns the record.

Count the records with sentiment and show the schema. This takes a few minutes because Spark must initialize the library and run the sentiment analysis, which can be involved.

issue_sentiments.count
issue_sentiments.printSchema()

Notice that all records were processed (14,063), and the sentiment value was added to the schema.

Finally, let’s pick out the titles that have the lowest sentiment (less than 1.5). Count them and print out a sample to see what some of the titles look like.

val pressing_issues = issue_sentiments.filter(_.getField("sentiment").exists(_.asInstanceOf[Double] < 1.5))
pressing_issues.count
pressing_issues.show(10)

Next, write them all to a file so that you can handle them later. (You’ll need to replace the output path with your own.)

glueContext.getSinkWithFormat(connectionType = "s3", 
                              options = JsonOptions("""{"path": "s3://<bucket>/out/path/"}"""), 
                              format = "json")
            .writeDynamicFrame(pressing_issues)

Take a look in the output path, and you can see the output files.

Putting it all together

Now, let’s create a job from the preceding interactive session. The following script combines all the commands from earlier. It processes the GitHub archive files and writes out the highly negative issues:

import com.amazonaws.services.glue.DynamicRecord
import com.amazonaws.services.glue.GlueContext
import com.amazonaws.services.glue.util.GlueArgParser
import com.amazonaws.services.glue.util.Job
import com.amazonaws.services.glue.util.JsonOptions
import com.amazonaws.services.glue.types._
import org.apache.spark.SparkContext
import java.util.Properties
import edu.stanford.nlp.ling.CoreAnnotations
import edu.stanford.nlp.neural.rnn.RNNCoreAnnotations
import edu.stanford.nlp.pipeline.{Annotation, StanfordCoreNLP}
import edu.stanford.nlp.sentiment.SentimentCoreAnnotations
import scala.collection.convert.wrapAll._

object GlueApp {

    object myNLP {
        val props = new Properties()
        props.setProperty("annotators", "tokenize, ssplit, parse, sentiment")
        props.setProperty("parse.maxlen", "70")

        lazy val coreNLP = new StanfordCoreNLP(props)
    }

    def estimatedSentiment(text: String): Double = {
        if ((text == null) || (!text.nonEmpty)) { return Double.NaN }
        val annotations = myNLP.coreNLP.process(text)
        val sentences = annotations.get(classOf[CoreAnnotations.SentencesAnnotation])
        sentences.foldLeft(0.0)( (csum, x) => { 
            csum + RNNCoreAnnotations.getPredictedClass(x.get(classOf[SentimentCoreAnnotations.SentimentAnnotatedTree])) 
        }) / sentences.length
    }

    def main(sysArgs: Array[String]) {
        val spark: SparkContext = SparkContext.getOrCreate()
        val glueContext: GlueContext = new GlueContext(spark)

        val dbname = "githubarchive"
        val tblname = "data"
        val outpath = "s3://<bucket>/out/path/"

        val github_events = glueContext
                            .getCatalogSource(database = dbname, tableName = tblname)
                            .getDynamicFrame()

        val issue_events =  github_events.filter((rec: DynamicRecord) => {
            rec.getField("type").exists(_ == "IssuesEvent")
        })

        val issue_titles = issue_events.applyMapping(Seq(("id", "string", "id", "string"),
                                                         ("actor.login", "string", "actor", "string"), 
                                                         ("repo.name", "string", "repo", "string"),
                                                         ("payload.action", "string", "action", "string"),
                                                         ("payload.issue.title", "string", "title", "string")))

        val issue_sentiments = issue_titles.map((rec: DynamicRecord) => { 
            val mbody = rec.getField("title")
            mbody match {
                case Some(mval: String) => { 
                    rec.addField("sentiment", ScalarNode(estimatedSentiment(mval)))
                    rec }
                case _ => rec
            }
        })

        val pressing_issues = issue_sentiments.filter(_.getField("sentiment").exists(_.asInstanceOf[Double] < 1.5))

        glueContext.getSinkWithFormat(connectionType = "s3", 
                              options = JsonOptions(s"""{"path": "$outpath"}"""), 
                              format = "json")
                    .writeDynamicFrame(pressing_issues)
    }
}

Notice that the script is enclosed in a top-level object called GlueApp, which serves as the script’s entry point for the job. (You’ll need to replace the output path with your own.) Upload the script to an Amazon S3 location so that AWS Glue can load it when needed.

To create the job, open the AWS Glue console. Choose Jobs in the left navigation pane, and then choose Add job. Create a name for the job, and specify a role with permissions to access the data. Choose An existing script that you provide, and choose Scala as the language.

For the Scala class name, type GlueApp to indicate the script’s entry point. Specify the Amazon S3 location of the script.

Choose Script libraries and job parameters. In the Dependent jars path field, enter the Amazon S3 locations of the Stanford CoreNLP libraries from earlier as a comma-separated list (without spaces). Then choose Next.

No connections are needed for this job, so choose Next again. Review the job properties, and choose Finish. Finally, choose Run job to execute the job.

You can simply edit the script’s input table and output path to run this job on whatever GitHub timeline datasets that you might have.

Conclusion

In this post, we showed how to write AWS Glue ETL scripts in Scala via notebooks and how to run them as jobs. Scala has the advantage that it is the native language for the Spark runtime. With Scala, it is easier to call Scala or Java functions and third-party libraries for analyses. Moreover, data processing is faster in Scala because there’s no need to convert records from one language runtime to another.

You can find more example of Scala scripts in our GitHub examples repository: https://github.com/awslabs/aws-glue-samples. We encourage you to experiment with Scala scripts and let us know about any interesting ETL flows that you want to share.

Happy Glue-ing!

 


Additional Reading

If you found this post useful, be sure to check out Simplify Querying Nested JSON with the AWS Glue Relationalize Transform and Genomic Analysis with Hail on Amazon EMR and Amazon Athena.

 


About the Authors

Mehul Shah is a senior software manager for AWS Glue. His passion is leveraging the cloud to build smarter, more efficient, and easier to use data systems. He has three girls, and, therefore, he has no spare time.

 

 

 

Ben Sowell is a software development engineer at AWS Glue.

 

 

 

 
Vinay Vivili is a software development engineer for AWS Glue.

 

 

 

Susan Landau’s New Book: Listening In

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/01/susan_landaus_n.html

Susan Landau has written a terrific book on cybersecurity threats and why we need strong crypto. Listening In: Cybersecurity in an Insecure Age. It’s based in part on her 2016 Congressional testimony in the Apple/FBI case; it examines how the Digital Revolution has transformed society, and how law enforcement needs to — and can — adjust to the new realities. The book is accessible to techies and non-techies alike, and is strongly recommended.

And if you’ve already read it, give it a review on Amazon. Reviews sell books, and this one needs more of them.

Daniel Miessler on My Writings about IoT Security

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/01/daniel_miessler.html

Daniel Miessler criticizes my writings about IoT security:

I know it’s super cool to scream about how IoT is insecure, how it’s dumb to hook up everyday objects like houses and cars and locks to the internet, how bad things can get, and I know it’s fun to be invited to talk about how everything is doom and gloom.

I absolutely respect Bruce Schneier a lot for what he’s contributed to InfoSec, which makes me that much more disappointed with this kind of position from him.

InfoSec is full of those people, and it’s beneath people like Bruce to add their voices to theirs. Everyone paying attention already knows it’s going to be a soup sandwich — a carnival of horrors — a tragedy of mistakes and abuses of trust.

It’s obvious. Not interesting. Not novel. Obvious. But obvious or not, all these things are still going to happen.

I actually agree with everything in his essay. “We should obviously try to minimize the risks, but we don’t do that by trying to shout down the entire enterprise.” Yes, definitely.

I don’t think the IoT must be stopped. I do think that the risks are considerable, and will increase as these systems become more pervasive and susceptible to class breaks. And I’m trying to write a book that will help navigate this. I don’t think I’m the prophet of doom, and don’t want to come across that way. I’ll give the manuscript another read with that in mind.

Tech Companies Meet EC to Discuss Removal of Pirate & Illegal Content

Post Syndicated from Andy original https://torrentfreak.com/tech-companies-meet-ec-to-discuss-removal-of-pirate-illegal-content-180109/

Thousands perhaps millions of pieces of illegal content flood onto the Internet every single day, a problem that’s only increasing with each passing year.

In the early days of the Internet very little was done to combat the problem but with the rise of social media and millions of citizens using it to publish whatever they like – not least terrorist propaganda and racist speech – governments around the world are beginning to take notice.

Of course, running parallel is the multi-billion dollar issue of intellectual property infringement. Eighteen years on from the first wave of mass online piracy and the majority of popular movies, TV shows, games, software and books are still available to download.

Over the past couple of years and increasingly in recent months, there have been clear signs that the EU in particular wishes to collectively mitigate the spread of all illegal content – from ISIS videos to pirated Hollywood movies – with assistance from major tech companies.

Google, YouTube, Facebook and Twitter are all expected to do their part, with the looming stick of legislation behind the collaborative carrots, should they fail to come up with a solution.

To that end, five EU Commissioners – Dimitris Avramopoulos, Elżbieta Bieńkowska, Věra Jourová, Julian King and Mariya Gabriel – will meet today in Brussels with representatives of several online platforms to discuss progress made in dealing with the spread of the aforementioned material.

In a joint statement together with EC Vice-President Andrus Ansip, the Commissioners describe all illegal content as a threat to security, safety, and fundamental rights, demanding a “collective response – from all actors, including the internet industry.”

They note that online platforms have committed significant resources towards removing violent and extremist content, including via automated removal, but more needs to be done to tackle the issue.

“This is starting to achieve results. However, even if tens of thousands of pieces of illegal content have been taken down, there are still hundreds of thousands more out there,” the Commissioners writes.

“And removal needs to be speedy: the longer illegal material stays online, the greater its reach, the more it can spread and grow. Building on the current voluntary approach, more efforts and progress have to be made.”

The Commission says it is relying on online platforms such as Google and Facebook to “step up and speed up their efforts to tackle these threats quickly and comprehensively.” This should include closer cooperation with law enforcement, sharing of information with other online players, plus action to ensure that once taken down, illegal content does not simply reappear.

While it’s clear that that the EC would prefer to work collaboratively with the platforms to find a solution to the illegal content problem, as expected there’s the veiled threat of them being compelled by law to do so, should they fall short of their responsibilities.

“We will continue to promote cooperation with social media companies to detect and remove terrorist and other illegal content online, and if necessary, propose legislation to complement the existing regulatory framework,” the EC warns.

Today’s discussions run both in parallel and in tandem with others specifically targeted at intellectual property abuses. Late November the EC presented a set of new measures to ensure that copyright holders are well protected both online and in the physical realm.

A key aim is to focus on large-scale facilitators, such as pirate site operators, while cutting their revenue streams.

“The Commission seeks to deprive commercial-scale IP infringers of the revenue flows that make their criminal activity lucrative – this is the so-called ‘follow the money’ approach which focuses on the ‘big fish’ rather than individuals,” the Commission explained.

This presentation followed on the heels of a proposal last September which had the EC advocating the take-down-stay-down principle, with pirate content being taken down, automated filters ensuring infringement can be tackled proactively, with measures being taken against repeat infringers.

Again, the EC warned that should cooperation with Internet platforms fail to come up with results, future legislation cannot be ruled out.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Wanted: Fixed Assets Accountant

Post Syndicated from Yev original https://www.backblaze.com/blog/wanted-fixed-assets-accountant/

As Backblaze continues to grow, we’re expanding our accounting team! We’re looking for a seasoned Fixed Asset Accountant to help us with fixed assets and equipment leases.

Job Duties:

  • Maintain and review fixed assets.
  • Record fixed asset acquisitions and dispositions.
  • Review and update the detailed schedule of fixed assets and accumulated depreciation.
  • Calculate depreciation for all fixed assets.
  • Investigate the potential obsolescence of fixed assets.
  • Coordinate with Operations team data center asset dispositions.
  • Conduct periodic physical inventory counts of fixed assets. Work with Operations team on cycle counts.
  • Reconcile the balance in the fixed asset subsidiary ledger to the summary-level account in the general ledger.
  • Track company expenditures for fixed assets in comparison to the capital budget and management authorizations.
  • Prepare audit schedules relating to fixed assets, and assist the auditors in their inquiries.
  • Recommend to management any updates to accounting policies related to fixed assets.
  • Manage equipment leases.
  • Engage and negotiate acquisition of new equipment lease lines.
  • Overall control of original lease documentation and maintenance of master lease files.
  • Facilitate and track routing and execution of various lease related: agreements — documents/forms/lease documents.
  • Establish and maintain proper controls to track expirations, renewal options, and all other critical dates.
  • Perform other duties and special projects as assigned.

Qualifications:

  • 5-6 years relevant accounting experience.
  • Knowledge of inventory and cycle counting preferred.
  • Quickbooks, Excel, Word experience desired.
  • Organized, with excellent attention to detail, meticulous, quick-learner.
  • Good interpersonal skills and a team player.
  • Flexibility and ability to adapt and wear different hats.

Backblaze Employees Have:

  • Good attitude and willingness to do whatever it takes to get the job done.
  • Strong desire to work for a small, fast-paced company.
  • Desire to learn and adapt to rapidly changing technologies and work environment.
  • Comfortable with well-behaved pets in the office.

This position is located in San Mateo, California. Regular attendance in the office is expected. Backblaze is an Equal Opportunity Employer and we offer competitive salary and benefits, including our no policy vacation policy.

If This Sounds Like You:
Send an email to [email protected] with:

  1. Fixed Asset Accountant in the subject line
  2. Your resume attached
  3. An overview of your relevant experience

The post Wanted: Fixed Assets Accountant appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

New Book Coming in September: "Click Here to Kill Everybody"

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/01/new_book_coming.html

My next book is still on track for a September 2018 publication. Norton is still the publisher. The title is now Click Here to Kill Everybody: Peril and Promise on a Hyperconnected Planet, which I generally refer to as CH2KE.

The table of contents has changed since I last blogged about this, and it now looks like this:

  • Introduction: Everything is Becoming a Computer
  • Part 1: The Trends
    • 1. Computers are Still Hard to Secure
    • 2. Everyone Favors Insecurity
    • 3. Autonomy and Physical Agency Bring New Dangers
    • 4. Patching is Failing as a Security Paradigm
    • 5. Authentication and Identification are Getting Harder
    • 6. Risks are Becoming Catastrophic
  • Part 2: The Solutions
    • 7. What a Secure Internet+ Looks Like
    • 8. How We Can Secure the Internet+
    • 9. Government is Who Enables Security
    • 10. How Government Can Prioritize Defense Over Offense
    • 11. What’s Likely to Happen, and What We Can Do in Response
    • 12. Where Policy Can Go Wrong
    • 13. How to Engender Trust on the Internet+
  • Conclusion: Technology and Policy, Together

Two questions for everyone.

1. I’m not really happy with the subtitle. It needs to be descriptive, to counterbalance the admittedly clickbait title. It also needs to telegraph: “everyone needs to read this book.” I’m taking suggestions.

2. In the book I need a word for the Internet plus the things connected to it plus all the data and processing in the cloud. I’m using the word “Internet+,” and I’m not really happy with it. I don’t want to invent a new word, but I need to strongly signal that what’s coming is much more than just the Internet — and I can’t find any existing word. Again, I’m taking suggestions.

Start off the New Year by earning AWS Certified Solutions Architect – Associate

Post Syndicated from Janna Pellegrino original https://aws.amazon.com/blogs/architecture/start-off-the-new-year-by-earning-aws-certified-solutions-architect-associate/

Do you design applications and systems on AWS? Want to demonstrate your AWS Cloud skills? Ring in 2018 by becoming an AWS Certified Solutions Architect – Associate. It’s a way to validate your expertise with an industry-recognized credential and give your career a boost.

Why get certified, you ask? According to the 2017 Global Knowledge IT Skills and Salary Report, cloud certifications, including AWS Certified Solutions Architect – Associate, generally have salaries well above average. For example, a typical U.S. salary for AWS Certified IT staff is 27.5 percent higher than the normal salary rate. Looking ahead, the report also finds that the IT industry will continue investing heavily in certification as a way to validating employees’ skills and expertise.

Here are our tips for preparing for the AWS Certified Solutions Architect – Associate exam—which we hope you’ll pass with flying colors.

Learn About the Exam

View the AWS Certified Solutions Architect – Associate Exam Guide. It covers concepts within the exam and gives you a blueprint of what you need to study.

The exam tests your technical expertise in designing and deploying scalable, highly-available, and fault-tolerant systems on AWS. It’s for anyone with one or more years of hands-on experience designing distributed applications and systems on the AWS platform.

Continue with Digital and Classroom Training

Next, brush up on key AWS services covered in the exam with our new free digital training offerings at aws.training. Our 100+ bite-sized online courses are each 10 minutes long so you learn AWS fundamentals at your own pace.

Just getting started learning the fundamentals of the AWS Cloud? We recommend you take our AWS Cloud Practitioner Essentials course, part of our free digital training offerings.

For more in-depth technical training, register for our immersive Architecting on AWS course. It’s three days of instructor-led classroom training, books, and labs, built and taught by AWS experts.

Study with Exam Prep Resources

Once you have an idea of what’s on the exam, and you’ve taken training to prepare, it’s time to prepare for the exam itself.

Dig deeper into the exam’s concepts and topics with the AWS Certified Solutions Architect – Associate Exam: Official Study Guide. It provides access to content written by AWS experts, real-world knowledge, key exam essentials, chapter review questions, an interactive online learning environment, and much more.

Next, study AWS whitepapers and FAQs with content related to the exam. You can find links to our suggested whitepapers at FAQs at https://aws.amazon.com/certification/certification-prep/ under the Solutions Architect – Associate tab.

You can also take an Exam Prep Workshop and learn exam strategies from a certified technical instructor.

Once you’re ready, put your knowledge to the (practice) test with sample questions. Register for an online practice exam to test your knowledge in a timed environment.

Schedule Your Exam and Get Certified

Now you’re ready to take the exam! Go to aws.training to schedule an exam at a testing center near you at. Once you’ve passed and are AWS Certified, you’ll enjoy AWS Certification benefits like access to the AWS Certified LinkedIn Community, invitations to AWS Certification Appreciation Receptions, digital AWS Certified badges, access to AWS Certified merchandise, and more.

Learn More

Visit us at aws.amazon.com/training for more information on digital training, classroom training, and AWS Certifications.

Simplify Querying Nested JSON with the AWS Glue Relationalize Transform

Post Syndicated from Trevor Roberts original https://aws.amazon.com/blogs/big-data/simplify-querying-nested-json-with-the-aws-glue-relationalize-transform/

AWS Glue has a transform called Relationalize that simplifies the extract, transform, load (ETL) process by converting nested JSON into columns that you can easily import into relational databases. Relationalize transforms the nested JSON into key-value pairs at the outermost level of the JSON document. The transformed data maintains a list of the original keys from the nested JSON separated by periods.

Let’s look at how Relationalize can help you with a sample use case.

An example of Relationalize in action

Suppose that the developers of a video game want to use a data warehouse like Amazon Redshift to run reports on player behavior based on data that is stored in JSON. Sample 1 shows example user data from the game. The player named “user1” has characteristics such as race, class, and location in nested JSON data. Further down, the player’s arsenal information includes additional nested JSON data. If the developers want to ETL this data into their data warehouse, they might have to resort to nested loops or recursive functions in their code.

Sample 1: Nested JSON

{
	"player": {
		"username": "user1",
		"characteristics": {
			"race": "Human",
			"class": "Warlock",
			"subclass": "Dawnblade",
			"power": 300,
			"playercountry": "USA"
		},
		"arsenal": {
			"kinetic": {
				"name": "Sweet Business",
				"type": "Auto Rifle",
				"power": 300,
				"element": "Kinetic"
			},
			"energy": {
				"name": "MIDA Mini-Tool",
				"type": "Submachine Gun",
				"power": 300,
				"element": "Solar"
			},
			"power": {
				"name": "Play of the Game",
				"type": "Grenade Launcher",
				"power": 300,
				"element": "Arc"
			}
		},
		"armor": {
			"head": "Eye of Another World",
			"arms": "Philomath Gloves",
			"chest": "Philomath Robes",
			"leg": "Philomath Boots",
			"classitem": "Philomath Bond"
		},
		"location": {
			"map": "Titan",
			"waypoint": "The Rig"
		}
	}
}

Instead, the developers can use the Relationalize transform. Sample 2 shows what the transformed data looks like.

Sample 2: Flattened JSON

{
    "player.username": "user1",
    "player.characteristics.race": "Human",
    "player.characteristics.class": "Warlock",
    "player.characteristics.subclass": "Dawnblade",
    "player.characteristics.power": 300,
    "player.characteristics.playercountry": "USA",
    "player.arsenal.kinetic.name": "Sweet Business",
    "player.arsenal.kinetic.type": "Auto Rifle",
    "player.arsenal.kinetic.power": 300,
    "player.arsenal.kinetic.element": "Kinetic",
    "player.arsenal.energy.name": "MIDA Mini-Tool",
    "player.arsenal.energy.type": "Submachine Gun",
    "player.arsenal.energy.power": 300,
    "player.arsenal.energy.element": "Solar",
    "player.arsenal.power.name": "Play of the Game",
    "player.arsenal.power.type": "Grenade Launcher",
    "player.arsenal.power.power": 300,
    "player.arsenal.power.element": "Arc",
    "player.armor.head": "Eye of Another World",
    "player.armor.arms": "Philomath Gloves",
    "player.armor.chest": "Philomath Robes",
    "player.armor.leg": "Philomath Boots",
    "player.armor.classitem": "Philomath Bond",
    "player.location.map": "Titan",
    "player.location.waypoint": "The Rig"
}

You can then write the data to a database or to a data warehouse. You can also write it to delimited text files, such as in comma-separated value (CSV) format, or columnar file formats such as Optimized Row Columnar (ORC) format. You can use either of these format types for long-term storage in Amazon S3. Storing the transformed files in S3 provides the additional benefit of being able to query this data using Amazon Athena or Amazon Redshift Spectrum. You can further extend the usefulness of the data by performing joins between data stored in S3 and the data stored in an Amazon Redshift data warehouse.

Before we get started…

In my example, I took two preparatory steps that save some time in your ETL code development:

  1. I stored my data in an Amazon S3 bucket and used an AWS Glue crawler to make my data available in the AWS Glue data catalog. You can find instructions on how to do that in Cataloging Tables with a Crawler in the AWS Glue documentation. The AWS Glue database name I used was “blog,” and the table name was “players.” You can see these values in use in the sample code that follows.
  2. I deployed a Zeppelin notebook using the automated deployment available within AWS Glue. If you already used an AWS Glue development endpoint to deploy a Zeppelin notebook, you can skip the deployment instructions. Otherwise, let’s quickly review how to deploy Zeppelin.

Deploying a Zeppelin notebook with AWS Glue

The following steps are outlined in the AWS Glue documentation, and I include a few screenshots here for clarity.

First, create two IAM roles:

Next, in the AWS Glue Management Console, choose Dev endpoints, and then choose Add endpoint.

Specify a name for the endpoint and the AWS Glue IAM role that you created.

On the networking screen, choose Skip Networking because our code only communicates with S3.

Complete the development endpoint process by providing a Secure Shell (SSH) public key and confirming your settings.

When your new development endpoint’s Provisioning status changes from PROVISIONING to READY, choose your endpoint, and then for Actions choose Create notebook server.

Enter the notebook server details, including the role you previously created and a security group with inbound access allowed on TCP port 443.

Doing this automatically launches an AWS CloudFormation template. The output specifies the URL that you can use to access your Zeppelin notebook with the username and password you specified in the wizard.

How do we flatten nested JSON?

With my data loaded and my notebook server ready, I accessed Zeppelin, created a new note, and set my interpreter to spark. I used some Python code that AWS Glue previously generated for another job that outputs to ORC. Then I added the Relationalize transform. You can see the resulting Python code in Sample 3.­

Sample 3: Python code to transform the nested JSON and output it to ORC

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
#from awsglue.transforms import Relationalize

# Begin variables to customize with your information
glue_source_database = "blog"
glue_source_table = "players"
glue_temp_storage = "s3://blog-example-edz/temp"
glue_relationalize_output_s3_path = "s3://blog-example-edz/output-flat"
dfc_root_table_name = "root" #default value is "roottable"
# End variables to customize with your information

glueContext = GlueContext(spark.sparkContext)
datasource0 = glueContext.create_dynamic_frame.from_catalog(database = glue_source_database, table_name = glue_source_table, transformation_ctx = "datasource0")
dfc = Relationalize.apply(frame = datasource0, staging_path = glue_temp_storage, name = dfc_root_table_name, transformation_ctx = "dfc")
blogdata = dfc.select(dfc_root_table_name)
blogdataoutput = glueContext.write_dynamic_frame.from_options(frame = blogdata, connection_type = "s3", connection_options = {"path": glue_relationalize_output_s3_path}, format = "orc", transformation_ctx = "blogdataoutput")

What exactly is going on in this script?

After the import statements, we instantiate a GlueContext object, which allows us to work with the data in AWS Glue. Next, we create a DynamicFrame (datasource0) from the “players” table in the AWS Glue “blog” database. We use this DynamicFrame to perform any necessary operations on the data structure before it’s written to our desired output format. The source files remain unchanged.

We then run the Relationalize transform (Relationalize.apply()) with our datasource0 as one of the parameters. Another important parameter is the name parameter, which is a key that identifies our data after the transformation completes.

The Relationalize.apply() method returns a DynamicFrameCollection, and this is stored in the dfc variable. Before we can write our data to S3, we need to select the DynamicFrame from the DynamicFrameCollection object. We do this with the dfc.select() method. The correct DynamicFrame is stored in the blogdata variable.

You might be curious why a DynamicFrameCollection was returned when we started with a single DynamicFrame. This return value comes from the way Relationalize treats arrays in the JSON document: A DynamicFrame is created for each array. Together with the root data structure, each generated DynamicFrame is added to a DynamicFrameCollection when Relationalize completes its work. Although we didn’t have any arrays in our data, it’s good to keep this in mind. Finally, we output (blogdataoutput) the root DynamicFrame to ORC files in S3.

Using the transformed data

One of the use cases we discussed earlier was using Amazon Athena or Amazon Redshift Spectrum to query the ORC files.

I used the following SQL DDL statements to create external tables in both services to enable queries of my data stored in Amazon S3.

Sample 4: Amazon Athena DDL

CREATE EXTERNAL TABLE IF NOT EXISTS blog.blog_data_athena_test (
  `characteristics_race` string,
  `characteristics_class` string,
  `characteristics_subclass` string,
  `characteristics_power` int,
  `characteristics_playercountry` string,
  `kinetic_name` string,
  `kinetic_type` string,
  `kinetic_power` int,
  `kinetic_element` string,
  `energy_name` string,
  `energy_type` string,
  `energy_power` int,
  `energy_element` string,
  `power_name` string,
  `power_type` string,
  `power_power` int,
  `power_element` string,
  `armor_head` string,
  `armor_arms` string,
  `armor_chest` string,
  `armor_leg` string,
  `armor_classitem` string,
  `map` string,
  `waypoint` string 
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
WITH SERDEPROPERTIES (
  'serialization.format' = '1'
) LOCATION 's3://blog-example-edz/output-flat/'
TBLPROPERTIES ('has_encrypted_data'='false');

 

Sample 5: Amazon Redshift Spectrum DDL

-- Create a Schema
-- A single schema can be used with multiple external tables.
-- This step is only required once for the external tables you create.
create external schema spectrum 
from data catalog 
database 'blog' 
iam_role 'arn:aws:iam::0123456789:role/redshift-role'
create external database if not exists;

-- Create an external table in the schema
create external table spectrum.blog(
  username VARCHAR,
  characteristics_race VARCHAR,
  characteristics_class VARCHAR,
  characteristics_subclass VARCHAR,
  characteristics_power INTEGER,
  characteristics_playercountry VARCHAR,
  kinetic_name VARCHAR,
  kinetic_type VARCHAR,
  kinetic_power INTEGER,
  kinetic_element VARCHAR,
  energy_name VARCHAR,
  energy_type VARCHAR,
  energy_power INTEGER,
  energy_element VARCHAR,
  power_name VARCHAR,
  power_type VARCHAR,
  power_power INTEGER,
  power_element VARCHAR,
  armor_head VARCHAR,
  armor_arms VARCHAR,
  armor_chest VARCHAR,
  armor_leg VARCHAR,
  armor_classItem VARCHAR,
  map VARCHAR,
  waypoint VARCHAR)
stored as orc
location 's3://blog-example-edz/output-flat';

I even ran a query, shown in Sample 6, that joined my Redshift Spectrum table (spectrum.playerdata) with data in an Amazon Redshift table (public.raids) to generate advanced reports. In the where clause, I join the two tables based on the username values that are common to both data sources.

Sample 6: Select statement with a join of Redshift Spectrum data with Amazon Redshift data

-- Get Total Raid Completions for the Hunter Class.
select spectrum.playerdata.characteristics_class as class, sum(public.raids."completions.val.raids.leviathan") as "Total Hunter Leviathan Raid Completions" from spectrum.playerdata, public.raids
where spectrum.playerdata.username = public.raids."completions.val.username"
and spectrum.playerdata.characteristics_class = 'Hunter'
group by spectrum.playerdata.characteristics_class;

Summary

This post demonstrated how simple it can be to flatten nested JSON data with AWS Glue, using the Relationalize transform to automate the conversion of nested JSON. AWS Glue also automates the deployment of Zeppelin notebooks that you can use to develop your Python automation script. Finally, AWS Glue can output the transformed data directly to a relational database, or to files in Amazon S3 for further analysis with tools such as Amazon Athena and Amazon Redshift Spectrum.

As great as Relationalize is, it’s not the only transform available with AWS Glue. You can see a complete list of available transforms in Built-In Transforms in the AWS Glue documentation. Try them out today!


Additional Reading

If you found this post useful, be sure to check out Using Amazon Redshift Spectrum, Amazon Athena and AWS Glue with Node.js in Production and Build a Data Lake Foundation with AWS Glue and Amazon S3.


About the Author

Trevor Roberts Jr is a Solutions Architect with AWS. He provides architectural guidance to help customers achieve success in the cloud. In his spare time, Trevor enjoys traveling to new places and spending time with family.