Tag Archives: USTR

Subtitle Heroes: Fansubbing Movie Criticized For Piracy Promotion

Post Syndicated from Andy original https://torrentfreak.com/subtitle-heroes-fansubbing-movie-criticized-for-piracy-promotion-180217/

With many thousands of movies and TV shows being made available illegally online every year, a significant number will be enjoyed by speakers of languages other than that presented in the original production.

When Hollywood blockbusters appear online, small armies of individuals around the world spring into action, translating the dialog into Chinese and Czech, Dutch and Danish, French and Farsi, Russian and Romanian, plus a dozen languages in between. TV shows, particularly those produced in the US, get the same immediate treatment.

For many years, subtitling (‘fansubbing’) communities have provided an incredible service to citizens around the globe, from those seeking to experience new culture and languages to the hard of hearing and profoundly deaf. Now, following in the footsteps of movies like TPB:AFK and Kim Dotcom: Caught in the Web, a new movie has premiered in Italy which celebrates this extraordinary movement.

Subs Heroes from writer and director Franco Dipietro hit cinemas at the end of January. It documents the contribution fansubbing has made to Italian culture in a country that under fascism in 1934 banned the use of foreign languages in films, books, newspapers and everyday speech.

The movie centers on the large subtitle site ItalianSubs.net. Founded by a group of teenagers in 2006, it is now run by a team of men and women who maintain their identities as regular citizens during the day but transform into “superheroes of fansubbing” at night.

Needless to say, not everyone is pleased with this depiction of the people behind the now-infamous 500,000 member site.

For many years, fansubbing attracted very little heat but over time anti-piracy groups have been turning up the pressure, accusing subtitling teams of fueling piracy. This notion is shared by local anti-piracy outfit FAPAV (Federation for the Protection of Audiovisual and Multimedia Content), which has accused Dipietro’s movie of glamorizing criminal activity.

In a statement following the release of Subs Heroes, FAPAV made its position crystal clear: sites like ItalianSubs do not contribute to the development of the audiovisual market in Italy.

“It is necessary to clarify: when a protected work is subtitled and there is no right to do so, a crime is committed,” the anti-piracy group says.

“[Italiansubs] translates and makes available subtitles of audiovisual works (films and television series) in many cases not yet distributed on the Italian market. All this without having requested the consent of the rights holders. Ergo the Italiansubs community is illegal.”

Italiansubs (note ad for movie, top right)

FAPAV General Secretary Federico Bagnoli Rossi says that the impact that fansubbers have on the market is significant, causing damage not only to companies distributing the content but also to those who invest in official translations.

The fact that fansubbers often translate content that is not yet available in the region only compounds matters, Rossi says, noting that unofficial translations can also have “direct consequences” on those who have language dubbing as an occupation.

“The audiovisual market today needs to be supported and the protection and fight against illicit behaviors are as fundamental as investments and creative ideas,” Rossi notes.

“Everyone must do their part, respecting the rules and with a competitive and global cultural vision. There are no ‘superheroes’ or noble goals behind piracy, but only great damage to the audiovisual sector and all its workers.”

Also piling on the criticism is the chief of the National Cinema Exhibitors’ Association, who wrote to all of the companies involved to remind them that unauthorized subtitling is a crime. According to local reports, there seems to be an underlying tone that people should avoid becoming associated with the movie.

This did not please director Franco Dipietro who is defending his right to document the fansubbing movement, whether the industry likes it or not.

“We invite those who perhaps think differently to deepen the discussion and maybe organize an event to talk about it together. The film is made to confront and talk about a phenomenon that, whether we like it or not, exists and we can not pretend that it is not there,” Dipietro concludes.



Subs Heroes Trailer 1 from Duel: on Vimeo.



Subs Heroes Trailer 2 from Duel: on Vimeo.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Major US Sports Leagues Report Top Piracy Nations to Government

Post Syndicated from Ernesto original https://torrentfreak.com/major-us-sports-leagues-report-top-piracy-nations-to-government-180216/

While pirated Hollywood blockbusters often score the big headlines, there are several other industries that have been battling with piracy over the years. This includes sports organizations.

Many of the major US leagues including the NBA, NFL, NHL, MLB and the Tennis Association, are bundling their powers in the Sports Coalition, to try and curb the availability of pirated streams and videos.

A few days ago the Sports Coalition put the piracy problem on the agenda of the United States Trade Representative (USTR).

“Sports organizations, including Sports Coalition members, are heavily affected by live sports telecast piracy, including the unauthorized live retransmission of sports telecasts over the Internet,” the Sports Coalition wrote.

“The Internet piracy of live sports telecasts is not only a persistent problem, but also a global one, often involving bad actors in more than one nation.”

The USTR asked the public for comments on which countries play a central role in copyright infringement issues. In its response, the Sports Coalition stresses that piracy is a global issue but singles out several nations as particularly problematic.

The coalition recommends that the USTR should put the Netherlands and Switzerland on the “Priority Watch List” of its 2018 Special 301 Report, followed by Russia, Saudi Arabia, Seychelles and Sweden, which get a regular “Watch List” recommendation.

The main problem with these countries is that hosting providers and content distribution networks don’t do enough to curb piracy.

In the Netherlands, sawlive.tv, strikezoneme, wizlnet, AltusHost, Host Palace, Quasi Networks and SNEL pirated or provided services contributing to sports piracy, the coalition writes. In Switzerland, mlbstreamme, robinwidgetorg, strikeoutmobi, BlackHOST, Private Layer and Solar Communications are doing the same.

According to the major sports leagues, the US Government should encourage these countries to step up their anti-piracy game. This is not only important for US copyright holders, but also for licensees in other countries.

“Clearly, there is common ground – both in terms of shared economic interests and legal obligations to protect and enforce intellectual property and related rights – for the United States and the nations with which it engages in international trade to work cooperatively to stop Internet piracy of sports programming.”

Whether any of these countries will make it into the USTR’s final list has yet to be seen. For Switzerland it wouldn’t be the first time but for the Netherlands it would be new, although it has been considered before.

A document we received through a FOIA request earlier this year revealed that the US Embassy reached out to the Dutch Government in the past, to discuss similar complaints from the Sports Coalition.

The same document also revealed that local anti-piracy group BREIN consistently urged the entertainment industries it represents not to advocate placing the Netherlands on the 301 Watch List but to solve the problems behind the scenes instead.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

This IoT Pet Monitor barks back

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/iot-pet-monitor/

Jennifer Fox, founder of FoxBot Industries, uses a Raspberry Pi pet monitor to check the sound levels of her home while she is out, allowing her to keep track of when her dog Marley gets noisy or agitated, and to interact with the gorgeous furball accordingly.

Bark Back Project Demo

A quick overview and demo of the Bark Back, a project to monitor and interact with Check out the full tutorial here: https://learn.sparkfun.com/tutorials/bark-back-interactive-pet-monitor For any licensing requests please contact [email protected]

Marley, bark!

Using a Raspberry Pi 3, speakers, SparkFun’s MEMS microphone breakout board, and an analogue-to-digital converter (ADC), the IoT Pet Monitor is fairly easy to recreate, all thanks to Jennifer’s full tutorial on the FoxBot website.

Building the pet monitor

In a nutshell, once the Raspberry Pi and the appropriate bits and pieces are set up, you’ll need to sign up at CloudMQTT — it’s free if you select the Cute Cat account. CloudMQTT will create an invisible bridge between your home and wherever you are that isn’t home, so that you can check in on your pet monitor.

Screenshot CloudMQTT account set-up — IoT Pet Monitor Bark Back Raspberry Pi

Image c/o FoxBot Industries

Within the project code, you’ll be able to calculate the peak-to-peak amplitude of sound the microphone picks up. Then you can decide how noisy is too noisy when it comes to the occasional whine and bark of your beloved pup.

MEMS microphone breakout board — IoT Pet Monitor Bark Back Raspberry Pi

The MEMS microphone breakout board collects sound data and relays it back to the Raspberry Pi via the ADC.
Image c/o FoxBot Industries

Next you can import sounds to a preset song list that will be played back when the volume rises above your predefined threshold. As Jennifer states in the tutorial, the sounds can easily be recorded via apps such as Garageband, or even on your mobile phone.

Using the pet monitor

Whenever the Bark Back IoT Pet Monitor is triggered to play back audio, this information is fed to the CloudMQTT service, allowing you to see if anything is going on back home.

A sitting dog with a doll in its mouth — IoT Pet Monitor Bark Back Raspberry Pi

*incoherent coos of affection from Alex*
Image c/o FoxBot Industries

And as Jennifer recommends, a update of the project could include a camera or sensors to feed back more information about your home environment.

If you’ve created something similar, be sure to let us know in the comments. And if you haven’t, but you’re now planning to build your own IoT pet monitor, be sure to let us know in the comments. And if you don’t have a pet but just want to say hi…that’s right, be sure to let us know in the comments.

The post This IoT Pet Monitor barks back appeared first on Raspberry Pi.

Court Orders Spanish ISPs to Block Pirate Sites For Hollywood

Post Syndicated from Andy original https://torrentfreak.com/court-orders-spanish-isps-to-block-pirate-sites-for-hollywood-180216/

Determined to reduce levels of piracy globally, Hollywood has become one of the main proponents of site-blocking on the planet. To date there have been multiple lawsuits in far-flung jurisdictions, with Europe one of the primary targets.

Following complaints from Disney, 20th Century Fox, Paramount, Sony, Universal and Warner, Spain has become one of the latest targets. According to the studios a pair of sites – HDFull.tv and Repelis.tv – infringe their copyrights on a grand scale and need to be slowed down by preventing users from accessing them.

HDFull is a platform that provides movies and TV shows in both Spanish and English. Almost 60% its traffic comes from Spain and after a huge surge in visitors last July, it’s now the 337th most popular site in the country according to Alexa. Visitors from Mexico, Argentina, United States and Chile make up the rest of its audience.

Repelis.tv is a similar streaming portal specializing in movies, mainly in Spanish. A third of the site’s visitors hail from Mexico with the remainder coming from Argentina, Columbia, Spain and Chile. In common with HDFull, Repelis has been building its visitor numbers quickly since 2017.

The studios demanding more blocks

With a ruling in hand from the European Court of Justice which determined that sites can be blocked on copyright infringement grounds, the studios asked the courts to issue an injunction against several local ISPs including Telefónica, Vodafone, Orange and Xfera. In an order handed down this week, Barcelona Commercial Court No. 6 sided with the studios and ordered the ISPs to begin blocking the sites.

“They damage the legitimate rights of those who own the films and series, which these pages illegally display and with which they profit illegally through the advertising revenues they generate,” a statement from the Spanish Federation of Cinematographic Distributors (FEDECINE) reads.

FEDECINE General director Estela Artacho said that changes in local law have helped to provide the studios with a new way to protect audiovisual content released in Spain.

“Thanks to the latest reform of the Civil Procedure Law, we have in this jurisdiction a new way to exercise different possibilities to protect our commercial film offering,” Artacho said.

“Those of us who are part of this industry work to make culture accessible and offer the best cinematographic experience in the best possible conditions, guaranteeing the continuity of the sector.”

The development was also welcomed by Stan McCoy, president of the Motion Picture Association’s EMEA division, which represents the plaintiffs in the case.

“We have just taken a welcome step which we consider crucial to face the problem of piracy in Spain,” McCoy said.

“These actions are necessary to maintain the sustainability of the creative community both in Spain and throughout Europe. We want to ensure that consumers enjoy the entertainment offer in a safe and secure environment.”

After gaining experience from blockades and subsequent circumvention in other regions, the studios seem better prepared to tackle fallout in Spain. In addition to blocking primary domains, the ruling handed down by the court this week also obliges ISPs to block any other domain, subdomain or IP address whose purpose is to facilitate access to the blocked platforms.

News of Spain’s ‘pirate’ blocks come on the heels of fresh developments in Germany, where this week a court ordered ISP Vodafone to block KinoX, one of the country’s most popular streaming portals.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Australian Government Launches Pirate Site-Blocking Review

Post Syndicated from Andy original https://torrentfreak.com/australian-government-launches-pirate-site-blocking-review-180214/

Following intense pressure from entertainment industry groups, in 2014 Australia began developing legislation which would allow ‘pirate’ sites to be blocked at the ISP level.

In March 2015 the Copyright Amendment (Online Infringement) Bill 2015 (pdf) was introduced to parliament and after just three months of consideration, the Australian Senate passed the legislation into law.

Soon after, copyright holders began preparing their first cases and in December 2016, the Australian Federal Court ordered dozens of local Internet service providers to block The Pirate Bay, Torrentz, TorrentHound, IsoHunt, SolarMovie, plus many proxy and mirror services.

Since then, more processes have been launched establishing site-blocking as a permanent fixture on the Aussie anti-piracy agenda. But with yet more applications for injunction looming on the horizon, how is the mechanism performing and does anything else need to be done to improve or amend it?

Those are the questions now being asked by the responsible department of the Australian Government via a consultation titled Review of Copyright Online Infringement Amendment. The review should’ve been carried out 18 months after the law’s introduction in 2015 but the department says that it delayed the consultation to let more evidence emerge.

“The Department of Communications and the Arts is seeking views from stakeholders on the questions put forward in this paper. The Department welcomes single, consolidated submissions from organizations or parties, capturing all views on the Copyright Amendment (Online Infringement) Act 2015 (Online Infringement Amendment),” the consultation paper begins.

The three key questions for response are as follows:

– How effective and efficient is the mechanism introduced by the Online Infringement Amendment?

– Is the application process working well for parties and are injunctions operating well, once granted?

– Are any amendments required to improve the operation of the Online Infringement Amendment?

Given the tendency for copyright holders to continuously demand more bang for their buck, it will perhaps come as a surprise that at least for now there is a level of consensus that the system is working as planned.

“Case law and survey data suggests the Online Infringement Amendment has enabled copyright owners to work with [Internet service providers] to reduce large-scale online copyright infringement. So far, it appears that copyright owners and [ISPs] find the current arrangement acceptable, clear and effective,” the paper reads.

Thus far under the legislation there have been four applications for injunctions through the Federal Court, notably against leading torrent indexes and browser-based streaming sites, which were both granted.

The other two processes, which began separately but will be heard together, at least in part, involve the recent trend of set-top box based streaming.

Village Roadshow, Disney, Universal, Warner Bros, Twentieth Century Fox, and Paramount are currently presenting their case to the Federal Court. Along with Hong Kong-based broadcaster Television Broadcasts Limited (TVB), which has a separate application, the companies have been told to put together quality evidence for an April 2018 hearing.

With these applications already in the pipeline, yet more are on the horizon. The paper notes that more applications are expected to reach the Federal Court shortly, with the Department of Communications monitoring to assess whether current arrangements are refined as additional applications are filed.

Thus far, however, steady progress appears to have been made. The paper cites various precedents established as a result of the blocking process including the use of landing pages to inform Internet users why sites are blocked and who is paying.

“Either a copyright owner or [ISP] can establish a landing page. If an [ISP] wishes to avoid the cost of its own landing page, it can redirect customers to one that the copyright owner would provide. Another precedent allocates responsibility for compliance costs. Cases to date have required copyright owners to pay all or a significant proportion of compliance costs,” the paper notes.

But perhaps the issue of most importance is whether site-blocking as a whole has had any effect on the levels of copyright infringement in Australia.

The Government says that research carried out by Kantar shows that downloading “fell slightly from 2015 to 2017” with a 5-10% decrease in individuals consuming unlicensed content across movies, music and television. It’s worth noting, however, that Netflix didn’t arrive on Australian shores until May 2015, just a month before the new legislation was passed.

Research commissioned by the Department of Communications and published a year later in 2016 (pdf) found that improved availability of legal streaming alternatives was the main contributor to falling infringement rates. In a juicy twist, the report also revealed that Aussie pirates were the entertainment industries’ best customers.

“The Department is aware that other factors — such as the increasing availability of television, music and film streaming services and of subscription gaming services — may also contribute to falling levels of copyright infringement,” the paper notes.

Submissions to the consultation (pdf) are invited by 5.00 pm AEST on Friday 16 March 2018 via the government’s website.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

EFF Urges US Copyright Office To Reject Proactive ‘Piracy’ Filters

Post Syndicated from Andy original https://torrentfreak.com/eff-urges-us-copyright-office-to-reject-proactive-piracy-filters-180213/

Faced with millions of individuals consuming unlicensed audiovisual content from a variety of sources, entertainment industry groups have been seeking solutions closer to the roots of the problem.

As widespread site-blocking attempts to tackle ‘pirate’ sites in the background, greater attention has turned to legal platforms that host both licensed and unlicensed content.

Under current legislation, these sites and services can do business relatively comfortably due to the so-called safe harbor provisions of the US Digital Millennium Copyright Act (DMCA) and the European Union Copyright Directive (EUCD).

Both sets of legislation ensure that Internet platforms can avoid being held liable for the actions of others provided they themselves address infringement when they are made aware of specific problems. If a video hosting site has a copy of an unlicensed movie uploaded by a user, for example, it must be removed within a reasonable timeframe upon request from the copyright holder.

However, in both the US and EU there is mounting pressure to make it more difficult for online services to achieve ‘safe harbor’ protections.

Entertainment industry groups believe that platforms use the law to turn a blind eye to infringing content uploaded by users, content that is often monetized before being taken down. With this in mind, copyright holders on both sides of the Atlantic are pressing for more proactive regimes, ones that will see Internet platforms install filtering mechanisms to spot and discard infringing content before it can reach the public.

While such a system would be welcomed by rightsholders, Internet companies are fearful of a future in which they could be held more liable for the infringements of others. They’re supported by the EFF, who yesterday presented a petition to the US Copyright Office urging caution over potential changes to the DMCA.

“As Internet users, website owners, and online entrepreneurs, we urge you to preserve and strengthen the Digital Millennium Copyright Act safe harbors for Internet service providers,” the EFF writes.

“The DMCA safe harbors are key to keeping the Internet open to all. They allow anyone to launch a website, app, or other service without fear of crippling liability for copyright infringement by users.”

It is clear that pressure to introduce mandatory filtering is a concern to the EFF. Filters are blunt instruments that cannot fathom the intricacies of fair use and are liable to stifle free speech and stymie innovation, they argue.

“Major media and entertainment companies and their surrogates want Congress to replace today’s DMCA with a new law that would require websites and Internet services to use automated filtering to enforce copyrights.

“Systems like these, no matter how sophisticated, cannot accurately determine the copyright status of a work, nor whether a use is licensed, a fair use, or otherwise non-infringing. Simply put, automated filters censor lawful and important speech,” the EFF warns.

While its introduction was voluntary and doesn’t affect the company’s safe harbor protections, YouTube already has its own content filtering system in place.

ContentID is able to detect the nature of some content uploaded by users and give copyright holders a chance to remove or monetize it. The company says that the majority of copyright disputes are now handled by ContentID but the system is not perfect and mistakes are regularly flagged by users and mentioned in the media.

However, ContentID was also very expensive to implement so expecting smaller companies to deploy something similar on much more limited budgets could be a burden too far, the EFF warns.

“What’s more, even deeply flawed filters are prohibitively expensive for all but the largest Internet services. Requiring all websites to implement filtering would reinforce the market power wielded by today’s large Internet services and allow them to stifle competition. We urge you to preserve effective, usable DMCA safe harbors, and encourage Congress to do the same,” the EFF notes.

The same arguments, for and against, are currently raging in Europe where the EU Commission proposed mandatory upload filtering in 2016. Since then, opposition to the proposals has been fierce, with warnings of potential human rights breaches and conflicts with existing copyright law.

Back in the US, there are additional requirements for a provider to qualify for safe harbor, including having a named designated agent tasked with receiving copyright infringement notifications. This person’s name must be listed on a platform’s website and submitted to the US Copyright Office, which maintains a centralized online directory of designated agents’ contact information.

Under new rules, agents must be re-registered with the Copyright Office every three years, despite that not being a requirement under the DMCA. The EFF is concerned that by simply failing to re-register an agent, an otherwise responsible website could lose its safe harbor protections, even if the agent’s details have remained the same.

“We’re concerned that the new requirement will particularly disadvantage small and nonprofit websites. We ask you to reconsider this rule,” the EFF concludes.

The EFF’s letter to the Copyright Office can be found here.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

AWS Hot Startups for February 2018: Canva, Figma, InVision

Post Syndicated from Tina Barr original https://aws.amazon.com/blogs/aws/aws-hot-startups-for-february-2018-canva-figma-invision/

Note to readers! Starting next month, we will be publishing our monthly Hot Startups blog post on the AWS Startup Blog. Please come check us out.

As visual communication—whether through social media channels like Instagram or white space-heavy product pages—becomes a central part of everyone’s life, accessible design platforms and tools become more and more important in the world of tech. This trend is why we have chosen to spotlight three design-related startups—namely Canva, Figma, and InVision—as our hot startups for the month of February. Please read on to learn more about these design-savvy companies and be sure to check out our full post here.

Canva (Sydney, Australia)

For a long time, creating designs required expensive software, extensive studying, and time spent waiting for feedback from clients or colleagues. With Canva, a graphic design tool that makes creating designs much simpler and accessible, users have the opportunity to design anything and publish anywhere. The platform—which integrates professional design elements, including stock photography, graphic elements, and fonts for users to build designs either entirely from scratch or from thousands of free templates—is available on desktop, iOS, and Android, making it possible to spin up an invitation, poster, or graphic on a smartphone at any time.

To learn more about Canva, read our full interview with CEO Melanie Perkins here.

Figma (San Francisco, CA)

Figma is a cloud-based design platform that empowers designers to communicate and collaborate more effectively. Using recent advancements in WebGL, Figma offers a design tool that doesn’t require users to install any software or special operating systems. It also allows multiple people to work in a file at the same time—a crucial feature.

As the need for new design talent increases, the industry will need plenty of junior designers to keep up with the demand. Figma is prepared to help students by offering their platform for free. Through this, they “hope to give young designers the resources necessary to kick-start their education and eventually, their careers.”

For more about Figma, check out our full interview with CEO Dylan Field here.

InVision (New York, NY)

Founded in 2011 with the goal of helping improve every digital experience in the world, digital product design platform InVision helps users create a streamlined and scalable product design process, build and iterate on prototypes, and collaborate across organizations. The company, which raised a $100 million series E last November, bringing the company’s total funding to $235 million, currently powers the digital product design process at more than 80 percent of the Fortune 100 and brands like Airbnb, HBO, Netflix, and Uber.

Learn more about InVision here.

Be sure to check out our full post on the AWS Startups blog!

-Tina

How I built a data warehouse using Amazon Redshift and AWS services in record time

Post Syndicated from Stephen Borg original https://aws.amazon.com/blogs/big-data/how-i-built-a-data-warehouse-using-amazon-redshift-and-aws-services-in-record-time/

This is a customer post by Stephen Borg, the Head of Big Data and BI at Cerberus Technologies.

Cerberus Technologies, in their own words: Cerberus is a company founded in 2017 by a team of visionary iGaming veterans. Our mission is simple – to offer the best tech solutions through a data-driven and a customer-first approach, delivering innovative solutions that go against traditional forms of working and process. This mission is based on the solid foundations of reliability, flexibility and security, and we intend to fundamentally change the way iGaming and other industries interact with technology.

Over the years, I have developed and created a number of data warehouses from scratch. Recently, I built a data warehouse for the iGaming industry single-handedly. To do it, I used the power and flexibility of Amazon Redshift and the wider AWS data management ecosystem. In this post, I explain how I was able to build a robust and scalable data warehouse without the large team of experts typically needed.

In two of my recent projects, I ran into challenges when scaling our data warehouse using on-premises infrastructure. Data was growing at many tens of gigabytes per day, and query performance was suffering. Scaling required major capital investment for hardware and software licenses, and also significant operational costs for maintenance and technical staff to keep it running and performing well. Unfortunately, I couldn’t get the resources needed to scale the infrastructure with data growth, and these projects were abandoned. Thanks to cloud data warehousing, the bottleneck of infrastructure resources, capital expense, and operational costs have been significantly reduced or have totally gone away. There is no more excuse for allowing obstacles of the past to delay delivering timely insights to decision makers, no matter how much data you have.

With Amazon Redshift and AWS, I delivered a cloud data warehouse to the business very quickly, and with a small team: me. I didn’t have to order hardware or software, and I no longer needed to install, configure, tune, or keep up with patches and version updates. Instead, I easily set up a robust data processing pipeline and we were quickly ingesting and analyzing data. Now, my data warehouse team can be extremely lean, and focus more time on bringing in new data and delivering insights. In this post, I show you the AWS services and the architecture that I used.

Handling data feeds

I have several different data sources that provide everything needed to run the business. The data includes activity from our iGaming platform, social media posts, clickstream data, marketing and campaign performance, and customer support engagements.

To handle the diversity of data feeds, I developed abstract integration applications using Docker that run on Amazon EC2 Container Service (Amazon ECS) and feed data to Amazon Kinesis Data Streams. These data streams can be used for real time analytics. In my system, each record in Kinesis is preprocessed by an AWS Lambda function to cleanse and aggregate information. My system then routes it to be stored where I need on Amazon S3 by Amazon Kinesis Data Firehose. Suppose that you used an on-premises architecture to accomplish the same task. A team of data engineers would be required to maintain and monitor a Kafka cluster, develop applications to stream data, and maintain a Hadoop cluster and the infrastructure underneath it for data storage. With my stream processing architecture, there are no servers to manage, no disk drives to replace, and no service monitoring to write.

Setting up a Kinesis stream can be done with a few clicks, and the same for Kinesis Firehose. Firehose can be configured to automatically consume data from a Kinesis Data Stream, and then write compressed data every N minutes to Amazon S3. When I want to process a Kinesis data stream, it’s very easy to set up a Lambda function to be executed on each message received. I can just set a trigger from the AWS Lambda Management Console, as shown following.

I also monitor the duration of function execution using Amazon CloudWatch and AWS X-Ray.

Regardless of the format I receive the data from our partners, I can send it to Kinesis as JSON data using my own formatters. After Firehose writes this to Amazon S3, I have everything in nearly the same structure I received but compressed, encrypted, and optimized for reading.

This data is automatically crawled by AWS Glue and placed into the AWS Glue Data Catalog. This means that I can immediately query the data directly on S3 using Amazon Athena or through Amazon Redshift Spectrum. Previously, I used Amazon EMR and an Amazon RDS–based metastore in Apache Hive for catalog management. Now I can avoid the complexity of maintaining Hive Metastore catalogs. Glue takes care of high availability and the operations side so that I know that end users can always be productive.

Working with Amazon Athena and Amazon Redshift for analysis

I found Amazon Athena extremely useful out of the box for ad hoc analysis. Our engineers (me) use Athena to understand new datasets that we receive and to understand what transformations will be needed for long-term query efficiency.

For our data analysts and data scientists, we’ve selected Amazon Redshift. Amazon Redshift has proven to be the right tool for us over and over again. It easily processes 20+ million transactions per day, regardless of the footprint of the tables and the type of analytics required by the business. Latency is low and query performance expectations have been more than met. We use Redshift Spectrum for long-term data retention, which enables me to extend the analytic power of Amazon Redshift beyond local data to anything stored in S3, and without requiring me to load any data. Redshift Spectrum gives me the freedom to store data where I want, in the format I want, and have it available for processing when I need it.

To load data directly into Amazon Redshift, I use AWS Data Pipeline to orchestrate data workflows. I create Amazon EMR clusters on an intra-day basis, which I can easily adjust to run more or less frequently as needed throughout the day. EMR clusters are used together with Amazon RDS, Apache Spark 2.0, and S3 storage. The data pipeline application loads ETL configurations from Spring RESTful services hosted on AWS Elastic Beanstalk. The application then loads data from S3 into memory, aggregates and cleans the data, and then writes the final version of the data to Amazon Redshift. This data is then ready to use for analysis. Spark on EMR also helps with recommendations and personalization use cases for various business users, and I find this easy to set up and deliver what users want. Finally, business users use Amazon QuickSight for self-service BI to slice, dice, and visualize the data depending on their requirements.

Each AWS service in this architecture plays its part in saving precious time that’s crucial for delivery and getting different departments in the business on board. I found the services easy to set up and use, and all have proven to be highly reliable for our use as our production environments. When the architecture was in place, scaling out was either completely handled by the service, or a matter of a simple API call, and crucially doesn’t require me to change one line of code. Increasing shards for Kinesis can be done in a minute by editing a stream. Increasing capacity for Lambda functions can be accomplished by editing the megabytes allocated for processing, and concurrency is handled automatically. EMR cluster capacity can easily be increased by changing the master and slave node types in Data Pipeline, or by using Auto Scaling. Lastly, RDS and Amazon Redshift can be easily upgraded without any major tasks to be performed by our team (again, me).

In the end, using AWS services including Kinesis, Lambda, Data Pipeline, and Amazon Redshift allows me to keep my team lean and highly productive. I eliminated the cost and delays of capital infrastructure, as well as the late night and weekend calls for support. I can now give maximum value to the business while keeping operational costs down. My team pushed out an agile and highly responsive data warehouse solution in record time and we can handle changing business requirements rapidly, and quickly adapt to new data and new user requests.


Additional Reading

If you found this post useful, be sure to check out Deploy a Data Warehouse Quickly with Amazon Redshift, Amazon RDS for PostgreSQL and Tableau Server and Top 8 Best Practices for High-Performance ETL Processing Using Amazon Redshift.


About the Author

Stephen Borg is the Head of Big Data and BI at Cerberus Technologies. He has a background in platform software engineering, and first became involved in data warehousing using the typical RDBMS, SQL, ETL, and BI tools. He quickly became passionate about providing insight to help others optimize the business and add personalization to products. He is now the Head of Big Data and BI at Cerberus Technologies.

 

 

 

The Early Days of Mass Internet Piracy Were Awesome Yet Awful

Post Syndicated from Andy original https://torrentfreak.com/the-early-days-of-mass-internet-piracy-were-awesome-yet-awful-180211/

While Napster certainly put the digital cats among the pigeons in 1999, the organized chaos of mass Internet file-sharing couldn’t be truly appreciated until the advent of decentralized P2P networks a year or so later.

In the blink of an eye, everyone with a “shared folder” client became both a consumer and publisher, sucking in files from strangers and sharing them with like-minded individuals all around the planet. While today’s piracy narrative is all about theft and danger, in the early 2000s the sharing community felt more like distant friends who hadn’t met, quietly trading cards together.

Satisfying to millions, those who really engaged found shared folder sharing a real adrenaline buzz, as English comedian Seann Walsh noted on Conan this week.

“Click. 20th Century Fox comes up. No pixels. No shaky cam. No silhouettes of heads at the bottom of the screen, people coming in five minutes late. None of that,” Walsh said, recalling his experience of downloading X-Men 2 (X2) from LimeWire.

“We thought: ‘We’ve done it!!’ This was incredible! We were going to have to go to the cinema. We weren’t going to have to wait for the film to come out on video. We weren’t going to have to WALK to blockbuster!”

But while the nostalgia has an air of magic about it, Walsh’s take on the piracy experience is bittersweet. While obtaining X2 without having to trudge to a video store was a revelation, there were plenty of drawbacks too.

Downloading the pirate copy took a week, which pre-BitTorrent wasn’t a completely bad result but still a considerable commitment. There were also serious problems with quality control.

“20th Century fades, X Men 2 comes up. We’ve done it! We’re not taking it for granted – we’re actually hugging. Yes! Yes! We’ve done it! This is the future! We look at the screen, Wolverine turns round…,” …..and Walsh launches into a broadside of pseudo-German babble, mimicking the unexpectedly-dubbed superhero.

After a week of downloading and getting a quality picture on launch, that is a punch in the gut, to say the least. Arguably no less than a pirate deserves, some will argue, but a fat lip nonetheless, and one many a pirate has suffered over the years. Nevertheless, as Walsh notes, it’s a pain that kids in 2018 simply cannot comprehend.

“Children today are living the childhood I dreamed of. If they want to hear a song – touch – they stream it. They’ve got it now. Bang. Instantly. They don’t know the pain of LimeWire.

“Start downloading a song, go to school, come back. HOPE that it’d finished! That download bar messing with you. Four minutes left…..nine HOURS and 28 minutes left? Thirty seconds left…..52 hours and 38 minutes left? JUST TELL ME THE TRUTH!!!!!” Walsh pleaded.

While this might sound comical now, this was the reality of people downloading from clients such as LimeWire and Kazaa. While X2 in German would’ve been torture for a non-German speaker, the misery of watching an English language copy of 28 Days Later somehow crammed into a 30Mb file is right up there too.

Mislabeled music with microscopic bitrates? That was pretty much standard.

But against the odds, these frankly second-rate experiences still managed to capture the hearts and minds of the digitally minded. People were prepared to put up with nonsense and regular disappointment in order to consume content in a way fit for the 21st century. Yet somehow the combined might of the entertainment industries couldn’t come up with anything substantially better for a number of years.

Of course, broadband availability and penetration played its part but looking back, something could have been done. Not only didn’t the Internet’s popularity come as a surprise, people’s expectations were dramatically lower than they are today too. In any event, beating the pirates should have been child’s play. After all, it was just regular people sharing files in a Windows folder.

Any fool could do it – and millions did. Surprisingly, they have proven unstoppable.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Voksi Releases Detailed Denuvo-Cracking Video Tutorial

Post Syndicated from Andy original https://torrentfreak.com/voksi-releases-detailed-denuvo-cracking-video-tutorial-180210/

Earlier this week, version 4.9 of the Denuvo anti-tamper system, which had protected Assassins Creed Origin for the past several months, was defeated by Italian cracking group CPY.

While Denuvo would probably paint four months of protection as a success, the company would certainly have preferred for things to have gone on a bit longer, not least following publisher Ubisoft’s decision to use VMProtect technology on top.

But while CPY do their thing in Italy there’s another rival whittling away at whatever the giants at Denuvo (and new owner Irdeto) can come up with. The cracker – known only as Voksi – hails from Bulgaria and this week he took the unusual step of releasing a 90-minute video (embedded below) in which he details how to defeat Denuvo’s V4 anti-tamper technology.

The video is not for the faint-hearted so those with an aversion to issues of a highly technical nature might feel the urge to look away. However, it may surprise readers to learn that not so long ago, Voksi knew absolutely nothing about coding.

“You will find this very funny and unbelievable,” Voksi says, recalling the events of 2012.

“There was one game called Sanctum and on one free [play] weekend [on Steam], I and my best friend played through it and saw how great the cooperative action was. When the free weekend was over, we wanted to keep playing, but we didn’t have any money to buy the game.

“So, I started to look for alternative ways, LAN emulators, anything! Then I decided I need to crack it. That’s how I got into reverse engineering. I started watching some shitty YouTube videos with bad quality and doing some tutorials. Then I found about Steam exploits and that’s how I got into making Steamworks fixes, allowing cracked multiplayer between players.”

Voksi says his entire cracking career began with this one indie game and his desire to play it with his best friend. Prior to that, he had absolutely no experience at all. He says he’s taken no university courses or any course at all for that matter. Everything he knows has come from material he’s found online. But the intrigue doesn’t stop there.

“I don’t even know how to code properly in high-level language like C#, C++, etc. But I understand assembly [language] perfectly fine,” he explains.

For those who code, that’s generally a little bit back to front, with low-level languages usually posing the most difficulties. But Voksi says that with assembly, everything “just clicked.”

Of course, it’s been six years since the 21-year-old was first motivated to crack a game due to lack of funds. In the more than half decade since, have his motivations changed at all? Is it the thrill of solving the puzzle or are there other factors at play?

“I just developed an urge to provide paid stuff for free for people who can’t afford it and specifically, co-op and multiplayer cracks. Of course, i’m not saying don’t support the developers if you have the money and like the game. You should do that,” he says.

“The challenge of cracking also motivates me, especially with an abomination like Denuvo. It is pure cancer for the gaming industry, it doesn’t help and it only causes issues for the paying customers.”

Those who follow Voksi online will know that as well as being known in his own right, he’s part of the REVOLT group, a collective that has Voksi’s core interests and goals as their own.

“REVOLT started as a group with one and only goal – to provide multiplayer support for cracked games. No other group was doing it until that day. It was founded by several members, from which I’m currently the only one active, still releasing cracks.

“Our great achievements are in first place, of course, cracking Denuvo V4, making us one of the four groups/people who were able to break the protection. In second place are our online fixes for several AAA games, allowing you to play on legit servers with legit players. In third place, our ordinary Steamworks fixes allowing you to play multiplayer between cracked users.”

In communities like /r/crackwatch on Reddit and those less accessible, Voksi and others doing similar work are often held up as Internet heroes, cracking games in order to give the masses access to something that might’ve been otherwise inaccessible. But how does this fame sit with him?

“Well, I don’t see myself as a hero, just another ordinary person doing what he loves. I love seeing people happy because of my work, that’s also a big motivation, but nothing more than that,” he says.

Finally, what’s up next for Voksi and what are his hopes for the rest of the year?

“In an ideal world, Denuvo would die. As for me, I don’t know, time will tell,” he concludes.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Big Birthday Weekend 2018: find a Jam near you!

Post Syndicated from Ben Nuttall original https://www.raspberrypi.org/blog/big-birthday-weekend-2018-find-a-jam-near-you/

We’re just over three weeks away from the Raspberry Jam Big Birthday Weekend 2018, our community celebration of Raspberry Pi’s sixth birthday. Instead of an event in Cambridge, as we’ve held in the past, we’re coordinating Raspberry Jam events to take place around the world on 3–4 March, so that as many people as possible can join in. Well over 100 Jams have been confirmed so far.

Raspberry Pi Big Birthday Weekend Jam

Find a Jam near you

There are Jams planned in Argentina, Australia, Bolivia, Brazil, Bulgaria, Cameroon, Canada, Colombia, Dominican Republic, France, Germany, Greece, Hungary, India, Iran, Ireland, Italy, Japan, Kenya, Malaysia, Malta, Mexico, Netherlands, Norway, Papua New Guinea, Peru, Philippines, Poland, South Africa, Spain, Taiwan, Turkey, United Kingdom, United States, and Zimbabwe.

Take a look at the events map and the full list (including those who haven’t added their event to the map quite yet).

Raspberry Jam Big Birthday Weekend 2018 event map

We will have Raspberry Jams in 35 countries across six continents

Birthday kits

We had some special swag made especially for the birthday, including these T-shirts, which we’ve sent to Jam organisers:

Raspberry Jam Big Birthday Weekend 2018 T-shirt

There is also a poster with a list of participating Jams, which you can download:

Raspberry Jam Big Birthday Weekend 2018 list

Raspberry Jam photo booth

I created a Raspberry Jam photo booth that overlays photos with the Big Birthday Weekend logo and then tweets the picture from your Jam’s account — you’ll be seeing plenty of those if you follow the #PiParty hashtag on 3–4 March.

Check out the project on GitHub, and feel free to set up your own booth, or modify it to your own requirements. We’ve included text annotations in several languages, and more contributions are very welcome.

There’s still time…

If you can’t find a Jam near you, there’s still time to organise one for the Big Birthday Weekend. All you need to do is find a venue — a room in a school or library will do — and think about what you’d like to do at the event. Some Jams have Raspberry Pis set up for workshops and practical activities, some arrange tech talks, some put on show-and-tell — it’s up to you. To help you along, there’s the Raspberry Jam Guidebook full of advice and tips from Jam organisers.

Raspberry Pi on Twitter

The packed. And they packed. And they packed some more. Who’s expecting one of these #rjam kits for the Raspberry Jam Big Birthday Weekend?

Download the Raspberry Jam branding pack, and the special birthday branding pack, where you’ll find logos, graphical assets, flyer templates, worksheets, and more. When you’re ready to announce your event, create a webpage for it — you can use a site like Eventbrite or Meetup — and submit your Jam to us so it will appear on the Jam map!

We are six

We’re really looking forward to celebrating our birthday with thousands of people around the world. Over 48 hours, people of all ages will come together at more than 100 events to learn, share ideas, meet people, and make things during our Big Birthday Weekend.

Raspberry Jam Manchester
Raspberry Jam Manchester
Raspberry Jam Manchester

Since we released the first Raspberry Pi in 2012, we’ve sold 17 million of them. We’re also reaching almost 200000 children in 130 countries around the world through Code Club and CoderDojo, we’ve trained over 1500 Raspberry Pi Certified Educators, and we’ve sent code written by more than 6800 children into space. Our magazines are read by a quarter of a million people, and millions more use our free online learning resources. There’s plenty to celebrate and even more still to do: we really hope you’ll join us from a Jam near you on 3–4 March.

The post Big Birthday Weekend 2018: find a Jam near you! appeared first on Raspberry Pi.

Man Handed Conditional Prison Sentence for Spreading Popcorn Time Information

Post Syndicated from Andy original https://torrentfreak.com/man-handed-conditional-prison-sentence-spreading-popcorn-time-information-180208/

In August 2015, police in Denmark announced they had arrested a man in his thirties said to be the operator of a Popcorn Time-focused website. Popcorntime.dk was subsequently shut down and its domain placed under the control of the state prosecutor.

“The Danish State Prosecutor for Serious Economic and International Crime is presently conducting a criminal investigation that involves this domain name,” a seizure notice on the site reads.

“As part of the investigation the state prosecutor has requested a Danish District Court to transfer the rights of the domain name to the state prosecutor. The District Court has complied with the request.”

In a circumstance like this, it’s common to conclude that the site was offering copyright-infringing content or software. That wasn’t the case though, not even close.

PopcornTime.dk was an information resource, offering news on Popcorn Time-related developments, guides, plus tips on how to use the software while staying anonymous.

PopcornTime.dk as it appeared in 2015

Importantly, PopcornTime.dk hosted no software, preferring to link to other sites where the application could be downloaded instead. That didn’t prevent an aggressive prosecution though and now, two-and-half years later, the verdict’s in and it’s bound to raise more than a few eyebrows.

On Wednesday, a court in Odense, Denmark, handed the now 39-year-old man behind PopcornTime.dk a six-month conditional prison sentence for spreading information about the controversial movie streaming service.

Senior prosecutor Dorte Køhler Frandsen from SØIK (State Attorney for Special Economic and International Crime), who was behind the criminal proceedings, described the successful prosecution as a first-of-its-kind moment for the entire region.

“Never before has a person been convicted of helping to spread streaming services. The judgment is therefore an important step in combating illegal streaming on the Internet and will reverberate throughout Europe,” Frandsen said.

According to a statement from the prosecutor, the 39-year-old earned 506,003 Danish Krone ($83,363) in advertising revenue from his website in 2015. In addition to forfeiting this amount and having his domain confiscated, the man will also be required to complete 120 hours of community service.

“The verdict is a clear signal to those who spread illegal pirate services. The film industry and others lose billions in revenue each year because criminals illegally offer films for free. It’s a loss for everyone. Also the consumer,” Frandsen added.

The convicted man now has two weeks to decide whether he will take his appeal to the Østre Landsret, one of Denmark’s two High Courts.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

RIAA: Cox Ruling Shows that Grande Can Be Liable for Piracy Too

Post Syndicated from Ernesto original https://torrentfreak.com/riaa-cox-ruling-shows-that-grande-can-be-liable-for-piracy-too-180207/

Regular Internet providers are being put under increasing pressure for not doing enough to curb copyright infringement.

Last year several major record labels, represented by the RIAA, filed a lawsuit in a Texas District Court, accusing ISP Grande Communications of turning a blind eye on its pirating subscribers.

“Despite their knowledge of repeat infringements, Defendants have permitted repeat infringers to use the Grande service to continue to infringe Plaintiffs’ copyrights without consequence,” the RIAA’s complaint read.

Grande disagreed with this assertion and filed a motion to dismiss the case. The ISP argued that it doesn’t encourage any of its customers to download copyrighted works, and that it has no control over the content subscribers access.

The Internet provider didn’t deny that it received millions of takedown notices through the piracy tracking company Rightscorp. However, it believed that these notices are flawed and not worthy of acting upon.

The case shows a lot of similarities with the legal battle between BMG and Cox Communications, in which the Fourth Circuit Court of Appeals issued an important verdict last week.

The appeals court overturned the $25 million piracy damages verdict against Cox due to an erroneous jury instruction but held that the ISP lost its safe harbor protection because it failed to implement a meaningful repeat infringer policy.

This week, the RIAA used the Fourth Circuit ruling as further evidence that Grande’s motion to dismiss should be denied.

The RIAA points out that both Cox and Grande used similar arguments in their defense, some of which were denied by the appeals court. The Fourth Circuit held, for example, that an ISP’s substantial non-infringing uses does not immunize it from liability for contributory copyright infringement.

In addition, the appeals court also clarified that if an ISP wilfully blinds itself to copyright infringements, that is sufficient to satisfy the knowledge requirement for contributory copyright infringement.

According to the RIAA’s filing at a Texas District Court this week, Grande has already admitted that it willingly ‘ignored’ takedown notices that were submitted on behalf of third-party copyright holders.

“Grande has already admitted that it received notices from Rightscorp and, to use Grande’s own phrase, did not ‘meaningfully investigate’ them,” the RIAA writes.

“Thus, even if this Court were to apply the Fourth Circuit’s ‘willful blindness’ standard, the level of knowledge that Grande has effectively admitted exceeds the level of knowledge that the Fourth Circuit held was ‘powerful evidence’ sufficient to establish liability for contributory infringement.”

As such, the motion to dismiss the case should be denied, the RIAA argues.

What’s not mentioned in the RIAA’s filing, however, is why Grande chose not to act upon these takedown notices. In its defense, the ISP previously explained that Rightcorp’s notices lacked specificity and were incapable of detecting actual infringements.

Grande argued that if they acted on these notices without additional proof, its subscribers could lose their Internet access even though they are using it for legal purposes. The ISP may, therefore, counter that it wasn’t willfully blind, as it saw no solid proof for the alleged infringements to begin with.

“To merely treat these allegations as true without investigation would be a disservice to Grande’s subscribers, who would run the risk of having their Internet service permanently terminated despite using Grande’s services for completely legitimate purposes,” Grande previously wrote.

This brings up a tricky issue. The Fourth Circuit made it clear last week that ISPs require a meaningful policy against repeat infringers in respond to takedown notices from copyright holders. But what are the requirements for a proper takedown notice? Do any and all notices count?

Grande clearly has no faith in the accuracy of Rightscorp’s technology but if their case goes in the same direction as Cox’s, that might not make much of a difference.

A copy of the RIAA’s summary of supplemental authority is available here (pdf).

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Cabinet of Secret Documents from Australia

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/02/cabinet_of_secr.html

This story of leaked Australian government secrets is unlike any other I’ve heard:

It begins at a second-hand shop in Canberra, where ex-government furniture is sold off cheaply.

The deals can be even cheaper when the items in question are two heavy filing cabinets to which no-one can find the keys.

They were purchased for small change and sat unopened for some months until the locks were attacked with a drill.

Inside was the trove of documents now known as The Cabinet Files.

The thousands of pages reveal the inner workings of five separate governments and span nearly a decade.

Nearly all the files are classified, some as “top secret” or “AUSTEO”, which means they are to be seen by Australian eyes only.

Yes, that really happened. The person who bought and opened the file cabinets contacted the Australian Broadcasting Corp, who is now publishing a bunch of it.

There’s lots of interesting (and embarassing) stuff in the documents, although most of it is local politics. I am more interested in the government’s reaction to the incident: they’re pushing for a law making it illegal for the press to publish government secrets it received through unofficial channels.

“The one thing I would point out about the legislation that does concern me particularly is that classified information is an element of the offence,” he said.

“That is to say, if you’ve got a filing cabinet that is full of classified information … that means all the Crown has to prove if they’re prosecuting you is that it is classified ­ nothing else.

“They don’t have to prove that you knew it was classified, so knowledge is beside the point.”

[…]

Many groups have raised concerns, including media organisations who say they unfairly target journalists trying to do their job.

But really anyone could be prosecuted just for possessing classified information, regardless of whether they know about it.

That might include, for instance, if you stumbled across a folder of secret files in a regular skip bin while walking home and handed it over to a journalist.

This illustrates a fundamental misunderstanding of the threat. The Australian Broadcasting Corp gets their funding from the government, and was very restrained in what they published. They waited months before publishing as they coordinated with the Australian government. They allowed the government to secure the files, and then returned them. From the government’s perspective, they were the best possible media outlet to receive this information. If the government makes it illegal for the Australian press to publish this sort of material, the next time it will be sent to the BBC, the Guardian, the New York Times, or Wikileaks. And since people no longer read their news from newspapers sold in stores but on the Internet, the result will be just as many people reading the stories with far fewer redactions.

The proposed law is older than this leak, but the leak is giving it new life. The Australian opposition party is being cagey on whether they will support the law. They don’t want to appear weak on national security, so I’m not optimistic.

EDITED TO ADD (2/8): The Australian government backed down on that new security law.

EDITED TO ADD (2/13): Excellent political cartoon.

Build a Multi-Tenant Amazon EMR Cluster with Kerberos, Microsoft Active Directory Integration and EMRFS Authorization

Post Syndicated from Songzhi Liu original https://aws.amazon.com/blogs/big-data/build-a-multi-tenant-amazon-emr-cluster-with-kerberos-microsoft-active-directory-integration-and-emrfs-authorization/

One of the challenges faced by our customers—especially those in highly regulated industries—is balancing the need for security with flexibility. In this post, we cover how to enable multi-tenancy and increase security by using EMRFS (EMR File System) authorization, the Amazon S3 storage-level authorization on Amazon EMR.

Amazon EMR is an easy, fast, and scalable analytics platform enabling large-scale data processing. EMRFS authorization provides Amazon S3 storage-level authorization by configuring EMRFS with multiple IAM roles. With this functionality enabled, different users and groups can share the same cluster and assume their own IAM roles respectively.

Simply put, on Amazon EMR, we can now have an Amazon EC2 role per user assumed at run time instead of one general EC2 role at the cluster level. When the user is trying to access Amazon S3 resources, Amazon EMR evaluates against a predefined mappings list in EMRFS configurations and picks up the right role for the user.

In this post, we will discuss what EMRFS authorization is (Amazon S3 storage-level access control) and show how to configure the role mappings with detailed examples. You will then have the desired permissions in a multi-tenant environment. We also demo Amazon S3 access from HDFS command line, Apache Hive on Hue, and Apache Spark.

EMRFS authorization for Amazon S3

There are two prerequisites for using this feature:

  1. Users must be authenticated, because EMRFS needs to map the current user/group/prefix to a predefined user/group/prefix. There are several authentication options. In this post, we launch a Kerberos-enabled cluster that manages the Key Distribution Center (KDC) on the master node, and enable a one-way trust from the KDC to a Microsoft Active Directory domain.
  2. The application must support accessing Amazon S3 via Applications that have their own S3FileSystem APIs (for example, Presto) are not supported at this time.

EMRFS supports three types of mapping entries: user, group, and Amazon S3 prefix. Let’s use an example to show how this works.

Assume that you have the following three identities in your organization, and they are defined in the Active Directory:

To enable all these groups and users to share the EMR cluster, you need to define the following IAM roles:

In this case, you create a separate Amazon EC2 role that doesn’t give any permission to Amazon S3. Let’s call the role the base role (the EC2 role attached to the EMR cluster), which in this example is named EMR_EC2_RestrictedRole. Then, you define all the Amazon S3 permissions for each specific user or group in their own roles. The restricted role serves as the fallback role when the user doesn’t belong to any user/group, nor does the user try to access any listed Amazon S3 prefixes defined on the list.

Important: For all other roles, like emrfs_auth_group_role_data_eng, you need to add the base role (EMR_EC2_RestrictedRole) as the trusted entity so that it can assume other roles. See the following example:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "ec2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    },
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::511586466501:role/EMR_EC2_RestrictedRole"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

The following is an example policy for the admin user role (emrfs_auth_user_role_admin_user):

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "s3:*",
            "Resource": "*"
        }
    ]
}

We are assuming the admin user has access to all buckets in this example.

The following is an example policy for the data science group role (emrfs_auth_group_role_data_sci):

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Resource": [
                "arn:aws:s3:::emrfs-auth-data-science-bucket-demo/*",
                "arn:aws:s3:::emrfs-auth-data-science-bucket-demo"
            ],
            "Action": [
                "s3:*"
            ]
        }
    ]
}

This role grants all Amazon S3 permissions to the emrfs-auth-data-science-bucket-demo bucket and all the objects in it. Similarly, the policy for the role emrfs_auth_group_role_data_eng is shown below:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Resource": [
                "arn:aws:s3:::emrfs-auth-data-engineering-bucket-demo/*",
                "arn:aws:s3:::emrfs-auth-data-engineering-bucket-demo"
            ],
            "Action": [
                "s3:*"
            ]
        }
    ]
}

Example role mappings configuration

To configure EMRFS authorization, you use EMR security configuration. Here is the configuration we use in this post

Consider the following scenario.

First, the admin user admin1 tries to log in and run a command to access Amazon S3 data through EMRFS. The first role emrfs_auth_user_role_admin_user on the mapping list, which is a user role, is mapped and picked up. Then admin1 has access to the Amazon S3 locations that are defined in this role.

Then a user from the data engineer group (grp_data_engineering) tries to access a data bucket to run some jobs. When EMRFS sees that the user is a member of the grp_data_engineering group, the group role emrfs_auth_group_role_data_eng is assumed, and the user has proper access to Amazon S3 that is defined in the emrfs_auth_group_role_data_eng role.

Next, the third user comes, who is not an admin and doesn’t belong to any of the groups. After failing evaluation of the top three entries, EMRFS evaluates whether the user is trying to access a certain Amazon S3 prefix defined in the last mapping entry. This type of mapping entry is called the prefix type. If the user is trying to access s3://emrfs-auth-default-bucket-demo/, then the prefix mapping is in effect, and the prefix role emrfs_auth_prefix_role_default_s3_prefix is assumed.

If the user is not trying to access any of the Amazon S3 paths that are defined on the list—which means it failed the evaluation of all the entries—it only has the permissions defined in the EMR_EC2RestrictedRole. This role is assumed by the EC2 instances in the cluster.

In this process, all the mappings defined are evaluated in the defined order, and the first role that is mapped is assumed, and the rest of the list is skipped.

Setting up an EMR cluster and mapping Active Directory users and groups

Now that we know how EMRFS authorization role mapping works, the next thing we need to think about is how we can use this feature in an easy and manageable way.

Active Directory setup

Many customers manage their users and groups using Microsoft Active Directory or other tools like OpenLDAP. In this post, we create the Active Directory on an Amazon EC2 instance running Windows Server and create the users and groups we will be using in the example below. After setting up Active Directory, we use the Amazon EMR Kerberos auto-join capability to establish a one-way trust from the KDC running on the EMR master node to the Active Directory domain on the EC2 instance. You can use your own directory services as long as it talks to the LDAP (Lightweight Directory Access Protocol).

To create and join Active Directory to Amazon EMR, follow the steps in the blog post Use Kerberos Authentication to Integrate Amazon EMR with Microsoft Active Directory.

After configuring Active Directory, you can create all the users and groups using the Active Directory tools and add users to appropriate groups. In this example, we created users like admin1, dataeng1, datascientist1, grp_data_engineering, and grp_data_science, and then add the users to the right groups.

Join the EMR cluster to an Active Directory domain

For clusters with Kerberos, Amazon EMR now supports automated Active Directory domain joins. You can use the security configuration to configure the one-way trust from the KDC to the Active Directory domain. You also configure the EMRFS role mappings in the same security configuration.

The following is an example of the EMR security configuration with a trusted Active Directory domain EMRKRB.TEST.COM and the EMRFS role mappings as we discussed earlier:

The EMRFS role mapping configuration is shown in this example:

We will also provide an example AWS CLI command that you can run.

Launching the EMR cluster and running the tests

Now you have configured Kerberos and EMRFS authorization for Amazon S3.

Additionally, you need to configure Hue with Active Directory using the Amazon EMR configuration API in order to log in using the AD users created before. The following is an example of Hue AD configuration.

[
  {
    "Classification":"hue-ini",
    "Properties":{

    },
    "Configurations":[
      {
        "Classification":"desktop",
        "Properties":{

        },
        "Configurations":[
          {
            "Classification":"ldap",
            "Properties":{

            },
            "Configurations":[
              {
                "Classification":"ldap_servers",
                "Properties":{

                },
                "Configurations":[
                  {
                    "Classification":"AWS",
                    "Properties":{
                      "base_dn":"DC=emrkrb,DC=test,DC=com",
                      "ldap_url":"ldap://emrkrb.test.com",
                      "search_bind_authentication":"false",
                      "bind_dn":"CN=adjoiner,CN=users,DC=emrkrb,DC=test,DC=com",
                      "bind_password":"Abc123456",
                      "create_users_on_login":"true",
                      "nt_domain":"emrkrb.test.com"
                    },
                    "Configurations":[

                    ]
                  }
                ]
              }
            ]
          },
          {
            "Classification":"auth",
            "Properties":{
              "backend":"desktop.auth.backend.LdapBackend"
            },
            "Configurations":[

            ]
          }
        ]
      }
    ]
  }

Note: In the preceding configuration JSON file, change the values as required before pasting it into the software setting section in the Amazon EMR console.

Now let’s use this configuration and the security configuration you created before to launch the cluster.

In the Amazon EMR console, choose Create cluster. Then choose Go to advanced options. On the Step1: Software and Steps page, under Edit software settings (optional), paste the configuration in the box.

The rest of the setup is the same as an ordinary cluster setup, except in the Security Options section. In Step 4: Security, under Permissions, choose Custom, and then choose the RestrictedRole that you created before.

Choose the appropriate subnets (these should meet the base requirement in order for a successful Active Directory join—see the Amazon EMR Management Guide for more details), and choose the appropriate security groups to make sure it talks to the Active Directory. Choose a key so that you can log in and configure the cluster.

Most importantly, choose the security configuration that you created earlier to enable Kerberos and EMRFS authorization for Amazon S3.

You can use the following AWS CLI command to create a cluster.

aws emr create-cluster --name "TestEMRFSAuthorization" \ 
--release-label emr-5.10.0 \ --instance-type m3.xlarge \ 
--instance-count 3 \ 
--ec2-attributes InstanceProfile=EMR_EC2_DefaultRole,KeyName=MyEC2KeyPair \ --service-role EMR_DefaultRole \ 
--security-configuration MyKerberosConfig \ 
--configurations file://hue-config.json \
--applications Name=Hadoop Name=Hive Name=Hue Name=Spark \ 
--kerberos-attributes Realm=EC2.INTERNAL, \ KdcAdminPassword=<YourClusterKDCAdminPassword>, \ ADDomainJoinUser=<YourADUserLogonName>,ADDomainJoinPassword=<YourADUserPassword>, \ 
CrossRealmTrustPrincipalPassword=<MatchADTrustPwd>

Note: If you create the cluster using CLI, you need to save the JSON configuration for Hue into a file named hue-config.json and place it on the server where you run the CLI command.

After the cluster gets into the Waiting state, try to connect by using SSH into the cluster using the Active Directory user name and password.

ssh -l [email protected] <EMR IP or DNS name>

Quickly run two commands to show that the Active Directory join is successful:

  1. id [user name] shows the mapped AD users and groups in Linux.
  2. hdfs groups [user name] shows the mapped group in Hadoop.

Both should return the current Active Directory user and group information if the setup is correct.

Now, you can test the user mapping first. Log in with the admin1 user, and run a Hadoop list directory command:

hadoop fs -ls s3://emrfs-auth-data-science-bucket-demo/

Now switch to a user from the data engineer group.

Retry the previous command to access the admin’s bucket. It should throw an Amazon S3 Access Denied exception.

When you try listing the Amazon S3 bucket that a data engineer group member has accessed, it triggers the group mapping.

hadoop fs -ls s3://emrfs-auth-data-engineering-bucket-demo/

It successfully returns the listing results. Next we will test Apache Hive and then Apache Spark.

 

To run jobs successfully, you need to create a home directory for every user in HDFS for staging data under /user/<username>. Users can configure a step to create a home directory at cluster launch time for every user who has access to the cluster. In this example, you use Hue since Hue will create the home directory in HDFS for the user at the first login. Here Hue also needs to be integrated with the same Active Directory as explained in the example configuration described earlier.

First, log in to Hue as a data engineer user, and open a Hive Notebook in Hue. Then run a query to create a new table pointing to the data engineer bucket, s3://emrfs-auth-data-engineering-bucket-demo/table1_data_eng/.

You can see that the table was created successfully. Now try to create another table pointing to the data science group’s bucket, where the data engineer group doesn’t have access.

It failed and threw an Amazon S3 Access Denied error.

Now insert one line of data into the successfully create table.

Next, log out, switch to a data science group user, and create another table, test2_datasci_tb.

The creation is successful.

The last task is to test Spark (it requires the user directory, but Hue created one in the previous step).

Now let’s come back to the command line and run some Spark commands.

Login to the master node using the datascientist1 user:

Start the SparkSQL interactive shell by typing spark-sql, and run the show tables command. It should list the tables that you created using Hive.

As a data science group user, try select on both tables. You will find that you can only select the table defined in the location that your group has access to.

Conclusion

EMRFS authorization for Amazon S3 enables you to have multiple roles on the same cluster, providing flexibility to configure a shared cluster for different teams to achieve better efficiency. The Active Directory integration and group mapping make it much easier for you to manage your users and groups, and provides better auditability in a multi-tenant environment.


Additional Reading

If you found this post useful, be sure to check out Use Kerberos Authentication to Integrate Amazon EMR with Microsoft Active Directory and Launching and Running an Amazon EMR Cluster inside a VPC.


About the Authors

Songzhi Liu is a Big Data Consultant with AWS Professional Services. He works closely with AWS customers to provide them Big Data & Machine Learning solutions and best practices on the Amazon cloud.

 

 

 

 

All-In on Unlimited Backup

Post Syndicated from Gleb Budman original https://www.backblaze.com/blog/all-in-on-unlimited-backup/

chips on computer with cloud backup

The cloud backup industry has seen its share of tumultuousness. BitCasa, Dell DataSafe, Xdrive, and a dozen others have closed up shop. Mozy, Amazon, and Microsoft offered, but later canceled, their unlimited offerings. Recently, CrashPlan for Home customers were notified that their service was being end-of-lifed. Then today we’ve heard from Carbonite customers who are frustrated by this morning’s announcement of a price increase from Carbonite.

We believe that the fundamental goal of a cloud backup is having peace-of-mind: knowing your data — all of it — is safe. For over 10 years Backblaze has been providing that peace-of-mind by offering completely unlimited cloud backup to our customers. And we continue to be committed to that. Knowing that your cloud backup vendor is not going to disappear or fundamentally change their service is an essential element in achieving that peace-of-mind.

Committed to Unlimited Backup

When Mozy discontinued their unlimited backup on Jan 31, 2011, a lot of people asked, “Does this mean Backblaze will discontinue theirs as well?” At that time I wrote the blog post Backblaze is committed to unlimited backup. That was seven years ago. Since then we’ve continued to make Backblaze cloud backup better: dramatically speeding up backups and restores, offering the unique and very popular Restore Return Refund program, enabling direct access and sharing of any file in your backup, and more. We also introduced Backblaze Groups to enable businesses and families to manage backups — all at no additional cost.

How That’s Possible

I’d like to answer the question of “How have you been able to do this when others haven’t?

First, commitment. It’s not impossible to offer unlimited cloud backup, but it’s not easy. The Backblaze team has been committed to unlimited as a core tenet.

Second, we have pursued the technical, business, and cultural steps required to make it happen. We’ve designed our own servers, written our cloud storage software, run our own operations, and been continually focused on every place we could optimize a penny out of the cost of storage. We’ve built a culture at Backblaze that cares deeply about that.

Ensuring Peace-of-Mind

Price increases and plan changes happen in our industry, but Backblaze has consistently been the low price leader, and continues to stand by the foundational element of our service — truly unlimited backup storage. Carbonite just announced a price increase from $60 to $72/year, and while that’s not an astronomical increase, it’s important to keep in mind the service that they are providing at that rate. The basic Carbonite plan provides a service that doesn’t back up videos or external hard drives by default. We think that’s dangerous. No one wants to discover that their videos weren’t backed up after their computer dies, or have to worry about the safety and durability of their data. That is why we have continued to build on our foundation of unlimited, as well as making our service faster and more accessible. All of these serve the goal of ensuring peace-of-mind for our customers.

3 Months Free For You & A Friend

As part of our commitment to unlimited, refer your friends to receive three months of Backblaze service through March 15, 2018. When you Refer-a-Friend with your personal referral link, and they subscribe, both of you will receive three months of service added to your account. See promotion details on our Refer-a-Friend page.

Want A Reminder When Your Carbonite Subscription Runs Out?

If you’re considering switching from Carbonite, we’d love to be your new backup provider. Enter your email and the date you’d like to be reminded in the form below and you’ll get a friendly reminder email from us to start a new backup plan with Backblaze. Or, you could start a free trial today.

We think you’ll be glad you switched, and you’ll have a chance to experience some of that Backblaze peace-of-mind for your data.

Please Send Me a Reminder When I Need a New Backup Provider



 

The post All-In on Unlimited Backup appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Anti-Piracy Video Scares Kids With ‘Fake’ Malware Info

Post Syndicated from Ernesto original https://torrentfreak.com/anti-piracy-video-scares-kids-with-fake-malware-info-180206/

Today is Safer Internet Day, a global awareness campaign to educate the public on all sorts of threats that people face online.

It is a laudable initiative supported by the Industry Trust for IP Awareness which, together with the children’s charity Into Film, has released an informative video and associated course materials.

The organizations have created a British version of an animation previously released as part of the Australian “Price of Piracy” campaign. While the video includes an informative description of the various types of malware, there appears to be a secondary agenda.

Strangely enough, the video itself contains no advice on how to avoid malware at all, other than to avoid pirate sites. In that sense, it looks more like an indirect anti-piracy ad.

While there’s no denying that kids might run into malware if they randomly click on pirate site ads, this problem is certainly not exclusive to these sites. Email and social media are frequently used to link to malware too, and YouTube comments can pose the same risk. The problem is everywhere.

What really caught our eye, however, is the statement that pirate sites are the most used propagation method for malware. “Did you know, the number one way we infect your device is via illegal pirate sites,” an animated piece of malware claims in the video.

Forget about email attachments, spam links, compromised servers, or even network attacks. Pirate sites are the number one spot through which malware spreads. According to the video at least. But where do they get this knowledge?

Meet the malwares

When we asked the Industry Trust for IP Awareness for further details, the organization checked with their Australian colleagues, who pointed us to a working paper (pdf) from 2014. This paper includes the following line: “Illegal streaming websites are now the number one propagation mechanism for malicious software as 97% of them contain malware.”

Unfortunately, there’s a lot wrong with this claim.

Through another citation, the 97% figure points to this unpublished study of which only the highlights were shared. This “malware” research looked at the prevalence of malware and other unwanted software linked to pirate sites. Not just streaming sites as the other paper said, but let’s ignore that last bit.

What the study actually found is that of the 30 researched pirate sites, “90% contained malware or other ‘Potentially Unwanted Programmes’.” Note that this is not the earlier mentioned 97%, and that this broad category not only includes malware but also popup ads, which were most popular. This means that the percentage of actual malware on these sites can be anywhere from 0.1% to 90%.

Importantly, none of the malware found in this research was installed without an action performed by the user, such as clicking on a flashy download button or installing a mysterious .exe file.

Aside from clearly erroneous references, the more worrying issue is that even the original incorrect statement that “97% of all pirate sites contain malware” provides no evidence for the claim in the video that pirate sites are “the number one way” through which malware spreads.

Even if 100% of all pirate sites link to malware, that’s no proof that it’s the most used propagation method.

The malware issue has been a popular talking point for a while, but after searching for answers for days, we couldn’t find a grain of evidence. There are a lot of malware propagation methods, including email, which traditionally is a very popular choice.

Even more confusingly, the same paper that was cited as a source for the pirate site malware claim notes that 80% of all web-based malware is hosted on “innocent” but compromised websites.

As the provided evidence gave no answers, we asked the experts to chime in. Luckily, security company Malwarebytes was willing to share its assessment. As leaders in the anti-malware industry, they should know better than researchers who have their numbers and terminology mixed up.

“These days, most common infections come from malicious spam campaigns and drive-by exploit attacks,” Adam Kujawa, Director of Malware Intelligence at Malwarebytes informs us.

“Torrent sites are still frequently used by criminals to host malware disguised as something the user wants, like an application, movie, etc. However they are really only a threat to people who use torrent sites regularly and those people have likely learned how to avoid malicious torrents,” he adds.

In other words, most people who regularly visit pirate sites know how to avoid these dangers. That doesn’t mean that they are not a threat to unsuspecting kids who visit them for the first time of course.

“Now, if users who were not familiar with torrent and pirate sites started using these services, there is a high probability that they could encounter some kind of malware. However, many of these sites have user review processes to let other users know if a particular torrent or download is likely malicious.

“So, unless a user is completely new to this process and ignores all the warning signs, they could walk away from a pirate site without getting infected,” Kujawa says.

Overall, the experts at Malwarebytes see no evidence for the claim that pirate sites are the number one propagation method for malware.

“So in summary, I don’t think the claim that ‘pirate sites’ are the number one way to infect users is accurate at all,” Kujawa concludes.

While it’s always a good idea to avoid places that can have a high prevalence of malware, including pirate sites, the claims in the video are not backed up by real evidence. There are tens of thousands of non-pirate sites that pose similar or worse risks, so it’s always a good idea to have anti-malware and virus software installed.

The organizations and people involved in the British “Meet the Malwares” video might not have been aware of the doubtful claims, but it’s unfortunate that they didn’t opt for a broader campaign instead of the focused anti-piracy message.

Finally, since it’s still Safer Internet Day, we encourage kids to take a close look at the various guides on how to avoid “fake news” while engaging in critical thinking.

Be safe!

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Reactive Microservices Architecture on AWS

Post Syndicated from Sascha Moellering original https://aws.amazon.com/blogs/architecture/reactive-microservices-architecture-on-aws/

Microservice-application requirements have changed dramatically in recent years. These days, applications operate with petabytes of data, need almost 100% uptime, and end users expect sub-second response times. Typical N-tier applications can’t deliver on these requirements.

Reactive Manifesto, published in 2014, describes the essential characteristics of reactive systems including: responsiveness, resiliency, elasticity, and being message driven.

Being message driven is perhaps the most important characteristic of reactive systems. Asynchronous messaging helps in the design of loosely coupled systems, which is a key factor for scalability. In order to build a highly decoupled system, it is important to isolate services from each other. As already described, isolation is an important aspect of the microservices pattern. Indeed, reactive systems and microservices are a natural fit.

Implemented Use Case
This reference architecture illustrates a typical ad-tracking implementation.

Many ad-tracking companies collect massive amounts of data in near-real-time. In many cases, these workloads are very spiky and heavily depend on the success of the ad-tech companies’ customers. Typically, an ad-tracking-data use case can be separated into a real-time part and a non-real-time part. In the real-time part, it is important to collect data as fast as possible and ask several questions including:,  “Is this a valid combination of parameters?,””Does this program exist?,” “Is this program still valid?”

Because response time has a huge impact on conversion rate in advertising, it is important for advertisers to respond as fast as possible. This information should be kept in memory to reduce communication overhead with the caching infrastructure. The tracking application itself should be as lightweight and scalable as possible. For example, the application shouldn’t have any shared mutable state and it should use reactive paradigms. In our implementation, one main application is responsible for this real-time part. It collects and validates data, responds to the client as fast as possible, and asynchronously sends events to backend systems.

The non-real-time part of the application consumes the generated events and persists them in a NoSQL database. In a typical tracking implementation, clicks, cookie information, and transactions are matched asynchronously and persisted in a data store. The matching part is not implemented in this reference architecture. Many ad-tech architectures use frameworks like Hadoop for the matching implementation.

The system can be logically divided into the data collection partand the core data updatepart. The data collection part is responsible for collecting, validating, and persisting the data. In the core data update part, the data that is used for validation gets updated and all subscribers are notified of new data.

Components and Services

Main Application
The main application is implemented using Java 8 and uses Vert.x as the main framework. Vert.x is an event-driven, reactive, non-blocking, polyglot framework to implement microservices. It runs on the Java virtual machine (JVM) by using the low-level IO library Netty. You can write applications in Java, JavaScript, Groovy, Ruby, Kotlin, Scala, and Ceylon. The framework offers a simple and scalable actor-like concurrency model. Vert.x calls handlers by using a thread known as an event loop. To use this model, you have to write code known as “verticles.” Verticles share certain similarities with actors in the actor model. To use them, you have to implement the verticle interface. Verticles communicate with each other by generating messages in  a single event bus. Those messages are sent on the event bus to a specific address, and verticles can register to this address by using handlers.

With only a few exceptions, none of the APIs in Vert.x block the calling thread. Similar to Node.js, Vert.x uses the reactor pattern. However, in contrast to Node.js, Vert.x uses several event loops. Unfortunately, not all APIs in the Java ecosystem are written asynchronously, for example, the JDBC API. Vert.x offers a possibility to run this, blocking APIs without blocking the event loop. These special verticles are called worker verticles. You don’t execute worker verticles by using the standard Vert.x event loops, but by using a dedicated thread from a worker pool. This way, the worker verticles don’t block the event loop.

Our application consists of five different verticles covering different aspects of the business logic. The main entry point for our application is the HttpVerticle, which exposes an HTTP-endpoint to consume HTTP-requests and for proper health checking. Data from HTTP requests such as parameters and user-agent information are collected and transformed into a JSON message. In order to validate the input data (to ensure that the program exists and is still valid), the message is sent to the CacheVerticle.

This verticle implements an LRU-cache with a TTL of 10 minutes and a capacity of 100,000 entries. Instead of adding additional functionality to a standard JDK map implementation, we use Google Guava, which has all the features we need. If the data is not in the L1 cache, the message is sent to the RedisVerticle. This verticle is responsible for data residing in Amazon ElastiCache and uses the Vert.x-redis-client to read data from Redis. In our example, Redis is the central data store. However, in a typical production implementation, Redis would just be the L2 cache with a central data store like Amazon DynamoDB. One of the most important paradigms of a reactive system is to switch from a pull- to a push-based model. To achieve this and reduce network overhead, we’ll use Redis pub/sub to push core data changes to our main application.

Vert.x also supports direct Redis pub/sub-integration, the following code shows our subscriber-implementation:

vertx.eventBus().<JsonObject>consumer(REDIS_PUBSUB_CHANNEL_VERTX, received -> {

JsonObject value = received.body().getJsonObject("value");

String message = value.getString("message");

JsonObject jsonObject = new JsonObject(message);

eb.send(CACHE_REDIS_EVENTBUS_ADDRESS, jsonObject);

});

redis.subscribe(Constants.REDIS_PUBSUB_CHANNEL, res -> {

if (res.succeeded()) {

LOGGER.info("Subscribed to " + Constants.REDIS_PUBSUB_CHANNEL);

} else {

LOGGER.info(res.cause());

}

});

The verticle subscribes to the appropriate Redis pub/sub-channel. If a message is sent over this channel, the payload is extracted and forwarded to the cache-verticle that stores the data in the L1-cache. After storing and enriching data, a response is sent back to the HttpVerticle, which responds to the HTTP request that initially hit this verticle. In addition, the message is converted to ByteBuffer, wrapped in protocol buffers, and send to an Amazon Kinesis Data Stream.

The following example shows a stripped-down version of the KinesisVerticle:

public class KinesisVerticle extends AbstractVerticle {

private static final Logger LOGGER = LoggerFactory.getLogger(KinesisVerticle.class);

private AmazonKinesisAsync kinesisAsyncClient;

private String eventStream = "EventStream";

@Override

public void start() throws Exception {

EventBus eb = vertx.eventBus();

kinesisAsyncClient = createClient();

eventStream = System.getenv(STREAM_NAME) == null ? "EventStream" : System.getenv(STREAM_NAME);

eb.consumer(Constants.KINESIS_EVENTBUS_ADDRESS, message -> {

try {

TrackingMessage trackingMessage = Json.decodeValue((String)message.body(), TrackingMessage.class);

String partitionKey = trackingMessage.getMessageId();

byte [] byteMessage = createMessage(trackingMessage);

ByteBuffer buf = ByteBuffer.wrap(byteMessage);

sendMessageToKinesis(buf, partitionKey);

message.reply("OK");

}

catch (KinesisException exc) {

LOGGER.error(exc);

}

});

}

Kinesis Consumer
This AWS Lambda function consumes data from an Amazon Kinesis Data Stream and persists the data in an Amazon DynamoDB table. In order to improve testability, the invocation code is separated from the business logic. The invocation code is implemented in the class KinesisConsumerHandler and iterates over the Kinesis events pulled from the Kinesis stream by AWS Lambda. Each Kinesis event is unwrapped and transformed from ByteBuffer to protocol buffers and converted into a Java object. Those Java objects are passed to the business logic, which persists the data in a DynamoDB table. In order to improve duration of successive Lambda calls, the DynamoDB-client is instantiated lazily and reused if possible.

Redis Updater
From time to time, it is necessary to update core data in Redis. A very efficient implementation for this requirement is using AWS Lambda and Amazon Kinesis. New core data is sent over the AWS Kinesis stream using JSON as data format and consumed by a Lambda function. This function iterates over the Kinesis events pulled from the Kinesis stream by AWS Lambda. Each Kinesis event is unwrapped and transformed from ByteBuffer to String and converted into a Java object. The Java object is passed to the business logic and stored in Redis. In addition, the new core data is also sent to the main application using Redis pub/sub in order to reduce network overhead and converting from a pull- to a push-based model.

The following example shows the source code to store data in Redis and notify all subscribers:

public void updateRedisData(final TrackingMessage trackingMessage, final Jedis jedis, final LambdaLogger logger) {

try {

ObjectMapper mapper = new ObjectMapper();

String jsonString = mapper.writeValueAsString(trackingMessage);

Map<String, String> map = marshal(jsonString);

String statusCode = jedis.hmset(trackingMessage.getProgramId(), map);

}

catch (Exception exc) {

if (null == logger)

exc.printStackTrace();

else

logger.log(exc.getMessage());

}

}

public void notifySubscribers(final TrackingMessage trackingMessage, final Jedis jedis, final LambdaLogger logger) {

try {

ObjectMapper mapper = new ObjectMapper();

String jsonString = mapper.writeValueAsString(trackingMessage);

jedis.publish(Constants.REDIS_PUBSUB_CHANNEL, jsonString);

}

catch (final IOException e) {

log(e.getMessage(), logger);

}

}

Similarly to our Kinesis Consumer, the Redis-client is instantiated somewhat lazily.

Infrastructure as Code
As already outlined, latency and response time are a very critical part of any ad-tracking solution because response time has a huge impact on conversion rate. In order to reduce latency for customers world-wide, it is common practice to roll out the infrastructure in different AWS Regions in the world to be as close to the end customer as possible. AWS CloudFormation can help you model and set up your AWS resources so that you can spend less time managing those resources and more time focusing on your applications that run in AWS.

You create a template that describes all the AWS resources that you want (for example, Amazon EC2 instances or Amazon RDS DB instances), and AWS CloudFormation takes care of provisioning and configuring those resources for you. Our reference architecture can be rolled out in different Regions using an AWS CloudFormation template, which sets up the complete infrastructure (for example, Amazon Virtual Private Cloud (Amazon VPC), Amazon Elastic Container Service (Amazon ECS) cluster, Lambda functions, DynamoDB table, Amazon ElastiCache cluster, etc.).

Conclusion
In this blog post we described reactive principles and an example architecture with a common use case. We leveraged the capabilities of different frameworks in combination with several AWS services in order to implement reactive principles—not only at the application-level but also at the system-level. I hope I’ve given you ideas for creating your own reactive applications and systems on AWS.

About the Author

Sascha Moellering is a Senior Solution Architect. Sascha is primarily interested in automation, infrastructure as code, distributed computing, containers and JVM. He can be reached at [email protected]

 

 

Success at Apache: A Newbie’s Narrative

Post Syndicated from mikesefanov original https://yahooeng.tumblr.com/post/170536010891

yahoodevelopers:

Kuhu Shukla (bottom center) and team at the 2017 DataWorks Summit


By Kuhu Shukla

This post first appeared here on the Apache Software Foundation blog as part of ASF’s “Success at Apache” monthly blog series.

As I sit at my desk on a rather frosty morning with my coffee, looking up new JIRAs from the previous day in the Apache Tez project, I feel rather pleased. The latest community release vote is complete, the bug fixes that we so badly needed are in and the new release that we tested out internally on our many thousand strong cluster is looking good. Today I am looking at a new stack trace from a different Apache project process and it is hard to miss how much of the exceptional code I get to look at every day comes from people all around the globe. A contributor leaves a JIRA comment before he goes on to pick up his kid from soccer practice while someone else wakes up to find that her effort on a bug fix for the past two months has finally come to fruition through a binding +1.

Yahoo – which joined AOL, HuffPost, Tumblr, Engadget, and many more brands to form the Verizon subsidiary Oath last year – has been at the frontier of open source adoption and contribution since before I was in high school. So while I have no historical trajectories to share, I do have a story on how I found myself in an epic journey of migrating all of Yahoo jobs from Apache MapReduce to Apache Tez, a then-new DAG based execution engine.

Oath grid infrastructure is through and through driven by Apache technologies be it storage through HDFS, resource management through YARN, job execution frameworks with Tez and user interface engines such as Hive, Hue, Pig, Sqoop, Spark, Storm. Our grid solution is specifically tailored to Oath’s business-critical data pipeline needs using the polymorphic technologies hosted, developed and maintained by the Apache community.

On the third day of my job at Yahoo in 2015, I received a YouTube link on An Introduction to Apache Tez. I watched it carefully trying to keep up with all the questions I had and recognized a few names from my academic readings of Yarn ACM papers. I continued to ramp up on YARN and HDFS, the foundational Apache technologies Oath heavily contributes to even today. For the first few weeks I spent time picking out my favorite (necessary) mailing lists to subscribe to and getting started on setting up on a pseudo-distributed Hadoop cluster. I continued to find my footing with newbie contributions and being ever more careful with whitespaces in my patches. One thing was clear – Tez was the next big thing for us. By the time I could truly call myself a contributor in the Hadoop community nearly 80-90% of the Yahoo jobs were now running with Tez. But just like hiking up the Grand Canyon, the last 20% is where all the pain was. Being a part of the solution to this challenge was a happy prospect and thankfully contributing to Tez became a goal in my next quarter.

The next sprint planning meeting ended with me getting my first major Tez assignment – progress reporting. The progress reporting in Tez was non-existent – “Just needs an API fix,”  I thought. Like almost all bugs in this ecosystem, it was not easy. How do you define progress? How is it different for different kinds of outputs in a graph? The questions were many.

I, however, did not have to go far to get answers. The Tez community actively came to a newbie’s rescue, finding answers and posing important questions. I started attending the bi-weekly Tez community sync up calls and asking existing contributors and committers for course correction. Suddenly the team was much bigger, the goals much more chiseled. This was new to anyone like me who came from the networking industry, where the most open part of the code are the RFCs and the implementation details are often hidden. These meetings served as a clean room for our coding ideas and experiments. Ideas were shared, to the extent of which data structure we should pick and what a future user of Tez would take from it. In between the usual status updates and extensive knowledge transfers were made.

Oath uses Apache Pig and Apache Hive extensively and most of the urgent requirements and requests came from Pig and Hive developers and users. Each issue led to a community JIRA and as we started running Tez at Oath scale, new feature ideas and bugs around performance and resource utilization materialized. Every year most of the Hadoop team at Oath travels to the Hadoop Summit where we meet our cohorts from the Apache community and we stand for hours discussing the state of the art and what is next for the project. One such discussion set the course for the next year and a half for me.

We needed an innovative way to shuffle data. Frameworks like MapReduce and Tez have a shuffle phase in their processing lifecycle wherein the data from upstream producers is made available to downstream consumers. Even though Apache Tez was designed with a feature set corresponding to optimization requirements in Pig and Hive, the Shuffle Handler Service was retrofitted from MapReduce at the time of the project’s inception. With several thousands of jobs on our clusters leveraging these features in Tez, the Shuffle Handler Service became a clear performance bottleneck. So as we stood talking about our experience with Tez with our friends from the community, we decided to implement a new Shuffle Handler for Tez. All the conversation points were tracked now through an umbrella JIRA TEZ-3334 and the to-do list was long. I picked a few JIRAs and as I started reading through I realized, this is all new code I get to contribute to and review. There might be a better way to put this, but to be honest it was just a lot of fun! All the whiteboards were full, the team took walks post lunch and discussed how to go about defining the API. Countless hours were spent debugging hangs while fetching data and looking at stack traces and Wireshark captures from our test runs. Six months in and we had the feature on our sandbox clusters. There were moments ranging from sheer frustration to absolute exhilaration with high fives as we continued to address review comments and fixing big and small issues with this evolving feature.

As much as owning your code is valued everywhere in the software community, I would never go on to say “I did this!” In fact, “we did!” It is this strong sense of shared ownership and fluid team structure that makes the open source experience at Apache truly rewarding. This is just one example. A lot of the work that was done in Tez was leveraged by the Hive and Pig community and cross Apache product community interaction made the work ever more interesting and challenging. Triaging and fixing issues with the Tez rollout led us to hit a 100% migration score last year and we also rolled the Tez Shuffle Handler Service out to our research clusters. As of last year we have run around 100 million Tez DAGs with a total of 50 billion tasks over almost 38,000 nodes.

In 2018 as I move on to explore Hadoop 3.0 as our future release, I hope that if someone outside the Apache community is reading this, it will inspire and intrigue them to contribute to a project of their choice. As an astronomy aficionado, going from a newbie Apache contributor to a newbie Apache committer was very much like looking through my telescope - it has endless possibilities and challenges you to be your best.

About the Author:

Kuhu Shukla is a software engineer at Oath and did her Masters in Computer Science at North Carolina State University. She works on the Big Data Platforms team on Apache Tez, YARN and HDFS with a lot of talented Apache PMCs and Committers in Champaign, Illinois. A recent Apache Tez Committer herself she continues to contribute to YARN and HDFS and spoke at the 2017 Dataworks Hadoop Summit on “Tez Shuffle Handler: Shuffling At Scale With Apache Hadoop”. Prior to that she worked on Juniper Networks’ router and switch configuration APIs. She likes to participate in open source conferences and women in tech events. In her spare time she loves singing Indian classical and jazz, laughing, whale watching, hiking and peering through her Dobsonian telescope.