Tag Archives: cap

Dutch Continue to Curb Illegal Downloading But What About Streaming?

Post Syndicated from Andy original https://torrentfreak.com/dutch-continue-to-curb-illegal-downloading-but-what-about-streaming-180222/

After many years of downloading content with impunity, 2014 brought a culture shock to the Dutch.

Citizens were previously allowed to obtain content for their own use due to a levy on blank media that compensated rightsholders. However, the European Court of Justice found that system to be illegal and the government quickly moved to ban downloading from unauthorized sources.

In the four years that have passed since the ban, the downloading landscape has undergone change. That’s according to a study published by the Consumer Insights panel at Telecompaper which found that while 41% of respondents downloaded movies, TV shows, music and games from unauthorized sources in 2013, the figure had plunged to 27% at the end of 2016. There was a further drop to 24% by the end of 2017.

Of the people who continue to download illegally, men are overrepresented, the study found. While 27% of men obtained media for free during the last year to October 2017, only 21% of women did likewise.

While as many as 150 million people still use P2P technologies such as BitTorrent worldwide, there is a general decline in usage and this is reflected in the report.

In 2013, 18% of Dutch respondents used torrent-like systems to download, a figure that had fallen to 8% in 2016 and 6% last year. Again, male participants were overrepresented, outnumbering women by two to one. However, people appear to be visiting P2P networks less.

“The study showed that people who reported using P2P to download content, have done so on average 37 times a year [to October 2017]. In January of 2017 it was significantly higher, 61 times,” the study notes. P2P usage in November 2015 was rated at 98 instances per year.

Perhaps surprisingly, one of the oldest methods of downloading content has maintained its userbase in more recent years. Usenet, otherwise known as the newsgroups, accounted for 9% of downloaders in 2013 but after falling to around 6% of downloaders in 2016, that figure remained unchanged in 2017. Almost five times more men used newsgroups than women.

At the same time as showing a steady trend in terms of users, instances of newsgroup downloading are reportedly up in the latest count. In November 2015, people used the system an average of 98 times per year but in January 2017 that had fallen to 66 times. The latest figures find an average use of 68 times per year.

Drilling down into more obscure systems, 2% of respondents told Telecompaper that they’d used an FTP server during the past year, a method that was entirely dominated by men.

While the Dutch downloading ban in 2013 may have played some part in changing perceptions, the increased availability of legal offers cannot be ignored. Films and TV shows are now widely available on services such as Netflix and Amazon, while music is strongly represented via Spotify, Apple, Deezer and similar platforms.

Indeed, 12% of respondents said they are now downloading less illegally because it’s easier to obtain paid content, that’s versus 11% at the start of 2017 and just 3% in 2013. Interestingly, 14% of respondents this time around said their illegal downloads are down because they have more restrictions on their time.

Another interesting reason given for downloading less is that pirate content is becoming harder to find. In 2013, just 4% cited this as a cause for reduction yet in 2017, this had jumped to 8% of respondents, with blocked sites proving a stumbling block for some users.

On the other hand, 3% of respondents said that since content had become easier to find, they are now downloading more. However, that figure is down from 13% in November 2013 and 6% in January 2017.

But with legal streaming certainly making its mark in the Netherlands, the illegal streaming phenomenon isn’t directly addressed in the report. It is likely that a considerable number of citizens are now using this method to obtain their content fix in a way that’s not as easily trackable as torrent-like systems.

Furthermore, given the plans of local film distribution Dutch FilmWorks to chase and demand cash settlements from BitTorrent users, it’s likely that traffic to streaming sites will only increase in the months to come, at least for those looking to consume TV shows and movies.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

qrocodile: the kid-friendly Sonos system

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/qrocodile-kid-friendly-sonos-system/

Chris Campbell’s qrocodile uses a Raspberry Pi, a camera, and QR codes to allow Chris’s children to take full control of the Sonos home sound system. And we love it!

qrocodile

Introducing qrocodile, a kid-friendly system for controlling your Sonos with QR codes. Source code is available at: https://github.com/chrispcampbell/qrocodile Learn more at: http://labonnesoupe.org https://twitter.com/chrscmpbll

Sonos

SONOS is SONOS backwards. It’s also SONOS upside down, and SONOS upside down and backwards. I just learnt that this means SONOS is an ambigram. Hurray for learning!

Sonos (the product, not the ambigram) is a multi-room speaker system controlled by an app. Speakers in different rooms can play different tracks or join forces to play one track for a smooth musical atmosphere throughout your home.

sonos raspberry pi

If you have a Sonos system in your home, I would highly recommend accessing to it from outside your home and set it to play the Imperial March as you walk through the front door. Why wouldn’t you?

qrocodile

One day, Chris’s young children wanted to play an album while eating dinner. By this one request, he was inspired to create qrocodile, a musical jukebox enabling his children to control the songs Sonos plays, and where it plays them, via QR codes.

It all started one night at the dinner table over winter break. The kids wanted to put an album on the turntable (hooked up to the line-in on a Sonos PLAY:5 in the dining room). They’re perfectly capable of putting vinyl on the turntable all by themselves, but using the Sonos app to switch over to play from the line-in is a different story.

The QR codes represent commands (such as Play in the living room, Use the turntable, or Build a song list) and artists (such as my current musical crush Courtney Barnett or the Ramones).

qrocodile raspberry Pi

A camera attached to a Raspberry Pi 3 feeds the Pi the QR code that’s presented, and the Pi runs a script that recognises the code and sends instructions to Sonos accordingly.


Chris used a costum version of the Sonos HTTP API created by Jimmy Shimizu to gain access to Sonos from his Raspberry Pi. To build the QR codes, he wrote a script that utilises the Spotify API via the Spotipy library.

His children are now able to present recognisable album art to the camera in order to play their desired track.

It’s been interesting seeing the kids putting the thing through its paces during their frequent “dance parties”, queuing up their favorite songs and uncovering new ones. I really like that they can use tangible objects to discover music in much the same way I did when I was their age, looking through my parents records, seeing which ones had interesting artwork or reading the song titles on the back, listening and exploring.

Chris has provided all the scripts for the project, along with a tutorial of how to set it up, on his GitHub — have a look if you want to recreate it or learn more about his code. Also check out Chris’ website for more on qrocodile and to see some of his other creations.

The post qrocodile: the kid-friendly Sonos system appeared first on Raspberry Pi.

Google on Collision Course With Movie Biz Over Piracy & Safe Harbor

Post Syndicated from Andy original https://torrentfreak.com/google-on-collision-course-with-movie-biz-over-piracy-safe-harbor-180219/

Wherever Google has a presence, rightsholders are around to accuse the search giant of not doing enough to deal with piracy.

Over the past several years, the company has been attacked by both the music and movie industries but despite overtures from Google, criticism still floods in.

In Australia, things are definitely heating up. Village Roadshow, one of the nation’s foremost movie companies, has been an extremely vocal Google critic since 2015 but now its co-chief, the outspoken Graham Burke, seems to want to take things to the next level.

As part of yet another broadside against Google, Burke has for the second time in a month accused Google of playing a large part in online digital crime.

“My view is they are complicit and they are facilitating crime,” Burke said, adding that if Google wants to sue him over his comments, they’re very welcome to do so.

It’s highly unlikely that Google will take the bait. Burke’s attempt at pushing the issue further into the spotlight will have been spotted a mile off but in any event, legal battles with Google aren’t really something that Burke wants to get involved in.

Australia is currently in the midst of a consultation process for the Copyright Amendment (Service Providers) Bill 2017 which would extend the country’s safe harbor provisions to a broader range of service providers including educational institutions, libraries, archives, key cultural institutions and organizations assisting people with disabilities.

For its part, Village Roadshow is extremely concerned that these provisions may be extended to other providers – specifically Google – who might then use expanded safe harbor to deflect more liability in respect of piracy.

“Village Roadshow….urges that there be no further amendments to safe harbor and in particular there is no advantage to Australia in extending safe harbor to Google,” Burke wrote in his company’s recent submission to the government.

“It is very unlikely given their size and power that as content owners we would ever sue them but if we don’t have that right then we stand naked. Most importantly if Google do the right thing by Australia on the question of piracy then there will be no issues. However, they are very far from this position and demonstrably are facilitating crime.”

Accusations of crime facilitation are nothing new for Google, with rightsholders in the US and Europe having accused the company of the same a number of times over the years. In response, Google always insists that it abides by relevant laws and actually goes much further in tackling piracy than legislation currently requires.

On the safe harbor front, Google begins by saying that not expanding provisions to service providers will have a seriously detrimental effect on business development in the region.

“[Excluding] online service providers falls far short of a balanced, pro-innovation environment for Australia. Further, it takes Australia out of step with other digital economies by creating regulatory uncertainty for [venture capital] investment and startup/entrepreneurial success,” Google’s submission reads.

“[T]he Draft Bill’s narrow safe harbor scheme places Australian-based startups and online service providers — including individual bloggers, websites, small startups, video-hosting services, enterprise cloud companies, auction sites, online marketplaces, hosting providers for real-estate listings, photo hosting services, search engines, review sites, and online platforms —in a disadvantaged position compared with global startups in countries that have strong safe harbor frameworks, such as the United States, Canada, United Kingdom, Singapore, South Korea, Japan, and other EU countries.

“Under the new scheme, Australian-based startups and service providers, unlike their international counterparts, will not receive clear and consistent legal protection when they respond to complaints from rightsholders about alleged instances of online infringement by third-party users on their services,” Google notes.

Interestingly, Google then delivers what appears to be a loosely veiled threat.

One of the key anti-piracy strategies touted by the mainstream entertainment companies is collaboration between rightsholders and service providers, including the latter providing voluntary tools to police infringement online. Google says that if service providers are given a raw deal on safe harbor, the extent of future cooperation may be at risk.

“If Australian-based service providers are carved out of the new safe harbor regime post-reform, they will operate from a lower incentive to build and test new voluntary tools to combat online piracy, potentially reducing their contributions to innovation in best practices in both Australia and international markets,” the company warns.

But while Village Roadshow argue against safe harbors and warn that piracy could kill the movie industry, it is quietly optimistic that the tide is turning.

In a presentation to investors last week, the company said that reducing piracy would have “only an upside” for its business but also added that new research indicates that “piracy growth [is] getting arrested.” As a result, the company says that it will build on the notion that “74% of people see piracy as ‘wrong/theft’” and will call on Australians to do the right thing.

In the meantime, the pressure on Google will continue but lawsuits – in either direction – won’t provide an answer.

Village Roadshow’s submission can be found here, Google’s here (pdf).

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Tech wishes for 2018

Post Syndicated from Eevee original https://eev.ee/blog/2018/02/18/tech-wishes-for-2018/

Anonymous asks, via money:

What would you like to see happen in tech in 2018?

(answer can be technical, social, political, combination, whatever)

Hmm.

Less of this

I’m not really qualified to speak in depth about either of these things, but let me put my foot in my mouth anyway:

The Blockchain™

Bitcoin was a neat idea. No, really! Decentralization is cool. Overhauling our terrible financial infrastructure is cool. Hash functions are cool.

Unfortunately, it seems to have devolved into mostly a get-rich-quick scheme for nerds, and by nearly any measure it’s turning into a spectacular catastrophe. Its “success” is measured in how much a bitcoin is worth in US dollars, which is pretty close to an admission from its own investors that its only value is in converting back to “real” money — all while that same “success” is making it less useful as a distinct currency.

Blah, blah, everyone already knows this.

What concerns me slightly more is the gold rush hype cycle, which is putting cryptocurrency and “blockchain” in the news and lending it all legitimacy. People have raked in millions of dollars on ICOs of novel coins I’ve never heard mentioned again. (Note: again, that value is measured in dollars.) Most likely, none of the investors will see any return whatsoever on that money. They can’t, really, unless a coin actually takes off as a currency, and that seems at odds with speculative investing since everyone either wants to hoard or ditch their coins. When the coins have no value themselves, the money can only come from other investors, and eventually the hype winds down and you run out of other investors.

I fear this will hurt a lot of people before it’s over, so I’d like for it to be over as soon as possible.


That said, the hype itself has gotten way out of hand too. First it was the obsession with “blockchain” like it’s a revolutionary technology, but hey, Git is a fucking blockchain. The novel part is the way it handles distributed consensus (which in Git is basically left for you to figure out), and that’s uniquely important to currency because you want to be pretty sure that money doesn’t get duplicated or lost when moved around.

But now we have startups trying to use blockchains for website backends and file storage and who knows what else? Why? What advantage does this have? When you say “blockchain”, I hear “single Git repository” — so when you say “email on the blockchain”, I have an aneurysm.

Bitcoin seems to have sparked imagination in large part because it’s decentralized, but I’d argue it’s actually a pretty bad example of a decentralized network, since people keep forking it. The ability to fork is a feature, sure, but the trouble here is that the Bitcoin family has no notion of federation — there is one canonical Bitcoin ledger and it has no notion of communication with any other. That’s what you want for currency, not necessarily other applications. (Bitcoin also incentivizes frivolous forking by giving the creator an initial pile of coins to keep and sell.)

And federation is much more interesting than decentralization! Federation gives us email and the web. Federation means I can set up my own instance with my own rules and still be able to meaningfully communicate with the rest of the network. Federation has some amount of tolerance for changes to the protocol, so such changes are more flexible and rely more heavily on consensus.

Federation is fantastic, and it feels like a massive tragedy that this rekindled interest in decentralization is mostly focused on peer-to-peer networks, which do little to address our current problems with centralized platforms.

And hey, you know what else is federated? Banks.

AI

Again, the tech is cool and all, but the marketing hype is getting way out of hand.

Maybe what I really want from 2018 is less marketing?

For one, I’ve seen a huge uptick in uncritically referring to any software that creates or classifies creative work as “AI”. Can we… can we not. It’s not AI. Yes, yes, nerds, I don’t care about the hair-splitting about the nature of intelligence — you know that when we hear “AI” we think of a human-like self-aware intelligence. But we’re applying it to stuff like a weird dog generator. Or to whatever neural network a website threw into production this week.

And this is dangerously misleading — we already had massive tech companies scapegoating The Algorithm™ for the poor behavior of their software, and now we’re talking about those algorithms as though they were self-aware, untouchable, untameable, unknowable entities of pure chaos whose decisions we are arbitrarily bound to. Ancient, powerful gods who exist just outside human comprehension or law.

It’s weird to see this stuff appear in consumer products so quickly, too. It feels quick, anyway. The latest iPhone can unlock via facial recognition, right? I’m sure a lot of effort was put into ensuring that the same person’s face would always be recognized… but how confident are we that other faces won’t be recognized? I admit I don’t follow all this super closely, so I may be imagining a non-problem, but I do know that humans are remarkably bad at checking for negative cases.

Hell, take the recurring problem of major platforms like Twitter and YouTube classifying anything mentioning “bisexual” as pornographic — because the word is also used as a porn genre, and someone threw a list of porn terms into a filter without thinking too hard about it. That’s just a word list, a fairly simple thing that any human can review; but suddenly we’re confident in opaque networks of inferred details?

I don’t know. “Traditional” classification and generation are much more comforting, since they’re a set of fairly abstract rules that can be examined and followed. Machine learning, as I understand it, is less about rules and much more about pattern-matching; it’s built out of the fingerprints of the stuff it’s trained on. Surely that’s just begging for tons of edge cases. They’re practically made of edge cases.


I’m reminded of a point I saw made a few days ago on Twitter, something I’d never thought about but should have. TurnItIn is a service for universities that checks whether students’ papers match any others, in order to detect cheating. But this is a paid service, one that fundamentally hinges on its corpus: a large collection of existing student papers. So students pay money to attend school, where they’re required to let their work be given to a third-party company, which then profits off of it? What kind of a goofy business model is this?

And my thoughts turn to machine learning, which is fundamentally different from an algorithm you can simply copy from a paper, because it’s all about the training data. And to get good results, you need a lot of training data. Where is that all coming from? How many for-profit companies are setting a neural network loose on the web — on millions of people’s work — and then turning around and selling the result as a product?

This is really a question of how intellectual property works in the internet era, and it continues our proud decades-long tradition of just kinda doing whatever we want without thinking about it too much. Nothing if not consistent.

More of this

A bit tougher, since computers are pretty alright now and everything continues to chug along. Maybe we should just quit while we’re ahead. There’s some real pie-in-the-sky stuff that would be nice, but it certainly won’t happen within a year, and may never happen except in some horrific Algorithmic™ form designed by people that don’t know anything about the problem space and only works 60% of the time but is treated as though it were bulletproof.

Federation

The giants are getting more giant. Maybe too giant? Granted, it could be much worse than Google and Amazon — it could be Apple!

Amazon has its own delivery service and brick-and-mortar stores now, as well as providing the plumbing for vast amounts of the web. They’re not doing anything particularly outrageous, but they kind of loom.

Ad company Google just put ad blocking in its majority-share browser — albeit for the ambiguously-noble goal of only blocking obnoxious ads so that people will be less inclined to install a blanket ad blocker.

Twitter is kind of a nightmare but no one wants to leave. I keep trying to use Mastodon as well, but I always forget about it after a day, whoops.

Facebook sounds like a total nightmare but no one wants to leave that either, because normies don’t use anything else, which is itself direly concerning.

IRC is rapidly bleeding mindshare to Slack and Discord, both of which are far better at the things IRC sadly never tried to do and absolutely terrible at the exact things IRC excels at.

The problem is the same as ever: there’s no incentive to interoperate. There’s no fundamental technical reason why Twitter and Tumblr and MySpace and Facebook can’t intermingle their posts; they just don’t, because why would they bother? It’s extra work that makes it easier for people to not use your ecosystem.

I don’t know what can be done about that, except that hope for a really big player to decide to play nice out of the kindness of their heart. The really big federated success stories — say, the web — mostly won out because they came along first. At this point, how does a federated social network take over? I don’t know.

Social progress

I… don’t really have a solid grasp on what’s happening in tech socially at the moment. I’ve drifted a bit away from the industry part, which is where that all tends to come up. I have the vague sense that things are improving, but that might just be because the Rust community is the one I hear the most about, and it puts a lot of effort into being inclusive and welcoming.

So… more projects should be like Rust? Do whatever Rust is doing? And not so much what Linus is doing.

Open source funding

I haven’t heard this brought up much lately, but it would still be nice to see. The Bay Area runs on open source and is raking in zillions of dollars on its back; pump some of that cash back into the ecosystem, somehow.

I’ve seen a couple open source projects on Patreon, which is fantastic, but feels like a very small solution given how much money is flowing through the commercial tech industry.

Ad blocking

Nice. Fuck ads.

One might wonder where the money to host a website comes from, then? I don’t know. Maybe we should loop this in with the above thing and find a more informal way to pay people for the stuff they make when we find it useful, without the financial and cognitive overhead of A Transaction or Giving Someone My Damn Credit Card Number. You know, something like Bitco— ah, fuck.

Year of the Linux Desktop

I don’t know. What are we working on at the moment? Wayland? Do Wayland, I guess. Oh, and hi-DPI, which I hear sucks. And please fix my sound drivers so PulseAudio stops blaming them when it fucks up.

FOSS Project Spotlight: LinuxBoot (Linux Journal)

Post Syndicated from jake original https://lwn.net/Articles/747380/rss

Linux Journal takes a look at the newly announced LinuxBoot project. LWN covered a related talk back in November. “Modern firmware generally consists of two main parts: hardware initialization (early stages) and OS loading (late stages). These parts may be divided further depending on the implementation, but the overall flow is similar across boot firmware. The late stages have gained many capabilities over the years and often have an environment with drivers, utilities, a shell, a graphical menu (sometimes with 3D animations) and much more. Runtime components may remain resident and active after firmware exits. Firmware, which used to fit in an 8 KiB ROM, now contains an OS used to boot another OS and doesn’t always stop running after the OS boots. LinuxBoot replaces the late stages with a Linux kernel and initramfs, which are used to load and execute the next stage, whatever it may be and wherever it may come from. The Linux kernel included in LinuxBoot is called the ‘boot kernel’ to distinguish it from the ‘target kernel’ that is to be booted and may be something other than Linux.

How to Patch Linux Workloads on AWS

Post Syndicated from Koen van Blijderveen original https://aws.amazon.com/blogs/security/how-to-patch-linux-workloads-on-aws/

Most malware tries to compromise your systems by using a known vulnerability that the operating system maker has already patched. As best practices to help prevent malware from affecting your systems, you should apply all operating system patches and actively monitor your systems for missing patches.

In this blog post, I show you how to patch Linux workloads using AWS Systems Manager. To accomplish this, I will show you how to use the AWS Command Line Interface (AWS CLI) to:

  1. Launch an Amazon EC2 instance for use with Systems Manager.
  2. Configure Systems Manager to patch your Amazon EC2 Linux instances.

In two previous blog posts (Part 1 and Part 2), I showed how to use the AWS Management Console to perform the necessary steps to patch, inspect, and protect Microsoft Windows workloads. You can implement those same processes for your Linux instances running in AWS by changing the instance tags and types shown in the previous blog posts.

Because most Linux system administrators are more familiar with using a command line, I show how to patch Linux workloads by using the AWS CLI in this blog post. The steps to use the Amazon EBS Snapshot Scheduler and Amazon Inspector are identical for both Microsoft Windows and Linux.

What you should know first

To follow along with the solution in this post, you need one or more Amazon EC2 instances. You may use existing instances or create new instances. For this post, I assume this is an Amazon EC2 for Amazon Linux instance installed from Amazon Machine Images (AMIs).

Systems Manager is a collection of capabilities that helps you automate management tasks for AWS-hosted instances on Amazon EC2 and your on-premises servers. In this post, I use Systems Manager for two purposes: to run remote commands and apply operating system patches. To learn about the full capabilities of Systems Manager, see What Is AWS Systems Manager?

As of Amazon Linux 2017.09, the AMI comes preinstalled with the Systems Manager agent. Systems Manager Patch Manager also supports Red Hat and Ubuntu. To install the agent on these Linux distributions or an older version of Amazon Linux, see Installing and Configuring SSM Agent on Linux Instances.

If you are not familiar with how to launch an Amazon EC2 instance, see Launching an Instance. I also assume you launched or will launch your instance in a private subnet. You must make sure that the Amazon EC2 instance can connect to the internet using a network address translation (NAT) instance or NAT gateway to communicate with Systems Manager. The following diagram shows how you should structure your VPC.

Diagram showing how to structure your VPC

Later in this post, you will assign tasks to a maintenance window to patch your instances with Systems Manager. To do this, the IAM user you are using for this post must have the iam:PassRole permission. This permission allows the IAM user assigning tasks to pass his own IAM permissions to the AWS service. In this example, when you assign a task to a maintenance window, IAM passes your credentials to Systems Manager. You also should authorize your IAM user to use Amazon EC2 and Systems Manager. As mentioned before, you will be using the AWS CLI for most of the steps in this blog post. Our documentation shows you how to get started with the AWS CLI. Make sure you have the AWS CLI installed and configured with an AWS access key and secret access key that belong to an IAM user that have the following AWS managed policies attached to the IAM user you are using for this example: AmazonEC2FullAccess and AmazonSSMFullAccess.

Step 1: Launch an Amazon EC2 Linux instance

In this section, I show you how to launch an Amazon EC2 instance so that you can use Systems Manager with the instance. This step requires you to do three things:

  1. Create an IAM role for Systems Manager before launching your Amazon EC2 instance.
  2. Launch your Amazon EC2 instance with Amazon EBS and the IAM role for Systems Manager.
  3. Add tags to the instances so that you can add your instances to a Systems Manager maintenance window based on tags.

A. Create an IAM role for Systems Manager

Before launching an Amazon EC2 instance, I recommend that you first create an IAM role for Systems Manager, which you will use to update the Amazon EC2 instance. AWS already provides a preconfigured policy that you can use for the new role and it is called AmazonEC2RoleforSSM.

  1. Create a JSON file named trustpolicy-ec2ssm.json that contains the following trust policy. This policy describes which principal (an entity that can take action on an AWS resource) is allowed to assume the role we are going to create. In this example, the principal is the Amazon EC2 service.
    {
      "Version": "2012-10-17",
      "Statement": {
        "Effect": "Allow",
        "Principal": {"Service": "ec2.amazonaws.com"},
        "Action": "sts:AssumeRole"
      }
    }

  1. Use the following command to create a role named EC2SSM that has the AWS managed policy AmazonEC2RoleforSSM attached to it. This generates JSON-based output that describes the role and its parameters, if the command is successful.
    $ aws iam create-role --role-name EC2SSM --assume-role-policy-document file://trustpolicy-ec2ssm.json

  1. Use the following command to attach the AWS managed IAM policy (AmazonEC2RoleforSSM) to your newly created role.
    $ aws iam attach-role-policy --role-name EC2SSM --policy-arn arn:aws:iam::aws:policy/service-role/AmazonEC2RoleforSSM

  1. Use the following commands to create the IAM instance profile and add the role to the instance profile. The instance profile is needed to attach the role we created earlier to your Amazon EC2 instance.
    $ aws iam create-instance-profile --instance-profile-name EC2SSM-IP
    $ aws iam add-role-to-instance-profile --instance-profile-name EC2SSM-IP --role-name EC2SSM

B. Launch your Amazon EC2 instance

To follow along, you need an Amazon EC2 instance that is running Amazon Linux. You can use any existing instance you may have or create a new instance.

When launching a new Amazon EC2 instance, be sure that:

  1. Use the following command to launch a new Amazon EC2 instance using an Amazon Linux AMI available in the US East (N. Virginia) Region (also known as us-east-1). Replace YourKeyPair and YourSubnetId with your information. For more information about creating a key pair, see the create-key-pair documentation. Write down the InstanceId that is in the output because you will need it later in this post.
    $ aws ec2 run-instances --image-id ami-cb9ec1b1 --instance-type t2.micro --key-name YourKeyPair --subnet-id YourSubnetId --iam-instance-profile Name=EC2SSM-IP

  1. If you are using an existing Amazon EC2 instance, you can use the following command to attach the instance profile you created earlier to your instance.
    $ aws ec2 associate-iam-instance-profile --instance-id YourInstanceId --iam-instance-profile Name=EC2SSM-IP

C. Add tags

The final step of configuring your Amazon EC2 instances is to add tags. You will use these tags to configure Systems Manager in Step 2 of this post. For this example, I add a tag named Patch Group and set the value to Linux Servers. I could have other groups of Amazon EC2 instances that I treat differently by having the same tag name but a different tag value. For example, I might have a collection of other servers with the tag name Patch Group with a value of Web Servers.

  • Use the following command to add the Patch Group tag to your Amazon EC2 instance.
    $ aws ec2 create-tags --resources YourInstanceId --tags --tags Key="Patch Group",Value="Linux Servers"

Note: You must wait a few minutes until the Amazon EC2 instance is available before you can proceed to the next section. To make sure your Amazon EC2 instance is online and ready, you can use the following AWS CLI command:

$ aws ec2 describe-instance-status --instance-ids YourInstanceId

At this point, you now have at least one Amazon EC2 instance you can use to configure Systems Manager.

Step 2: Configure Systems Manager

In this section, I show you how to configure and use Systems Manager to apply operating system patches to your Amazon EC2 instances, and how to manage patch compliance.

To start, I provide some background information about Systems Manager. Then, I cover how to:

  1. Create the Systems Manager IAM role so that Systems Manager is able to perform patch operations.
  2. Create a Systems Manager patch baseline and associate it with your instance to define which patches Systems Manager should apply.
  3. Define a maintenance window to make sure Systems Manager patches your instance when you tell it to.
  4. Monitor patch compliance to verify the patch state of your instances.

You must meet two prerequisites to use Systems Manager to apply operating system patches. First, you must attach the IAM role you created in the previous section, EC2SSM, to your Amazon EC2 instance. Second, you must install the Systems Manager agent on your Amazon EC2 instance. If you have used a recent Amazon Linux AMI, Amazon has already installed the Systems Manager agent on your Amazon EC2 instance. You can confirm this by logging in to an Amazon EC2 instance and checking the Systems Manager agent log files that are located at /var/log/amazon/ssm/.

To install the Systems Manager agent on an instance that does not have the agent preinstalled or if you want to use the Systems Manager agent on your on-premises servers, see Installing and Configuring the Systems Manager Agent on Linux Instances. If you forgot to attach the newly created role when launching your Amazon EC2 instance or if you want to attach the role to already running Amazon EC2 instances, see Attach an AWS IAM Role to an Existing Amazon EC2 Instance by Using the AWS CLI or use the AWS Management Console.

A. Create the Systems Manager IAM role

For a maintenance window to be able to run any tasks, you must create a new role for Systems Manager. This role is a different kind of role than the one you created earlier: this role will be used by Systems Manager instead of Amazon EC2. Earlier, you created the role, EC2SSM, with the policy, AmazonEC2RoleforSSM, which allowed the Systems Manager agent on your instance to communicate with Systems Manager. In this section, you need a new role with the policy, AmazonSSMMaintenanceWindowRole, so that the Systems Manager service can execute commands on your instance.

To create the new IAM role for Systems Manager:

  1. Create a JSON file named trustpolicy-maintenancewindowrole.json that contains the following trust policy. This policy describes which principal is allowed to assume the role you are going to create. This trust policy allows not only Amazon EC2 to assume this role, but also Systems Manager.
    {
       "Version":"2012-10-17",
       "Statement":[
          {
             "Sid":"",
             "Effect":"Allow",
             "Principal":{
                "Service":[
                   "ec2.amazonaws.com",
                   "ssm.amazonaws.com"
               ]
             },
             "Action":"sts:AssumeRole"
          }
       ]
    }

  1. Use the following command to create a role named MaintenanceWindowRole that has the AWS managed policy, AmazonSSMMaintenanceWindowRole, attached to it. This command generates JSON-based output that describes the role and its parameters, if the command is successful.
    $ aws iam create-role --role-name MaintenanceWindowRole --assume-role-policy-document file://trustpolicy-maintenancewindowrole.json

  1. Use the following command to attach the AWS managed IAM policy (AmazonEC2RoleforSSM) to your newly created role.
    $ aws iam attach-role-policy --role-name MaintenanceWindowRole --policy-arn arn:aws:iam::aws:policy/service-role/AmazonSSMMaintenanceWindowRole

B. Create a Systems Manager patch baseline and associate it with your instance

Next, you will create a Systems Manager patch baseline and associate it with your Amazon EC2 instance. A patch baseline defines which patches Systems Manager should apply to your instance. Before you can associate the patch baseline with your instance, though, you must determine if Systems Manager recognizes your Amazon EC2 instance. Use the following command to list all instances managed by Systems Manager. The --filters option ensures you look only for your newly created Amazon EC2 instance.

$ aws ssm describe-instance-information --filters Key=InstanceIds,Values= YourInstanceId

{
    "InstanceInformationList": [
        {
            "IsLatestVersion": true,
            "ComputerName": "ip-10-50-2-245",
            "PingStatus": "Online",
            "InstanceId": "YourInstanceId",
            "IPAddress": "10.50.2.245",
            "ResourceType": "EC2Instance",
            "AgentVersion": "2.2.120.0",
            "PlatformVersion": "2017.09",
            "PlatformName": "Amazon Linux AMI",
            "PlatformType": "Linux",
            "LastPingDateTime": 1515759143.826
        }
    ]
}

If your instance is missing from the list, verify that:

  1. Your instance is running.
  2. You attached the Systems Manager IAM role, EC2SSM.
  3. You deployed a NAT gateway in your public subnet to ensure your VPC reflects the diagram shown earlier in this post so that the Systems Manager agent can connect to the Systems Manager internet endpoint.
  4. The Systems Manager agent logs don’t include any unaddressed errors.

Now that you have checked that Systems Manager can manage your Amazon EC2 instance, it is time to create a patch baseline. With a patch baseline, you define which patches are approved to be installed on all Amazon EC2 instances associated with the patch baseline. The Patch Group resource tag you defined earlier will determine to which patch group an instance belongs. If you do not specifically define a patch baseline, the default AWS-managed patch baseline is used.

To create a patch baseline:

  1. Use the following command to create a patch baseline named AmazonLinuxServers. With approval rules, you can determine the approved patches that will be included in your patch baseline. In this example, you add all Critical severity patches to the patch baseline as soon as they are released, by setting the Auto approval delay to 0 days. By setting the Auto approval delay to 2 days, you add to this patch baseline the Important, Medium, and Low severity patches two days after they are released.
    $ aws ssm create-patch-baseline --name "AmazonLinuxServers" --description "Baseline containing all updates for Amazon Linux" --operating-system AMAZON_LINUX --approval-rules "PatchRules=[{PatchFilterGroup={PatchFilters=[{Values=[Critical],Key=SEVERITY}]},ApproveAfterDays=0,ComplianceLevel=CRITICAL},{PatchFilterGroup={PatchFilters=[{Values=[Important,Medium,Low],Key=SEVERITY}]},ApproveAfterDays=2,ComplianceLevel=HIGH}]"
    
    {
        "BaselineId": "YourBaselineId"
    }

  1. Use the following command to register the patch baseline you created with your instance. To do so, you use the Patch Group tag that you added to your Amazon EC2 instance.
    $ aws ssm register-patch-baseline-for-patch-group --baseline-id YourPatchBaselineId --patch-group "Linux Servers"
    
    {
        "PatchGroup": "Linux Servers",
        "BaselineId": "YourBaselineId"
    }

C.  Define a maintenance window

Now that you have successfully set up a role, created a patch baseline, and registered your Amazon EC2 instance with your patch baseline, you will define a maintenance window so that you can control when your Amazon EC2 instances will receive patches. By creating multiple maintenance windows and assigning them to different patch groups, you can make sure your Amazon EC2 instances do not all reboot at the same time.

To define a maintenance window:

  1. Use the following command to define a maintenance window. In this example command, the maintenance window will start every Saturday at 10:00 P.M. UTC. It will have a duration of 4 hours and will not start any new tasks 1 hour before the end of the maintenance window.
    $ aws ssm create-maintenance-window --name SaturdayNight --schedule "cron(0 0 22 ? * SAT *)" --duration 4 --cutoff 1 --allow-unassociated-targets
    
    {
        "WindowId": "YourMaintenanceWindowId"
    }

For more information about defining a cron-based schedule for maintenance windows, see Cron and Rate Expressions for Maintenance Windows.

  1. After defining the maintenance window, you must register the Amazon EC2 instance with the maintenance window so that Systems Manager knows which Amazon EC2 instance it should patch in this maintenance window. You can register the instance by using the same Patch Group tag you used to associate the Amazon EC2 instance with the AWS-provided patch baseline, as shown in the following command.
    $ aws ssm register-target-with-maintenance-window --window-id YourMaintenanceWindowId --resource-type INSTANCE --targets "Key=tag:Patch Group,Values=Linux Servers"
    
    {
        "WindowTargetId": "YourWindowTargetId"
    }

  1. Assign a task to the maintenance window that will install the operating system patches on your Amazon EC2 instance. The following command includes the following options.
    1. name is the name of your task and is optional. I named mine Patching.
    2. task-arn is the name of the task document you want to run.
    3. max-concurrency allows you to specify how many of your Amazon EC2 instances Systems Manager should patch at the same time. max-errors determines when Systems Manager should abort the task. For patching, this number should not be too low, because you do not want your entire patch task to stop on all instances if one instance fails. You can set this, for example, to 20%.
    4. service-role-arn is the Amazon Resource Name (ARN) of the AmazonSSMMaintenanceWindowRole role you created earlier in this blog post.
    5. task-invocation-parameters defines the parameters that are specific to the AWS-RunPatchBaseline task document and tells Systems Manager that you want to install patches with a timeout of 600 seconds (10 minutes).
      $ aws ssm register-task-with-maintenance-window --name "Patching" --window-id "YourMaintenanceWindowId" --targets "Key=WindowTargetIds,Values=YourWindowTargetId" --task-arn AWS-RunPatchBaseline --service-role-arn "arn:aws:iam::123456789012:role/MaintenanceWindowRole" --task-type "RUN_COMMAND" --task-invocation-parameters "RunCommand={Comment=,TimeoutSeconds=600,Parameters={SnapshotId=[''],Operation=[Install]}}" --max-concurrency "500" --max-errors "20%"
      
      {
          "WindowTaskId": "YourWindowTaskId"
      }

Now, you must wait for the maintenance window to run at least once according to the schedule you defined earlier. If your maintenance window has expired, you can check the status of any maintenance tasks Systems Manager has performed by using the following command.

$ aws ssm describe-maintenance-window-executions --window-id "YourMaintenanceWindowId"

{
    "WindowExecutions": [
        {
            "Status": "SUCCESS",
            "WindowId": "YourMaintenanceWindowId",
            "WindowExecutionId": "b594984b-430e-4ffa-a44c-a2e171de9dd3",
            "EndTime": 1515766467.487,
            "StartTime": 1515766457.691
        }
    ]
}

D.  Monitor patch compliance

You also can see the overall patch compliance of all Amazon EC2 instances using the following command in the AWS CLI.

$ aws ssm list-compliance-summaries

This command shows you the number of instances that are compliant with each category and the number of instances that are not in JSON format.

You also can see overall patch compliance by choosing Compliance under Insights in the navigation pane of the Systems Manager console. You will see a visual representation of how many Amazon EC2 instances are up to date, how many Amazon EC2 instances are noncompliant, and how many Amazon EC2 instances are compliant in relation to the earlier defined patch baseline.

Screenshot of the Compliance page of the Systems Manager console

In this section, you have set everything up for patch management on your instance. Now you know how to patch your Amazon EC2 instance in a controlled manner and how to check if your Amazon EC2 instance is compliant with the patch baseline you have defined. Of course, I recommend that you apply these steps to all Amazon EC2 instances you manage.

Summary

In this blog post, I showed how to use Systems Manager to create a patch baseline and maintenance window to keep your Amazon EC2 Linux instances up to date with the latest security patches. Remember that by creating multiple maintenance windows and assigning them to different patch groups, you can make sure your Amazon EC2 instances do not all reboot at the same time.

If you have comments about this post, submit them in the “Comments” section below. If you have questions about or issues implementing any part of this solution, start a new thread on the Amazon EC2 forum or contact AWS Support.

– Koen

Backblaze and GDPR

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/gdpr-compliance/

GDPR General Data Protection Regulation

Over the next few months the noise over GDPR will finally reach a crescendo. For the uninitiated, “GDPR” stands for “General Data Protection Regulation” and it goes into effect on May 25th of this year. GDPR is designed to protect how personal information of EU (European Union) citizens is collected, stored, and shared. The regulation should also improve transparency as to how personal information is managed by a business or organization.

Backblaze fully expects to be GDPR compliant when May 25th rolls around and we thought we’d share our experience along the way. We’ll start with this post as an introduction to GDPR. In future posts, we’ll dive into some of the details of the process we went through in meeting the GDPR objectives.

GDPR: A Two Way Street

To ensure we are GDPR compliant, Backblaze has assembled a dedicated internal team, engaged outside counsel in the United Kingdom, and consulted with other tech companies on best practices. While it is a sizable effort on our part, we view this as a waypoint in our ongoing effort to secure and protect our customers’ data and to be transparent in how we work as a company.

In addition to the effort we are putting into complying with the regulation, we think it is important to underscore and promote the idea that data privacy and security is a two-way street. We can spend millions of dollars on protecting the security of our systems, but we can’t stop a bad actor from finding and using your account credentials left on a note stuck to your monitor. We can give our customers tools like two factor authentication and private encryption keys, but it is the partnership with our customers that is the most powerful protection. The same thing goes for your digital privacy — we’ll do our best to protect your information, but we will need your help to do so.

Why GDPR is Important

At the center of GDPR is the protection of Personally Identifiable Information or “PII.” The definition for PII is information that can be used stand-alone or in concert with other information to identify a specific person. This includes obvious data like: name, address, and phone number, less obvious data like email address and IP address, and other data such as a credit card number, and unique identifiers that can be decoded back to the person.

How Will GDPR Affect You as an Individual

If you are a citizen in the EU, GDPR is designed to protect your private information from being used or shared without your permission. Technically, this only applies when your data is collected, processed, stored or shared outside of the EU, but it’s a good practice to hold all of your service providers to the same standard. For example, when you are deciding to sign up with a service, you should be able to quickly access and understand what personal information is being collected, why it is being collected, and what the business can do with that information. These terms are typically found in “Terms and Conditions” and “Privacy Policy” documents, or perhaps in a written contract you signed before starting to use a given service or product.

Even if you are not a citizen of the EU, GDPR will still affect you. Why? Because nearly every company you deal with, especially online, will have customers that live in the EU. It makes little sense for Backblaze, or any other service provider or vendor, to create a separate set of rules for just EU citizens. In practice, protection of private information should be more accountable and transparent with GDPR.

How Will GDPR Affect You as a Backblaze Customer

Over the coming months Backblaze customers will see changes to our current “Terms and Conditions,” “Privacy Policy,” and to our Backblaze services. While the changes to the Backblaze services are expected to be minimal, the “terms and privacy” documents will change significantly. The changes will include among other things the addition of a group of model clauses and related materials. These clauses will be generally consistent across all GDPR compliant vendors and are meant to be easily understood so that a customer can easily determine how their PII is being collected and used.

Common GDPR Questions:

Here are a few of the more common questions we have heard regarding GDPR.

  1. GDPR will only affect citizens in the EU.
    Answer: The changes that are being made by companies such as Backblaze to comply with GDPR will almost certainly apply to customers from all countries. And that’s a good thing. The protections afforded to EU citizens by GDPR are something all users of our service should benefit from.
  2. After May 25, 2018, a citizen of the EU will not be allowed to use any applications or services that store data outside of the EU.
    Answer: False, no one will stop you as an EU citizen from using the internet-based service you choose. But, you should make sure you know where your data is being collected, processed, and stored. If any of those activities occur outside the EU, make sure the company is following the GDPR guidelines.
  3. My business only has a few EU citizens as customers, so I don’t need to care about GDPR?
    Answer: False, even if you have just one EU citizen as a customer, and you capture, process or store data their PII outside of the EU, you need to comply with GDPR.
  4. Companies can be fined millions of dollars for not complying with GDPR.
    Answer:
    True, but: the regulation allows for companies to be fined up to $4 Million dollars or 20% of global revenue (whichever is greater) if they don’t comply with GDPR. In practice, the feeling is that such fines will be reserved (at least initially) for egregious violators that ignore or merely give “lip-service” to GDPR.
  5. You’ll be able to tell a company is GDPR compliant because they have a “GDPR Certified” badge on their website.
    Answer: There is no official GDPR certification or an official GDPR certification program. Companies that comply with GDPR are expected to follow the articles in the regulation and it should be clear from the outside looking in that they have followed the regulations. For example, their “Terms and Conditions,” and “Privacy Policy” should clearly spell out how and why they collect, use, and share your information. At some point a real GDPR certification program may be adopted, but not yet.

For all the hoopla about GDPR, the regulation is reasonably well thought out and addresses a very important issue — people’s privacy online. Creating a best practices document, or in this case a regulation, that companies such as Backblaze can follow is a good idea. The document isn’t perfect, and over the coming years we expect there to be changes. One thing we hope for is that the countries within the EU continue to stand behind one regulation and not fragment the document into multiple versions, each applying to themselves. We believe that having multiple different GDPR versions for different EU countries would lead to less protection overall of EU citizens.

In summary, GDPR changes are coming over the next few months. Backblaze has our internal staff and our EU-based legal council working diligently to ensure that we will be GDPR compliant by May 25th. We believe that GDPR will have a positive effect in enhancing the protection of personally identifiable information for not only EU citizens, but all of our Backblaze customers.

The post Backblaze and GDPR appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Australian Government Launches Pirate Site-Blocking Review

Post Syndicated from Andy original https://torrentfreak.com/australian-government-launches-pirate-site-blocking-review-180214/

Following intense pressure from entertainment industry groups, in 2014 Australia began developing legislation which would allow ‘pirate’ sites to be blocked at the ISP level.

In March 2015 the Copyright Amendment (Online Infringement) Bill 2015 (pdf) was introduced to parliament and after just three months of consideration, the Australian Senate passed the legislation into law.

Soon after, copyright holders began preparing their first cases and in December 2016, the Australian Federal Court ordered dozens of local Internet service providers to block The Pirate Bay, Torrentz, TorrentHound, IsoHunt, SolarMovie, plus many proxy and mirror services.

Since then, more processes have been launched establishing site-blocking as a permanent fixture on the Aussie anti-piracy agenda. But with yet more applications for injunction looming on the horizon, how is the mechanism performing and does anything else need to be done to improve or amend it?

Those are the questions now being asked by the responsible department of the Australian Government via a consultation titled Review of Copyright Online Infringement Amendment. The review should’ve been carried out 18 months after the law’s introduction in 2015 but the department says that it delayed the consultation to let more evidence emerge.

“The Department of Communications and the Arts is seeking views from stakeholders on the questions put forward in this paper. The Department welcomes single, consolidated submissions from organizations or parties, capturing all views on the Copyright Amendment (Online Infringement) Act 2015 (Online Infringement Amendment),” the consultation paper begins.

The three key questions for response are as follows:

– How effective and efficient is the mechanism introduced by the Online Infringement Amendment?

– Is the application process working well for parties and are injunctions operating well, once granted?

– Are any amendments required to improve the operation of the Online Infringement Amendment?

Given the tendency for copyright holders to continuously demand more bang for their buck, it will perhaps come as a surprise that at least for now there is a level of consensus that the system is working as planned.

“Case law and survey data suggests the Online Infringement Amendment has enabled copyright owners to work with [Internet service providers] to reduce large-scale online copyright infringement. So far, it appears that copyright owners and [ISPs] find the current arrangement acceptable, clear and effective,” the paper reads.

Thus far under the legislation there have been four applications for injunctions through the Federal Court, notably against leading torrent indexes and browser-based streaming sites, which were both granted.

The other two processes, which began separately but will be heard together, at least in part, involve the recent trend of set-top box based streaming.

Village Roadshow, Disney, Universal, Warner Bros, Twentieth Century Fox, and Paramount are currently presenting their case to the Federal Court. Along with Hong Kong-based broadcaster Television Broadcasts Limited (TVB), which has a separate application, the companies have been told to put together quality evidence for an April 2018 hearing.

With these applications already in the pipeline, yet more are on the horizon. The paper notes that more applications are expected to reach the Federal Court shortly, with the Department of Communications monitoring to assess whether current arrangements are refined as additional applications are filed.

Thus far, however, steady progress appears to have been made. The paper cites various precedents established as a result of the blocking process including the use of landing pages to inform Internet users why sites are blocked and who is paying.

“Either a copyright owner or [ISP] can establish a landing page. If an [ISP] wishes to avoid the cost of its own landing page, it can redirect customers to one that the copyright owner would provide. Another precedent allocates responsibility for compliance costs. Cases to date have required copyright owners to pay all or a significant proportion of compliance costs,” the paper notes.

But perhaps the issue of most importance is whether site-blocking as a whole has had any effect on the levels of copyright infringement in Australia.

The Government says that research carried out by Kantar shows that downloading “fell slightly from 2015 to 2017” with a 5-10% decrease in individuals consuming unlicensed content across movies, music and television. It’s worth noting, however, that Netflix didn’t arrive on Australian shores until May 2015, just a month before the new legislation was passed.

Research commissioned by the Department of Communications and published a year later in 2016 (pdf) found that improved availability of legal streaming alternatives was the main contributor to falling infringement rates. In a juicy twist, the report also revealed that Aussie pirates were the entertainment industries’ best customers.

“The Department is aware that other factors — such as the increasing availability of television, music and film streaming services and of subscription gaming services — may also contribute to falling levels of copyright infringement,” the paper notes.

Submissions to the consultation (pdf) are invited by 5.00 pm AEST on Friday 16 March 2018 via the government’s website.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Early Challenges: Making Critical Hires

Post Syndicated from Gleb Budman original https://www.backblaze.com/blog/early-challenges-making-critical-hires/

row of potential employee hires sitting waiting for an interview

In 2009, Google disclosed that they had 400 recruiters on staff working to hire nearly 10,000 people. Someday, that might be your challenge, but most companies in their early days are looking to hire a handful of people — the right people — each year. Assuming you are closer to startup stage than Google stage, let’s look at who you need to hire, when to hire them, where to find them (and how to help them find you), and how to get them to join your company.

Who Should Be Your First Hires

In later stage companies, the roles in the company have been well fleshed out, don’t change often, and each role can be segmented to focus on a specific area. A large company may have an entire department focused on just cubicle layout; at a smaller company you may not have a single person whose actual job encompasses all of facilities. At Backblaze, our CTO has a passion and knack for facilities and mostly led that charge. Also, the needs of a smaller company are quick to change. One of our first hires was a QA person, Sean, who ended up being 100% focused on data center infrastructure. In the early stage, things can shift quite a bit and you need people that are broadly capable, flexible, and most of all willing to pitch in where needed.

That said, there are times you may need an expert. At a previous company we hired Jon, a PhD in Bayesian statistics, because we needed algorithmic analysis for spam fighting. However, even that person was not only able and willing to do the math, but also code, and to not only focus on Bayesian statistics but explore a plethora of spam fighting options.

When To Hire

If you’ve raised a lot of cash and are willing to burn it with mistakes, you can guess at all the roles you might need and start hiring for them. No judgement: that’s a reasonable strategy if you’re cash-rich and time-poor.

If your cash is limited, try to see what you and your team are already doing and then hire people to take those jobs. It may sound counterintuitive, but if you’re already doing it presumably it needs to be done, you have a good sense of the type of skills required to do it, and you can bring someone on-board and get them up to speed quickly. That then frees you up to focus on tasks that can’t be done by someone else. At Backblaze, I ran marketing internally for years before hiring a VP of Marketing, making it easier for me to know what we needed. Once I was hiring, my primary goal was to find someone I could trust to take that role completely off of me so I could focus solely on my CEO duties

Where To Find the Right People

Finding great people is always difficult, particularly when the skillsets you’re looking for are highly in-demand by larger companies with lots of cash and cachet. You, however, have one massive advantage: you need to hire 5 people, not 5,000.

People You Worked With

The absolutely best people to hire are ones you’ve worked with before that you already know are good in a work situation. Consider your last job, the one before, and the one before that. A significant number of the people we recruited at Backblaze came from our previous startup MailFrontier. We knew what they could do and how they would fit into the culture, and they knew us and thus could quickly meld into the environment. If you didn’t have a previous job, consider people you went to school with or perhaps individuals with whom you’ve done projects previously.

People You Know

Hiring friends, family, and others can be risky, but should be considered. Sometimes a friend can be a “great buddy,” but is not able to do the job or isn’t a good fit for the organization. Having to let go of someone who is a friend or family member can be rough. Have the conversation up front with them about that possibility, so you have the ability to stay friends if the position doesn’t work out. Having said that, if you get along with someone as a friend, that’s one critical component of succeeding together at work. At Backblaze we’ve hired a number of people successfully that were friends of someone in the organization.

Friends Of People You Know

Your network is likely larger than you imagine. Your employees, investors, advisors, spouses, friends, and other folks all know people who might be a great fit for you. Make sure they know the roles you’re hiring for and ask them if they know anyone that would fit. Search LinkedIn for the titles you’re looking for and see who comes up; if they’re a 2nd degree connection, ask your connection for an introduction.

People You Know About

Sometimes the person you want isn’t someone anyone knows, but you may have read something they wrote, used a product they’ve built, or seen a video of a presentation they gave. Reach out. You may get a great hire: worst case, you’ll let them know they were appreciated, and make them aware of your organization.

Other Places to Find People

There are a million other places to find people, including job sites, community groups, Facebook/Twitter, GitHub, and more. Consider where the people you’re looking for are likely to congregate online and in person.

A Comment on Diversity

Hiring “People You Know” can often result in “Hiring People Like You” with the same workplace experiences, culture, background, and perceptions. Some studies have shown [1, 2, 3, 4] that homogeneous groups deliver faster, while heterogeneous groups are more creative. Also, “Hiring People Like You” often propagates the lack of women and minorities in tech and leadership positions in general. When looking for people you know, keep an eye to not discount people you know who don’t have the same cultural background as you.

Helping People To Find You

Reaching out proactively to people is the most direct way to find someone, but you want potential hires coming to you as well. To do this, they have to a) be aware of you, b) know you have a role they’re interested in, and c) think they would want to work there. Let’s tackle a) and b) first below.

Your Blog

I started writing our blog before we launched the product and talked about anything I found interesting related to our space. For several years now our team has owned the content on the blog and in 2017 over 1.5 million people read it. Each time we have a position open it’s published to the blog. If someone finds reading about backup and storage interesting, perhaps they’d want to dig in deeper from the inside. Many of the people we’ve recruited have mentioned reading the blog as either how they found us or as a factor in why they wanted to work here.
[BTW, this is Gleb’s 200th post on Backblaze’s blog. The first was in 2008. — Editor]

Your Email List

In addition to the emails our blog subscribers receive, we send regular emails to our customers, partners, and prospects. These are largely focused on content we think is directly useful or interesting for them. However, once every few months we include a small mention that we’re hiring, and the positions we’re looking for. Often a small blurb is all you need to capture people’s imaginations whether they might find the jobs interesting or can think of someone that might fit the bill.

Your Social Involvement

Whether it’s Twitter or Facebook, Hacker News or Slashdot, your potential hires are engaging in various communities. Being socially involved helps make people aware of you, reminds them of you when they’re considering a job, and paints a picture of what working with you and your company would be like. Adam was in a Reddit thread where we were discussing our Storage Pods, and that interaction was ultimately part of the reason he left Apple to come to Backblaze.

Convincing People To Join

Once you’ve found someone or they’ve found you, how do you convince them to join? They may be currently employed, have other offers, or have to relocate. Again, while the biggest companies have a number of advantages, you might have more unique advantages than you realize.

Why Should They Join You

Here are a set of items that you may be able to offer which larger organizations might not:

Role: Consider the strengths of the role. Perhaps it will have broader scope? More visibility at the executive level? No micromanagement? Ability to take risks? Option to create their own role?

Compensation: In addition to salary, will their options potentially be worth more since they’re getting in early? Can they trade-off salary for more options? Do they get option refreshes?

Benefits: In addition to healthcare, food, and 401(k) plans, are there unique benefits of your company? One company I knew took the entire team for a one-month working retreat abroad each year.

Location: Most people prefer to work close to home. If you’re located outside of the San Francisco Bay Area, you might be at a disadvantage for not being in the heart of tech. But if you find employees close to you you’ve got a huge advantage. Sometimes it’s micro; even in the Bay Area the difference of 5 miles can save 20 minutes each way every day. We located the Backblaze headquarters in San Mateo, a middle-ground that made it accessible to those coming from San Jose and San Francisco. We also chose a downtown location near a train, restaurants, and cafes: all to make it easier and more pleasant. Also, are you flexible in letting your employees work remotely? Our systems administrator Elliott is about to embark on a long-term cross-country journey working from an RV.

Environment: Open office, cubicle, cafe, work-from-home? Loud/quiet? Social or focused? 24×7 or work-life balance? Different environments appeal to different people.

Team: Who will they be working with? A company with 100,000 people might have 100 brilliant ones you’d want to work with, but ultimately we work with our core team. Who will your prospective hires be working with?

Market: Some people are passionate about gaming, others biotech, still others food. The market you’re targeting will get different people excited.

Product: Have an amazing product people love? Highlight that. If you’re lucky, your potential hire is already a fan.

Mission: Curing cancer, making people happy, and other company missions inspire people to strive to be part of the journey. Our mission is to make storing data astonishingly easy and low-cost. If you care about data, information, knowledge, and progress, our mission helps drive all of them.

Culture: I left this for last, but believe it’s the most important. What is the culture of your company? Finding people who want to work in the culture of your organization is critical. If they like the culture, they’ll fit and continue it. We’ve worked hard to build a culture that’s collaborative, friendly, supportive, and open; one in which people like coming to work. For example, the five founders started with (and still have) the same compensation and equity. That started a culture of “we’re all in this together.” Build a culture that will attract the people you want, and convey what the culture is.

Writing The Job Description

Most job descriptions focus on the all the requirements the candidate must meet. While important to communicate, the job description should first sell the job. Why would the appropriate candidate want the job? Then share some of the requirements you think are critical. Remember that people read not just what you say but how you say it. Try to write in a way that conveys what it is like to actually be at the company. Ahin, our VP of Marketing, said the job description itself was one of the things that attracted him to the company.

Orchestrating Interviews

Much can be said about interviewing well. I’m just going to say this: make sure that everyone who is interviewing knows that their job is not only to evaluate the candidate, but give them a sense of the culture, and sell them on the company. At Backblaze, we often have one person interview core prospects solely for company/culture fit.

Onboarding

Hiring success shouldn’t be defined by finding and hiring the right person, but instead by the right person being successful and happy within the organization. Ensure someone (usually their manager) provides them guidance on what they should be concentrating on doing during their first day, first week, and thereafter. Giving new employees opportunities and guidance so that they can achieve early wins and feel socially integrated into the company does wonders for bringing people on board smoothly

In Closing

Our Director of Production Systems, Chris, said to me the other day that he looks for companies where he can work on “interesting problems with nice people.” I’m hoping you’ll find your own version of that and find this post useful in looking for your early and critical hires.

Of course, I’d be remiss if I didn’t say, if you know of anyone looking for a place with “interesting problems with nice people,” Backblaze is hiring. 😉

The post Early Challenges: Making Critical Hires appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Amazon Relational Database Service – Looking Back at 2017

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/amazon-relational-database-service-looking-back-at-2017/

The Amazon RDS team launched nearly 80 features in 2017. Some of them were covered in this blog, others on the AWS Database Blog, and the rest in What’s New or Forum posts. To wrap up my week, I thought it would be worthwhile to give you an organized recap. So here we go!

Certification & Security

Features

Engine Versions & Features

Regional Support

Instance Support

Price Reductions

And That’s a Wrap
I’m pretty sure that’s everything. As you can see, 2017 was quite the year! I can’t wait to see what the team delivers in 2018.

Jeff;

 

Troubleshooting event publishing issues in Amazon SES

Post Syndicated from Dustin Taylor original https://aws.amazon.com/blogs/ses/troubleshooting-event-publishing-issues-in-amazon-ses/

Over the past year, we’ve released several features that make it easier to track the metrics that are associated with your Amazon SES account. The first of these features, launched in November of last year, was event publishing.

Initially, event publishing let you capture basic metrics related to your email sending and publish them to other AWS services, such as Amazon CloudWatch and Amazon Kinesis Data Firehose. Some examples of these basic metrics include the number of emails that were sent and delivered, as well as the number that bounced or received complaints. A few months ago, we expanded this feature by adding engagement metrics—specifically, information about the number of emails that your customers opened or engaged with by clicking links.

As a former Cloud Support Engineer, I’ve seen Amazon SES customers do some amazing things with event publishing, but I’ve also seen some common issues. In this article, we look at some of these issues, and discuss the steps you can take to resolve them.

Before we begin

This post assumes that your Amazon SES account is already out of the sandbox, that you’ve verified an identity (such as an email address or domain), and that you have the necessary permissions to use Amazon SES and the service that you’ll publish event data to (such as Amazon SNS, CloudWatch, or Kinesis Data Firehose).

We also assume that you’re familiar with the process of creating configuration sets and specifying event destinations for those configuration sets. For more information, see Using Amazon SES Configuration Sets in the Amazon SES Developer Guide.

Amazon SNS event destinations

If you want to receive notifications when events occur—such as when recipients click a link in an email, or when they report an email as spam—you can use Amazon SNS as an event destination.

Occasionally, customers ask us why they’re not receiving notifications when they use an Amazon SNS topic as an event destination. One of the most common reasons for this issue is that they haven’t configured subscriptions for their Amazon SNS topic yet.

A single topic in Amazon SNS can have one or more subscriptions. When you subscribe to a topic, you tell that topic which endpoints (such as email addresses or mobile phone numbers) to contact when it receives a notification. If you haven’t set up any subscriptions, nothing will happen when an email event occurs.

For more information about setting up topics and subscriptions, see Getting Started in the Amazon SNS Developer Guide. For information about publishing Amazon SES events to Amazon SNS topics, see Set Up an Amazon SNS Event Destination for Amazon SES Event Publishing in the Amazon SES Developer Guide.

Kinesis Data Firehose event destinations

If you want to store your Amazon SES event data for the long term, choose Amazon Kinesis Data Firehose as a destination for Amazon SES events. With Kinesis Data Firehose, you can stream data to Amazon S3 or Amazon Redshift for storage and analysis.

The process of setting up Kinesis Data Firehose as an event destination is similar to the process for setting up Amazon SNS: you choose the types of events (such as deliveries, opens, clicks, or bounces) that you want to export, and the name of the Kinesis Data Firehose stream that you want to export to. However, there’s one important difference. When you set up a Kinesis Data Firehose event destination, you must also choose the IAM role that Amazon SES uses to send event data to Kinesis Data Firehose.

When you set up the Kinesis Data Firehose event destination, you can choose to have Amazon SES create the IAM role for you automatically. For many users, this is the best solution—it ensures that the IAM role has the appropriate permissions to move event data from Amazon SES to Kinesis Data Firehose.

Customers occasionally run into issues with the Kinesis Data Firehose event destination when they use an existing IAM role. If you use an existing IAM role, or create a new role for this purpose, make sure that the role includes the firehose:PutRecord and firehose:PutRecordBatch permissions. If the role doesn’t include these permissions, then the Amazon SES event data isn’t published to Kinesis Data Firehose. For more information, see Controlling Access with Amazon Kinesis Data Firehose in the Amazon Kinesis Data Firehose Developer Guide.

CloudWatch event destinations

By publishing your Amazon SES event data to Amazon CloudWatch, you can create dashboards that track your sending statistics in real time, as well as alarms that notify you when your event metrics reach certain thresholds.

The amount that you’re charged for using CloudWatch is based on several factors, including the number of metrics you use. In order to give you more control over the specific metrics you send to CloudWatch—and to help you avoid unexpected charges—you can limit the email sending events that are sent to CloudWatch.

When you choose CloudWatch as an event destination, you must choose a value source. The value source can be one of three options: a message tag, a link tag, or an email header. After you choose a value source, you then specify a name and a value. When you send an email using a configuration set that refers to a CloudWatch event destination, it only sends the metrics for that email to CloudWatch if the email contains the name and value that you specified as the value source. This requirement is commonly overlooked.

For example, assume that you chose Message Tag as the value source, and specified “CategoryId” as the dimension name and “31415” as the dimension value. When you want to send events for an email to CloudWatch, you must specify the name of the configuration set that uses the CloudWatch destination. You must also include a tag in your message. The name of the tag must be “CategoryId” and the value must be “31415”.

For more information about adding tags and email headers to your messages, see Send Email Using Amazon SES Event Publishing in the Amazon SES Developer Guide. For more information about adding tags to links, see Amazon SES Email Sending Metrics FAQs in the Amazon SES Developer Guide.

Troubleshooting event publishing for open and click data

Occasionally, customers ask why they’re not seeing open and click data for their emails. This issue most often occurs when the customer only sends text versions of their emails. Because of the way Amazon SES tracks open and click events, you can only see open and click data for emails that are sent as HTML. For more information about how Amazon SES modifies your emails when you enable open and click tracking, see Amazon SES Email Sending Metrics FAQs in the Amazon SES Developer Guide.

The process that you use to send HTML emails varies based on the email sending method you use. The Code Examples section of the Amazon SES Developer Guide contains examples of several methods of sending email by using the Amazon SES SMTP interface or an AWS SDK. All of the examples in this section include methods for sending HTML (as well as text-only) emails.

If you encounter any issues that weren’t covered in this post, please open a case in the Support Center and we’d be more than happy to assist.

How I built a data warehouse using Amazon Redshift and AWS services in record time

Post Syndicated from Stephen Borg original https://aws.amazon.com/blogs/big-data/how-i-built-a-data-warehouse-using-amazon-redshift-and-aws-services-in-record-time/

This is a customer post by Stephen Borg, the Head of Big Data and BI at Cerberus Technologies.

Cerberus Technologies, in their own words: Cerberus is a company founded in 2017 by a team of visionary iGaming veterans. Our mission is simple – to offer the best tech solutions through a data-driven and a customer-first approach, delivering innovative solutions that go against traditional forms of working and process. This mission is based on the solid foundations of reliability, flexibility and security, and we intend to fundamentally change the way iGaming and other industries interact with technology.

Over the years, I have developed and created a number of data warehouses from scratch. Recently, I built a data warehouse for the iGaming industry single-handedly. To do it, I used the power and flexibility of Amazon Redshift and the wider AWS data management ecosystem. In this post, I explain how I was able to build a robust and scalable data warehouse without the large team of experts typically needed.

In two of my recent projects, I ran into challenges when scaling our data warehouse using on-premises infrastructure. Data was growing at many tens of gigabytes per day, and query performance was suffering. Scaling required major capital investment for hardware and software licenses, and also significant operational costs for maintenance and technical staff to keep it running and performing well. Unfortunately, I couldn’t get the resources needed to scale the infrastructure with data growth, and these projects were abandoned. Thanks to cloud data warehousing, the bottleneck of infrastructure resources, capital expense, and operational costs have been significantly reduced or have totally gone away. There is no more excuse for allowing obstacles of the past to delay delivering timely insights to decision makers, no matter how much data you have.

With Amazon Redshift and AWS, I delivered a cloud data warehouse to the business very quickly, and with a small team: me. I didn’t have to order hardware or software, and I no longer needed to install, configure, tune, or keep up with patches and version updates. Instead, I easily set up a robust data processing pipeline and we were quickly ingesting and analyzing data. Now, my data warehouse team can be extremely lean, and focus more time on bringing in new data and delivering insights. In this post, I show you the AWS services and the architecture that I used.

Handling data feeds

I have several different data sources that provide everything needed to run the business. The data includes activity from our iGaming platform, social media posts, clickstream data, marketing and campaign performance, and customer support engagements.

To handle the diversity of data feeds, I developed abstract integration applications using Docker that run on Amazon EC2 Container Service (Amazon ECS) and feed data to Amazon Kinesis Data Streams. These data streams can be used for real time analytics. In my system, each record in Kinesis is preprocessed by an AWS Lambda function to cleanse and aggregate information. My system then routes it to be stored where I need on Amazon S3 by Amazon Kinesis Data Firehose. Suppose that you used an on-premises architecture to accomplish the same task. A team of data engineers would be required to maintain and monitor a Kafka cluster, develop applications to stream data, and maintain a Hadoop cluster and the infrastructure underneath it for data storage. With my stream processing architecture, there are no servers to manage, no disk drives to replace, and no service monitoring to write.

Setting up a Kinesis stream can be done with a few clicks, and the same for Kinesis Firehose. Firehose can be configured to automatically consume data from a Kinesis Data Stream, and then write compressed data every N minutes to Amazon S3. When I want to process a Kinesis data stream, it’s very easy to set up a Lambda function to be executed on each message received. I can just set a trigger from the AWS Lambda Management Console, as shown following.

I also monitor the duration of function execution using Amazon CloudWatch and AWS X-Ray.

Regardless of the format I receive the data from our partners, I can send it to Kinesis as JSON data using my own formatters. After Firehose writes this to Amazon S3, I have everything in nearly the same structure I received but compressed, encrypted, and optimized for reading.

This data is automatically crawled by AWS Glue and placed into the AWS Glue Data Catalog. This means that I can immediately query the data directly on S3 using Amazon Athena or through Amazon Redshift Spectrum. Previously, I used Amazon EMR and an Amazon RDS–based metastore in Apache Hive for catalog management. Now I can avoid the complexity of maintaining Hive Metastore catalogs. Glue takes care of high availability and the operations side so that I know that end users can always be productive.

Working with Amazon Athena and Amazon Redshift for analysis

I found Amazon Athena extremely useful out of the box for ad hoc analysis. Our engineers (me) use Athena to understand new datasets that we receive and to understand what transformations will be needed for long-term query efficiency.

For our data analysts and data scientists, we’ve selected Amazon Redshift. Amazon Redshift has proven to be the right tool for us over and over again. It easily processes 20+ million transactions per day, regardless of the footprint of the tables and the type of analytics required by the business. Latency is low and query performance expectations have been more than met. We use Redshift Spectrum for long-term data retention, which enables me to extend the analytic power of Amazon Redshift beyond local data to anything stored in S3, and without requiring me to load any data. Redshift Spectrum gives me the freedom to store data where I want, in the format I want, and have it available for processing when I need it.

To load data directly into Amazon Redshift, I use AWS Data Pipeline to orchestrate data workflows. I create Amazon EMR clusters on an intra-day basis, which I can easily adjust to run more or less frequently as needed throughout the day. EMR clusters are used together with Amazon RDS, Apache Spark 2.0, and S3 storage. The data pipeline application loads ETL configurations from Spring RESTful services hosted on AWS Elastic Beanstalk. The application then loads data from S3 into memory, aggregates and cleans the data, and then writes the final version of the data to Amazon Redshift. This data is then ready to use for analysis. Spark on EMR also helps with recommendations and personalization use cases for various business users, and I find this easy to set up and deliver what users want. Finally, business users use Amazon QuickSight for self-service BI to slice, dice, and visualize the data depending on their requirements.

Each AWS service in this architecture plays its part in saving precious time that’s crucial for delivery and getting different departments in the business on board. I found the services easy to set up and use, and all have proven to be highly reliable for our use as our production environments. When the architecture was in place, scaling out was either completely handled by the service, or a matter of a simple API call, and crucially doesn’t require me to change one line of code. Increasing shards for Kinesis can be done in a minute by editing a stream. Increasing capacity for Lambda functions can be accomplished by editing the megabytes allocated for processing, and concurrency is handled automatically. EMR cluster capacity can easily be increased by changing the master and slave node types in Data Pipeline, or by using Auto Scaling. Lastly, RDS and Amazon Redshift can be easily upgraded without any major tasks to be performed by our team (again, me).

In the end, using AWS services including Kinesis, Lambda, Data Pipeline, and Amazon Redshift allows me to keep my team lean and highly productive. I eliminated the cost and delays of capital infrastructure, as well as the late night and weekend calls for support. I can now give maximum value to the business while keeping operational costs down. My team pushed out an agile and highly responsive data warehouse solution in record time and we can handle changing business requirements rapidly, and quickly adapt to new data and new user requests.


Additional Reading

If you found this post useful, be sure to check out Deploy a Data Warehouse Quickly with Amazon Redshift, Amazon RDS for PostgreSQL and Tableau Server and Top 8 Best Practices for High-Performance ETL Processing Using Amazon Redshift.


About the Author

Stephen Borg is the Head of Big Data and BI at Cerberus Technologies. He has a background in platform software engineering, and first became involved in data warehousing using the typical RDBMS, SQL, ETL, and BI tools. He quickly became passionate about providing insight to help others optimize the business and add personalization to products. He is now the Head of Big Data and BI at Cerberus Technologies.

 

 

 

The Early Days of Mass Internet Piracy Were Awesome Yet Awful

Post Syndicated from Andy original https://torrentfreak.com/the-early-days-of-mass-internet-piracy-were-awesome-yet-awful-180211/

While Napster certainly put the digital cats among the pigeons in 1999, the organized chaos of mass Internet file-sharing couldn’t be truly appreciated until the advent of decentralized P2P networks a year or so later.

In the blink of an eye, everyone with a “shared folder” client became both a consumer and publisher, sucking in files from strangers and sharing them with like-minded individuals all around the planet. While today’s piracy narrative is all about theft and danger, in the early 2000s the sharing community felt more like distant friends who hadn’t met, quietly trading cards together.

Satisfying to millions, those who really engaged found shared folder sharing a real adrenaline buzz, as English comedian Seann Walsh noted on Conan this week.

“Click. 20th Century Fox comes up. No pixels. No shaky cam. No silhouettes of heads at the bottom of the screen, people coming in five minutes late. None of that,” Walsh said, recalling his experience of downloading X-Men 2 (X2) from LimeWire.

“We thought: ‘We’ve done it!!’ This was incredible! We were going to have to go to the cinema. We weren’t going to have to wait for the film to come out on video. We weren’t going to have to WALK to blockbuster!”

But while the nostalgia has an air of magic about it, Walsh’s take on the piracy experience is bittersweet. While obtaining X2 without having to trudge to a video store was a revelation, there were plenty of drawbacks too.

Downloading the pirate copy took a week, which pre-BitTorrent wasn’t a completely bad result but still a considerable commitment. There were also serious problems with quality control.

“20th Century fades, X Men 2 comes up. We’ve done it! We’re not taking it for granted – we’re actually hugging. Yes! Yes! We’ve done it! This is the future! We look at the screen, Wolverine turns round…,” …..and Walsh launches into a broadside of pseudo-German babble, mimicking the unexpectedly-dubbed superhero.

After a week of downloading and getting a quality picture on launch, that is a punch in the gut, to say the least. Arguably no less than a pirate deserves, some will argue, but a fat lip nonetheless, and one many a pirate has suffered over the years. Nevertheless, as Walsh notes, it’s a pain that kids in 2018 simply cannot comprehend.

“Children today are living the childhood I dreamed of. If they want to hear a song – touch – they stream it. They’ve got it now. Bang. Instantly. They don’t know the pain of LimeWire.

“Start downloading a song, go to school, come back. HOPE that it’d finished! That download bar messing with you. Four minutes left…..nine HOURS and 28 minutes left? Thirty seconds left…..52 hours and 38 minutes left? JUST TELL ME THE TRUTH!!!!!” Walsh pleaded.

While this might sound comical now, this was the reality of people downloading from clients such as LimeWire and Kazaa. While X2 in German would’ve been torture for a non-German speaker, the misery of watching an English language copy of 28 Days Later somehow crammed into a 30Mb file is right up there too.

Mislabeled music with microscopic bitrates? That was pretty much standard.

But against the odds, these frankly second-rate experiences still managed to capture the hearts and minds of the digitally minded. People were prepared to put up with nonsense and regular disappointment in order to consume content in a way fit for the 21st century. Yet somehow the combined might of the entertainment industries couldn’t come up with anything substantially better for a number of years.

Of course, broadband availability and penetration played its part but looking back, something could have been done. Not only didn’t the Internet’s popularity come as a surprise, people’s expectations were dramatically lower than they are today too. In any event, beating the pirates should have been child’s play. After all, it was just regular people sharing files in a Windows folder.

Any fool could do it – and millions did. Surprisingly, they have proven unstoppable.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Cloudflare Hit With Piracy Lawsuit After Abuse Form ‘Fails’

Post Syndicated from Ernesto original https://torrentfreak.com/cloudflare-hit-with-piracy-lawsuit-after-abuse-form-fails-180210/

Seattle-based artist Christopher Boffoli is no stranger when it comes to suing tech companies for aiding copyright infringement of his work.

Boffoli has filed lawsuits against Imgur, Twitter, Pinterest, Google, and others, which were dismissed and/or settled out of court under undisclosed terms.

This month he filed a new case against another intermediary, Cloudflare, which has had its fair share of piracy allegations in recent years.

In common with other companies, Cloudflare is accused of contributing to copyright infringements of Boffoli’s “Big Appetites” miniatures series. In this case, several Cloudflare customers allegedly posted these photos on their sites which were then reproduced on the servers of the CDN provider.

The lawsuit mentions that the infringing copies were posted on unique-landscape.com and baklol.com. This was also pointed out to Cloudflare by Boffoli, who sent the company DMCA takedown notices in October and November of last year.

While the photographer received an automated response, the photos in question remained online. Through the lawsuit, Boffoli hopes this will change.

“CloudFlare induced, caused, or materially contributed to the Infringing Websites’ publication,” the complaint reads. “CloudFlare had actual knowledge of the Infringing Content. Boffoli provided notice to CloudFlare in compliance with the DMCA, and CloudFlare failed to disable access to or remove the Infringing Websites.”

The photographer is asking the court to order an injunction preventing Cloudflare from making his work available. In addition, the complaint asks for actual and statutory damages for willful copyright infringement. With at least four photos in the lawsuit, the potential damages are more than half a million dollars.

While it’s not mentioned in the complaint, the email communication between Boffoli and Cloudflare goes further than just an automated response. Court records show that the photographer initially didn’t ask Cloudflare to remove the infringing photos. Instead, he asked the CDN provider to forward them to the ISP or site owner.

“I would be grateful if you would forward this DMCA takedown request to the website owner and ISP so these infringing links can immediately be removed,” it read.

Part of the email communication

From then on things escalated a bit. The emails reveal that Boffoli had trouble reporting the infringing photos through the required form.

When the photographer pointed this out in a direct email, Cloudflare urged him to try the form again as that was the only way to send the DMCA request to the designated copyright agent.

“The DMCA doesn’t require us to process reports not sent to our registered agent as per our registration with the US Copyright Office. Our registered copyright agent is the form located at cloudflare.com/abuse/form and you may proceed via that avenue,” Cloudflare wrote.

If the case moves forward, Cloudflare may use this to argue that it never received a proper DMCA takedown notice. However, Boffoli wasn’t planning on trying again and instead threatened a lawsuit, unless Cloudflare took immediate action.

“As I have said, your form did not work for me despite repeated attempts to use it. And it is insulting for you to suggest that it’s working fine when it is not. So again, this is absolutely my last attempt to get you to respond to this infringement for which you are impeding the removal,” Boffoli wrote.

“If you take no action now I will forward this to my legal team this week. It is more than enough of a burden to have to waste countless hours policing my own copyrights without organizations like Cloudflare running interference for copyright infringers. I am not averse to asking a federal judge to compel you to deal with these copyright infringements. And I will seek statutory damages for contributory infringement at that time.”

As it turns out, that was not an idle threat.

—-

A copy of the complaint is available here (pdf) and the email exhibits can be found here (pdf).

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Server vs Endpoint Backup — Which is Best?

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/endpoint-backup-for-distributed-computing/

server and computer backup to the cloud

How common are these statements in your organization?

  • I know I saved that file. The application must have put it somewhere outside of my documents folder.” — Mike in Marketing
  • I was on the road and couldn’t get a reliable VPN connection. I guess that’s why my laptop wasn’t backed up.” — Sally in Sales
  • I try to follow file policies, but I had a deadline this week and didn’t have time to copy my files to the server.” — Felicia in Finance
  • I just did a commit of my code changes and that was when the coffee mug was knocked over onto the laptop.” — Erin in Engineering
  • If you need a file restored from backup, contact the help desk at [email protected] The IT department will get back to you.” — XYZ corporate intranet
  • Why don’t employees save files on the network drive like they’re supposed to?” — Isaac in IT

If these statements are familiar, most likely you rely on file server backups to safeguard your valuable endpoint data.

The problem is, the workplace has changed. Where server backups might have fit how offices worked at one time in the past, relying solely on server backups today means you could be missing valuable endpoint data from your backups. On top of that, you likely are unnecessarily expending valuable user and IT time in attempting to secure and restore endpoint data.

Times Have Changed, and so have Effective Enterprise Backup Strategies

The ways we use computers and handle files today are vastly different from just five or ten years ago. Employees are mobile, and we no longer are limited to monolithic PC and Mac-based office suites. Cloud applications are everywhere. Company-mandated network drive policies are difficult to enforce as office practices change, devices proliferate, and organizational culture evolves. Besides, your IT staff has other things to do than babysit your employees to make sure they follow your organization’s policies for managing files.

Server Backup has its Place, but Does it Support How People Work Today?

Many organizations still rely on server backup. If your organization works primarily in centralized offices with all endpoints — likely desktops — connected directly to your network, and you maintain tight control of how employees manage their files, it still might work for you.

Your IT department probably has set network drive policies that require employees to save files in standard places that are regularly backed up to your file server. Turns out, though, that even standard applications don’t always save files where IT would like them to be. They could be in a directory or folder that’s not regularly backed up.

As employees have become more mobile, they have adopted practices that enable them to access files from different places, but these practices might not fit in with your organization’s server policies. An employee saving a file to Dropbox might be planning to copy it to an “official” location later, but whether that ever happens could be doubtful. Often people don’t realize until it’s too late that accidentally deleting a file in one sync service directory means that all copies in all locations — even the cloud — are also deleted.

Employees are under increasing demands to produce, which means that network drive policies aren’t always followed; time constraints and deadlines can cause best practices to go out the window. Users will attempt to comply with policies as best they can — and you might get 70% or even 75% effective compliance — but getting even to that level requires training, monitoring, and repeatedly reminding employees of policies they need to follow — none of which leads to a good work environment.

Even if you get to 75% compliance with network file policies, what happens if the critical file needed to close out an end-of-year financial summary isn’t one of the files backed up? The effort required for IT to get from 70% to 80% or 90% of an endpoint’s files effectively backed up could require multiple hours from your IT department, and you still might not have backed up the one critical file you need later.

Your Organization Operates on its Data — And Today That Data Exists in Multiple Locations

Users are no longer tied to one endpoint, and may use different computers in the office, at home, or traveling. The greater the number of endpoints used, the greater the chance of an accidental or malicious device loss or data corruption. The loss of the Sales VP’s laptop at the airport on her way back from meeting with major customers can affect an entire organization and require weeks to resolve.

Even with the best intentions and efforts, following policies when out of the office can be difficult or impossible. Connecting to your private network when remote most likely requires a VPN, and VPN connectivity can be challenging from the lobby Wi-Fi at the Radisson. Server restores require time from the IT staff, which can mean taking resources away from other IT priorities and a growing backlog of requests from users to need their files as soon as possible. When users are dependent on IT to get back files critical to their work, employee productivity and often deadlines are affected.

Managing Finite Server Storage Is an Ongoing Challenge

Network drive backup usually requires on-premises data storage for endpoint backups. Since it is a finite resource, allocating that storage is another burden on your IT staff. To make sure that storage isn’t exceeded, IT departments often ration storage by department and/or user — another oversight duty for IT, and even more choices required by your IT department and department heads who have to decide which files to prioritize for backing up.

Adding Backblaze Endpoint Backup Improves Business Continuity and Productivity

Having an endpoint backup strategy in place can mitigate these problems and improve user productivity, as well. A good endpoint backup service, such as Backblaze Cloud Backup, will ensure that all devices are backed up securely, automatically, without requiring any action by the user or by your IT department.

For 99% of users, no configuration is required for Backblaze Backup. Everything on the endpoint is encrypted and securely backed up to the cloud, including program configuration files and files outside of standard document folders. Even temp files are backed up, which can prove invaluable when recovering a file after a crash or other program interruption. Cloud storage is unlimited with Backblaze Backup, so there are no worries about running out of storage or rationing file backups.

The Backblaze client can be silently and remotely installed to both Macintosh and Windows clients with no user interaction. And, with Backblaze Groups, your IT staff has complete visibility into when files were last backed up. IT staff can recover any backed up file, folder, or entire computer from the admin panel, and even give file restore capability to the user, if desired, which reduces dependency on IT and time spent waiting for restores.

With over 500 petabytes of customer data stored and one million files restored every hour of every day by Backblaze customers, you know that Backblaze Backup works for its users.

You Need Data Security That Matches the Way People Work Today

Both file server and endpoint backup have their places in an organization’s data security plan, but their use and value differ. If you already are using file server backup, adding endpoint backup will make a valuable contribution to your organization by reducing workload, improving productivity, and increasing confidence that all critical files are backed up.

By guaranteeing fast and automatic backup of all endpoint data, and matching the current way organizations and people work with data, Backblaze Backup will enable you to effectively and affordably meet the data security demands of your organization.

The post Server vs Endpoint Backup — Which is Best? appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Migrating Your Amazon ECS Containers to AWS Fargate

Post Syndicated from Tiffany Jernigan original https://aws.amazon.com/blogs/compute/migrating-your-amazon-ecs-containers-to-aws-fargate/

AWS Fargate is a new technology that works with Amazon Elastic Container Service (ECS) to run containers without having to manage servers or clusters. What does this mean? With Fargate, you no longer need to provision or manage a single virtual machine; you can just create tasks and run them directly!

Fargate uses the same API actions as ECS, so you can use the ECS console, the AWS CLI, or the ECS CLI. I recommend running through the first-run experience for Fargate even if you’re familiar with ECS. It creates all of the one-time setup requirements, such as the necessary IAM roles. If you’re using a CLI, make sure to upgrade to the latest version

In this blog, you will see how to migrate ECS containers from running on Amazon EC2 to Fargate.

Getting started

Note: Anything with code blocks is a change in the task definition file. Screen captures are from the console. Additionally, Fargate is currently available in the us-east-1 (N. Virginia) region.

Launch type

When you create tasks (grouping of containers) and clusters (grouping of tasks), you now have two launch type options: EC2 and Fargate. The default launch type, EC2, is ECS as you knew it before the announcement of Fargate. You need to specify Fargate as the launch type when running a Fargate task.

Even though Fargate abstracts away virtual machines, tasks still must be launched into a cluster. With Fargate, clusters are a logical infrastructure and permissions boundary that allow you to isolate and manage groups of tasks. ECS also supports heterogeneous clusters that are made up of tasks running on both EC2 and Fargate launch types.

The optional, new requiresCompatibilities parameter with FARGATE in the field ensures that your task definition only passes validation if you include Fargate-compatible parameters. Tasks can be flagged as compatible with EC2, Fargate, or both.

"requiresCompatibilities": [
    "FARGATE"
]

Networking

"networkMode": "awsvpc"

In November, we announced the addition of task networking with the network mode awsvpc. By default, ECS uses the bridge network mode. Fargate requires using the awsvpc network mode.

In bridge mode, all of your tasks running on the same instance share the instance’s elastic network interface, which is a virtual network interface, IP address, and security groups.

The awsvpc mode provides this networking support to your tasks natively. You now get the same VPC networking and security controls at the task level that were previously only available with EC2 instances. Each task gets its own elastic networking interface and IP address so that multiple applications or copies of a single application can run on the same port number without any conflicts.

The awsvpc mode also provides a separation of responsibility for tasks. You can get complete control of task placement within your own VPCs, subnets, and the security policies associated with them, even though the underlying infrastructure is managed by Fargate. Also, you can assign different security groups to each task, which gives you more fine-grained security. You can give an application only the permissions it needs.

"portMappings": [
    {
        "containerPort": "3000"
    }
 ]

What else has to change? First, you only specify a containerPort value, not a hostPort value, as there is no host to manage. Your container port is the port that you access on your elastic network interface IP address. Therefore, your container ports in a single task definition file need to be unique.

"environment": [
    {
        "name": "WORDPRESS_DB_HOST",
        "value": "127.0.0.1:3306"
    }
 ]

Additionally, links are not allowed as they are a property of the “bridge” network mode (and are now a legacy feature of Docker). Instead, containers share a network namespace and communicate with each other over the localhost interface. They can be referenced using the following:

localhost/127.0.0.1:<some_port_number>

CPU and memory

"memory": "1024",
 "cpu": "256"

"memory": "1gb",
 "cpu": ".25vcpu"

When launching a task with the EC2 launch type, task performance is influenced by the instance types that you select for your cluster combined with your task definition. If you pick larger instances, your applications make use of the extra resources if there is no contention.

In Fargate, you needed a way to get additional resource information so we created task-level resources. Task-level resources define the maximum amount of memory and cpu that your task can consume.

  • memory can be defined in MB with just the number, or in GB, for example, “1024” or “1gb”.
  • cpu can be defined as the number or in vCPUs, for example, “256” or “.25vcpu”.
    • vCPUs are virtual CPUs. You can look at the memory and vCPUs for instance types to get an idea of what you may have used before.

The memory and CPU options available with Fargate are:

CPU Memory
256 (.25 vCPU) 0.5GB, 1GB, 2GB
512 (.5 vCPU) 1GB, 2GB, 3GB, 4GB
1024 (1 vCPU) 2GB, 3GB, 4GB, 5GB, 6GB, 7GB, 8GB
2048 (2 vCPU) Between 4GB and 16GB in 1GB increments
4096 (4 vCPU) Between 8GB and 30GB in 1GB increments

IAM roles

Because Fargate uses awsvpc mode, you need an Amazon ECS service-linked IAM role named AWSServiceRoleForECS. It provides Fargate with the needed permissions, such as the permission to attach an elastic network interface to your task. After you create your service-linked IAM role, you can delete the remaining roles in your services.

"executionRoleArn": "arn:aws:iam::<your_account_id>:role/ecsTaskExecutionRole"

With the EC2 launch type, an instance role gives the agent the ability to pull, publish, talk to ECS, and so on. With Fargate, the task execution IAM role is only needed if you’re pulling from Amazon ECR or publishing data to Amazon CloudWatch Logs.

The Fargate first-run experience tutorial in the console automatically creates these roles for you.

Volumes

Fargate currently supports non-persistent, empty data volumes for containers. When you define your container, you no longer use the host field and only specify a name.

Load balancers

For awsvpc mode, and therefore for Fargate, use the IP target type instead of the instance target type. You define this in the Amazon EC2 service when creating a load balancer.

If you’re using a Classic Load Balancer, change it to an Application Load Balancer or a Network Load Balancer.

Tip: If you are using an Application Load Balancer, make sure that your tasks are launched in the same VPC and Availability Zones as your load balancer.

Let’s migrate a task definition!

Here is an example NGINX task definition. This type of task definition is what you’re used to if you created one before Fargate was announced. It’s what you would run now with the EC2 launch type.

{
    "containerDefinitions": [
        {
            "name": "nginx",
            "image": "nginx",
            "memory": "512",
            "cpu": "100",
            "essential": true,
            "portMappings": [
                {
                    "hostPort": "80",
                    "containerPort": "80",
                    "protocol": "tcp"
                }
            ],
            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-group": "/ecs/",
                    "awslogs-region": "us-east-1",
                    "awslogs-stream-prefix": "ecs"
                }
            }
        }
    ],
    "family": "nginx-ec2"
}

OK, so now what do you need to do to change it to run with the Fargate launch type?

  • Add FARGATE for requiredCompatibilities (not required, but a good safety check for your task definition).
  • Use awsvpc as the network mode.
  • Just specify the containerPort (the hostPortvalue is the same).
  • Add a task executionRoleARN value to allow logging to CloudWatch.
  • Provide cpu and memory limits for the task.
{
    "requiresCompatibilities": [
        "FARGATE"
    ],
    "containerDefinitions": [
        {
            "name": "nginx",
            "image": "nginx",
            "memory": "512",
            "cpu": "100",
            "essential": true,
            "portMappings": [
                {
                    "containerPort": "80",
                    "protocol": "tcp"
                }
            ],
            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-group": "/ecs/",
                    "awslogs-region": "us-east-1",
                    "awslogs-stream-prefix": "ecs"
                }
            }
        }
    ],
    "networkMode": "awsvpc",
    "executionRoleArn": "arn:aws:iam::<your_account_id>:role/ecsTaskExecutionRole",
    "family": "nginx-fargate",
    "memory": "512",
    "cpu": "256"
}

Are there more examples?

Yep! Head to the AWS Samples GitHub repo. We have several sample task definitions you can try for both the EC2 and Fargate launch types. Contributions are very welcome too :).

 

tiffany jernigan
@tiffanyfayj

RIAA: Cox Ruling Shows that Grande Can Be Liable for Piracy Too

Post Syndicated from Ernesto original https://torrentfreak.com/riaa-cox-ruling-shows-that-grande-can-be-liable-for-piracy-too-180207/

Regular Internet providers are being put under increasing pressure for not doing enough to curb copyright infringement.

Last year several major record labels, represented by the RIAA, filed a lawsuit in a Texas District Court, accusing ISP Grande Communications of turning a blind eye on its pirating subscribers.

“Despite their knowledge of repeat infringements, Defendants have permitted repeat infringers to use the Grande service to continue to infringe Plaintiffs’ copyrights without consequence,” the RIAA’s complaint read.

Grande disagreed with this assertion and filed a motion to dismiss the case. The ISP argued that it doesn’t encourage any of its customers to download copyrighted works, and that it has no control over the content subscribers access.

The Internet provider didn’t deny that it received millions of takedown notices through the piracy tracking company Rightscorp. However, it believed that these notices are flawed and not worthy of acting upon.

The case shows a lot of similarities with the legal battle between BMG and Cox Communications, in which the Fourth Circuit Court of Appeals issued an important verdict last week.

The appeals court overturned the $25 million piracy damages verdict against Cox due to an erroneous jury instruction but held that the ISP lost its safe harbor protection because it failed to implement a meaningful repeat infringer policy.

This week, the RIAA used the Fourth Circuit ruling as further evidence that Grande’s motion to dismiss should be denied.

The RIAA points out that both Cox and Grande used similar arguments in their defense, some of which were denied by the appeals court. The Fourth Circuit held, for example, that an ISP’s substantial non-infringing uses does not immunize it from liability for contributory copyright infringement.

In addition, the appeals court also clarified that if an ISP wilfully blinds itself to copyright infringements, that is sufficient to satisfy the knowledge requirement for contributory copyright infringement.

According to the RIAA’s filing at a Texas District Court this week, Grande has already admitted that it willingly ‘ignored’ takedown notices that were submitted on behalf of third-party copyright holders.

“Grande has already admitted that it received notices from Rightscorp and, to use Grande’s own phrase, did not ‘meaningfully investigate’ them,” the RIAA writes.

“Thus, even if this Court were to apply the Fourth Circuit’s ‘willful blindness’ standard, the level of knowledge that Grande has effectively admitted exceeds the level of knowledge that the Fourth Circuit held was ‘powerful evidence’ sufficient to establish liability for contributory infringement.”

As such, the motion to dismiss the case should be denied, the RIAA argues.

What’s not mentioned in the RIAA’s filing, however, is why Grande chose not to act upon these takedown notices. In its defense, the ISP previously explained that Rightcorp’s notices lacked specificity and were incapable of detecting actual infringements.

Grande argued that if they acted on these notices without additional proof, its subscribers could lose their Internet access even though they are using it for legal purposes. The ISP may, therefore, counter that it wasn’t willfully blind, as it saw no solid proof for the alleged infringements to begin with.

“To merely treat these allegations as true without investigation would be a disservice to Grande’s subscribers, who would run the risk of having their Internet service permanently terminated despite using Grande’s services for completely legitimate purposes,” Grande previously wrote.

This brings up a tricky issue. The Fourth Circuit made it clear last week that ISPs require a meaningful policy against repeat infringers in respond to takedown notices from copyright holders. But what are the requirements for a proper takedown notice? Do any and all notices count?

Grande clearly has no faith in the accuracy of Rightscorp’s technology but if their case goes in the same direction as Cox’s, that might not make much of a difference.

A copy of the RIAA’s summary of supplemental authority is available here (pdf).

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Build a Multi-Tenant Amazon EMR Cluster with Kerberos, Microsoft Active Directory Integration and EMRFS Authorization

Post Syndicated from Songzhi Liu original https://aws.amazon.com/blogs/big-data/build-a-multi-tenant-amazon-emr-cluster-with-kerberos-microsoft-active-directory-integration-and-emrfs-authorization/

One of the challenges faced by our customers—especially those in highly regulated industries—is balancing the need for security with flexibility. In this post, we cover how to enable multi-tenancy and increase security by using EMRFS (EMR File System) authorization, the Amazon S3 storage-level authorization on Amazon EMR.

Amazon EMR is an easy, fast, and scalable analytics platform enabling large-scale data processing. EMRFS authorization provides Amazon S3 storage-level authorization by configuring EMRFS with multiple IAM roles. With this functionality enabled, different users and groups can share the same cluster and assume their own IAM roles respectively.

Simply put, on Amazon EMR, we can now have an Amazon EC2 role per user assumed at run time instead of one general EC2 role at the cluster level. When the user is trying to access Amazon S3 resources, Amazon EMR evaluates against a predefined mappings list in EMRFS configurations and picks up the right role for the user.

In this post, we will discuss what EMRFS authorization is (Amazon S3 storage-level access control) and show how to configure the role mappings with detailed examples. You will then have the desired permissions in a multi-tenant environment. We also demo Amazon S3 access from HDFS command line, Apache Hive on Hue, and Apache Spark.

EMRFS authorization for Amazon S3

There are two prerequisites for using this feature:

  1. Users must be authenticated, because EMRFS needs to map the current user/group/prefix to a predefined user/group/prefix. There are several authentication options. In this post, we launch a Kerberos-enabled cluster that manages the Key Distribution Center (KDC) on the master node, and enable a one-way trust from the KDC to a Microsoft Active Directory domain.
  2. The application must support accessing Amazon S3 via Applications that have their own S3FileSystem APIs (for example, Presto) are not supported at this time.

EMRFS supports three types of mapping entries: user, group, and Amazon S3 prefix. Let’s use an example to show how this works.

Assume that you have the following three identities in your organization, and they are defined in the Active Directory:

To enable all these groups and users to share the EMR cluster, you need to define the following IAM roles:

In this case, you create a separate Amazon EC2 role that doesn’t give any permission to Amazon S3. Let’s call the role the base role (the EC2 role attached to the EMR cluster), which in this example is named EMR_EC2_RestrictedRole. Then, you define all the Amazon S3 permissions for each specific user or group in their own roles. The restricted role serves as the fallback role when the user doesn’t belong to any user/group, nor does the user try to access any listed Amazon S3 prefixes defined on the list.

Important: For all other roles, like emrfs_auth_group_role_data_eng, you need to add the base role (EMR_EC2_RestrictedRole) as the trusted entity so that it can assume other roles. See the following example:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "ec2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    },
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::511586466501:role/EMR_EC2_RestrictedRole"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

The following is an example policy for the admin user role (emrfs_auth_user_role_admin_user):

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "s3:*",
            "Resource": "*"
        }
    ]
}

We are assuming the admin user has access to all buckets in this example.

The following is an example policy for the data science group role (emrfs_auth_group_role_data_sci):

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Resource": [
                "arn:aws:s3:::emrfs-auth-data-science-bucket-demo/*",
                "arn:aws:s3:::emrfs-auth-data-science-bucket-demo"
            ],
            "Action": [
                "s3:*"
            ]
        }
    ]
}

This role grants all Amazon S3 permissions to the emrfs-auth-data-science-bucket-demo bucket and all the objects in it. Similarly, the policy for the role emrfs_auth_group_role_data_eng is shown below:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Resource": [
                "arn:aws:s3:::emrfs-auth-data-engineering-bucket-demo/*",
                "arn:aws:s3:::emrfs-auth-data-engineering-bucket-demo"
            ],
            "Action": [
                "s3:*"
            ]
        }
    ]
}

Example role mappings configuration

To configure EMRFS authorization, you use EMR security configuration. Here is the configuration we use in this post

Consider the following scenario.

First, the admin user admin1 tries to log in and run a command to access Amazon S3 data through EMRFS. The first role emrfs_auth_user_role_admin_user on the mapping list, which is a user role, is mapped and picked up. Then admin1 has access to the Amazon S3 locations that are defined in this role.

Then a user from the data engineer group (grp_data_engineering) tries to access a data bucket to run some jobs. When EMRFS sees that the user is a member of the grp_data_engineering group, the group role emrfs_auth_group_role_data_eng is assumed, and the user has proper access to Amazon S3 that is defined in the emrfs_auth_group_role_data_eng role.

Next, the third user comes, who is not an admin and doesn’t belong to any of the groups. After failing evaluation of the top three entries, EMRFS evaluates whether the user is trying to access a certain Amazon S3 prefix defined in the last mapping entry. This type of mapping entry is called the prefix type. If the user is trying to access s3://emrfs-auth-default-bucket-demo/, then the prefix mapping is in effect, and the prefix role emrfs_auth_prefix_role_default_s3_prefix is assumed.

If the user is not trying to access any of the Amazon S3 paths that are defined on the list—which means it failed the evaluation of all the entries—it only has the permissions defined in the EMR_EC2RestrictedRole. This role is assumed by the EC2 instances in the cluster.

In this process, all the mappings defined are evaluated in the defined order, and the first role that is mapped is assumed, and the rest of the list is skipped.

Setting up an EMR cluster and mapping Active Directory users and groups

Now that we know how EMRFS authorization role mapping works, the next thing we need to think about is how we can use this feature in an easy and manageable way.

Active Directory setup

Many customers manage their users and groups using Microsoft Active Directory or other tools like OpenLDAP. In this post, we create the Active Directory on an Amazon EC2 instance running Windows Server and create the users and groups we will be using in the example below. After setting up Active Directory, we use the Amazon EMR Kerberos auto-join capability to establish a one-way trust from the KDC running on the EMR master node to the Active Directory domain on the EC2 instance. You can use your own directory services as long as it talks to the LDAP (Lightweight Directory Access Protocol).

To create and join Active Directory to Amazon EMR, follow the steps in the blog post Use Kerberos Authentication to Integrate Amazon EMR with Microsoft Active Directory.

After configuring Active Directory, you can create all the users and groups using the Active Directory tools and add users to appropriate groups. In this example, we created users like admin1, dataeng1, datascientist1, grp_data_engineering, and grp_data_science, and then add the users to the right groups.

Join the EMR cluster to an Active Directory domain

For clusters with Kerberos, Amazon EMR now supports automated Active Directory domain joins. You can use the security configuration to configure the one-way trust from the KDC to the Active Directory domain. You also configure the EMRFS role mappings in the same security configuration.

The following is an example of the EMR security configuration with a trusted Active Directory domain EMRKRB.TEST.COM and the EMRFS role mappings as we discussed earlier:

The EMRFS role mapping configuration is shown in this example:

We will also provide an example AWS CLI command that you can run.

Launching the EMR cluster and running the tests

Now you have configured Kerberos and EMRFS authorization for Amazon S3.

Additionally, you need to configure Hue with Active Directory using the Amazon EMR configuration API in order to log in using the AD users created before. The following is an example of Hue AD configuration.

[
  {
    "Classification":"hue-ini",
    "Properties":{

    },
    "Configurations":[
      {
        "Classification":"desktop",
        "Properties":{

        },
        "Configurations":[
          {
            "Classification":"ldap",
            "Properties":{

            },
            "Configurations":[
              {
                "Classification":"ldap_servers",
                "Properties":{

                },
                "Configurations":[
                  {
                    "Classification":"AWS",
                    "Properties":{
                      "base_dn":"DC=emrkrb,DC=test,DC=com",
                      "ldap_url":"ldap://emrkrb.test.com",
                      "search_bind_authentication":"false",
                      "bind_dn":"CN=adjoiner,CN=users,DC=emrkrb,DC=test,DC=com",
                      "bind_password":"Abc123456",
                      "create_users_on_login":"true",
                      "nt_domain":"emrkrb.test.com"
                    },
                    "Configurations":[

                    ]
                  }
                ]
              }
            ]
          },
          {
            "Classification":"auth",
            "Properties":{
              "backend":"desktop.auth.backend.LdapBackend"
            },
            "Configurations":[

            ]
          }
        ]
      }
    ]
  }

Note: In the preceding configuration JSON file, change the values as required before pasting it into the software setting section in the Amazon EMR console.

Now let’s use this configuration and the security configuration you created before to launch the cluster.

In the Amazon EMR console, choose Create cluster. Then choose Go to advanced options. On the Step1: Software and Steps page, under Edit software settings (optional), paste the configuration in the box.

The rest of the setup is the same as an ordinary cluster setup, except in the Security Options section. In Step 4: Security, under Permissions, choose Custom, and then choose the RestrictedRole that you created before.

Choose the appropriate subnets (these should meet the base requirement in order for a successful Active Directory join—see the Amazon EMR Management Guide for more details), and choose the appropriate security groups to make sure it talks to the Active Directory. Choose a key so that you can log in and configure the cluster.

Most importantly, choose the security configuration that you created earlier to enable Kerberos and EMRFS authorization for Amazon S3.

You can use the following AWS CLI command to create a cluster.

aws emr create-cluster --name "TestEMRFSAuthorization" \ 
--release-label emr-5.10.0 \ --instance-type m3.xlarge \ 
--instance-count 3 \ 
--ec2-attributes InstanceProfile=EMR_EC2_DefaultRole,KeyName=MyEC2KeyPair \ --service-role EMR_DefaultRole \ 
--security-configuration MyKerberosConfig \ 
--configurations file://hue-config.json \
--applications Name=Hadoop Name=Hive Name=Hue Name=Spark \ 
--kerberos-attributes Realm=EC2.INTERNAL, \ KdcAdminPassword=<YourClusterKDCAdminPassword>, \ ADDomainJoinUser=<YourADUserLogonName>,ADDomainJoinPassword=<YourADUserPassword>, \ 
CrossRealmTrustPrincipalPassword=<MatchADTrustPwd>

Note: If you create the cluster using CLI, you need to save the JSON configuration for Hue into a file named hue-config.json and place it on the server where you run the CLI command.

After the cluster gets into the Waiting state, try to connect by using SSH into the cluster using the Active Directory user name and password.

ssh -l [email protected] <EMR IP or DNS name>

Quickly run two commands to show that the Active Directory join is successful:

  1. id [user name] shows the mapped AD users and groups in Linux.
  2. hdfs groups [user name] shows the mapped group in Hadoop.

Both should return the current Active Directory user and group information if the setup is correct.

Now, you can test the user mapping first. Log in with the admin1 user, and run a Hadoop list directory command:

hadoop fs -ls s3://emrfs-auth-data-science-bucket-demo/

Now switch to a user from the data engineer group.

Retry the previous command to access the admin’s bucket. It should throw an Amazon S3 Access Denied exception.

When you try listing the Amazon S3 bucket that a data engineer group member has accessed, it triggers the group mapping.

hadoop fs -ls s3://emrfs-auth-data-engineering-bucket-demo/

It successfully returns the listing results. Next we will test Apache Hive and then Apache Spark.

 

To run jobs successfully, you need to create a home directory for every user in HDFS for staging data under /user/<username>. Users can configure a step to create a home directory at cluster launch time for every user who has access to the cluster. In this example, you use Hue since Hue will create the home directory in HDFS for the user at the first login. Here Hue also needs to be integrated with the same Active Directory as explained in the example configuration described earlier.

First, log in to Hue as a data engineer user, and open a Hive Notebook in Hue. Then run a query to create a new table pointing to the data engineer bucket, s3://emrfs-auth-data-engineering-bucket-demo/table1_data_eng/.

You can see that the table was created successfully. Now try to create another table pointing to the data science group’s bucket, where the data engineer group doesn’t have access.

It failed and threw an Amazon S3 Access Denied error.

Now insert one line of data into the successfully create table.

Next, log out, switch to a data science group user, and create another table, test2_datasci_tb.

The creation is successful.

The last task is to test Spark (it requires the user directory, but Hue created one in the previous step).

Now let’s come back to the command line and run some Spark commands.

Login to the master node using the datascientist1 user:

Start the SparkSQL interactive shell by typing spark-sql, and run the show tables command. It should list the tables that you created using Hive.

As a data science group user, try select on both tables. You will find that you can only select the table defined in the location that your group has access to.

Conclusion

EMRFS authorization for Amazon S3 enables you to have multiple roles on the same cluster, providing flexibility to configure a shared cluster for different teams to achieve better efficiency. The Active Directory integration and group mapping make it much easier for you to manage your users and groups, and provides better auditability in a multi-tenant environment.


Additional Reading

If you found this post useful, be sure to check out Use Kerberos Authentication to Integrate Amazon EMR with Microsoft Active Directory and Launching and Running an Amazon EMR Cluster inside a VPC.


About the Authors

Songzhi Liu is a Big Data Consultant with AWS Professional Services. He works closely with AWS customers to provide them Big Data & Machine Learning solutions and best practices on the Amazon cloud.

 

 

 

 

Nextcloud 13 is out

Post Syndicated from ris original https://lwn.net/Articles/746710/rss

Nextcloud 13 has been released. “This release brings improvements to the core File Sync and Share like easier moving of files and a tech preview of our end-to-end encryption for the ultimate protection of your data. It also introduces collaboration and communication capabilities, like auto-complete of comments and integrated real-time chat and video communication. Last but not least, Nextcloud was optimized and tuned to deliver up to 80% faster LDAP, much faster object storage and Windows Network Drive performance and a smoother user interface.