Tag Archives: talk

Steal This Show S03E13: The Tao of The DAO

Post Syndicated from Ernesto original https://torrentfreak.com/steal-show-s03e13-tao-dao/

stslogo180If you enjoy this episode, consider becoming a patron and getting involved with the show. Check out Steal This Show’s Patreon campaign: support us and get all kinds of fantastic benefits!

In this episode, we meet Chris Beams, founder of the decentralized cryptocurrency exchange Bisq. We discuss the concept of DAOs (Decentralised Autonomous Organisations) and whether The Pirate Bay was an early example; how the start of Bitcoin parallels the start of the Internet itself; and why the meretricious Bitcoin Cash fork of Bitcoin is based on a misunderstanding of Open Source development.

Finally, we get into Bisq itself, discussing the potential political importance of decentralized crypto exchanges in the context of any future attempts by the financial establishment to control cryptocurrency.

Steal This Show aims to release bi-weekly episodes featuring insiders discussing copyright and file-sharing news. It complements our regular reporting by adding more room for opinion, commentary, and analysis.

The guests for our news discussions will vary, and we’ll aim to introduce voices from different backgrounds and persuasions. In addition to news, STS will also produce features interviewing some of the great innovators and minds.

Host: Jamie King

Guest: Chris Beams

Produced by Jamie King
Edited & Mixed by Riley Byrne
Original Music by David Triana
Web Production by Siraje Amarniss

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Tech wishes for 2018

Post Syndicated from Eevee original https://eev.ee/blog/2018/02/18/tech-wishes-for-2018/

Anonymous asks, via money:

What would you like to see happen in tech in 2018?

(answer can be technical, social, political, combination, whatever)

Hmm.

Less of this

I’m not really qualified to speak in depth about either of these things, but let me put my foot in my mouth anyway:

The Blockchain™

Bitcoin was a neat idea. No, really! Decentralization is cool. Overhauling our terrible financial infrastructure is cool. Hash functions are cool.

Unfortunately, it seems to have devolved into mostly a get-rich-quick scheme for nerds, and by nearly any measure it’s turning into a spectacular catastrophe. Its “success” is measured in how much a bitcoin is worth in US dollars, which is pretty close to an admission from its own investors that its only value is in converting back to “real” money — all while that same “success” is making it less useful as a distinct currency.

Blah, blah, everyone already knows this.

What concerns me slightly more is the gold rush hype cycle, which is putting cryptocurrency and “blockchain” in the news and lending it all legitimacy. People have raked in millions of dollars on ICOs of novel coins I’ve never heard mentioned again. (Note: again, that value is measured in dollars.) Most likely, none of the investors will see any return whatsoever on that money. They can’t, really, unless a coin actually takes off as a currency, and that seems at odds with speculative investing since everyone either wants to hoard or ditch their coins. When the coins have no value themselves, the money can only come from other investors, and eventually the hype winds down and you run out of other investors.

I fear this will hurt a lot of people before it’s over, so I’d like for it to be over as soon as possible.


That said, the hype itself has gotten way out of hand too. First it was the obsession with “blockchain” like it’s a revolutionary technology, but hey, Git is a fucking blockchain. The novel part is the way it handles distributed consensus (which in Git is basically left for you to figure out), and that’s uniquely important to currency because you want to be pretty sure that money doesn’t get duplicated or lost when moved around.

But now we have startups trying to use blockchains for website backends and file storage and who knows what else? Why? What advantage does this have? When you say “blockchain”, I hear “single Git repository” — so when you say “email on the blockchain”, I have an aneurysm.

Bitcoin seems to have sparked imagination in large part because it’s decentralized, but I’d argue it’s actually a pretty bad example of a decentralized network, since people keep forking it. The ability to fork is a feature, sure, but the trouble here is that the Bitcoin family has no notion of federation — there is one canonical Bitcoin ledger and it has no notion of communication with any other. That’s what you want for currency, not necessarily other applications. (Bitcoin also incentivizes frivolous forking by giving the creator an initial pile of coins to keep and sell.)

And federation is much more interesting than decentralization! Federation gives us email and the web. Federation means I can set up my own instance with my own rules and still be able to meaningfully communicate with the rest of the network. Federation has some amount of tolerance for changes to the protocol, so such changes are more flexible and rely more heavily on consensus.

Federation is fantastic, and it feels like a massive tragedy that this rekindled interest in decentralization is mostly focused on peer-to-peer networks, which do little to address our current problems with centralized platforms.

And hey, you know what else is federated? Banks.

AI

Again, the tech is cool and all, but the marketing hype is getting way out of hand.

Maybe what I really want from 2018 is less marketing?

For one, I’ve seen a huge uptick in uncritically referring to any software that creates or classifies creative work as “AI”. Can we… can we not. It’s not AI. Yes, yes, nerds, I don’t care about the hair-splitting about the nature of intelligence — you know that when we hear “AI” we think of a human-like self-aware intelligence. But we’re applying it to stuff like a weird dog generator. Or to whatever neural network a website threw into production this week.

And this is dangerously misleading — we already had massive tech companies scapegoating The Algorithm™ for the poor behavior of their software, and now we’re talking about those algorithms as though they were self-aware, untouchable, untameable, unknowable entities of pure chaos whose decisions we are arbitrarily bound to. Ancient, powerful gods who exist just outside human comprehension or law.

It’s weird to see this stuff appear in consumer products so quickly, too. It feels quick, anyway. The latest iPhone can unlock via facial recognition, right? I’m sure a lot of effort was put into ensuring that the same person’s face would always be recognized… but how confident are we that other faces won’t be recognized? I admit I don’t follow all this super closely, so I may be imagining a non-problem, but I do know that humans are remarkably bad at checking for negative cases.

Hell, take the recurring problem of major platforms like Twitter and YouTube classifying anything mentioning “bisexual” as pornographic — because the word is also used as a porn genre, and someone threw a list of porn terms into a filter without thinking too hard about it. That’s just a word list, a fairly simple thing that any human can review; but suddenly we’re confident in opaque networks of inferred details?

I don’t know. “Traditional” classification and generation are much more comforting, since they’re a set of fairly abstract rules that can be examined and followed. Machine learning, as I understand it, is less about rules and much more about pattern-matching; it’s built out of the fingerprints of the stuff it’s trained on. Surely that’s just begging for tons of edge cases. They’re practically made of edge cases.


I’m reminded of a point I saw made a few days ago on Twitter, something I’d never thought about but should have. TurnItIn is a service for universities that checks whether students’ papers match any others, in order to detect cheating. But this is a paid service, one that fundamentally hinges on its corpus: a large collection of existing student papers. So students pay money to attend school, where they’re required to let their work be given to a third-party company, which then profits off of it? What kind of a goofy business model is this?

And my thoughts turn to machine learning, which is fundamentally different from an algorithm you can simply copy from a paper, because it’s all about the training data. And to get good results, you need a lot of training data. Where is that all coming from? How many for-profit companies are setting a neural network loose on the web — on millions of people’s work — and then turning around and selling the result as a product?

This is really a question of how intellectual property works in the internet era, and it continues our proud decades-long tradition of just kinda doing whatever we want without thinking about it too much. Nothing if not consistent.

More of this

A bit tougher, since computers are pretty alright now and everything continues to chug along. Maybe we should just quit while we’re ahead. There’s some real pie-in-the-sky stuff that would be nice, but it certainly won’t happen within a year, and may never happen except in some horrific Algorithmic™ form designed by people that don’t know anything about the problem space and only works 60% of the time but is treated as though it were bulletproof.

Federation

The giants are getting more giant. Maybe too giant? Granted, it could be much worse than Google and Amazon — it could be Apple!

Amazon has its own delivery service and brick-and-mortar stores now, as well as providing the plumbing for vast amounts of the web. They’re not doing anything particularly outrageous, but they kind of loom.

Ad company Google just put ad blocking in its majority-share browser — albeit for the ambiguously-noble goal of only blocking obnoxious ads so that people will be less inclined to install a blanket ad blocker.

Twitter is kind of a nightmare but no one wants to leave. I keep trying to use Mastodon as well, but I always forget about it after a day, whoops.

Facebook sounds like a total nightmare but no one wants to leave that either, because normies don’t use anything else, which is itself direly concerning.

IRC is rapidly bleeding mindshare to Slack and Discord, both of which are far better at the things IRC sadly never tried to do and absolutely terrible at the exact things IRC excels at.

The problem is the same as ever: there’s no incentive to interoperate. There’s no fundamental technical reason why Twitter and Tumblr and MySpace and Facebook can’t intermingle their posts; they just don’t, because why would they bother? It’s extra work that makes it easier for people to not use your ecosystem.

I don’t know what can be done about that, except that hope for a really big player to decide to play nice out of the kindness of their heart. The really big federated success stories — say, the web — mostly won out because they came along first. At this point, how does a federated social network take over? I don’t know.

Social progress

I… don’t really have a solid grasp on what’s happening in tech socially at the moment. I’ve drifted a bit away from the industry part, which is where that all tends to come up. I have the vague sense that things are improving, but that might just be because the Rust community is the one I hear the most about, and it puts a lot of effort into being inclusive and welcoming.

So… more projects should be like Rust? Do whatever Rust is doing? And not so much what Linus is doing.

Open source funding

I haven’t heard this brought up much lately, but it would still be nice to see. The Bay Area runs on open source and is raking in zillions of dollars on its back; pump some of that cash back into the ecosystem, somehow.

I’ve seen a couple open source projects on Patreon, which is fantastic, but feels like a very small solution given how much money is flowing through the commercial tech industry.

Ad blocking

Nice. Fuck ads.

One might wonder where the money to host a website comes from, then? I don’t know. Maybe we should loop this in with the above thing and find a more informal way to pay people for the stuff they make when we find it useful, without the financial and cognitive overhead of A Transaction or Giving Someone My Damn Credit Card Number. You know, something like Bitco— ah, fuck.

Year of the Linux Desktop

I don’t know. What are we working on at the moment? Wayland? Do Wayland, I guess. Oh, and hi-DPI, which I hear sucks. And please fix my sound drivers so PulseAudio stops blaming them when it fucks up.

Subtitle Heroes: Fansubbing Movie Criticized For Piracy Promotion

Post Syndicated from Andy original https://torrentfreak.com/subtitle-heroes-fansubbing-movie-criticized-for-piracy-promotion-180217/

With many thousands of movies and TV shows being made available illegally online every year, a significant number will be enjoyed by speakers of languages other than that presented in the original production.

When Hollywood blockbusters appear online, small armies of individuals around the world spring into action, translating the dialog into Chinese and Czech, Dutch and Danish, French and Farsi, Russian and Romanian, plus a dozen languages in between. TV shows, particularly those produced in the US, get the same immediate treatment.

For many years, subtitling (‘fansubbing’) communities have provided an incredible service to citizens around the globe, from those seeking to experience new culture and languages to the hard of hearing and profoundly deaf. Now, following in the footsteps of movies like TPB:AFK and Kim Dotcom: Caught in the Web, a new movie has premiered in Italy which celebrates this extraordinary movement.

Subs Heroes from writer and director Franco Dipietro hit cinemas at the end of January. It documents the contribution fansubbing has made to Italian culture in a country that under fascism in 1934 banned the use of foreign languages in films, books, newspapers and everyday speech.

The movie centers on the large subtitle site ItalianSubs.net. Founded by a group of teenagers in 2006, it is now run by a team of men and women who maintain their identities as regular citizens during the day but transform into “superheroes of fansubbing” at night.

Needless to say, not everyone is pleased with this depiction of the people behind the now-infamous 500,000 member site.

For many years, fansubbing attracted very little heat but over time anti-piracy groups have been turning up the pressure, accusing subtitling teams of fueling piracy. This notion is shared by local anti-piracy outfit FAPAV (Federation for the Protection of Audiovisual and Multimedia Content), which has accused Dipietro’s movie of glamorizing criminal activity.

In a statement following the release of Subs Heroes, FAPAV made its position crystal clear: sites like ItalianSubs do not contribute to the development of the audiovisual market in Italy.

“It is necessary to clarify: when a protected work is subtitled and there is no right to do so, a crime is committed,” the anti-piracy group says.

“[Italiansubs] translates and makes available subtitles of audiovisual works (films and television series) in many cases not yet distributed on the Italian market. All this without having requested the consent of the rights holders. Ergo the Italiansubs community is illegal.”

Italiansubs (note ad for movie, top right)

FAPAV General Secretary Federico Bagnoli Rossi says that the impact that fansubbers have on the market is significant, causing damage not only to companies distributing the content but also to those who invest in official translations.

The fact that fansubbers often translate content that is not yet available in the region only compounds matters, Rossi says, noting that unofficial translations can also have “direct consequences” on those who have language dubbing as an occupation.

“The audiovisual market today needs to be supported and the protection and fight against illicit behaviors are as fundamental as investments and creative ideas,” Rossi notes.

“Everyone must do their part, respecting the rules and with a competitive and global cultural vision. There are no ‘superheroes’ or noble goals behind piracy, but only great damage to the audiovisual sector and all its workers.”

Also piling on the criticism is the chief of the National Cinema Exhibitors’ Association, who wrote to all of the companies involved to remind them that unauthorized subtitling is a crime. According to local reports, there seems to be an underlying tone that people should avoid becoming associated with the movie.

This did not please director Franco Dipietro who is defending his right to document the fansubbing movement, whether the industry likes it or not.

“We invite those who perhaps think differently to deepen the discussion and maybe organize an event to talk about it together. The film is made to confront and talk about a phenomenon that, whether we like it or not, exists and we can not pretend that it is not there,” Dipietro concludes.



Subs Heroes Trailer 1 from Duel: on Vimeo.



Subs Heroes Trailer 2 from Duel: on Vimeo.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

FOSS Project Spotlight: LinuxBoot (Linux Journal)

Post Syndicated from jake original https://lwn.net/Articles/747380/rss

Linux Journal takes a look at the newly announced LinuxBoot project. LWN covered a related talk back in November. “Modern firmware generally consists of two main parts: hardware initialization (early stages) and OS loading (late stages). These parts may be divided further depending on the implementation, but the overall flow is similar across boot firmware. The late stages have gained many capabilities over the years and often have an environment with drivers, utilities, a shell, a graphical menu (sometimes with 3D animations) and much more. Runtime components may remain resident and active after firmware exits. Firmware, which used to fit in an 8 KiB ROM, now contains an OS used to boot another OS and doesn’t always stop running after the OS boots. LinuxBoot replaces the late stages with a Linux kernel and initramfs, which are used to load and execute the next stage, whatever it may be and wherever it may come from. The Linux kernel included in LinuxBoot is called the ‘boot kernel’ to distinguish it from the ‘target kernel’ that is to be booted and may be something other than Linux.

HackSpace magazine 4: the wearables issue

Post Syndicated from Andrew Gregory original https://www.raspberrypi.org/blog/hackspace-4-wearables/

Big things are afoot in the world of HackSpace magazine! This month we’re running our first special issue, with wearables projects throughout the magazine. Moreover, we’re giving away our first subscription gift free to all 12-month print subscribers. Lastly, and most importantly, we’ve made the cover EXTRA SHINY!

HackSpace magazine issue 4 cover

Prepare your eyeballs — it’s HackSpace magazine issue 4!

Wearables

In this issue, we’re taking an in-depth look at wearable tech. Not Fitbits or Apple Watches — we’re talking stuff you can make yourself, from projects that take a couple of hours to put together, to the huge, inspiring builds that are bringing technology to the runway. If you like wearing clothes and you like using your brain to make things better, then you’ll love this feature.

We’re continuing our obsession with Nixie tubes, with the brilliant Time-To-Go-Clock – Trump edition. This ingenious bit of kit uses obsolete Russian electronics to count down the time until the end of the 45th president’s term in office. However, you can also program it to tell the time left to any predictable event, such as the deadline for your tax return or essay submission, or the date England gets knocked out of the World Cup.

HackSpace magazine page 08
HackSpace magazine page 70
HackSpace magazine issue 4 page 98

We’re also talking to Dr Lucy Rogers — NASA alumna, Robot Wars judge, and fellow of the Institution of Mechanical Engineers — about the difference between making as a hobby and as a job, and about why we need the Guild of Makers. Plus, issue 4 has a teeny boat, the most beautiful Raspberry Pi cases you’ve ever seen, and it explores the results of what happens when you put a bunch of hardware hackers together in a French chateau — sacré bleu!

Tutorials

As always, we’ve got more how-tos than you can shake a soldering iron at. Fittingly for the current climate here in the UK, there’s a hot water monitor, which shows you how long you have before your morning shower turns cold, and an Internet of Tea project to summon a cuppa from your kettle via the web. Perhaps not so fittingly, there’s also an ESP8266 project for monitoring a solar power station online. Readers in the southern hemisphere, we’ll leave that one for you — we haven’t seen the sun here for months!

And there’s more!

We’re super happy to say that all our 12-month print subscribers have been sent an Adafruit Circuit Playground Express with this new issue:

Adafruit Circuit Playground Express HackSpace

This gadget was developed primarily with wearables in mind and comes with all sorts of in-built functionality, so subscribers can get cracking with their latest wearable project today! If you’re not a 12-month print subscriber, you’ll miss out, so subscribe here to get your magazine and your device,  and let us know what you’ll make.

The post HackSpace magazine 4: the wearables issue appeared first on Raspberry Pi.

[$] DIY biology

Post Syndicated from jake original https://lwn.net/Articles/747187/rss

A scientist with a rather unusual name, Meow-Ludo Meow-Meow, gave a talk at
linux.conf.au 2018
about the current trends in “do it yourself” (DIY) biology or
“biohacking”. He is perhaps most famous for being
prosecuted for implanting an Opal card RFID chip
into his hand; the
Opal card is used for public transportation fares in Sydney. He gave more
details about his implant as well as describing some other biohacking
projects in an engaging presentation.

[$] A report from the Enigma conference

Post Syndicated from jake original https://lwn.net/Articles/747005/rss

The 2018 USENIX
Enigma conference
was held for the third time in January. Among
many interesting talks, three presentations dealing with human security
behaviors stood out. This article covers the key messages of these talks,
namely the finding that humans are social in their security
behaviors: their decision to adopt a good security practice is hardly ever
an isolated decision.

Subscribers can read on for the report by guest author Christian
Folini.

[$] Authentication and authorization in Samba 4

Post Syndicated from jake original https://lwn.net/Articles/747122/rss

Volker Lendecke is one of the first contributors to Samba,
having submitted his first patches in 1994. In addition to developing
other important file-sharing tools, he’s heavily involved in development of
the winbind service, which is implemented in winbindd. Although the core Active Directory (AD) domain controller
(DC) code was written by his colleague Stefan Metzmacher, winbind is a
crucial component of Samba’s AD functionality.
In his information-packed talk at FOSDEM
2018
, Lendecke
said he aimed to give a high-level
overview of what AD and Samba authentication is, and in particular the
communication pathways and trust relationships between the parts of
Samba that authenticate a Samba user in an AD environment.

[$] Two FOSDEM talks on Samba 4

Post Syndicated from jake original https://lwn.net/Articles/747098/rss

Much as some of us would love never to have to deal with Windows,
it exists. It wants to authenticate its users and share
resources like files and printers over the network. Although many
enterprises use Microsoft tools to do this, there is a free alternative,
in the form of Samba. While Samba 3 has been happily providing
authentication along with file and print sharing to Windows clients for
many years,
the Microsoft world has been slowly moving toward Active Directory (AD).
Meanwhile, Samba 4, which adds a free reimplementation of AD on Linux, has
been increasingly ready for deployment. Three short talks at FOSDEM 2018
provided three different views of Samba 4, also known as Samba-AD,
and left behind a pretty clear picture that Samba 4 is truly
ready for use.

Subscribers can read on for a report from guest author Tom Yates on the first two of those talks; stay tuned for another on the third soon.

Early Challenges: Making Critical Hires

Post Syndicated from Gleb Budman original https://www.backblaze.com/blog/early-challenges-making-critical-hires/

row of potential employee hires sitting waiting for an interview

In 2009, Google disclosed that they had 400 recruiters on staff working to hire nearly 10,000 people. Someday, that might be your challenge, but most companies in their early days are looking to hire a handful of people — the right people — each year. Assuming you are closer to startup stage than Google stage, let’s look at who you need to hire, when to hire them, where to find them (and how to help them find you), and how to get them to join your company.

Who Should Be Your First Hires

In later stage companies, the roles in the company have been well fleshed out, don’t change often, and each role can be segmented to focus on a specific area. A large company may have an entire department focused on just cubicle layout; at a smaller company you may not have a single person whose actual job encompasses all of facilities. At Backblaze, our CTO has a passion and knack for facilities and mostly led that charge. Also, the needs of a smaller company are quick to change. One of our first hires was a QA person, Sean, who ended up being 100% focused on data center infrastructure. In the early stage, things can shift quite a bit and you need people that are broadly capable, flexible, and most of all willing to pitch in where needed.

That said, there are times you may need an expert. At a previous company we hired Jon, a PhD in Bayesian statistics, because we needed algorithmic analysis for spam fighting. However, even that person was not only able and willing to do the math, but also code, and to not only focus on Bayesian statistics but explore a plethora of spam fighting options.

When To Hire

If you’ve raised a lot of cash and are willing to burn it with mistakes, you can guess at all the roles you might need and start hiring for them. No judgement: that’s a reasonable strategy if you’re cash-rich and time-poor.

If your cash is limited, try to see what you and your team are already doing and then hire people to take those jobs. It may sound counterintuitive, but if you’re already doing it presumably it needs to be done, you have a good sense of the type of skills required to do it, and you can bring someone on-board and get them up to speed quickly. That then frees you up to focus on tasks that can’t be done by someone else. At Backblaze, I ran marketing internally for years before hiring a VP of Marketing, making it easier for me to know what we needed. Once I was hiring, my primary goal was to find someone I could trust to take that role completely off of me so I could focus solely on my CEO duties

Where To Find the Right People

Finding great people is always difficult, particularly when the skillsets you’re looking for are highly in-demand by larger companies with lots of cash and cachet. You, however, have one massive advantage: you need to hire 5 people, not 5,000.

People You Worked With

The absolutely best people to hire are ones you’ve worked with before that you already know are good in a work situation. Consider your last job, the one before, and the one before that. A significant number of the people we recruited at Backblaze came from our previous startup MailFrontier. We knew what they could do and how they would fit into the culture, and they knew us and thus could quickly meld into the environment. If you didn’t have a previous job, consider people you went to school with or perhaps individuals with whom you’ve done projects previously.

People You Know

Hiring friends, family, and others can be risky, but should be considered. Sometimes a friend can be a “great buddy,” but is not able to do the job or isn’t a good fit for the organization. Having to let go of someone who is a friend or family member can be rough. Have the conversation up front with them about that possibility, so you have the ability to stay friends if the position doesn’t work out. Having said that, if you get along with someone as a friend, that’s one critical component of succeeding together at work. At Backblaze we’ve hired a number of people successfully that were friends of someone in the organization.

Friends Of People You Know

Your network is likely larger than you imagine. Your employees, investors, advisors, spouses, friends, and other folks all know people who might be a great fit for you. Make sure they know the roles you’re hiring for and ask them if they know anyone that would fit. Search LinkedIn for the titles you’re looking for and see who comes up; if they’re a 2nd degree connection, ask your connection for an introduction.

People You Know About

Sometimes the person you want isn’t someone anyone knows, but you may have read something they wrote, used a product they’ve built, or seen a video of a presentation they gave. Reach out. You may get a great hire: worst case, you’ll let them know they were appreciated, and make them aware of your organization.

Other Places to Find People

There are a million other places to find people, including job sites, community groups, Facebook/Twitter, GitHub, and more. Consider where the people you’re looking for are likely to congregate online and in person.

A Comment on Diversity

Hiring “People You Know” can often result in “Hiring People Like You” with the same workplace experiences, culture, background, and perceptions. Some studies have shown [1, 2, 3, 4] that homogeneous groups deliver faster, while heterogeneous groups are more creative. Also, “Hiring People Like You” often propagates the lack of women and minorities in tech and leadership positions in general. When looking for people you know, keep an eye to not discount people you know who don’t have the same cultural background as you.

Helping People To Find You

Reaching out proactively to people is the most direct way to find someone, but you want potential hires coming to you as well. To do this, they have to a) be aware of you, b) know you have a role they’re interested in, and c) think they would want to work there. Let’s tackle a) and b) first below.

Your Blog

I started writing our blog before we launched the product and talked about anything I found interesting related to our space. For several years now our team has owned the content on the blog and in 2017 over 1.5 million people read it. Each time we have a position open it’s published to the blog. If someone finds reading about backup and storage interesting, perhaps they’d want to dig in deeper from the inside. Many of the people we’ve recruited have mentioned reading the blog as either how they found us or as a factor in why they wanted to work here.
[BTW, this is Gleb’s 200th post on Backblaze’s blog. The first was in 2008. — Editor]

Your Email List

In addition to the emails our blog subscribers receive, we send regular emails to our customers, partners, and prospects. These are largely focused on content we think is directly useful or interesting for them. However, once every few months we include a small mention that we’re hiring, and the positions we’re looking for. Often a small blurb is all you need to capture people’s imaginations whether they might find the jobs interesting or can think of someone that might fit the bill.

Your Social Involvement

Whether it’s Twitter or Facebook, Hacker News or Slashdot, your potential hires are engaging in various communities. Being socially involved helps make people aware of you, reminds them of you when they’re considering a job, and paints a picture of what working with you and your company would be like. Adam was in a Reddit thread where we were discussing our Storage Pods, and that interaction was ultimately part of the reason he left Apple to come to Backblaze.

Convincing People To Join

Once you’ve found someone or they’ve found you, how do you convince them to join? They may be currently employed, have other offers, or have to relocate. Again, while the biggest companies have a number of advantages, you might have more unique advantages than you realize.

Why Should They Join You

Here are a set of items that you may be able to offer which larger organizations might not:

Role: Consider the strengths of the role. Perhaps it will have broader scope? More visibility at the executive level? No micromanagement? Ability to take risks? Option to create their own role?

Compensation: In addition to salary, will their options potentially be worth more since they’re getting in early? Can they trade-off salary for more options? Do they get option refreshes?

Benefits: In addition to healthcare, food, and 401(k) plans, are there unique benefits of your company? One company I knew took the entire team for a one-month working retreat abroad each year.

Location: Most people prefer to work close to home. If you’re located outside of the San Francisco Bay Area, you might be at a disadvantage for not being in the heart of tech. But if you find employees close to you you’ve got a huge advantage. Sometimes it’s micro; even in the Bay Area the difference of 5 miles can save 20 minutes each way every day. We located the Backblaze headquarters in San Mateo, a middle-ground that made it accessible to those coming from San Jose and San Francisco. We also chose a downtown location near a train, restaurants, and cafes: all to make it easier and more pleasant. Also, are you flexible in letting your employees work remotely? Our systems administrator Elliott is about to embark on a long-term cross-country journey working from an RV.

Environment: Open office, cubicle, cafe, work-from-home? Loud/quiet? Social or focused? 24×7 or work-life balance? Different environments appeal to different people.

Team: Who will they be working with? A company with 100,000 people might have 100 brilliant ones you’d want to work with, but ultimately we work with our core team. Who will your prospective hires be working with?

Market: Some people are passionate about gaming, others biotech, still others food. The market you’re targeting will get different people excited.

Product: Have an amazing product people love? Highlight that. If you’re lucky, your potential hire is already a fan.

Mission: Curing cancer, making people happy, and other company missions inspire people to strive to be part of the journey. Our mission is to make storing data astonishingly easy and low-cost. If you care about data, information, knowledge, and progress, our mission helps drive all of them.

Culture: I left this for last, but believe it’s the most important. What is the culture of your company? Finding people who want to work in the culture of your organization is critical. If they like the culture, they’ll fit and continue it. We’ve worked hard to build a culture that’s collaborative, friendly, supportive, and open; one in which people like coming to work. For example, the five founders started with (and still have) the same compensation and equity. That started a culture of “we’re all in this together.” Build a culture that will attract the people you want, and convey what the culture is.

Writing The Job Description

Most job descriptions focus on the all the requirements the candidate must meet. While important to communicate, the job description should first sell the job. Why would the appropriate candidate want the job? Then share some of the requirements you think are critical. Remember that people read not just what you say but how you say it. Try to write in a way that conveys what it is like to actually be at the company. Ahin, our VP of Marketing, said the job description itself was one of the things that attracted him to the company.

Orchestrating Interviews

Much can be said about interviewing well. I’m just going to say this: make sure that everyone who is interviewing knows that their job is not only to evaluate the candidate, but give them a sense of the culture, and sell them on the company. At Backblaze, we often have one person interview core prospects solely for company/culture fit.

Onboarding

Hiring success shouldn’t be defined by finding and hiring the right person, but instead by the right person being successful and happy within the organization. Ensure someone (usually their manager) provides them guidance on what they should be concentrating on doing during their first day, first week, and thereafter. Giving new employees opportunities and guidance so that they can achieve early wins and feel socially integrated into the company does wonders for bringing people on board smoothly

In Closing

Our Director of Production Systems, Chris, said to me the other day that he looks for companies where he can work on “interesting problems with nice people.” I’m hoping you’ll find your own version of that and find this post useful in looking for your early and critical hires.

Of course, I’d be remiss if I didn’t say, if you know of anyone looking for a place with “interesting problems with nice people,” Backblaze is hiring. 😉

The post Early Challenges: Making Critical Hires appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

How I built a data warehouse using Amazon Redshift and AWS services in record time

Post Syndicated from Stephen Borg original https://aws.amazon.com/blogs/big-data/how-i-built-a-data-warehouse-using-amazon-redshift-and-aws-services-in-record-time/

This is a customer post by Stephen Borg, the Head of Big Data and BI at Cerberus Technologies.

Cerberus Technologies, in their own words: Cerberus is a company founded in 2017 by a team of visionary iGaming veterans. Our mission is simple – to offer the best tech solutions through a data-driven and a customer-first approach, delivering innovative solutions that go against traditional forms of working and process. This mission is based on the solid foundations of reliability, flexibility and security, and we intend to fundamentally change the way iGaming and other industries interact with technology.

Over the years, I have developed and created a number of data warehouses from scratch. Recently, I built a data warehouse for the iGaming industry single-handedly. To do it, I used the power and flexibility of Amazon Redshift and the wider AWS data management ecosystem. In this post, I explain how I was able to build a robust and scalable data warehouse without the large team of experts typically needed.

In two of my recent projects, I ran into challenges when scaling our data warehouse using on-premises infrastructure. Data was growing at many tens of gigabytes per day, and query performance was suffering. Scaling required major capital investment for hardware and software licenses, and also significant operational costs for maintenance and technical staff to keep it running and performing well. Unfortunately, I couldn’t get the resources needed to scale the infrastructure with data growth, and these projects were abandoned. Thanks to cloud data warehousing, the bottleneck of infrastructure resources, capital expense, and operational costs have been significantly reduced or have totally gone away. There is no more excuse for allowing obstacles of the past to delay delivering timely insights to decision makers, no matter how much data you have.

With Amazon Redshift and AWS, I delivered a cloud data warehouse to the business very quickly, and with a small team: me. I didn’t have to order hardware or software, and I no longer needed to install, configure, tune, or keep up with patches and version updates. Instead, I easily set up a robust data processing pipeline and we were quickly ingesting and analyzing data. Now, my data warehouse team can be extremely lean, and focus more time on bringing in new data and delivering insights. In this post, I show you the AWS services and the architecture that I used.

Handling data feeds

I have several different data sources that provide everything needed to run the business. The data includes activity from our iGaming platform, social media posts, clickstream data, marketing and campaign performance, and customer support engagements.

To handle the diversity of data feeds, I developed abstract integration applications using Docker that run on Amazon EC2 Container Service (Amazon ECS) and feed data to Amazon Kinesis Data Streams. These data streams can be used for real time analytics. In my system, each record in Kinesis is preprocessed by an AWS Lambda function to cleanse and aggregate information. My system then routes it to be stored where I need on Amazon S3 by Amazon Kinesis Data Firehose. Suppose that you used an on-premises architecture to accomplish the same task. A team of data engineers would be required to maintain and monitor a Kafka cluster, develop applications to stream data, and maintain a Hadoop cluster and the infrastructure underneath it for data storage. With my stream processing architecture, there are no servers to manage, no disk drives to replace, and no service monitoring to write.

Setting up a Kinesis stream can be done with a few clicks, and the same for Kinesis Firehose. Firehose can be configured to automatically consume data from a Kinesis Data Stream, and then write compressed data every N minutes to Amazon S3. When I want to process a Kinesis data stream, it’s very easy to set up a Lambda function to be executed on each message received. I can just set a trigger from the AWS Lambda Management Console, as shown following.

I also monitor the duration of function execution using Amazon CloudWatch and AWS X-Ray.

Regardless of the format I receive the data from our partners, I can send it to Kinesis as JSON data using my own formatters. After Firehose writes this to Amazon S3, I have everything in nearly the same structure I received but compressed, encrypted, and optimized for reading.

This data is automatically crawled by AWS Glue and placed into the AWS Glue Data Catalog. This means that I can immediately query the data directly on S3 using Amazon Athena or through Amazon Redshift Spectrum. Previously, I used Amazon EMR and an Amazon RDS–based metastore in Apache Hive for catalog management. Now I can avoid the complexity of maintaining Hive Metastore catalogs. Glue takes care of high availability and the operations side so that I know that end users can always be productive.

Working with Amazon Athena and Amazon Redshift for analysis

I found Amazon Athena extremely useful out of the box for ad hoc analysis. Our engineers (me) use Athena to understand new datasets that we receive and to understand what transformations will be needed for long-term query efficiency.

For our data analysts and data scientists, we’ve selected Amazon Redshift. Amazon Redshift has proven to be the right tool for us over and over again. It easily processes 20+ million transactions per day, regardless of the footprint of the tables and the type of analytics required by the business. Latency is low and query performance expectations have been more than met. We use Redshift Spectrum for long-term data retention, which enables me to extend the analytic power of Amazon Redshift beyond local data to anything stored in S3, and without requiring me to load any data. Redshift Spectrum gives me the freedom to store data where I want, in the format I want, and have it available for processing when I need it.

To load data directly into Amazon Redshift, I use AWS Data Pipeline to orchestrate data workflows. I create Amazon EMR clusters on an intra-day basis, which I can easily adjust to run more or less frequently as needed throughout the day. EMR clusters are used together with Amazon RDS, Apache Spark 2.0, and S3 storage. The data pipeline application loads ETL configurations from Spring RESTful services hosted on AWS Elastic Beanstalk. The application then loads data from S3 into memory, aggregates and cleans the data, and then writes the final version of the data to Amazon Redshift. This data is then ready to use for analysis. Spark on EMR also helps with recommendations and personalization use cases for various business users, and I find this easy to set up and deliver what users want. Finally, business users use Amazon QuickSight for self-service BI to slice, dice, and visualize the data depending on their requirements.

Each AWS service in this architecture plays its part in saving precious time that’s crucial for delivery and getting different departments in the business on board. I found the services easy to set up and use, and all have proven to be highly reliable for our use as our production environments. When the architecture was in place, scaling out was either completely handled by the service, or a matter of a simple API call, and crucially doesn’t require me to change one line of code. Increasing shards for Kinesis can be done in a minute by editing a stream. Increasing capacity for Lambda functions can be accomplished by editing the megabytes allocated for processing, and concurrency is handled automatically. EMR cluster capacity can easily be increased by changing the master and slave node types in Data Pipeline, or by using Auto Scaling. Lastly, RDS and Amazon Redshift can be easily upgraded without any major tasks to be performed by our team (again, me).

In the end, using AWS services including Kinesis, Lambda, Data Pipeline, and Amazon Redshift allows me to keep my team lean and highly productive. I eliminated the cost and delays of capital infrastructure, as well as the late night and weekend calls for support. I can now give maximum value to the business while keeping operational costs down. My team pushed out an agile and highly responsive data warehouse solution in record time and we can handle changing business requirements rapidly, and quickly adapt to new data and new user requests.


Additional Reading

If you found this post useful, be sure to check out Deploy a Data Warehouse Quickly with Amazon Redshift, Amazon RDS for PostgreSQL and Tableau Server and Top 8 Best Practices for High-Performance ETL Processing Using Amazon Redshift.


About the Author

Stephen Borg is the Head of Big Data and BI at Cerberus Technologies. He has a background in platform software engineering, and first became involved in data warehousing using the typical RDBMS, SQL, ETL, and BI tools. He quickly became passionate about providing insight to help others optimize the business and add personalization to products. He is now the Head of Big Data and BI at Cerberus Technologies.

 

 

 

Integration With Zapier

Post Syndicated from Bozho original https://techblog.bozho.net/integration-with-zapier/

Integration is boring. And also inevitable. But I won’t be writing about enterprise integration patterns. Instead, I’ll explain how to create an app for integration with Zapier.

What is Zapier? It is a service that allows you tо connect two (or more) otherwise unconnected services via their APIs (or protocols). You can do stuff like “Create a Trello task from an Evernote note”, “publish new RSS items to Facebook”, “append new emails to a spreadsheet”, “post approaching calendar meeting to Slack”, “Save big email attachments to Dropbox”, “tweet all instagrams above a certain likes threshold”, and so on. In fact, it looks to cover mostly the same usecases as another famous service that I really like – IFTTT (if this then that), with my favourite use-case “Get a notification when the international space station passes over your house”. And all of those interactions can be configured via a UI.

Now that’s good for end users but what does it have to do with software development and integration? Zapier (unlike IFTTT, unfortunately), allows custom 3rd party services to be included. So if you have a service of your own, you can create an “app” and allow users to integrate your service with all the other 3rd party services. IFTTT offers a way to invoke web endpoints (including RESTful services), but it doesn’t allow setting headers, so that makes it quite limited for actual APIs.

In this post I’ll briefly explain how to write a custom Zapier app and then will discuss where services like Zapier stand from an architecture perspective.

The thing that I needed it for – to be able to integrate LogSentinel with any of the third parties available through Zapier, i.e. to store audit logs for events that happen in all those 3rd party systems. So how do I do that? There’s a tutorial that makes it look simple. And it is, with a few catches.

First, there are two tutorials – one in GitHub and one on Zapier’s website. And they differ slightly, which becomes tricky in some cases.

I initially followed the GitHub tutorial and had my build fail. It claimed the zapier platform dependency is missing. After I compared it with the example apps, I found out there’s a caret in front of the zapier platform dependency. Removing it just yielded another error – that my node version should be exactly 6.10.2. Why?

The Zapier CLI requires you have exactly version 6.10.2 installed. You’ll see errors and will be unable to proceed otherwise.

It appears that they are using AWS Lambda which is stuck on Node 6.10.2 (actually – it’s 6.10.3 when you check). The current major release is 8, so minus points for choosing … javascript for a command-line tool and for building sandboxed apps. Maybe other decisions had their downsides as well, I won’t be speculating. Maybe it’s just my dislike for dynamic languages.

So, after you make sure you have the correct old version on node, you call zapier init and make sure there are no carets, npm install and then zapier test. So far so good, you have a dummy app. Now how do you make a RESTful call to your service?

Zapier splits the programmable entities in two – “triggers” and “creates”. A trigger is the event that triggers the whole app, an a “create” is what happens as a result. In my case, my app doesn’t publish any triggers, it only accepts input, so I won’t be mentioning triggers (though they seem easy). You configure all of the elements in index.js (e.g. this one):

const log = require('./creates/log');
....
creates: {
    [log.key]: log,
}

The log.js file itself is the interesting bit – there you specify all the parameters that should be passed to your API call, as well as making the API call itself:

const log = (z, bundle) => {
  const responsePromise = z.request({
    method: 'POST',
    url: `https://api.logsentinel.com/api/log/${bundle.inputData.actorId}/${bundle.inputData.action}`,
    body: bundle.inputData.details,
	headers: {
		'Accept': 'application/json'
	}
  });
  return responsePromise
    .then(response => JSON.parse(response.content));
};

module.exports = {
  key: 'log-entry',
  noun: 'Log entry',

  display: {
    label: 'Log',
    description: 'Log an audit trail entry'
  },

  operation: {
    inputFields: [
      {key: 'actorId', label:'ActorID', required: true},
      {key: 'action', label:'Action', required: true},
      {key: 'details', label:'Details', required: false}
    ],
    perform: log
  }
};

You can pass the input parameters to your API call, and it’s as simple as that. The user can then specify which parameters from the source (“trigger”) should be mapped to each of your parameters. In an example zap, I used an email trigger and passed the sender as actorId, the sibject as “action” and the body of the email as details.

There’s one more thing – authentication. Authentication can be done in many ways. Some services offer OAuth, others – HTTP Basic or other custom forms of authentication. There is a section in the documentation about all the options. In my case it was (almost) an HTTP Basic auth. My initial thought was to just supply the credentials as parameters (which you just hardcode rather than map to trigger parameters). That may work, but it’s not the canonical way. You should configure “authentication”, as it triggers a friendly UI for the user.

You include authentication.js (which has the fields your authentication requires) and then pre-process requests by adding a header (in index.js):

const authentication = require('./authentication');

const includeAuthHeaders = (request, z, bundle) => {
  if (bundle.authData.organizationId) {
	request.headers = request.headers || {};
	request.headers['Application-Id'] = bundle.authData.applicationId
	const basicHash = Buffer(`${bundle.authData.organizationId}:${bundle.authData.apiSecret}`).toString('base64');
	request.headers['Authorization'] = `Basic ${basicHash}`;
  }
  return request;
};

const App = {
  // This is just shorthand to reference the installed dependencies you have. Zapier will
  // need to know these before we can upload
  version: require('./package.json').version,
  platformVersion: require('zapier-platform-core').version,
  authentication: authentication,
  
  // beforeRequest & afterResponse are optional hooks into the provided HTTP client
  beforeRequest: [
	includeAuthHeaders
  ]
...
}

And then you zapier push your app and you can test it. It doesn’t automatically go live, as you have to invite people to try it and use it first, but in many cases that’s sufficient (i.e. using Zapier when doing integration with a particular client)

Can Zapier can be used for any integration problem? Unlikely – it’s pretty limited and simple, but that’s also a strength. You can, in half a day, make your service integrate with thousands of others for the most typical use-cases. And not that although it’s meant for integrating public services rather than for enterprise integration (where you make multiple internal systems talk to each other), as an increasing number of systems rely on 3rd party services, it could find home in an enterprise system, replacing some functions of an ESB.

Effectively, such services (Zapier, IFTTT) are “Simple ESB-as-a-service”. You go to a UI, fill a bunch of fields, and you get systems talking to each other without touching the systems themselves. I’m not a big fan of ESBs, mostly because they become harder to support with time. But minimalist, external ones might be applicable in certain situations. And while such services are primarily aimed at end users, they could be a useful bit in an enterprise architecture that relies on 3rd party services.

Whether it could process the required load, whether an organization is willing to let its data flow through a 3rd party provider (which may store the intermediate parameters), is a question that should be answered in a case by cases basis. I wouldn’t recommend it as a general solution, but it’s certainly an option to consider.

The post Integration With Zapier appeared first on Bozho's tech blog.

Linux Plumbers Networking Track CFP

Post Syndicated from jake original https://lwn.net/Articles/747021/rss

Linux networking maintainer David Miller has put out a call for proposals for a two-day networking track at this year’s Linux Plumbers Conference (LPC). “We are seeking talks of 40 minutes in length, accompanied by papers
of 2 to 10 pages in length.
” The deadline for proposals is July 11. LPC will be held November 13-15 in Vancouver and the networking track will be held the first two days.

Calling Squid "Calamari" Makes It More Appetizing

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/02/calling_squid_c.html

Research shows that what a food is called affects how we think about it.

Research paper.

As usual, you can also use this squid post to talk about the security stories in the news that I haven’t covered.

Read my blog posting guidelines here.

Big Birthday Weekend 2018: find a Jam near you!

Post Syndicated from Ben Nuttall original https://www.raspberrypi.org/blog/big-birthday-weekend-2018-find-a-jam-near-you/

We’re just over three weeks away from the Raspberry Jam Big Birthday Weekend 2018, our community celebration of Raspberry Pi’s sixth birthday. Instead of an event in Cambridge, as we’ve held in the past, we’re coordinating Raspberry Jam events to take place around the world on 3–4 March, so that as many people as possible can join in. Well over 100 Jams have been confirmed so far.

Raspberry Pi Big Birthday Weekend Jam

Find a Jam near you

There are Jams planned in Argentina, Australia, Bolivia, Brazil, Bulgaria, Cameroon, Canada, Colombia, Dominican Republic, France, Germany, Greece, Hungary, India, Iran, Ireland, Italy, Japan, Kenya, Malaysia, Malta, Mexico, Netherlands, Norway, Papua New Guinea, Peru, Philippines, Poland, South Africa, Spain, Taiwan, Turkey, United Kingdom, United States, and Zimbabwe.

Take a look at the events map and the full list (including those who haven’t added their event to the map quite yet).

Raspberry Jam Big Birthday Weekend 2018 event map

We will have Raspberry Jams in 35 countries across six continents

Birthday kits

We had some special swag made especially for the birthday, including these T-shirts, which we’ve sent to Jam organisers:

Raspberry Jam Big Birthday Weekend 2018 T-shirt

There is also a poster with a list of participating Jams, which you can download:

Raspberry Jam Big Birthday Weekend 2018 list

Raspberry Jam photo booth

I created a Raspberry Jam photo booth that overlays photos with the Big Birthday Weekend logo and then tweets the picture from your Jam’s account — you’ll be seeing plenty of those if you follow the #PiParty hashtag on 3–4 March.

Check out the project on GitHub, and feel free to set up your own booth, or modify it to your own requirements. We’ve included text annotations in several languages, and more contributions are very welcome.

There’s still time…

If you can’t find a Jam near you, there’s still time to organise one for the Big Birthday Weekend. All you need to do is find a venue — a room in a school or library will do — and think about what you’d like to do at the event. Some Jams have Raspberry Pis set up for workshops and practical activities, some arrange tech talks, some put on show-and-tell — it’s up to you. To help you along, there’s the Raspberry Jam Guidebook full of advice and tips from Jam organisers.

Raspberry Pi on Twitter

The packed. And they packed. And they packed some more. Who’s expecting one of these #rjam kits for the Raspberry Jam Big Birthday Weekend?

Download the Raspberry Jam branding pack, and the special birthday branding pack, where you’ll find logos, graphical assets, flyer templates, worksheets, and more. When you’re ready to announce your event, create a webpage for it — you can use a site like Eventbrite or Meetup — and submit your Jam to us so it will appear on the Jam map!

We are six

We’re really looking forward to celebrating our birthday with thousands of people around the world. Over 48 hours, people of all ages will come together at more than 100 events to learn, share ideas, meet people, and make things during our Big Birthday Weekend.

Raspberry Jam Manchester
Raspberry Jam Manchester
Raspberry Jam Manchester

Since we released the first Raspberry Pi in 2012, we’ve sold 17 million of them. We’re also reaching almost 200000 children in 130 countries around the world through Code Club and CoderDojo, we’ve trained over 1500 Raspberry Pi Certified Educators, and we’ve sent code written by more than 6800 children into space. Our magazines are read by a quarter of a million people, and millions more use our free online learning resources. There’s plenty to celebrate and even more still to do: we really hope you’ll join us from a Jam near you on 3–4 March.

The post Big Birthday Weekend 2018: find a Jam near you! appeared first on Raspberry Pi.

Migrating Your Amazon ECS Containers to AWS Fargate

Post Syndicated from Tiffany Jernigan original https://aws.amazon.com/blogs/compute/migrating-your-amazon-ecs-containers-to-aws-fargate/

AWS Fargate is a new technology that works with Amazon Elastic Container Service (ECS) to run containers without having to manage servers or clusters. What does this mean? With Fargate, you no longer need to provision or manage a single virtual machine; you can just create tasks and run them directly!

Fargate uses the same API actions as ECS, so you can use the ECS console, the AWS CLI, or the ECS CLI. I recommend running through the first-run experience for Fargate even if you’re familiar with ECS. It creates all of the one-time setup requirements, such as the necessary IAM roles. If you’re using a CLI, make sure to upgrade to the latest version

In this blog, you will see how to migrate ECS containers from running on Amazon EC2 to Fargate.

Getting started

Note: Anything with code blocks is a change in the task definition file. Screen captures are from the console. Additionally, Fargate is currently available in the us-east-1 (N. Virginia) region.

Launch type

When you create tasks (grouping of containers) and clusters (grouping of tasks), you now have two launch type options: EC2 and Fargate. The default launch type, EC2, is ECS as you knew it before the announcement of Fargate. You need to specify Fargate as the launch type when running a Fargate task.

Even though Fargate abstracts away virtual machines, tasks still must be launched into a cluster. With Fargate, clusters are a logical infrastructure and permissions boundary that allow you to isolate and manage groups of tasks. ECS also supports heterogeneous clusters that are made up of tasks running on both EC2 and Fargate launch types.

The optional, new requiresCompatibilities parameter with FARGATE in the field ensures that your task definition only passes validation if you include Fargate-compatible parameters. Tasks can be flagged as compatible with EC2, Fargate, or both.

"requiresCompatibilities": [
    "FARGATE"
]

Networking

"networkMode": "awsvpc"

In November, we announced the addition of task networking with the network mode awsvpc. By default, ECS uses the bridge network mode. Fargate requires using the awsvpc network mode.

In bridge mode, all of your tasks running on the same instance share the instance’s elastic network interface, which is a virtual network interface, IP address, and security groups.

The awsvpc mode provides this networking support to your tasks natively. You now get the same VPC networking and security controls at the task level that were previously only available with EC2 instances. Each task gets its own elastic networking interface and IP address so that multiple applications or copies of a single application can run on the same port number without any conflicts.

The awsvpc mode also provides a separation of responsibility for tasks. You can get complete control of task placement within your own VPCs, subnets, and the security policies associated with them, even though the underlying infrastructure is managed by Fargate. Also, you can assign different security groups to each task, which gives you more fine-grained security. You can give an application only the permissions it needs.

"portMappings": [
    {
        "containerPort": "3000"
    }
 ]

What else has to change? First, you only specify a containerPort value, not a hostPort value, as there is no host to manage. Your container port is the port that you access on your elastic network interface IP address. Therefore, your container ports in a single task definition file need to be unique.

"environment": [
    {
        "name": "WORDPRESS_DB_HOST",
        "value": "127.0.0.1:3306"
    }
 ]

Additionally, links are not allowed as they are a property of the “bridge” network mode (and are now a legacy feature of Docker). Instead, containers share a network namespace and communicate with each other over the localhost interface. They can be referenced using the following:

localhost/127.0.0.1:<some_port_number>

CPU and memory

"memory": "1024",
 "cpu": "256"

"memory": "1gb",
 "cpu": ".25vcpu"

When launching a task with the EC2 launch type, task performance is influenced by the instance types that you select for your cluster combined with your task definition. If you pick larger instances, your applications make use of the extra resources if there is no contention.

In Fargate, you needed a way to get additional resource information so we created task-level resources. Task-level resources define the maximum amount of memory and cpu that your task can consume.

  • memory can be defined in MB with just the number, or in GB, for example, “1024” or “1gb”.
  • cpu can be defined as the number or in vCPUs, for example, “256” or “.25vcpu”.
    • vCPUs are virtual CPUs. You can look at the memory and vCPUs for instance types to get an idea of what you may have used before.

The memory and CPU options available with Fargate are:

CPU Memory
256 (.25 vCPU) 0.5GB, 1GB, 2GB
512 (.5 vCPU) 1GB, 2GB, 3GB, 4GB
1024 (1 vCPU) 2GB, 3GB, 4GB, 5GB, 6GB, 7GB, 8GB
2048 (2 vCPU) Between 4GB and 16GB in 1GB increments
4096 (4 vCPU) Between 8GB and 30GB in 1GB increments

IAM roles

Because Fargate uses awsvpc mode, you need an Amazon ECS service-linked IAM role named AWSServiceRoleForECS. It provides Fargate with the needed permissions, such as the permission to attach an elastic network interface to your task. After you create your service-linked IAM role, you can delete the remaining roles in your services.

"executionRoleArn": "arn:aws:iam::<your_account_id>:role/ecsTaskExecutionRole"

With the EC2 launch type, an instance role gives the agent the ability to pull, publish, talk to ECS, and so on. With Fargate, the task execution IAM role is only needed if you’re pulling from Amazon ECR or publishing data to Amazon CloudWatch Logs.

The Fargate first-run experience tutorial in the console automatically creates these roles for you.

Volumes

Fargate currently supports non-persistent, empty data volumes for containers. When you define your container, you no longer use the host field and only specify a name.

Load balancers

For awsvpc mode, and therefore for Fargate, use the IP target type instead of the instance target type. You define this in the Amazon EC2 service when creating a load balancer.

If you’re using a Classic Load Balancer, change it to an Application Load Balancer or a Network Load Balancer.

Tip: If you are using an Application Load Balancer, make sure that your tasks are launched in the same VPC and Availability Zones as your load balancer.

Let’s migrate a task definition!

Here is an example NGINX task definition. This type of task definition is what you’re used to if you created one before Fargate was announced. It’s what you would run now with the EC2 launch type.

{
    "containerDefinitions": [
        {
            "name": "nginx",
            "image": "nginx",
            "memory": "512",
            "cpu": "100",
            "essential": true,
            "portMappings": [
                {
                    "hostPort": "80",
                    "containerPort": "80",
                    "protocol": "tcp"
                }
            ],
            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-group": "/ecs/",
                    "awslogs-region": "us-east-1",
                    "awslogs-stream-prefix": "ecs"
                }
            }
        }
    ],
    "family": "nginx-ec2"
}

OK, so now what do you need to do to change it to run with the Fargate launch type?

  • Add FARGATE for requiredCompatibilities (not required, but a good safety check for your task definition).
  • Use awsvpc as the network mode.
  • Just specify the containerPort (the hostPortvalue is the same).
  • Add a task executionRoleARN value to allow logging to CloudWatch.
  • Provide cpu and memory limits for the task.
{
    "requiresCompatibilities": [
        "FARGATE"
    ],
    "containerDefinitions": [
        {
            "name": "nginx",
            "image": "nginx",
            "memory": "512",
            "cpu": "100",
            "essential": true,
            "portMappings": [
                {
                    "containerPort": "80",
                    "protocol": "tcp"
                }
            ],
            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-group": "/ecs/",
                    "awslogs-region": "us-east-1",
                    "awslogs-stream-prefix": "ecs"
                }
            }
        }
    ],
    "networkMode": "awsvpc",
    "executionRoleArn": "arn:aws:iam::<your_account_id>:role/ecsTaskExecutionRole",
    "family": "nginx-fargate",
    "memory": "512",
    "cpu": "256"
}

Are there more examples?

Yep! Head to the AWS Samples GitHub repo. We have several sample task definitions you can try for both the EC2 and Fargate launch types. Contributions are very welcome too :).

 

tiffany jernigan
@tiffanyfayj

[$] A cyborg’s journey

Post Syndicated from jake original https://lwn.net/Articles/745942/rss

Karen Sandler has been giving conference talks about free software and open
medical devices
for the better part of a decade at this point. LWN briefly covered a 2010 LinuxCon talk and a 2012 linux.conf.au (LCA) talk; her talk at
LCA 2012 was her first full-length keynote, she said. In this year’s
edition, she
reviewed her history (including her love for LCA based in part on that 2012
visit)
and gave an update on the status of the source code for the device she
has implanted on her heart.

Build a Multi-Tenant Amazon EMR Cluster with Kerberos, Microsoft Active Directory Integration and EMRFS Authorization

Post Syndicated from Songzhi Liu original https://aws.amazon.com/blogs/big-data/build-a-multi-tenant-amazon-emr-cluster-with-kerberos-microsoft-active-directory-integration-and-emrfs-authorization/

One of the challenges faced by our customers—especially those in highly regulated industries—is balancing the need for security with flexibility. In this post, we cover how to enable multi-tenancy and increase security by using EMRFS (EMR File System) authorization, the Amazon S3 storage-level authorization on Amazon EMR.

Amazon EMR is an easy, fast, and scalable analytics platform enabling large-scale data processing. EMRFS authorization provides Amazon S3 storage-level authorization by configuring EMRFS with multiple IAM roles. With this functionality enabled, different users and groups can share the same cluster and assume their own IAM roles respectively.

Simply put, on Amazon EMR, we can now have an Amazon EC2 role per user assumed at run time instead of one general EC2 role at the cluster level. When the user is trying to access Amazon S3 resources, Amazon EMR evaluates against a predefined mappings list in EMRFS configurations and picks up the right role for the user.

In this post, we will discuss what EMRFS authorization is (Amazon S3 storage-level access control) and show how to configure the role mappings with detailed examples. You will then have the desired permissions in a multi-tenant environment. We also demo Amazon S3 access from HDFS command line, Apache Hive on Hue, and Apache Spark.

EMRFS authorization for Amazon S3

There are two prerequisites for using this feature:

  1. Users must be authenticated, because EMRFS needs to map the current user/group/prefix to a predefined user/group/prefix. There are several authentication options. In this post, we launch a Kerberos-enabled cluster that manages the Key Distribution Center (KDC) on the master node, and enable a one-way trust from the KDC to a Microsoft Active Directory domain.
  2. The application must support accessing Amazon S3 via Applications that have their own S3FileSystem APIs (for example, Presto) are not supported at this time.

EMRFS supports three types of mapping entries: user, group, and Amazon S3 prefix. Let’s use an example to show how this works.

Assume that you have the following three identities in your organization, and they are defined in the Active Directory:

To enable all these groups and users to share the EMR cluster, you need to define the following IAM roles:

In this case, you create a separate Amazon EC2 role that doesn’t give any permission to Amazon S3. Let’s call the role the base role (the EC2 role attached to the EMR cluster), which in this example is named EMR_EC2_RestrictedRole. Then, you define all the Amazon S3 permissions for each specific user or group in their own roles. The restricted role serves as the fallback role when the user doesn’t belong to any user/group, nor does the user try to access any listed Amazon S3 prefixes defined on the list.

Important: For all other roles, like emrfs_auth_group_role_data_eng, you need to add the base role (EMR_EC2_RestrictedRole) as the trusted entity so that it can assume other roles. See the following example:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "ec2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    },
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::511586466501:role/EMR_EC2_RestrictedRole"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

The following is an example policy for the admin user role (emrfs_auth_user_role_admin_user):

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "s3:*",
            "Resource": "*"
        }
    ]
}

We are assuming the admin user has access to all buckets in this example.

The following is an example policy for the data science group role (emrfs_auth_group_role_data_sci):

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Resource": [
                "arn:aws:s3:::emrfs-auth-data-science-bucket-demo/*",
                "arn:aws:s3:::emrfs-auth-data-science-bucket-demo"
            ],
            "Action": [
                "s3:*"
            ]
        }
    ]
}

This role grants all Amazon S3 permissions to the emrfs-auth-data-science-bucket-demo bucket and all the objects in it. Similarly, the policy for the role emrfs_auth_group_role_data_eng is shown below:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Resource": [
                "arn:aws:s3:::emrfs-auth-data-engineering-bucket-demo/*",
                "arn:aws:s3:::emrfs-auth-data-engineering-bucket-demo"
            ],
            "Action": [
                "s3:*"
            ]
        }
    ]
}

Example role mappings configuration

To configure EMRFS authorization, you use EMR security configuration. Here is the configuration we use in this post

Consider the following scenario.

First, the admin user admin1 tries to log in and run a command to access Amazon S3 data through EMRFS. The first role emrfs_auth_user_role_admin_user on the mapping list, which is a user role, is mapped and picked up. Then admin1 has access to the Amazon S3 locations that are defined in this role.

Then a user from the data engineer group (grp_data_engineering) tries to access a data bucket to run some jobs. When EMRFS sees that the user is a member of the grp_data_engineering group, the group role emrfs_auth_group_role_data_eng is assumed, and the user has proper access to Amazon S3 that is defined in the emrfs_auth_group_role_data_eng role.

Next, the third user comes, who is not an admin and doesn’t belong to any of the groups. After failing evaluation of the top three entries, EMRFS evaluates whether the user is trying to access a certain Amazon S3 prefix defined in the last mapping entry. This type of mapping entry is called the prefix type. If the user is trying to access s3://emrfs-auth-default-bucket-demo/, then the prefix mapping is in effect, and the prefix role emrfs_auth_prefix_role_default_s3_prefix is assumed.

If the user is not trying to access any of the Amazon S3 paths that are defined on the list—which means it failed the evaluation of all the entries—it only has the permissions defined in the EMR_EC2RestrictedRole. This role is assumed by the EC2 instances in the cluster.

In this process, all the mappings defined are evaluated in the defined order, and the first role that is mapped is assumed, and the rest of the list is skipped.

Setting up an EMR cluster and mapping Active Directory users and groups

Now that we know how EMRFS authorization role mapping works, the next thing we need to think about is how we can use this feature in an easy and manageable way.

Active Directory setup

Many customers manage their users and groups using Microsoft Active Directory or other tools like OpenLDAP. In this post, we create the Active Directory on an Amazon EC2 instance running Windows Server and create the users and groups we will be using in the example below. After setting up Active Directory, we use the Amazon EMR Kerberos auto-join capability to establish a one-way trust from the KDC running on the EMR master node to the Active Directory domain on the EC2 instance. You can use your own directory services as long as it talks to the LDAP (Lightweight Directory Access Protocol).

To create and join Active Directory to Amazon EMR, follow the steps in the blog post Use Kerberos Authentication to Integrate Amazon EMR with Microsoft Active Directory.

After configuring Active Directory, you can create all the users and groups using the Active Directory tools and add users to appropriate groups. In this example, we created users like admin1, dataeng1, datascientist1, grp_data_engineering, and grp_data_science, and then add the users to the right groups.

Join the EMR cluster to an Active Directory domain

For clusters with Kerberos, Amazon EMR now supports automated Active Directory domain joins. You can use the security configuration to configure the one-way trust from the KDC to the Active Directory domain. You also configure the EMRFS role mappings in the same security configuration.

The following is an example of the EMR security configuration with a trusted Active Directory domain EMRKRB.TEST.COM and the EMRFS role mappings as we discussed earlier:

The EMRFS role mapping configuration is shown in this example:

We will also provide an example AWS CLI command that you can run.

Launching the EMR cluster and running the tests

Now you have configured Kerberos and EMRFS authorization for Amazon S3.

Additionally, you need to configure Hue with Active Directory using the Amazon EMR configuration API in order to log in using the AD users created before. The following is an example of Hue AD configuration.

[
  {
    "Classification":"hue-ini",
    "Properties":{

    },
    "Configurations":[
      {
        "Classification":"desktop",
        "Properties":{

        },
        "Configurations":[
          {
            "Classification":"ldap",
            "Properties":{

            },
            "Configurations":[
              {
                "Classification":"ldap_servers",
                "Properties":{

                },
                "Configurations":[
                  {
                    "Classification":"AWS",
                    "Properties":{
                      "base_dn":"DC=emrkrb,DC=test,DC=com",
                      "ldap_url":"ldap://emrkrb.test.com",
                      "search_bind_authentication":"false",
                      "bind_dn":"CN=adjoiner,CN=users,DC=emrkrb,DC=test,DC=com",
                      "bind_password":"Abc123456",
                      "create_users_on_login":"true",
                      "nt_domain":"emrkrb.test.com"
                    },
                    "Configurations":[

                    ]
                  }
                ]
              }
            ]
          },
          {
            "Classification":"auth",
            "Properties":{
              "backend":"desktop.auth.backend.LdapBackend"
            },
            "Configurations":[

            ]
          }
        ]
      }
    ]
  }

Note: In the preceding configuration JSON file, change the values as required before pasting it into the software setting section in the Amazon EMR console.

Now let’s use this configuration and the security configuration you created before to launch the cluster.

In the Amazon EMR console, choose Create cluster. Then choose Go to advanced options. On the Step1: Software and Steps page, under Edit software settings (optional), paste the configuration in the box.

The rest of the setup is the same as an ordinary cluster setup, except in the Security Options section. In Step 4: Security, under Permissions, choose Custom, and then choose the RestrictedRole that you created before.

Choose the appropriate subnets (these should meet the base requirement in order for a successful Active Directory join—see the Amazon EMR Management Guide for more details), and choose the appropriate security groups to make sure it talks to the Active Directory. Choose a key so that you can log in and configure the cluster.

Most importantly, choose the security configuration that you created earlier to enable Kerberos and EMRFS authorization for Amazon S3.

You can use the following AWS CLI command to create a cluster.

aws emr create-cluster --name "TestEMRFSAuthorization" \ 
--release-label emr-5.10.0 \ --instance-type m3.xlarge \ 
--instance-count 3 \ 
--ec2-attributes InstanceProfile=EMR_EC2_DefaultRole,KeyName=MyEC2KeyPair \ --service-role EMR_DefaultRole \ 
--security-configuration MyKerberosConfig \ 
--configurations file://hue-config.json \
--applications Name=Hadoop Name=Hive Name=Hue Name=Spark \ 
--kerberos-attributes Realm=EC2.INTERNAL, \ KdcAdminPassword=<YourClusterKDCAdminPassword>, \ ADDomainJoinUser=<YourADUserLogonName>,ADDomainJoinPassword=<YourADUserPassword>, \ 
CrossRealmTrustPrincipalPassword=<MatchADTrustPwd>

Note: If you create the cluster using CLI, you need to save the JSON configuration for Hue into a file named hue-config.json and place it on the server where you run the CLI command.

After the cluster gets into the Waiting state, try to connect by using SSH into the cluster using the Active Directory user name and password.

ssh -l [email protected] <EMR IP or DNS name>

Quickly run two commands to show that the Active Directory join is successful:

  1. id [user name] shows the mapped AD users and groups in Linux.
  2. hdfs groups [user name] shows the mapped group in Hadoop.

Both should return the current Active Directory user and group information if the setup is correct.

Now, you can test the user mapping first. Log in with the admin1 user, and run a Hadoop list directory command:

hadoop fs -ls s3://emrfs-auth-data-science-bucket-demo/

Now switch to a user from the data engineer group.

Retry the previous command to access the admin’s bucket. It should throw an Amazon S3 Access Denied exception.

When you try listing the Amazon S3 bucket that a data engineer group member has accessed, it triggers the group mapping.

hadoop fs -ls s3://emrfs-auth-data-engineering-bucket-demo/

It successfully returns the listing results. Next we will test Apache Hive and then Apache Spark.

 

To run jobs successfully, you need to create a home directory for every user in HDFS for staging data under /user/<username>. Users can configure a step to create a home directory at cluster launch time for every user who has access to the cluster. In this example, you use Hue since Hue will create the home directory in HDFS for the user at the first login. Here Hue also needs to be integrated with the same Active Directory as explained in the example configuration described earlier.

First, log in to Hue as a data engineer user, and open a Hive Notebook in Hue. Then run a query to create a new table pointing to the data engineer bucket, s3://emrfs-auth-data-engineering-bucket-demo/table1_data_eng/.

You can see that the table was created successfully. Now try to create another table pointing to the data science group’s bucket, where the data engineer group doesn’t have access.

It failed and threw an Amazon S3 Access Denied error.

Now insert one line of data into the successfully create table.

Next, log out, switch to a data science group user, and create another table, test2_datasci_tb.

The creation is successful.

The last task is to test Spark (it requires the user directory, but Hue created one in the previous step).

Now let’s come back to the command line and run some Spark commands.

Login to the master node using the datascientist1 user:

Start the SparkSQL interactive shell by typing spark-sql, and run the show tables command. It should list the tables that you created using Hive.

As a data science group user, try select on both tables. You will find that you can only select the table defined in the location that your group has access to.

Conclusion

EMRFS authorization for Amazon S3 enables you to have multiple roles on the same cluster, providing flexibility to configure a shared cluster for different teams to achieve better efficiency. The Active Directory integration and group mapping make it much easier for you to manage your users and groups, and provides better auditability in a multi-tenant environment.


Additional Reading

If you found this post useful, be sure to check out Use Kerberos Authentication to Integrate Amazon EMR with Microsoft Active Directory and Launching and Running an Amazon EMR Cluster inside a VPC.


About the Authors

Songzhi Liu is a Big Data Consultant with AWS Professional Services. He works closely with AWS customers to provide them Big Data & Machine Learning solutions and best practices on the Amazon cloud.