Tag Archives: open source

BREIN Goes After Developers of ‘Pirate’ Kodi Builds

Post Syndicated from Ernesto original https://torrentfreak.com/brein-goes-after-developers-of-pirate-kodi-builds-170823/

A surge of cheap media players, which often use the open source Kodi software, has made it easy for people to stream video from the Internet directly to their TVs.

The media players themselves are perfectly legal, and the Kodi software is too, but when these are loaded with pirate add-ons, legal issues arise.

Earlier this year the European Court of Justice ruled that selling or using devices pre-configured to obtain copyright-infringing content is illegal. With this decision in hand, anti-piracy group BREIN has pressured dozens of vendors to halt their sales, but the action hasn’t stopped there.

Aside from going after sellers, BREIN is also targeting people who make “pirate” Kodi builds, which are prepackaged bundles of add-ons.

“We are also going after people who are involved in illegal builds, those with add-ons for unauthorized content,” BREIN director Tim Kuik confirmed to TorrentFreak without highlighting any specific targets.

Thus far, the group has focused on three ‘pirate’ builds and settled with ten people connected to them.

BREIN settlements generally include an agreement not to offer any infringing material in the future. This is also the case here. The developers face a penalty of 500 euros per infringing link per day.

Aside from the Filmspeler (Film Player) judgment of the EU Court of Justice, BREIN’s actions also use the Geenstijl ruling as a basis. This confirmed that merely linking to copyrighted works without permission can be seen as infringement, especially when it’s done with a profit motive.

In addition to targeting developers, BREIN previously announced that it had successfully halted the infringing activities of 200 sellers of ‘pirate’ media players.

Despite BREIN’s efforts, there are still plenty of infringing players, builds, and add-ons circulating in the wild, even on eBay. However, with pressure from various sides, it has become increasingly risky for the people involved, which is a dramatic change compared to a year ago.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Oracle considers letting go of Java EE

Post Syndicated from corbet original https://lwn.net/Articles/731579/rss

Oracle has announced
that it is considering stepping back from management of the Java Enterprise
Edition. “We are discussing how we can improve the Java EE
development process following the delivery of Java EE 8. We believe that
moving Java EE technologies including reference implementations and test
compatibility kit to an open source foundation may be the right next step,
in order to adopt more agile processes, implement more flexible licensing,
and change the governance process. We plan on exploring this possibility
with the community, our licensees and several candidate foundations to see
if we can move Java EE forward in this direction.

NoSQLMap – Automated NoSQL Exploitation Tool

Post Syndicated from Darknet original http://feedproxy.google.com/~r/darknethackers/~3/Y4RGC1J9G-U/

NoSQLMap is an open source Python-based automated NoSQL exploitation tool designed to audit for as well as automate injection attacks and exploit default configuration weaknesses in NoSQL databases. It is also intended to attack web applications using NoSQL in order to disclose data from the database. Presently the tool’s exploits are focused…

Read the full post at darknet.org.uk

Announcement: IPS code

Post Syndicated from Robert Graham original http://blog.erratasec.com/2017/08/announcement-ips-code.html

So after 20 years, IBM is killing off my BlackICE code created in April 1998. So it’s time that I rewrite it.

BlackICE was the first “inline” intrusion-detection system, aka. an “intrusion prevention system” or IPS. ISS purchased my company in 2001 and replaced their RealSecure engine with it, and later renamed it Proventia. Then IBM purchased ISS in 2006. Now, they are formally canceling the project and moving customers onto Cisco’s products, which are based on Snort.

So now is a good time to write a replacement. The reason is that BlackICE worked fundamentally differently than Snort, using protocol analysis rather than pattern-matching. In this way, it worked more like Bro than Snort. The biggest benefit of protocol-analysis is speed, making it many times faster than Snort. The second benefit is better detection ability, as I describe in this post on Heartbleed.

So my plan is to create a new project. I’ll be checking in the starter bits into GitHub starting a couple weeks from now. I need to figure out a new name for the project, so I don’t have to rip off a name from William Gibson like I did last time :).

Some notes:

  • Yes, it’ll be GNU open source. I’m a capitalist, so I’ll earn money like snort/nmap dual-licensing it, charging companies who don’t want to open-source their addons. All capitalists GNU license their code.
  • C, not Rust. Sorry, I’m going for extreme scalability. We’ll re-visit this decision later when looking at building protocol parsers.
  • It’ll be 95% compatible with Snort signatures. Their language definition leaves so much ambiguous it’ll be hard to be 100% compatible.
  • It’ll support Snort output as well, though really, Snort’s events suck.
  • Protocol parsers in Lua, so you can use it as a replacement for Bro, writing parsers to extract data you are interested in.
  • Protocol state machine parsers in C, like you see in my Masscan project for X.509.
  • First version IDS only. These days, “inline” means also being able to MitM the SSL stack, so I’m gong to have to think harder on that.
  • Mutli-core worker threads off PF_RING/DPDK/netmap receive queues. Should handle 10gbps, tracking 10 million concurrent connections, with quad-core CPU.
So if you want to contribute to the project, here’s what I need:
  • Requirements from people who work daily with IDS/IPS today. I need you to write up what your products do well that you really like. I need to you write up what they suck at that needs to be fixed. These need to be in some detail.
  • Testing environment to play with. This means having a small server plugged into a real-world link running at a minimum of several gigabits-per-second available for the next year. I’ll sign NDAs related to the data I might see on the network.
  • Coders. I’ll be doing the basic architecture, but protocol parsers, output plugins, etc. will need work. Code will be in C and Lua for the near term. Unfortunately, since I’m going to dual-license, I’ll need waivers before accepting pull requests.
Anyway, follow me on Twitter @erratarob if you want to contribute.

timeShift(GrafanaBuzz, 1w) Issue 9

Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2017/08/18/timeshiftgrafanabuzz-1w-issue-9/

Matt from Grafana NYC spent the week visiting Stockholm to focus on v5.0 with Torkel. Despite warnings otherwise, the weather has been beautiful, making a nice backdrop for many UX discussions. Very, very excited to soon show what we’ve been working on.


Latest Release

Grafana v4.4.3 is Available for download

To see the full changelog, head over to our community site.


Grafana <3 Prometheus

Our very own Carl Bergquist spoke at PromCon 2017 yesterday in Munich, highlighting recent Grafana features and enhancements.

We also used the opportunity to debut our coming Prometheus query editor with a load of new functionality; seems the community approves,
in fact this is our most popular tweet ever!


From the Blogosphere

  • Wikimedia Metrics: A tweet this week reminded us of the public metrics Wikimedia exposes using Grafana. Exploring the performance stats in real time for the 5th mot popular site on the internet is pretty fun.

  • Creating Grafana Annotations with InfluxDB: Nice short article by Max Chadwick showing how to quickly add InfluxDB as a source for Grafana annotations.


This week’s MVC (Most Valuable Contributor)

This week’s MVC highlights what is great about Open Source software.

ericslaw
ericslaw submitted his first PR to a public project this past week. Speaking from personal experience, submitting a PR can feel daunting and and we were lucky that he chose Grafana. Even the smallest contributions, like Eric fixing a bogus link within our templating has big impact.


Tweet of the Week

We scour Twitter each week to find an interesting/beautiful dashboard and show it off! #monitoringLove

Seems the excitement about Prometheus and Grafana has also caught the attention of a certain superhero.



What do you think?

That wraps up another issue. Hope you’re finding these roundups valuable. Let us know how we’re doing! Submit a comment on this article below, or post something at our community forum. Help us make this better!

Follow us on Twitter, like us on Facebook, and join the Grafana Labs community.

Roku Gets Tough on Pirate Channels, Warns Users

Post Syndicated from Ernesto original https://torrentfreak.com/roku-gets-tough-on-pirate-channels-warns-users-170815/

In recent years it has become much easier to stream movies and TV-shows over the Internet.

Legal services such as Netflix and HBO are flourishing, but there’s also a darker side to this streaming epidemic. Millions of people are streaming from unauthorized sources, often paired with perfectly legal streaming platforms and devices.

Hollywood insiders have dubbed this trend “Piracy 3.0” are actively working with stakeholders to address the threat. One of the companies rightsholders are working with is Roku, known for its easy-to-use media players.

Earlier this year Roku was harshly confronted with this new piracy crackdown when a Mexican court ordered local retailers to take its media player off the shelves. While this legal battle isn’t over yet, it was clear to Roku that misuse of its platform wasn’t without consequences.

While Roku never permitted any infringing content, it appears that the company has recently made some adjustments to better deal with the problem, or at least clarify its stance.

Pirate content generally doesn’t show up in the official Roku Channel Store but is directly loaded onto the device through third-party “private” channels. A few weeks ago, Roku renamed these “private” channels to “non-certified” channels, while making it very clear that copyright infringement is not allowed.

A “WARNING!” message that pops up during the installation of these third-party channels stresses that Roku has no control over the content. In addition, the company notes that these channels may be removed if it links to copyright infringing content.

Roku Warning

“By continuing, you acknowledge you are accessing a non-certified channel that may include content that is offensive or inappropriate for some audiences,” Roku’s warning reads.

“Moreover, if Roku determines that this channel violates copyright, contains illegal content, or otherwise violates Roku’s terms and conditions, then ROKU MAY REMOVE THIS CHANNEL WITHOUT PRIOR NOTICE.”

TorrentFreak reached out to Roku to find out how they plan to enforce this policy, but we have yet to hear back. According to Cord Cutters News, several piracy channels have already been removed recently, with other developers opting to leave the platform.

Roku’s General Counsel Steve Kay previously informed us that the company is taking the piracy problem seriously. Together with various stakeholders, they are working hard to address the problem.

“We actively work to prevent third-parties from using our platform to distribute copyright infringing content. Moreover, we have been actively working with other industry stakeholders on a wide range of anti-piracy initiatives,” Kay said.

Roku is not the only platform dealing with the piracy epidemic, the popular media player software Kodi is in the same boat. Kodi has also taken an active anti-piracy stance but they’re not banning any add-ons. They believe it would be pointless due to the open source nature of their software.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

New – AWS SAM Local (Beta) – Build and Test Serverless Applications Locally

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/new-aws-sam-local-beta-build-and-test-serverless-applications-locally/

Today we’re releasing a beta of a new tool, SAM Local, that makes it easy to build and test your serverless applications locally. In this post we’ll use SAM local to build, debug, and deploy a quick application that allows us to vote on tabs or spaces by curling an endpoint. AWS introduced Serverless Application Model (SAM) last year to make it easier for developers to deploy serverless applications. If you’re not already familiar with SAM my colleague Orr wrote a great post on how to use SAM that you can read in about 5 minutes. At it’s core, SAM is a powerful open source specification built on AWS CloudFormation that makes it easy to keep your serverless infrastructure as code – and they have the cutest mascot.

SAM Local takes all the good parts of SAM and brings them to your local machine.

There are a couple of ways to install SAM Local but the easiest is through NPM. A quick npm install -g aws-sam-local should get us going but if you want the latest version you can always install straight from the source: go get github.com/awslabs/aws-sam-local (this will create a binary named aws-sam-local, not sam).

I like to vote on things so let’s write a quick SAM application to vote on Spaces versus Tabs. We’ll use a very simple, but powerful, architecture of API Gateway fronting a Lambda function and we’ll store our results in DynamoDB. In the end a user should be able to curl our API curl https://SOMEURL/ -d '{"vote": "spaces"}' and get back the number of votes.

Let’s start by writing a simple SAM template.yaml:

AWSTemplateFormatVersion : '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Resources:
  VotesTable:
    Type: "AWS::Serverless::SimpleTable"
  VoteSpacesTabs:
    Type: "AWS::Serverless::Function"
    Properties:
      Runtime: python3.6
      Handler: lambda_function.lambda_handler
      Policies: AmazonDynamoDBFullAccess
      Environment:
        Variables:
          TABLE_NAME: !Ref VotesTable
      Events:
        Vote:
          Type: Api
          Properties:
            Path: /
            Method: post

So we create a [dynamo_i] table that we expose to our Lambda function through an environment variable called TABLE_NAME.

To test that this template is valid I’ll go ahead and call sam validate to make sure I haven’t fat-fingered anything. It returns Valid! so let’s go ahead and get to work on our Lambda function.

import os
import os
import json
import boto3
votes_table = boto3.resource('dynamodb').Table(os.getenv('TABLE_NAME'))

def lambda_handler(event, context):
    print(event)
    if event['httpMethod'] == 'GET':
        resp = votes_table.scan()
        return {'body': json.dumps({item['id']: int(item['votes']) for item in resp['Items']})}
    elif event['httpMethod'] == 'POST':
        try:
            body = json.loads(event['body'])
        except:
            return {'statusCode': 400, 'body': 'malformed json input'}
        if 'vote' not in body:
            return {'statusCode': 400, 'body': 'missing vote in request body'}
        if body['vote'] not in ['spaces', 'tabs']:
            return {'statusCode': 400, 'body': 'vote value must be "spaces" or "tabs"'}

        resp = votes_table.update_item(
            Key={'id': body['vote']},
            UpdateExpression='ADD votes :incr',
            ExpressionAttributeValues={':incr': 1},
            ReturnValues='ALL_NEW'
        )
        return {'body': "{} now has {} votes".format(body['vote'], resp['Attributes']['votes'])}

So let’s test this locally. I’ll need to create a real DynamoDB database to talk to and I’ll need to provide the name of that database through the enviornment variable TABLE_NAME. I could do that with an env.json file or I can just pass it on the command line. First, I can call:
$ echo '{"httpMethod": "POST", "body": "{\"vote\": \"spaces\"}"}' |\
TABLE_NAME="vote-spaces-tabs" sam local invoke "VoteSpacesTabs"

to test the Lambda – it returns the number of votes for spaces so theoritically everything is working. Typing all of that out is a pain so I could generate a sample event with sam local generate-event api and pass that in to the local invocation. Far easier than all of that is just running our API locally. Let’s do that: sam local start-api. Now I can curl my local endpoints to test everything out.
I’ll run the command: $ curl -d '{"vote": "tabs"}' http://127.0.0.1:3000/ and it returns: “tabs now has 12 votes”. Now, of course I did not write this function perfectly on my first try. I edited and saved several times. One of the benefits of hot-reloading is that as I change the function I don’t have to do any additional work to test the new function. This makes iterative development vastly easier.

Let’s say we don’t want to deal with accessing a real DynamoDB database over the network though. What are our options? Well we can download DynamoDB Local and launch it with java -Djava.library.path=./DynamoDBLocal_lib -jar DynamoDBLocal.jar -sharedDb. Then we can have our Lambda function use the AWS_SAM_LOCAL environment variable to make some decisions about how to behave. Let’s modify our function a bit:

import os
import json
import boto3
if os.getenv("AWS_SAM_LOCAL"):
    votes_table = boto3.resource(
        'dynamodb',
        endpoint_url="http://docker.for.mac.localhost:8000/"
    ).Table("spaces-tabs-votes")
else:
    votes_table = boto3.resource('dynamodb').Table(os.getenv('TABLE_NAME'))

Now we’re using a local endpoint to connect to our local database which makes working without wifi a little easier.

SAM local even supports interactive debugging! In Java and Node.js I can just pass the -d flag and a port to immediately enable the debugger. For Python I could use a library like import epdb; epdb.serve() and connect that way. Then we can call sam local invoke -d 8080 "VoteSpacesTabs" and our function will pause execution waiting for you to step through with the debugger.

Alright, I think we’ve got everything working so let’s deploy this!

First I’ll call the sam package command which is just an alias for aws cloudformation package and then I’ll use the result of that command to sam deploy.

$ sam package --template-file template.yaml --s3-bucket MYAWESOMEBUCKET --output-template-file package.yaml
Uploading to 144e47a4a08f8338faae894afe7563c3  90570 / 90570.0  (100.00%)
Successfully packaged artifacts and wrote output template to file package.yaml.
Execute the following command to deploy the packaged template
aws cloudformation deploy --template-file package.yaml --stack-name 
$ sam deploy --template-file package.yaml --stack-name VoteForSpaces --capabilities CAPABILITY_IAM
Waiting for changeset to be created..
Waiting for stack create/update to complete
Successfully created/updated stack - VoteForSpaces

Which brings us to our API:
.

I’m going to hop over into the production stage and add some rate limiting in case you guys start voting a lot – but otherwise we’ve taken our local work and deployed it to the cloud without much effort at all. I always enjoy it when things work on the first deploy!

You can vote now and watch the results live! http://spaces-or-tabs.s3-website-us-east-1.amazonaws.com/

We hope that SAM Local makes it easier for you to test, debug, and deploy your serverless apps. We have a CONTRIBUTING.md guide and we welcome pull requests. Please tweet at us to let us know what cool things you build. You can see our What’s New post here and the documentation is live here.

Randall

All Systems Go! 2017 Speakers

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/all-systems-go-2017-speakers.html

The All Systems Go! 2017 Headline Speakers Announced!

Don’t forget to send in your submissions to the All Systems Go! 2017 CfP! Proposals are accepted until September 3rd!

A couple of headline speakers have been announced now:

  • Alban Crequy (Kinvolk)
  • Brian “Redbeard” Harrington (CoreOS)
  • Gianluca Borello (Sysdig)
  • Jon Boulle (NStack/CoreOS)
  • Martin Pitt (Debian)
  • Thomas Graf (covalent.io/Cilium)
  • Vincent Batts (Red Hat/OCI)
  • (and yours truly)

These folks will also review your submissions as part of the papers committee!

All Systems Go! is an Open Source community conference focused on the projects and technologies at the foundation of modern Linux systems — specifically low-level user-space technologies. Its goal is to provide a friendly and collaborative gathering place for individuals and communities working to push these technologies forward.

All Systems Go! 2017 takes place in Berlin, Germany on October 21st+22nd.

To submit your proposal now please visit our CFP submission web site.

For further information about All Systems Go! visit our conference web site.

Growing up alongside tech

Post Syndicated from Eevee original https://eev.ee/blog/2017/08/09/growing-up-alongside-tech/

IndustrialRobot asks… or, uh, asked last month:

industrialrobot: How has your views on tech changed as you’ve got older?

This is so open-ended that it’s actually stumped me for a solid month. I’ve had a surprisingly hard time figuring out where to even start.


It’s not that my views of tech have changed too much — it’s that they’ve changed very gradually. Teasing out and explaining any one particular change is tricky when it happened invisibly over the course of 10+ years.

I think a better framework for this is to consider how my relationship to tech has changed. It’s gone through three pretty distinct phases, each of which has strongly colored how I feel and talk about technology.

Act I

In which I start from nothing.

Nothing is an interesting starting point. You only really get to start there once.

Learning something on my own as a kid was something of a magical experience, in a way that I don’t think I could replicate as an adult. I liked computers; I liked toying with computers; so I did that.

I don’t know how universal this is, but when I was a kid, I couldn’t even conceive of how incredible things were made. Buildings? Cars? Paintings? Operating systems? Where does any of that come from? Obviously someone made them, but it’s not the sort of philosophical point I lingered on when I was 10, so in the back of my head they basically just appeared fully-formed from the æther.

That meant that when I started trying out programming, I had no aspirations. I couldn’t imagine how far I would go, because all the examples of how far I would go were completely disconnected from any idea of human achievement. I started out with BASIC on a toy computer; how could I possibly envision a connection between that and something like a mainstream video game? Every new thing felt like a new form of magic, so I couldn’t conceive that I was even in the same ballpark as whatever process produced real software. (Even seeing the source code for GORILLAS.BAS, it didn’t quite click. I didn’t think to try reading any of it until years after I’d first encountered the game.)

This isn’t to say I didn’t have goals. I invented goals constantly, as I’ve always done; as soon as I learned about a new thing, I’d imagine some ways to use it, then try to build them. I produced a lot of little weird goofy toys, some of which entertained my tiny friend group for a couple days, some of which never saw the light of day. But none of it felt like steps along the way to some mountain peak of mastery, because I didn’t realize the mountain peak was even a place that could be gone to. It was pure, unadulterated (!) playing.

I contrast this to my art career, which started only a couple years ago. I was already in my late 20s, so I’d already spend decades seeing a very broad spectrum of art: everything from quick sketches up to painted masterpieces. And I’d seen the people who create that art, sometimes seen them create it in real-time. I’m even in a relationship with one of them! And of course I’d already had the experience of advancing through tech stuff and discovering first-hand that even the most amazing software is still just code someone wrote.

So from the very beginning, from the moment I touched pencil to paper, I knew the possibilities. I knew that the goddamn Sistine Chapel was something I could learn to do, if I were willing to put enough time in — and I knew that I’m not, so I’d have to settle somewhere a ways before that. I knew that I’d have to put an awful lot of work in before I’d be producing anything very impressive.

I did it anyway (though perhaps waited longer than necessary to start), but those aren’t things I can un-know, and so I can never truly explore art from a place of pure ignorance. On the other hand, I’ve probably learned to draw much more quickly and efficiently than if I’d done it as a kid, precisely because I know those things. Now I can decide I want to do something far beyond my current abilities, then go figure out how to do it. When I was just playing, that kind of ambition was impossible.


So, I played.

How did this affect my views on tech? Well, I didn’t… have any. Learning by playing tends to teach you things in an outward sprawl without many abrupt jumps to new areas, so you don’t tend to run up against conflicting information. The whole point of opinions is that they’re your own resolution to a conflict; without conflict, I can’t meaningfully say I had any opinions. I just accepted whatever I encountered at face value, because I didn’t even know enough to suspect there could be alternatives yet.

Act II

That started to seriously change around, I suppose, the end of high school and beginning of college. I was becoming aware of this whole “open source” concept. I took classes that used languages I wouldn’t otherwise have given a second thought. (One of them was Python!) I started to contribute to other people’s projects. Eventually I even got a job, where I had to work with other people. It probably also helped that I’d had to maintain my own old code a few times.

Now I was faced with conflicting subjective ideas, and I had to form opinions about them! And so I did. With gusto. Over time, I developed an idea of what was Right based on experience I’d accrued. And then I set out to always do things Right.

That’s served me decently well with some individual problems, but it also led me to inflict a lot of unnecessary pain on myself. Several endeavors languished for no other reason than my dissatisfaction with the architecture, long before the basic functionality was done. I started a number of “pure” projects around this time, generic tools like imaging libraries that I had no direct need for. I built them for the sake of them, I guess because I felt like I was improving some niche… but of course I never finished any. It was always in areas I didn’t know that well in the first place, which is a fine way to learn if you have a specific concrete goal in mind — but it turns out that building a generic library for editing images means you have to know everything about images. Perhaps that ambition went a little haywire.

I’ve said before that this sort of (self-inflicted!) work was unfulfilling, in part because the best outcome would be that a few distant programmers’ lives are slightly easier. I do still think that, but I think there’s a deeper point here too.

In forgetting how to play, I’d stopped putting any of myself in most of the work I was doing. Yes, building an imaging library is kind of a slog that someone has to do, but… I assume the people who work on software like PIL and ImageMagick are actually interested in it. The few domains I tried to enter and revolutionize weren’t passions of mine; I just happened to walk through the neighborhood one day and decided I could obviously do it better.

Not coincidentally, this was the same era of my life that led me to write stuff like that PHP post, which you may notice I am conspicuously not even linking to. I don’t think I would write anything like it nowadays. I could see myself approaching the same subject, but purely from the point of view of language design, with more contrasts and tradeoffs and less going for volume. I certainly wouldn’t lead off with inflammatory puffery like “PHP is a community of amateurs”.

Act III

I think I’ve mellowed out a good bit in the last few years.

It turns out that being Right is much less important than being Not Wrong — i.e., rather than trying to make something perfect that can be adapted to any future case, just avoid as many pitfalls as possible. Code that does something useful has much more practical value than unfinished code with some pristine architecture.

Nowhere is this more apparent than in game development, where all code is doomed to be crap and the best you can hope for is to stem the tide. But there’s also a fixed goal that’s completely unrelated to how the code looks: does the game work, and is it fun to play? Yes? Ship the damn thing and forget about it.

Games are also nice because it’s very easy to pour my own feelings into them and evoke feelings in the people who play them. They’re mine, something with my fingerprints on them — even the games I’ve built with glip have plenty of my own hallmarks, little touches I added on a whim or attention to specific details that I care about.

Maybe a better example is the Doom map parser I started writing. It sounds like a “pure” problem again, except that I actually know an awful lot about the subject already! I also cleverly (accidentally) released some useful results of the work I’ve done thusfar — like statistics about Doom II maps and a few screenshots of flipped stock maps — even though I don’t think the parser itself is far enough along to release yet. The tool has served a purpose, one with my fingerprints on it, even without being released publicly. That keeps it fresh in my mind as something interesting I’d like to keep working on, eventually. (When I run into an architecture question, I step back for a while, or I do other work in the hopes that the solution will reveal itself.)

I also made two simple Pokémon ROM hacks this year, despite knowing nothing about Game Boy internals or assembly when I started. I just decided I wanted to do an open-ended thing beyond my reach, and I went to do it, not worrying about cleanliness and willing to accept a bumpy ride to get there. I played, but in a more experienced way, invoking the stuff I know (and the people I’ve met!) to help me get a running start in completely unfamiliar territory.


This feels like a really fine distinction that I’m not sure I’m doing justice. I don’t know if I could’ve appreciated it three or four years ago. But I missed making toys, and I’m glad I’m doing it again.

In short, I forgot how to have fun with programming for a little while, and I’ve finally started to figure it out again. And that’s far more important than whether you use PHP or not.

jSQL – Automatic SQL Injection Tool In Java

Post Syndicated from Darknet original http://feedproxy.google.com/~r/darknethackers/~3/vEsd_Exo0S0/

jSQL is an automatic SQL Injection tool written in Java, it’s lightweight and supports 23 kinds of database. It is free, open source and cross-platform (Windows, Linux, Mac OS X) and is easily available in Kali, Pentest Box, Parrot Security OS, ArchStrike or BlackArch Linux. Features Automatic injection of 23 kinds of databases: Access CockroachDB…

Read the full post at darknet.org.uk

Deploying an NGINX Reverse Proxy Sidecar Container on Amazon ECS

Post Syndicated from Nathan Peck original https://aws.amazon.com/blogs/compute/nginx-reverse-proxy-sidecar-container-on-amazon-ecs/

Reverse proxies are a powerful software architecture primitive for fetching resources from a server on behalf of a client. They serve a number of purposes, from protecting servers from unwanted traffic to offloading some of the heavy lifting of HTTP traffic processing.

This post explains the benefits of a reverse proxy, and explains how to use NGINX and Amazon EC2 Container Service (Amazon ECS) to easily implement and deploy a reverse proxy for your containerized application.

Components

NGINX is a high performance HTTP server that has achieved significant adoption because of its asynchronous event driven architecture. It can serve thousands of concurrent requests with a low memory footprint. This efficiency also makes it ideal as a reverse proxy.

Amazon ECS is a highly scalable, high performance container management service that supports Docker containers. It allows you to run applications easily on a managed cluster of Amazon EC2 instances. Amazon ECS helps you get your application components running on instances according to a specified configuration. It also helps scale out these components across an entire fleet of instances.

Sidecar containers are a common software pattern that has been embraced by engineering organizations. It’s a way to keep server side architecture easier to understand by building with smaller, modular containers that each serve a simple purpose. Just like an application can be powered by multiple microservices, each microservice can also be powered by multiple containers that work together. A sidecar container is simply a way to move part of the core responsibility of a service out into a containerized module that is deployed alongside a core application container.

The following diagram shows how an NGINX reverse proxy sidecar container operates alongside an application server container:

In this architecture, Amazon ECS has deployed two copies of an application stack that is made up of an NGINX reverse proxy side container and an application container. Web traffic from the public goes to an Application Load Balancer, which then distributes the traffic to one of the NGINX reverse proxy sidecars. The NGINX reverse proxy then forwards the request to the application server and returns its response to the client via the load balancer.

Reverse proxy for security

Security is one reason for using a reverse proxy in front of an application container. Any web server that serves resources to the public can expect to receive lots of unwanted traffic every day. Some of this traffic is relatively benign scans by researchers and tools, such as Shodan or nmap:

[18/May/2017:15:10:10 +0000] "GET /YesThisIsAReallyLongRequestURLbutWeAreDoingItOnPurposeWeAreScanningForResearchPurposePleaseHaveALookAtTheUserAgentTHXYesThisIsAReallyLongRequestURLbutWeAreDoingItOnPurposeWeAreScanningForResearchPurposePleaseHaveALookAtTheUserAgentTHXYesThisIsAReallyLongRequestURLbutWeAreDoingItOnPurposeWeAreScanningForResearchPurposePleaseHaveALookAtTheUserAgentTHXYesThisIsAReallyLongRequestURLbutWeAreDoingItOnPurposeWeAreScanningForResearchPurposePleaseHaveALookAtTheUserAgentTHXYesThisIsAReallyLongRequestURLbutWeAreDoingItOnPurposeWeAreScanningForResearchPurposePleaseHaveALookAtTheUserAgentTHXYesThisIsAReallyLongRequestURLbutWeAreDoingItOnPurposeWeAreScanningForResearchPurposePleaseHaveALookAtTheUserAgentTHXYesThisIsAReallyLongRequestURLbutWeAreDoingItOnPurposeWeAreScanningForResearchPurposePleaseHaveALookAtTheUserAgentTHXYesThisIsAReallyLongRequestURLbutWeAreDoingItOnPurposeWeAreScanningForResearchPurposePleaseHaveALookAtTheUserAgentTHXYesThisIsAReallyLongRequestURLbutWeAreDoingItOnPurposeWeAreScann HTTP/1.1" 404 1389 - Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36
[18/May/2017:18:19:51 +0000] "GET /clientaccesspolicy.xml HTTP/1.1" 404 322 - Cloud mapping experiment. Contact [email protected]

But other traffic is much more malicious. For example, here is what a web server sees while being scanned by the hacking tool ZmEu, which scans web servers trying to find PHPMyAdmin installations to exploit:

[18/May/2017:16:27:39 +0000] "GET /mysqladmin/scripts/setup.php HTTP/1.1" 404 391 - ZmEu
[18/May/2017:16:27:39 +0000] "GET /web/phpMyAdmin/scripts/setup.php HTTP/1.1" 404 394 - ZmEu
[18/May/2017:16:27:39 +0000] "GET /xampp/phpmyadmin/scripts/setup.php HTTP/1.1" 404 396 - ZmEu
[18/May/2017:16:27:40 +0000] "GET /apache-default/phpmyadmin/scripts/setup.php HTTP/1.1" 404 405 - ZmEu
[18/May/2017:16:27:40 +0000] "GET /phpMyAdmin-2.10.0.0/scripts/setup.php HTTP/1.1" 404 397 - ZmEu
[18/May/2017:16:27:40 +0000] "GET /mysql/scripts/setup.php HTTP/1.1" 404 386 - ZmEu
[18/May/2017:16:27:41 +0000] "GET /admin/scripts/setup.php HTTP/1.1" 404 386 - ZmEu
[18/May/2017:16:27:41 +0000] "GET /forum/phpmyadmin/scripts/setup.php HTTP/1.1" 404 396 - ZmEu
[18/May/2017:16:27:41 +0000] "GET /typo3/phpmyadmin/scripts/setup.php HTTP/1.1" 404 396 - ZmEu
[18/May/2017:16:27:42 +0000] "GET /phpMyAdmin-2.10.0.1/scripts/setup.php HTTP/1.1" 404 399 - ZmEu
[18/May/2017:16:27:44 +0000] "GET /administrator/components/com_joommyadmin/phpmyadmin/scripts/setup.php HTTP/1.1" 404 418 - ZmEu
[18/May/2017:18:34:45 +0000] "GET /phpmyadmin/scripts/setup.php HTTP/1.1" 404 390 - ZmEu
[18/May/2017:16:27:45 +0000] "GET /w00tw00t.at.blackhats.romanian.anti-sec:) HTTP/1.1" 404 401 - ZmEu

In addition, servers can also end up receiving unwanted web traffic that is intended for another server. In a cloud environment, an application may end up reusing an IP address that was formerly connected to another service. It’s common for misconfigured or misbehaving DNS servers to send traffic intended for a different host to an IP address now connected to your server.

It’s the responsibility of anyone running a web server to handle and reject potentially malicious traffic or unwanted traffic. Ideally, the web server can reject this traffic as early as possible, before it actually reaches the core application code. A reverse proxy is one way to provide this layer of protection for an application server. It can be configured to reject these requests before they reach the application server.

Reverse proxy for performance

Another advantage of using a reverse proxy such as NGINX is that it can be configured to offload some heavy lifting from your application container. For example, every HTTP server should support gzip. Whenever a client requests gzip encoding, the server compresses the response before sending it back to the client. This compression saves network bandwidth, which also improves speed for clients who now don’t have to wait as long for a response to fully download.

NGINX can be configured to accept a plaintext response from your application container and gzip encode it before sending it down to the client. This allows your application container to focus 100% of its CPU allotment on running business logic, while NGINX handles the encoding with its efficient gzip implementation.

An application may have security concerns that require SSL termination at the instance level instead of at the load balancer. NGINX can also be configured to terminate SSL before proxying the request to a local application container. Again, this also removes some CPU load from the application container, allowing it to focus on running business logic. It also gives you a cleaner way to patch any SSL vulnerabilities or update SSL certificates by updating the NGINX container without needing to change the application container.

NGINX configuration

Configuring NGINX for both traffic filtering and gzip encoding is shown below:

http {
  # NGINX will handle gzip compression of responses from the app server
  gzip on;
  gzip_proxied any;
  gzip_types text/plain application/json;
  gzip_min_length 1000;
 
  server {
    listen 80;
 
    # NGINX will reject anything not matching /api
    location /api {
      # Reject requests with unsupported HTTP method
      if ($request_method !~ ^(GET|POST|HEAD|OPTIONS|PUT|DELETE)$) {
        return 405;
      }
 
      # Only requests matching the whitelist expectations will
      # get sent to the application server
      proxy_pass http://app:3000;
      proxy_http_version 1.1;
      proxy_set_header Upgrade $http_upgrade;
      proxy_set_header Connection 'upgrade';
      proxy_set_header Host $host;
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
      proxy_cache_bypass $http_upgrade;
    }
  }
}

The above configuration only accepts traffic that matches the expression /api and has a recognized HTTP method. If the traffic matches, it is forwarded to a local application container accessible at the local hostname app. If the client requested gzip encoding, the plaintext response from that application container is gzip-encoded.

Amazon ECS configuration

Configuring ECS to run this NGINX container as a sidecar is also simple. ECS uses a core primitive called the task definition. Each task definition can include one or more containers, which can be linked to each other:

 {
  "containerDefinitions": [
     {
       "name": "nginx",
       "image": "<NGINX reverse proxy image URL here>",
       "memory": "256",
       "cpu": "256",
       "essential": true,
       "portMappings": [
         {
           "containerPort": "80",
           "protocol": "tcp"
         }
       ],
       "links": [
         "app"
       ]
     },
     {
       "name": "app",
       "image": "<app image URL here>",
       "memory": "256",
       "cpu": "256",
       "essential": true
     }
   ],
   "networkMode": "bridge",
   "family": "application-stack"
}

This task definition causes ECS to start both an NGINX container and an application container on the same instance. Then, the NGINX container is linked to the application container. This allows the NGINX container to send traffic to the application container using the hostname app.

The NGINX container has a port mapping that exposes port 80 on a publically accessible port but the application container does not. This means that the application container is not directly addressable. The only way to send it traffic is to send traffic to the NGINX container, which filters that traffic down. It only forwards to the application container if the traffic passes the whitelisted rules.

Conclusion

Running a sidecar container such as NGINX can bring significant benefits by making it easier to provide protection for application containers. Sidecar containers also improve performance by freeing your application container from various CPU intensive tasks. Amazon ECS makes it easy to run sidecar containers, and automate their deployment across your cluster.

To see the full code for this NGINX sidecar reference, or to try it out yourself, you can check out the open source NGINX reverse proxy reference architecture on GitHub.

– Nathan
 @nathankpeck

Former Vuze Developers Launch BiglyBT, a ‘New’ Open Source Torrent Client

Post Syndicated from Ernesto original https://torrentfreak.com/former-vuze-developers-launch-biglybt-a-new-open-source-torrent-client-170803/

Back in the summer of 2003 a group of developers debuted a new torrent client, which they called Azureus.

BitTorrent itself was still a relatively new technology at the time and users were eager to find new tools to transfer their files. The feature-rich Azureus client, which later rebranded to Vuze, delivered just that.

In recent years, however, things have gone relatively quiet, up to a point where Vuze development appears to have stalled completely. Perhaps not surprising, as two of the core developers, parg and TuxPaper, have left the project and moved on to something new.

“We are no longer involved in Vuze or Azureus Software, Inc. We can not speak to what their intentions are with the development of their product,” they inform us.

The developers, who were also part of the original Azureus team, are not saying farewell to their code though. While they are no longer working on Vuze, the pair have started a new Azureus branch, one they will actively maintain.

“We have invested such a large amount of our lives in the endeavor that we feel the need to keep the open source project active, for both our and our users’ enjoyment!” parg and TuxPaper tell us.

BiglyBT, as they have named their new client, will continue where Vuze development stalled. In addition to optimizing the code and releasing new features, BiglyBT is determined to keep the open source project alive, without any commercial interests.

“Our main goals for BiglyBT is to keep it ad-free and open source, and to continue to develop it into an even better torrent client. We also hope that a community will form again around the product.”

BiglyBT main window (large)

People who try the new client will notice that it’s indeed very similar to Vuze, but without the ads and some other ‘cluttering’ features, such as DVD-burning.

While BiglyBT looks and operates in a similar manner to Vuze, in the future the developers will work on a new set of features, a new style, and various other changes that will set it apart from its older brother.

“Our first release is mostly a name change, but we have removed some of the things that we know users don’t particularly want or use, such as the content network, games promotions, DVD burning, the huge ad in the corner of the app, and the offers in the installer.”

While Vuze appears to have downsized its development efforts, BiglyBT promises to go full steam ahead. The new client will also stay true to the Open Source nature. Previously, some people complained that Vuze included proprietary code, resulting in more restrictive license terms. BiglyBT is purely GPL, and will remain so.

The client is currently available on all major desktop platforms, including Windows, MacOS and Linux. An open source Android app, forked from Vuze remote, will follow in a few weeks.

BiglyBT should appeal to a wide range of users, especially the more seasoned torrent user who wants a client they can configure to their liking.

“Our target users are people who love to delve into the world of torrenting. People who like to tinker and watch torrents do their thing. Hoarders who like to seed, automate, categorize and contribute back to the torrenting community,” the developers note.

People who are interested in giving BiglyBT a spin can download the latest version from the official site. The application is free and won’t install any other applications or adware. Instead, it’s solely supported by donations from the public.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

EFF: Bassel Khartabil, In Memoriam

Post Syndicated from ris original https://lwn.net/Articles/729644/rss

The Electronic Frontier Foundation reports
that Bassel Khartabil, Syrian open source developer, blogger,
entrepreneur, hackerspace founder, and free culture advocate, was executed
by the Syrian authorities. “Bassel was a central figure in the
global free culture movement, connecting it and promoting it to Syria’s
emerging tech community as it existed before the country was ransacked by
civil war. He co-founded Aiki Lab, Syria’s first hackerspace, in Damascus
in 2010. He was a contributor to Mozilla’s Firefox browser and the Syrian
lead for Creative Commons. His influence went beyond Syria, however: he was
a key attendee at the Middle East’s bloggers’ conferences, and played a
vital role in the negotiations in Doha in 2010 that led to a common
language for discussing fair use and copyright across the Arab-speaking
world.
” (Thanks to Paul Wise)

Email2git: Matching Linux Code with its Mailing List Discussions (Linux.com)

Post Syndicated from jake original https://lwn.net/Articles/729090/rss

Linux.com is carrying an article about email2git by its developer, Alexandre Courouble. Email2git is a way to match up commits and the email thread that discussed them. It currently targets the kernel and threads from the linux-kernel mailing list. There are two separate ways to use it, as an extension to cregit (at https://cregit.linuxsources.org/) that allows browsing changes at the token level or via a search by commit ID interface. “The Linux project’s email-based reviewing process is highly effective in filtering open source contributions on their way from mailing list discussions towards Linus Torvalds’ Git repository. However, once integrated, it can be difficult to link Git commits back to their review comments in mailing list discussions, especially when considering commits that underwent multiple versions (and hence review rounds), that belong to a multi-patch series, or that were cherry-picked.

As an answer to these and other issues, we created email2git, a patch retrieving system built for the Linux kernel. For a given commit, the tool is capable of finding the email patch as well as the email conversation that took place during the review process. We are currently improving the system with support for multi-patch series and cherry-picking.” The code for email2git is available on GitHub.

AWS Hot Startups – July 2017

Post Syndicated from Tina Barr original https://aws.amazon.com/blogs/aws/aws-hot-startups-july-2017/

Welcome back to another month of Hot Startups! Every day, startups are creating innovative and exciting businesses, applications, and products around the world. Each month we feature a handful of startups doing cool things using AWS.

July is all about learning! These companies are focused on providing access to tools and resources to expand knowledge and skills in different ways.

This month’s startups:

  • CodeHS – provides fun and accessible computer science curriculum for middle and high schools.
  • Insight – offers intensive fellowships to grow technical talent in Data Science.
  • iTranslate – enables people to read, write, and speak in over 90 languages, anywhere in the world.

CodeHS (San Francisco, CA)

In 2012, Stanford students Zach Galant and Jeremy Keeshin were computer science majors and TAs for introductory classes when they noticed a trend among their peers. Many wished that they had been exposed to computer science earlier in life. In their senior year, Zach and Jeremy launched CodeHS to give middle and high schools the opportunity to provide a fun, accessible computer science education to students everywhere. CodeHS is a web-based curriculum pathway complete with teacher resources, lesson plans, and professional development opportunities. The curriculum is supplemented with time-saving teacher tools to help with lesson planning, grading and reviewing student code, and managing their classroom.

CodeHS aspires to empower all students to meaningfully impact the future, and believe that coding is becoming a new foundational skill, along with reading and writing, that allows students to further explore any interest or area of study. At the time CodeHS was founded in 2012, only 10% of high schools in America offered a computer science course. Zach and Jeremy set out to change that by providing a solution that made it easy for schools and districts to get started. With CodeHS, thousands of teachers have been trained and are teaching hundreds of thousands of students all over the world. To use CodeHS, all that’s needed is the internet and a web browser. Students can write and run their code online, and teachers can immediately see what the students are working on and how they are doing.

Amazon EC2, Amazon RDS, Amazon ElastiCache, Amazon CloudFront, and Amazon S3 make it possible for CodeHS to scale their site to meet the needs of schools all over the world. CodeHS also relies on AWS to compile and run student code in the browser, which is extremely important when teaching server-side languages like Java that powers the AP course. Since usage rises and falls based on school schedules, Amazon CloudWatch and ELBs are used to easily scale up when students are running code so they have a seamless experience.

Be sure to visit the CodeHS website, and to learn more about bringing computer science to your school, click here!

Insight (Palo Alto, CA)

Insight was founded in 2012 to create a new educational model, optimize hiring for data teams, and facilitate successful career transitions among data professionals. Over the last 5 years, Insight has kept ahead of market trends and launched a series of professional training fellowships including Data Science, Health Data Science, Data Engineering, and Artificial Intelligence. Finding individuals with the right skill set, background, and culture fit is a challenge for big companies and startups alike, and Insight is focused on developing top talent through intensive 7-week fellowships. To date, Insight has over 1,000 alumni at over 350 companies including Amazon, Google, Netflix, Twitter, and The New York Times.

The Data Engineering team at Insight is well-versed in the current ecosystem of open source tools and technologies and provides mentorship on the best practices in this space. The technical teams are continually working with external groups in a variety of data advisory and mentorship capacities, but the majority of Insight partners participate in professional sessions. Companies visit the Insight office to speak with fellows in an informal setting and provide details on the type of work they are doing and how their teams are growing. These sessions have proved invaluable as fellows experience a significantly better interview process and companies yield engaged and enthusiastic new team members.

An important aspect of Insight’s fellowships is the opportunity for hands-on work, focusing on everything from building big-data pipelines to contributing novel features to industry-standard open source efforts. Insight provides free AWS resources for all fellows to use, in addition to mentorships from the Data Engineering team. Fellows regularly utilize Amazon S3, Amazon EC2, Amazon Kinesis, Amazon EMR, AWS Lambda, Amazon Redshift, Amazon RDS, among other services. The experience with AWS gives fellows a solid skill set as they transition into the industry. Fellowships are currently being offered in Boston, New York, Seattle, and the Bay Area.

Check out the Insight blog for more information on trends in data infrastructure, artificial intelligence, and cutting-edge data products.

 

iTranslate (Austria)

When the App Store was introduced in 2008, the founders of iTranslate saw an opportunity to be part of something big. The group of four fully believed that the iPhone and apps were going to change the world, and together they brainstormed ideas for their own app. The combination of translation and mobile devices seemed a natural fit, and by 2009 iTranslate was born. iTranslate’s mission is to enable travelers, students, business professionals, employers, and medical staff to read, write, and speak in all languages, anywhere in the world. The app allows users to translate text, voice, websites and more into nearly 100 languages on various platforms. Today, iTranslate is the leading player for conversational translation and dictionary apps, with more than 60 million downloads and 6 million monthly active users.

iTranslate is breaking language barriers through disruptive technology and innovation, enabling people to translate in real time. The app has a variety of features designed to optimize productivity including offline translation, website and voice translation, and language auto detection. iTranslate also recently launched the world’s first ear translation device in collaboration with Bragi, a company focused on smart earphones. The Dash Pro allows people to communicate freely, while having a personal translator right in their ear.

iTranslate started using Amazon Polly soon after it was announced. CEO Alexander Marktl said, “As the leading translation and dictionary app, it is our mission at iTranslate to provide our users with the best possible tools to read, write, and speak in all languages across the globe. Amazon Polly provides us with the ability to efficiently produce and use high quality, natural sounding synthesized speech.” The stable and simple-to-use API, low latency, and free caching allow iTranslate to scale as they continue adding features to their app. Customers also enjoy the option to change speech rate and change between male and female voices. To assure quality, speed, and reliability of their products, iTranslate also uses Amazon EC2, Amazon S3, and Amazon Route 53.

To get started with iTranslate, visit their website here.

—–

Thanks for reading!

-Tina

Run Common Data Science Packages on Anaconda and Oozie with Amazon EMR

Post Syndicated from John Ohle original https://aws.amazon.com/blogs/big-data/run-common-data-science-packages-on-anaconda-and-oozie-with-amazon-emr/

In the world of data science, users must often sacrifice cluster set-up time to allow for complex usability scenarios. Amazon EMR allows data scientists to spin up complex cluster configurations easily, and to be up and running with complex queries in a matter of minutes.

Data scientists often use scheduling applications such as Oozie to run jobs overnight. However, Oozie can be difficult to configure when you are trying to use popular Python packages (such as “pandas,” “numpy,” and “statsmodels”), which are not included by default.

One such popular platform that contains these types of packages (and more) is Anaconda. This post focuses on setting up an Anaconda platform on EMR, with an intent to use its packages with Oozie. I describe how to run jobs using a popular open source scheduler like Oozie.

Walkthrough

For this post, you walk through the following tasks:

  • Create an EMR cluster.
  • Download Anaconda on your master node.
  • Configure Oozie.
  • Test the steps.

Create an EMR cluster

Spin up an Amazon EMR cluster using the console or the AWS CLI. Use the latest release, and include Apache Hadoop, Apache Spark, Apache Hive, and Oozie.

To create a three-node cluster in the us-east-1 region, issue an AWS CLI command such as the following. This command must be typed as one line, as shown below. It is shown here separated for readability purposes only.

aws emr create-cluster \ 
--release-label emr-5.7.0 \ 
 --name '<YOUR-CLUSTER-NAME>' \
 --applications Name=Hadoop Name=Oozie Name=Spark Name=Hive \ 
 --ec2-attributes '{"KeyName":"<YOUR-KEY-PAIR>","SubnetId":"<YOUR-SUBNET-ID>","EmrManagedSlaveSecurityGroup":"<YOUR-EMR-SLAVE-SECURITY-GROUP>","EmrManagedMasterSecurityGroup":"<YOUR-EMR-MASTER-SECURITY-GROUP>"}' \ 
 --use-default-roles \ 
 --instance-groups '[{"InstanceCount":1,"InstanceGroupType":"MASTER","InstanceType":"<YOUR-INSTANCE-TYPE>","Name":"Master - 1"},{"InstanceCount":<YOUR-CORE-INSTANCE-COUNT>,"InstanceGroupType":"CORE","InstanceType":"<YOUR-INSTANCE-TYPE>","Name":"Core - 2"}]'

One-line version for reference:

aws emr create-cluster --release-label emr-5.7.0 --name '<YOUR-CLUSTER-NAME>' --applications Name=Hadoop Name=Oozie Name=Spark Name=Hive --ec2-attributes '{"KeyName":"<YOUR-KEY-PAIR>","SubnetId":"<YOUR-SUBNET-ID>","EmrManagedSlaveSecurityGroup":"<YOUR-EMR-SLAVE-SECURITY-GROUP>","EmrManagedMasterSecurityGroup":"<YOUR-EMR-MASTER-SECURITY-GROUP>"}' --use-default-roles --instance-groups '[{"InstanceCount":1,"InstanceGroupType":"MASTER","InstanceType":"<YOUR-INSTANCE-TYPE>","Name":"Master - 1"},{"InstanceCount":<YOUR-CORE-INSTANCE-COUNT>,"InstanceGroupType":"CORE","InstanceType":"<YOUR-INSTANCE-TYPE>","Name":"Core - 2"}]'

Download Anaconda

SSH into your EMR master node instance and download the official Anaconda installer:

wget https://repo.continuum.io/archive/Anaconda2-4.4.0-Linux-x86_64.sh

At the time of publication, Anaconda 4.4 is the most current version available. For the download link location for the latest Python 2.7 version (Python 3.6 may encounter issues), see https://www.continuum.io/downloads.  Open the context (right-click) menu for the Python 2.7 download link, choose Copy Link Location, and use this value in the previous wget command.

This post used the Anaconda 4.4 installation. If you have a later version, it is shown in the downloaded filename:  “anaconda2-<version number>-Linux-x86_64.sh”.

Run this downloaded script and follow the on-screen installer prompts.

chmod u+x Anaconda2-4.4.0-Linux-x86_64.sh
./Anaconda2-4.4.0-Linux-x86_64.sh

For an installation directory, select somewhere with enough space on your cluster, such as “/mnt/anaconda/”.

The process should take approximately 1–2 minutes to install. When prompted if you “wish the installer to prepend the Anaconda2 install location”, select the default option of [no].

After you are done, export the PATH to include this new Anaconda installation:

export PATH=/mnt/anaconda/bin:$PATH

Zip up the Anaconda installation:

cd /mnt/anaconda/
zip -r anaconda.zip .

The zip process may take 4–5 minutes to complete.

(Optional) Upload this anaconda.zip file to your S3 bucket for easier inclusion into future EMR clusters. This removes the need to repeat the previous steps for future EMR clusters.

Configure Oozie

Next, you configure Oozie to use Pyspark and the Anaconda platform.

Get the location of your Oozie sharelibupdate folder. Issue the following command and take note of the “sharelibDirNew” value:

oozie admin -sharelibupdate

For this post, this value is “hdfs://ip-192-168-4-200.us-east-1.compute.internal:8020/user/oozie/share/lib/lib_20170616133136”.

Pass in the required Pyspark files into Oozies sharelibupdate location. The following files are required for Oozie to be able to run Pyspark commands:

  • pyspark.zip
  • py4j-0.10.4-src.zip

These are located on the EMR master instance in the location “/usr/lib/spark/python/lib/”, and must be put into the Oozie sharelib spark directory. This location is the value of the sharelibDirNew parameter value (shown above) with “/spark/” appended, that is, “hdfs://ip-192-168-4-200.us-east-1.compute.internal:8020/user/oozie/share/lib/lib_20170616133136/spark/”.

To do this, issue the following commands:

hdfs dfs -put /usr/lib/spark/python/lib/py4j-0.10.4-src.zip hdfs://ip-192-168-4-200.us-east-1.compute.internal:8020/user/oozie/share/lib/lib_20170616133136/spark/
hdfs dfs -put /usr/lib/spark/python/lib/pyspark.zip hdfs://ip-192-168-4-200.us-east-1.compute.internal:8020/user/oozie/share/lib/lib_20170616133136/spark/

After you’re done, Oozie can use Pyspark in its processes.

Pass the anaconda.zip file into HDFS as follows:

hdfs dfs -put /mnt/anaconda/anaconda.zip /tmp/myLocation/anaconda.zip

(Optional) Verify that it was transferred successfully with the following command:

hdfs dfs -ls /tmp/myLocation/

On your master node, execute the following command:

export PYSPARK_PYTHON=/mnt/anaconda/bin/python

Set the PYSPARK_PYTHON environment variable on the executor nodes. Put the following configurations in your “spark-opts” values in your Oozie workflow.xml file:

–conf spark.executorEnv.PYSPARK_PYTHON=./anaconda_remote/bin/python
–conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=./anaconda_remote/bin/python

This is referenced from the Oozie job in the following line in your workflow.xml file, also included as part of your “spark-opts”:

--archives hdfs:///tmp/myLocation/anaconda.zip#anaconda_remote

Your Oozie workflow.xml file should now look something like the following:

<workflow-app name="spark-wf" xmlns="uri:oozie:workflow:0.5">
<start to="start_spark" />
<action name="start_spark">
    <spark xmlns="uri:oozie:spark-action:0.1">
        <job-tracker>${jobTracker}</job-tracker>
        <name-node>${nameNode}</name-node>
        <prepare>
            <delete path="/tmp/test/spark_oozie_test_out3"/>
        </prepare>
        <master>yarn-cluster</master>
        <mode>cluster</mode>
        <name>SparkJob</name>
        <class>clear</class>
        <jar>hdfs:///user/oozie/apps/myPysparkProgram.py</jar>
        <spark-opts>--queue default
            --conf spark.ui.view.acls=*
            --executor-memory 2G --num-executors 2 --executor-cores 2 --driver-memory 3g
            --conf spark.executorEnv.PYSPARK_PYTHON=./anaconda_remote/bin/python
            --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=./anaconda_remote/bin/python
            --archives hdfs:///tmp/myLocation/anaconda.zip#anaconda_remote
        </spark-opts>
    </spark>
    <ok to="end"/>
    <error to="kill"/>
</action>
        <kill name="kill">
                <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
        </kill>
        <end name="end"/>
</workflow-app>

Test steps

To test this out, you can use the following job.properties and myPysparkProgram.py file, along with the following steps:

job.properties

masterNode ip-xxx-xxx-xxx-xxx.us-east-1.compute.internal
nameNode hdfs://${masterNode}:8020
jobTracker ${masterNode}:8032
master yarn
mode cluster
queueName default
oozie.libpath ${nameNode}/user/oozie/share/lib
oozie.use.system.libpath true
oozie.wf.application.path ${nameNode}/user/oozie/apps/

Note: You can get your master node IP address (denoted as “ip-xxx-xxx-xxx-xxx” here) from the value for the sharelibDirNew parameter noted earlier.

myPysparkProgram.py

from pyspark import SparkContext, SparkConf
import numpy
import sys

conf = SparkConf().setAppName('myPysparkProgram')
sc = SparkContext(conf=conf)

rdd = sc.textFile("/user/hadoop/input.txt")

x = numpy.sum([3,4,5]) #total = 12

rdd = rdd.map(lambda line: line + str(x))
rdd.saveAsTextFile("/user/hadoop/output")

Put the “myPysparkProgram.py” into the location mentioned between the “<jar>xxxxx</jar>” tags in your workflow.xml. In this example, the location is “hdfs:///user/oozie/apps/”. Use the following command to move the “myPysparkProgram.py” file to the correct location:

hdfs dfs -put myPysparkProgram.py /user/oozie/apps/

Put the above workflow.xml file into the “/user/oozie/apps/” location in hdfs:

hdfs dfs –put workflow.xml /user/oozie/apps/

Note: The job.properties file is run locally from the EMR master node.

Create a sample input.txt file with some data in it. For example:

input.txt

This is a sentence.
So is this. 
This is also a sentence.

Put this file into hdfs:

hdfs dfs -put input.txt /user/hadoop/

Execute the job in Oozie with the following command. This creates an Oozie job ID.

oozie job -config job.properties -run

You can check the Oozie job state with the command:

oozie job -info <Oozie job ID>

  1. When the job is successfully finished, the results are located at:
/user/hadoop/output/part-00000
/user/hadoop/output/part-00001

  1. Run the following commands to view the output:
hdfs dfs -cat /user/hadoop/output/part-00000
hdfs dfs -cat /user/hadoop/output/part-00001

The output will be:

This is a sentence. 12
So is this 12
This is also a sentence 12

Summary

The myPysparkProgram.py has successfully imported the numpy library from the Anaconda platform and has produced some output with it. If you tried to run this using standard Python, you’d encounter the following error:

Now when your Python job runs in Oozie, any imported packages that are implicitly imported by your Pyspark script are imported into your job within Oozie directly from the Anaconda platform. Simple!

If you have questions or suggestions, please leave a comment below.


Additional Reading

Learn how to use Apache Oozie workflows to automate Apache Spark jobs on Amazon EMR.

 


About the Author

John Ohle is an AWS BigData Cloud Support Engineer II for the BigData team in Dublin. He works to provide advice and solutions to our customers on their Big Data projects and workflows on AWS. In his spare time, he likes to play music, learn, develop tools and write documentation to further help others – both colleagues and customers alike.

 

 

 

Wanted: Automation Systems Administrator

Post Syndicated from Yev original https://www.backblaze.com/blog/wanted-automation-systems-administrator/

Are you an Automation Systems Administrator who is looking for a challenging and fast-paced working environment? Want to a join our dynamic team and help Backblaze grow to new heights? Our Operations team is a distributed and collaborative group of individual contributors. We work closely together to build and maintain our home grown cloud storage farm, carefully controlling costs by utilizing open source and various brands of technology, as well as designing our own cloud storage servers. Members of Operations participate in the prioritization and decision making process, and make a difference everyday. The environment is challenging, but we balance the challenges with rewards, and we are looking for clever and innovative people to join us.

Responsibilities:

  • Develop and deploy automated provisioning & updating of systems
  • Lead projects across a range of IT disciplines
  • Understand environment thoroughly enough to administer/debug any system
  • Participate in the 24×7 on-call rotation and respond to alerts as needed

Requirements:

  • Expert knowledge of automated provisioning
  • Expert knowledge of Linux administration (Debian preferred)
  • Scripting skills
  • Experience in automation/configuration management
  • Position based in the San Mateo, California Corporate Office

Required for all Backblaze Employees

  • Good attitude and willingness to do whatever it takes to get the job done.
  • Desire to learn and adapt to rapidly changing technologies and work environment.
  • Relentless attention to detail.
  • Excellent communication and problem solving skills.
  • Backblaze is an Equal Opportunity Employer and we offer competitive salary and benefits, including our no policy vacation policy.

Company Description:
Founded in 2007, Backblaze started with a mission to make backup software elegant and provide complete peace of mind. Over the course of almost a decade, we have become a pioneer in robust, scalable low cost cloud backup. Recently, we launched B2 – robust and reliable object storage at just $0.005/gb/mo. Part of our differentiation is being able to offer the lowest price of any of the big players while still being profitable.

We’ve managed to nurture a team oriented culture with amazingly low turnover. We value our people and their families. Don’t forget to check out our “About Us” page to learn more about the people and some of our perks.

We have built a profitable, high growth business. While we love our investors, we have maintained control over the business. That means our corporate goals are simple – grow sustainably and profitably.

Some Backblaze Perks:

  • Competitive healthcare plans
  • Competitive compensation and 401k
  • All employees receive Option grants
  • Unlimited vacation days
  • Strong coffee
  • Fully stocked Micro kitchen
  • Catered breakfast and lunches
  • Awesome people who work on awesome projects
  • Childcare bonus
  • Normal work hours
  • Get to bring your pets into the office
  • San Mateo Office – located near Caltrain and Highways 101 & 280.

If this sounds like you — follow these steps:

  1. Send an email to [email protected] with the position in the subject line.
  2. Include your resume.
  3. Tell us a bit about your experience and why you’re excited to work with Backblaze.

The post Wanted: Automation Systems Administrator appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Wanted: Site Reliability Engineer

Post Syndicated from Yev original https://www.backblaze.com/blog/wanted-site-reliability-engineer/

Are you a Site Reliability Engineer who is looking for a challenging and fast-paced working environment? Want to a join our dynamic team and help Backblaze grow to new heights? Our Operations team is a distributed and collaborative group of individual contributors. We work closely together to build and maintain our home grown cloud storage farm, carefully controlling costs by utilizing open source and various brands of technology, as well as designing our own cloud storage servers. Members of Operations participate in the prioritization and decision making process, and make a difference everyday. The environment is challenging, but we balance the challenges with rewards, and we are looking for clever and innovative people to join us.

Responsibilities:

  • Lead projects across a range of IT disciplines
  • Understand environment thoroughly enough to administer/debug any system
  • Collaborate on automated provisioning & updating of systems
  • Collaborate on network administration and security
  • Collaborate on database administration
  • Participate in the 24×7 on-call rotation and respond to alerts
    as needed

Requirements:

  • Expert knowledge of Linux administration (Debian preferred)
  • Scripting skills
  • Experience in automation/configuration management (Ansible preferred)
  • Position based in the San Mateo, California Corporate Office

Required for all Backblaze Employees

  • Good attitude and willingness to do whatever it takes to get the job done.
  • Desire to learn and adapt to rapidly changing technologies and work environment.
  • Relentless attention to detail.
  • Excellent communication and problem solving skills.
  • Backblaze is an Equal Opportunity Employer and we offer competitive salary and benefits, including our no policy vacation policy.

Company Description:
Founded in 2007, Backblaze started with a mission to make backup software elegant and provide complete peace of mind. Over the course of almost a decade, we have become a pioneer in robust, scalable low cost cloud backup. Recently, we launched B2 – robust and reliable object storage at just $0.005/gb/mo. Part of our differentiation is being able to offer the lowest price of any of the big players while still being profitable.

We’ve managed to nurture a team oriented culture with amazingly low turnover. We value our people and their families. Don’t forget to check out our “About Us” page to learn more about the people and some of our perks.

We have built a profitable, high growth business. While we love our investors, we have maintained control over the business. That means our corporate goals are simple – grow sustainably and profitably.

Some Backblaze Perks:

  • Competitive healthcare plans
  • Competitive compensation and 401k
  • All employees receive Option grants
  • Unlimited vacation days
  • Strong coffee
  • Fully stocked Micro kitchen
  • Catered breakfast and lunches
  • Awesome people who work on awesome projects
  • Childcare bonus
  • Normal work hours
  • Get to bring your pets into the office
  • San Mateo Office – located near Caltrain and Highways 101 & 280.

If this sounds like you — follow these steps:

  1. Send an email to [email protected] with the position in the subject line.
  2. Include your resume.
  3. Tell us a bit about your experience and why you’re excited to work with Backblaze.

The post Wanted: Site Reliability Engineer appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.