Tag Archives: education

Top 10 Most Pirated Movies of The Week – 07/25/16

Post Syndicated from Ernesto original https://torrentfreak.com/top-10-pirated-movies-week-072516/

xmenapoThis week we have three newcomers in our chart.

X-Men: Apocalypse, which came out as HDRip with Korean subtitles, is the most downloaded movie.

The data for our weekly download chart is estimated by TorrentFreak, and is for informational and educational reference only. All the movies in the list are Web-DL/Webrip/HDRip/BDrip/DVDrip unless stated otherwise.

RSS feed for the weekly movie download chart.

Ranking (last week) Movie IMDb Rating / Trailer
torrentfreak.com
1 (10) X-Men: Apocalypse (Subbed HDRip) 7.7 / trailer
2 (1) Central Intelligence 6.9 / trailer
3 (…) Batman: The Killing Joke 7.3 / trailer
4 (2) Warcraft (subbed HDRip) 7.7 / trailer
5 (3) The Purge: Election Year (subbed HDRip) 6.3 / trailer
6 (…) Ghostbusters (TS) 5.3 / trailer
7 (5) Batman v Superman: Dawn of Justice 7.0 / trailer
8 (…) The Secret Life of Pets (HDTS) 6.8 / trailer
9 (4) The Legend of Tarzan (HDTS) 6.9 / trailer
10 (9) Independence Day: Resurgence (HDTS) 5.6 / trailer

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Digital Citizens Slam Cloudflare For Enabling Piracy & Malware

Post Syndicated from Andy original https://torrentfreak.com/digital-citizens-slam-cloudflare-for-enabling-piracy-malware-160722/

For the past several years, one of the key educational strategies of entertainment industry companies has been to cast doubt on the credibility of so-called ‘pirate’ sites.

Previously there have been efforts to suggest that site operators make huge profits at the expense of artists who get nothing, but there are other recurring themes, mostly centered around fear.

One of the most prominent is that pirate sites are dangerous places to visit, with users finding themselves infected with viruses and malware while being subjected to phishing attacks.

This increasingly well-worn approach has just been revisited by consumer interest group Digital Citizens Alliance (DCA). In a new report titled ‘Enabling Malware’, the Hollywood-affiliated group calls out United States-based companies for helping pirate site operators “bait consumers and steal their personal information.”

“When you think of Internet crime, you probably imagine shadowy
individuals operating in Eastern Europe, China or Russia who come up with devious plans to steal your identity, trick you into turning over financial information or peddling counterfeits or stolen content. And you would be right,” DCA begin.

“But while many online criminals are based overseas, and often beyond the reach of U.S. prosecutors, they are aided by North American technology companies that ensure that overseas operators’ lifeline to the public – their websites – are available.”

DCA has examined the malware issue on pirate sites on previous occasions but this time around their attention turns to local service providers, including hosting platform Hawk Host and CDN company Cloudflare who (in)directly provide services to pirate sites.

“Are these companies doing anything illegal? No more than the landlord of an apartment isn’t doing anything illegal by renting to a drug dealer who has sellers showing up day and night,” DCA writes.

“But just like that landlord, more often than not these companies either look the other way or just don’t want to know.”

Faced with an investigative dead-end when it comes to tracing the operators of pirate sites, DCA criticizes Cloudflare for providing a service which effectively shields the true location of such platforms.

“In order to utilize CloudFlare’s CDN, DNS, and other protection services customers have to run all of their website traffic through the CloudFlare network. The end result of doing so is masked hosting information,” DCA reports.

“Instead of the actual hosting provider, IP address, domain name server, etc., a Whois search provides the information for CloudFlare’s network.”

To illustrate its point, DCA points to a pirate domain which presents itself as the famous Putlocker site but is actually a third-party clone operating from the dubious URL, Putlockerr.ac.

“From websites such as putlockerr.ac consumers are tricked into downloading malware. For example, when a consumer clicks to watch a movie, they are sent to a new screen in which they are told their video player is out of date and they must update it. The update, Digital Citizens’ researchers found, is the malware delivery mechanism.”

There’s little doubt that some of these low-level sites are in the malware game so DCA’s research is almost certainly sound. However, just like their colleagues at the MPAA and RIAA who regularly shift responsibility to Google, DCA lays the blame on Cloudflare, a more easily pinpointed target than a pirate site operator.

Unsurprisingly, Cloudflare isn’t particularly interested in getting involved in the online content-policing business.

“CloudFlare’s service protects and accelerates websites and applications. Because CloudFlare is not a host, we cannot control or remove customer content from the Internet,” the company said in a response to the report.

In common with Google, Cloudflare also says it makes efforts to stop the spread of malware but due to the nature of its business it is unable to physically remove content from the Internet.

“CloudFlare leaves the removal of online content to law enforcement agencies and complies with any legal requests made by the authorities,” the company notes.

“If we believe that one of our customers’ websites is distributing malware, CloudFlare will post an interstitial page that warns site visitors and asks them if they would like to proceed despite the warning. This practice follows established industry norms.”

Finally, while DCA says it has the safety of Internet users at heart, its malware report misses a great opportunity. Aside from criticizing companies like Cloudflare for not doing enough, it offers zero practical anti-malware advice to consumers.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Top 10 Most Pirated Movies of The Week – 07/18/16

Post Syndicated from Ernesto original https://torrentfreak.com/top-10-pirated-movies-week-071816/

centintThis week we have two newcomers in our chart.

Central Intelligence is the most downloaded movie.

The data for our weekly download chart is estimated by TorrentFreak, and is for informational and educational reference only. All the movies in the list are Web-DL/Webrip/HDRip/BDrip/DVDrip unless stated otherwise.

RSS feed for the weekly movie download chart.

Ranking (last week) Movie IMDb Rating / Trailer
torrentfreak.com
1 (8) Central Intelligence 6.9 / trailer
2 (1) Warcraft (subbed HDRip) 7.7 / trailer
3 (…) The Purge: Election Year (subbed HDRip) 6.3 / trailer
4 (5) The Legend of Tarzan (HDTS) 6.9 / trailer
5 (2) Batman v Superman: Dawn of Justice 7.0 / trailer
6 (…) Hardcore Henry 6.9 / trailer
7 (7) Finding Dory (HDTS) 8.1 / trailer
8 (3) Me Before You (Subbed Webrip) 7.7 / trailer
9 (4) Independence Day: Resurgence (HDTS) 5.6 / trailer
10 (9) X-Men: Apocalypse (HDCam/TC) 7.7 / trailer

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Some stuff about color

Post Syndicated from Eevee original https://eev.ee/blog/2016/07/16/some-stuff-about-color/

I’ve been trying to paint more lately, which means I have to actually think about color. Like an artist, I mean. I’m okay at thinking about color as a huge nerd, but I’m still figuring out how to adapt that.

While I work on that, here is some stuff about color from the huge nerd perspective, which may or may not be useful or correct.

Hue

Hues are what we usually think of as “colors”, independent of how light or dim or pale they are: general categories like purple and orange and green.

Strictly speaking, a hue is a specific wavelength of light. I think it’s really weird to think about light as coming in a bunch of wavelengths, so I try not to think about the precise physical mechanism too much. Instead, here’s a rainbow.

rainbow spectrum

These are all the hues the human eye can see. (Well, the ones this image and its colorspace and your screen can express, anyway.) They form a nice spectrum, which wraps around so the two red ends touch.

(And here is the first weird implication of the physical interpretation: purple is not a real color, in the sense that there is no single wavelength of light that we see as purple. The actual spectrum runs from red to blue; when we see red and blue simultaneously, we interpret it as purple.)

The spectrum is divided by three sharp lines: yellow, cyan, and magenta. The areas between those lines are largely dominated by red, green, and blue. These are the two sets of primary colors, those hues from which any others can be mixed.

Red, green, and blue (RGB) make up the additive primary colors, so named because they add light on top of black. LCD screens work exactly this way: each pixel is made up of three small red, green, and blue rectangles. It’s also how the human eye works, which is fascinating but again a bit too physical.

Cyan, magenta, and yellow are the subtractive primary colors, which subtract light from white. This is how ink, paint, and other materials work. When you look at an object, you’re seeing the colors it reflects, which are the colors it doesn’t absorb. A red ink reflects red light, which means it absorbs green and blue light. Cyan ink only absorbs red, and yellow ink only absorbs blue; if you mix them, you’ll get ink that absorbs both red and blue green, and thus will appear green. A pure black is often included to make CMYK; mixing all three colors would technically get you black, but it might be a bit muddy and would definitely use three times as much ink.

The great kindergarten lie

Okay, you probably knew all that. What confused me for the longest time was how no one ever mentioned the glaring contradiction with what every kid is taught in grade school art class: that the primary colors are red, blue, and yellow. Where did those come from, and where did they go?

I don’t have a canonical answer for that, but it does make some sense. Here’s a comparison: the first spectrum is a full rainbow, just like the one above. The second is the spectrum you get if you use red, blue, and yellow as primary colors.

a full spectrum of hues, labeled with color names that are roughly evenly distributed
a spectrum of hues made from red, blue, and yellow

The color names come from xkcd’s color survey, which asked a massive number of visitors to give freeform names to a variety of colors. One of the results was a map of names for all the fully-saturated colors, providing a rough consensus for how English speakers refer to them.

The first wheel is what you get if you start with red, green, and blue — but since we’re talking about art class here, it’s really what you get if you start with cyan, magenta, and yellow. The color names are spaced fairly evenly, save for blue and green, which almost entirely consume the bottom half.

The second wheel is what you get if you start with red, blue, and yellow. Red has replaced magenta, and blue has replaced cyan, so neither color appears on the wheel — red and blue are composites in the subtractive model, and you can’t make primary colors like cyan or magenta out of composite colors.

Look what this has done to the distribution of names. Pink and purple have shrunk considerably. Green is half its original size and somewhat duller. Red, orange, and yellow now consume a full half of the wheel.

There’s a really obvious advantage here, if you’re a painter: people are orange.

Yes, yes, we subdivide orange into a lot of more specific colors like “peach” and “brown”, but peach is just pale orange, and brown is just dark orange. Everyone, of every race, is approximately orange. Sunburn makes you redder; fear and sickness make you yellower.

People really like to paint other people, so it makes perfect sense to choose primary colors that easily mix to make people colors.

Meanwhile, cyan and magenta? When will you ever use those? Nothing in nature remotely resembles either of those colors. The true color wheel is incredibly, unnaturally bright. The reduced color wheel is much more subdued, with only one color that stands out as bright: yellow, the color of sunlight.

You may have noticed that I even cheated a little bit. The blue in the second wheel isn’t the same as the blue from the first wheel; it’s halfway between cyan and blue, a tertiary color I like to call azure. True pure blue is just as unnatural as true cyan; azure is closer to the color of the sky, which is reflected as the color of water.

People are orange. Sunlight is yellow. Dirt and rocks and wood are orange. Skies and oceans are blue. Blush and blood and sunburn are red. Sunsets are largely red and orange. Shadows are blue, the opposite of yellow. Plants are green, but in sun or shade they easily skew more blue or yellow.

All of these colors are much easier to mix if you start with red, blue, and yellow. It may not match how color actually works, but it’s a useful approximation for humans. (Anyway, where will you find dyes that are cyan or magenta? Blue is hard enough.)

I’ve actually done some painting since I first thought about this, and would you believe they sell paints in colors other than bright red, blue, and yellow? You can just pick whatever starting colors you want and the whole notion of “primary” goes a bit out the window. So maybe this is all a bit moot.

More on color names

The way we name colors fascinates me.

A “basic color term” is a single, unambiguous, very common name for a group of colors. English has eleven: red, orange, yellow, green, blue, purple, black, white, gray, pink, and brown.

Of these, orange is the only tertiary hue; brown is the only name for a specifically low-saturation color; pink and grey are the only names for specifically light shades. I can understand grey — it’s handy to have a midpoint between black and white — but the other exceptions are quite interesting.

Looking at the first color wheel again, “blue” and “green” together consume almost half of the spectrum. That seems reasonable, since they’re both primary colors, but “red” is relatively small; large chunks of it have been eaten up by its neighbors.

Orange is a tertiary color in either RGB or CMYK: it’s a mix of red and yellow, a primary and secondary color. Yet we ended up with a distinct name for it. I could understand if this were to give white folks’ skin tones their own category, similar to the reasons for the RBY art class model, but we don’t generally refer to white skin as “orange”. So where did this color come from?

Sometimes I imagine a parallel universe where we have common names for other tertiary colors. How much richer would the blue/green side of the color wheel be if “chartreuse” or “azure” were basic color terms? Can you even imagine treating those as distinct colors, not just variants or green or blue? That’s exactly how we treat orange, even though it’s just a variant of red.

I can’t speak to whether our vocabulary truly influences how we perceive or think (and that often-cited BBC report seems to have no real source). But for what it’s worth, I’ve been trying to think of “azure” as distinct for a few years now, and I’ve had a much easier time dealing with blues in art and design. Giving the cyan end of blue a distinct and common name has given me an anchor, something to arrange thoughts around.

Come to think of it, yellow is an interesting case as well. A decent chunk of the spectrum was ultimately called “yellow” in the xkcd map; here’s that chunk zoomed in a bit.

full range of xkcd yellows

How much of this range would you really call yellow, rather than green (or chartreuse!) or orange? Yellow is a remarkably specific color: mixing it even slightly with one of its neighbors loses some of its yellowness, and darkening it moves it swiftly towards brown.

I wonder why this is. When we see a yellowish-orange, are we inclined to think of it as orange because it looks like orange under yellow sunlight? Is it because yellow is between red and green, and the red and green receptors in the human eye pick up on colors that are very close together?


Most human languages develop their color terms in a similar order, with a split between blue and green often coming relatively late in a language’s development. Of particular interest to me is that orange and pink are listed as a common step towards the end — I’m really curious as to whether that happens universally and independently, or it’s just influence from Western color terms.

I’d love to see a list of the basic color terms in various languages, but such a thing is proving elusive. There’s a neat map of how many colors exist in various languages, but it doesn’t mention what the colors are. It’s easy enough to find a list of colors in various languages, like this one, but I have no idea whether they’re basic in each language. Note also that this chart only has columns for English’s eleven basic colors, even though Russian and several other languages have a twelfth basic term for azure. The page even mentions this, but doesn’t include a column for it, which seems ludicrous in an “omniglot” table.

The only language I know many color words in is Japanese, so I went delving into some of its color history. It turns out to be a fascinating example, because you can see how the color names developed right in the spelling of the words.

See, Japanese has a couple different types of words that function like adjectives. Many of the most common ones end in -i, like kawaii, and can be used like verbs — we would translate kawaii as “cute”, but it can function just as well as “to be cute”. I’m under the impression that -i adjectives trace back to Old Japanese, and new ones aren’t created any more.

That’s really interesting, because to my knowledge, only five Japanese color names are in this form: kuroi (black), shiroi (white), akai (red), aoi (blue), and kiiroi (yellow). So these are, necessarily, the first colors the language could describe. If you compare to the chart showing progression of color terms, this is the bottom cell in column IV: white, red, yellow, green/blue, and black.

A great many color names are compounds with iro, “color” — for example, chairo (brown) is cha (tea) + iro. Of the five basic terms above, kiiroi is almost of that form, but unusually still has the -i suffix. (You might think that shiroi contains iro, but shi is a single character distinct from i. kiiroi is actually written with the kanji for iro.) It’s possible, then, that yellow was the latest of these five words — and that would give Old Japanese words for white, red/yellow, green/blue, and black, matching the most common progression.

Skipping ahead some centuries, I was surprised to learn that midori, the word for green, was only promoted to a basic color fairly recently. It’s existed for a long time and originally referred to “greenery”, but it was considered to be a shade of blue (ao) until the Allied occupation after World War II, when teaching guidelines started to mention a blue/green distinction. (I would love to read more details about this, if you have any; the West’s coming in and adding a new color is a fascinating phenomenon, and I wonder what other substantial changes were made to education.)

Japanese still has a number of compound words that use ao (blue!) to mean what we would consider green: aoshingou is a green traffic light, aoao means “lush” in a natural sense, aonisai is a greenhorn (presumably from the color of unripe fruit), aojiru is a drink made from leafy vegetables, and so on.

This brings us to at least six basic colors, the fairly universal ones: black, white, red, yellow, blue, and green. What others does Japanese have?

From here, it’s a little harder to tell. I’m not exactly fluent and definitely not a native speaker, and resources aimed at native English speakers are more likely to list colors familiar to English speakers. (I mean, until this week, I never knew just how common it was for aoi to mean green, even though midori as a basic color is only about as old as my parents.)

I do know two curious standouts: pinku (pink) and orenji (orange), both English loanwords. I can’t be sure that they’re truly basic color terms, but they sure do come up a lot. The thing is, Japanese already has names for these colors: momoiro (the color of peach — flowers, not the fruit!) and daidaiiro (the color of, um, an orange). Why adopt loanwords for concepts that already exist?

I strongly suspect, but cannot remotely qualify, that pink and orange weren’t basic colors until Western culture introduced the idea that they could be — and so the language adopted the idea and the words simultaneously. (A similar thing happened with grey, natively haiiro and borrowed as guree, but in my limited experience even the loanword doesn’t seem to be very common.)

Based on the shape of the words and my own unqualified guesses of what counts as “basic”, the progression of basic colors in Japanese seems to be:

  1. black, white, red (+ yellow), blue (+ green) — Old Japanese
  2. yellow — later Old Japanese
  3. brown — sometime in the past millenium
  4. green — after WWII
  5. pink, orange — last few decades?

And in an effort to put a teeny bit more actual research into this, I searched the Leeds Japanese word frequency list (drawn from websites, so modern Japanese) for some color words. Here’s the rank of each. Word frequency is generally such that the actual frequency of a word is inversely proportional to its rank — so a word in rank 100 is twice as common as a word in rank 200. The five -i colors are split into both noun and adjective forms, so I’ve included an adjusted rank that you would see if they were counted as a single word, using ab / (a + b).

  • white: 1010 ≈ 1959 (as a noun) + 2083 (as an adjective)
  • red: 1198 ≈ 2101 (n) + 2790 (adj)
  • black: 1253 ≈ 2017 (n) + 3313 (adj)
  • blue: 1619 ≈ 2846 (n) + 3757 (adj)
  • green: 2710
  • yellow: 3316 ≈ 6088 (n) + 7284 (adj)
  • orange: 4732 (orenji), n/a (daidaiiro)
  • pink: 4887 (pinku), n/a (momoiro)
  • purple: 6502 (murasaki)
  • grey: 8472 (guree), 10848 (haiiro)
  • brown: 10622 (chairo)
  • gold: 12818 (kin’iro)
  • silver: n/a (gin’iro)
  • navy: n/a (kon)

n/a” doesn’t mean the word is never used, only that it wasn’t in the top 15,000.

I’m not sure where the cutoff is for “basic” color terms, but it’s interesting to see where the gaps lie. I’m especially surprised that yellow is so far down, and that purple (which I hadn’t even mentioned here) is as high as it is. Also, green is above yellow, despite having been a basic color for less than a century! Go, green.

For comparison, in American English:

  • black: 254
  • white: 302
  • red: 598
  • blue: 845
  • green: 893
  • yellow: 1675
  • brown: 1782
  • golden: 1835
  • græy: 1949
  • pink: 2512
  • orange: 3171
  • purple: 3931
  • silver: n/a
  • navy: n/a

Don’t read too much into the actual ranks; the languages and corpuses are both very different.

Color models

There are numerous ways to arrange and identify colors, much as there are numerous ways to identify points in 3D space. There are also benefits and drawbacks to each model, but I’m often most interested in how much sense the model makes to me as a squishy human.

RGB is the most familiar to anyone who does things with computers — it splits a color into its red, green, and blue channels, and measures the amount of each from “none” to “maximum”. (HTML sets this range as 0 to 255, but you could just as well call it 0 to 1, or -4 to 7600.)

RGB has a couple of interesting problems. Most notably, it’s kind of difficult to read and write by hand. You can sort of get used to how it works, though I’m still not particularly great at it. I keep in mind these rules:

  1. The largest channel is roughly how bright the color is.

    This follows pretty easily from the definition of RGB: it’s colored light added on top of black. The maximum amount of every color makes white, so less than the maximum must be darker, and of course none of any color stays black.

  2. The smallest channel is how pale (desaturated) the color is.

    Mixing equal amounts of red, green, and blue will produce grey. So if the smallest channel is green, you can imagine “splitting” the color between a grey (green, green, green), and the leftovers (red – green, 0, blue – green). Mixing grey with a color will of course make it paler — less saturated, closer to grey — so the bigger the smallest channel, the greyer the color.

  3. Whatever’s left over tells you the hue.

It might be time for an illustration. Consider the color (50%, 62.5%, 75%). The brightness is “capped” at 75%, the largest channel; the desaturation is 50%, the smallest channel. Here’s what that looks like.

illustration of the color (50%, 62.5%, 75%) split into three chunks of 50%, 25%, and 25%

Cutting out the grey and the darkness leaves a chunk in the middle of actual differences between the colors. Note that I’ve normalized it to (0%, 50%, 100%), which is the percentage of that small middle range. Removing the smallest and largest channels will always leave you with a middle chunk where at least one channel is 0% and at least one channel is 100%. (Or it’s grey, and there is no middle chunk.)

The odd one out is green at 50%, so the hue of this color is halfway between cyan (green + blue) and blue. That hue is… azure! So this color is a slightly darkened and fairly dull azure. (The actual amount of “greyness” is the smallest relative to the largest, so in this case it’s about ⅔ grey, or about ⅓ saturated.) Here’s that color.

a slightly darkened, fairly dull azure

This is a bit of a pain to do in your head all the time, so why not do it directly?

HSV is what you get when you directly represent colors as hue, saturation, and value. It’s often depicted as a cylinder, with hue represented as an angle around the color wheel: 0° for red, 120° for green, and 240° for blue. Saturation ranges from grey to a fully-saturated color, and value ranges from black to, er, the color. The azure above is (210°, ⅓, ¾) in HSV — 210° is halfway between 180° (cyan) and 240° (blue), ⅓ is the saturation measurement mentioned before, and ¾ is the largest channel.

It’s that hand-waved value bit that gives me trouble. I don’t really know how to intuitively explain what value is, which makes it hard to modify value to make the changes I want. I feel like I should have a better grasp of this after a year and a half of drawing, but alas.

I prefer HSL, which uses hue, saturation, and lightness. Lightness ranges from black to white, with the unperturbed color in the middle. Here’s lightness versus value for the azure color. (Its lightness is ⅝, the average of the smallest and largest channels.)

comparison of lightness and value for the azure color

The lightness just makes more sense to me. I can understand shifting a color towards white or black, and the color in the middle of that bar feels related to the azure I started with. Value looks almost arbitrary; I don’t know where the color at the far end comes from, and it just doesn’t seem to have anything to do with the original azure.

I’d hoped Wikipedia could clarify this for me. It tells me value is the same thing as brightness, but the mathematical definition on that page matches the definition of intensity from the little-used HSI model. I looked up lightness instead, and the first sentence says it’s also known as value. So lightness is value is brightness is intensity, but also they’re all completely different.

Wikipedia also says that HSV is sometimes known as HSB (where the “B” is for “brightness”), but I swear I’ve only ever seen HSB used as a synonym for HSL. I don’t know anything any more.

Oh, and in case you weren’t confused enough, the definition of “saturation” is different in HSV and HSL. Good luck!

Wikipedia does have some very nice illustrations of HSV and HSL, though, including depictions of them as a cone and double cone.

(Incidentally, you can use HSL directly in CSS now — there are hsl() and hsla() CSS3 functions which evaluate as colors. Combining these with Sass’s scale-color() function makes it fairly easy to come up with decent colors by hand, without having to go back and forth with an image editor. And I can even sort of read them later!)

An annoying problem with all of these models is that the idea of “lightness” is never quite consistent. Even in HSL, a yellow will appear much brighter than a blue with the same saturation and lightness. You may even have noticed in the RGB split diagram that I used dark red and green text, but light blue — the pure blue is so dark that a darker blue on top is hard to read! Yet all three colors have the same lightness in HSL, and the same value in HSV.

Clearly neither of these definitions of lightness or brightness or whatever is really working. There’s a thing called luminance, which is a weighted sum of the red, green, and blue channels that puts green as a whopping ten times brighter than blue. It tends to reflect how bright colors actually appear.

Unfortunately, luminance and related values are only used in fairly obscure color models, like YUV and Lab. I don’t mean “obscure” in the sense that nobody uses them, but rather that they’re very specialized and not often seen outside their particular niches: YUV is very common in video encoding, and Lab is useful for serious photo editing.

Lab is pretty interesting, since it’s intended to resemble how human vision works. It’s designed around the opponent process theory, which states that humans see color in three pairs of opposites: black/white, red/green, and yellow/blue. The idea is that we perceive color as somewhere along these axes, so a redder color necessarily appears less green — put another way, while it’s possible to see “yellowish green”, there’s no such thing as a “yellowish blue”.

(I wonder if that explains our affection for orange: we effectively perceive yellow as a fourth distinct primary color.)

Lab runs with this idea, making its three channels be lightness (but not the HSL lightness!), a (green to red), and b (blue to yellow). The neutral points for a and b are at zero, with green/blue extending in the negative direction and red/yellow extending in the positive direction.

Lab can express a whole bunch of colors beyond RGB, meaning they can’t be shown on a monitor, or even represented in most image formats. And you now have four primary colors in opposing pairs. That all makes it pretty weird, and I’ve actually never used it myself, but I vaguely aspire to do so someday.

I think those are all of the major ones. There’s also XYZ, which I think is some kind of master color model. Of course there’s CMYK, which is used for printing, but it’s effectively just the inverse of RGB.

With that out of the way, now we can get to the hard part!

Colorspaces

I called RGB a color model: a way to break colors into component parts.

Unfortunately, RGB alone can’t actually describe a color. You can tell me you have a color (0%, 50%, 100%), but what does that mean? 100% of what? What is “the most blue”? More importantly, how do you build a monitor that can display “the most blue” the same way as other monitors? Without some kind of absolute reference point, this is meaningless.

A color space is a color model plus enough information to map the model to absolute real-world colors. There are a lot of these. I’m looking at Krita’s list of built-in colorspaces and there are at least a hundred, most of them RGB.

I admit I’m bad at colorspaces and have basically done my best to not ever have to think about them, because they’re a big tangled mess and hard to reason about.

For example! The effective default RGB colorspace that almost everything will assume you’re using by default is sRGB, specifically designed to be this kind of global default. Okay, great.

Now, sRGB has gamma built in. Gamma correction means slapping an exponent on color values to skew them towards or away from black. The color is assumed to be in the range 0–1, so any positive power will produce output from 0–1 as well. An exponent greater than 1 will skew towards black (because you’re multiplying a number less than 1 by itself), whereas an exponent less than 1 will skew away from black.

What this means is that halfway between black and white in sRGB isn’t (50%, 50%, 50%), but around (73%, 73%, 73%). Here’s a great example, borrowed from this post (with numbers out of 255):

alternating black and white lines alongside gray squares of 128 and 187

Which one looks more like the alternating bands of black and white lines? Surely the one you pick is the color that’s actually halfway between black and white.

And yet, in most software that displays or edits images, interpolating white and black will give you a 50% gray — much darker than the original looked. A quick test is to scale that image down by half and see whether the result looks closer to the top square or the bottom square. (Firefox, Chrome, and GIMP get it wrong; Krita gets it right.)

The right thing to do here is convert an image to a linear colorspace before modifying it, then convert it back for display. In a linear colorspace, halfway between white and black is still 50%, but it looks like the 73% grey. This is great fun: it involves a piecewise function and an exponent of 2.4.

It’s really difficult to reason about this, for much the same reason that it’s hard to grasp text encoding problems in languages with only one string type. Ultimately you still have an RGB triplet at every stage, and it’s very easy to lose track of what kind of RGB that is. Then there’s the fact that most images don’t specify a colorspace in the first place so you can’t be entirely sure whether it’s sRGB, linear sRGB, or something entirely; monitors can have their own color profiles; you may or may not be using a program that respects an embedded color profile; and so on. How can you ever tell what you’re actually looking at and whether it’s correct? I can barely keep track of what I mean by “50% grey”.

And then… what about transparency? Should a 50% transparent white atop solid black look like 50% grey, or 73% grey? Krita seems to leave it to the colorspace: sRGB gives the former, but linear sRGB gives the latter. Does this mean I should paint in a linear colorspace? I don’t know! (Maybe I’ll give it a try and see what happens.)

Something I genuinely can’t answer is what effect this has on HSV and HSL, which are defined in terms of RGB. Is there such a thing as linear HSL? Does anyone ever talk about this? Would it make lightness more sensible?

There is a good reason for this, at least: the human eye is better at distinguishing dark colors than light ones. I was surprised to learn that, but of course, it’s been hidden from me by sRGB, which is deliberately skewed to dedicate more space to darker colors. In a linear colorspace, a gradient from white to black would have a lot of indistinguishable light colors, but appear to have severe banding among the darks.

several different black to white gradients

All three of these are regular black-to-white gradients drawn in 8-bit color (i.e., channels range from 0 to 255). The top one is the naïve result if you draw such a gradient in sRGB: the midpoint is the too-dark 50% grey. The middle one is that same gradient, but drawn in a linear colorspace. Obviously, a lot of dark colors are “missing”, in the sense that we could see them but there’s no way to express them in linear color. The bottom gradient makes this more clear: it’s a gradient of all the greys expressible in linear sRGB.

This is the first time I’ve ever delved so deeply into exactly how sRGB works, and I admit it’s kind of blowing my mind a bit. Straightforward linear color is so much lighter, and this huge bias gives us a lot more to work with. Also, 73% being the midpoint certainly explains a few things about my problems with understanding brightness of colors.

There are other RGB colorspaces, of course, and I suppose they all make for an equivalent CMYK colorspace. YUV and Lab are families of colorspaces, though I think most people talking about Lab specifically mean CIELAB (or “L*a*b*”), and there aren’t really any competitors. HSL and HSV are defined in terms of RGB, and image data is rarely stored directly as either, so there aren’t really HSL or HSV colorspaces.

I think that exhausts all the things I know.

Real world color is also a lie

Just in case you thought these problems were somehow unique to computers. Surprise! Modelling color is hard because color is hard.

I’m sure you’ve seen the checker shadow illusion, possibly one of the most effective optical illusions, where the presence of a shadow makes a gray square look radically different than a nearby square of the same color.

Our eyes are very good at stripping away ambient light effects to tell what color something “really” is. Have you ever been outside in bright summer weather for a while, then come inside and everything is starkly blue? Lingering compensation for the yellow sunlight shifting everything to be slightly yellow; the opposite of yellow is blue.

Or, here, I like this. I’m sure there are more drastic examples floating around, but this is the best I could come up with. Here are some Pikachu I found via GIS.

photo of Pikachu plushes on a shelf

My question for you is: what color is Pikachu?

Would you believe… orange?

photo of Pikachu plushes on a shelf, overlaid with color swatches; the Pikachu in the background are orange

In each box, the bottom color is what I color-dropped, and the top color is the same hue with 100% saturation and 50% lightness. It’s the same spot, on the same plush, right next to each other — but the one in the background is orange, not yellow. At best, it’s brown.

What we see as “yellow in shadow” and interpret to be “yellow, but darker” turns out to be another color entirely. (The grey whistles are, likewise, slightly blue.)

Did you know that mirrors are green? You can see it in a mirror tunnel: the image gets slightly greener as it goes through the mirror over and over.

Distant mountains and other objects, of course, look bluer.

This all makes painting rather complicated, since it’s not actually about painting things the color that they “are”, but painting them in such a way that a human viewer will interpret them appropriately.

I, er, don’t know enough to really get very deep here. I really should, seeing as I keep trying to paint things, but I don’t have a great handle on it yet. I’ll have to defer to Mel’s color tutorial. (warning: big)

Blending modes

You know, those things in Photoshop.

I’ve always found these remarkably unintuitive. Most of them have names that don’t remotely describe what they do, the math doesn’t necessarily translate to useful understanding, and they’re incredibly poorly-documented. So I went hunting for some precise definitions, even if I had to read GIMP’s or Krita’s source code.

In the following, A is a starting image, and B is something being drawn on top with the given blending mode. (In the case of layers, B is the layer with the mode, and A is everything underneath.) Generally, the same operation is done on each of the RGB channels independently. Everything is scaled to 0–1, and results are generally clamped to that range.

I believe all of these treat layer alpha the same way: linear interpolation between A and the combination of A and B. If B has alpha t, and the blending mode is a function f, then the result is t × f(A, B) + (1 - t) × A.

If A and B themselves have alpha, the result is a little more complicated, and probably not that interesting. It tends to work how you’d expect. (If you’re really curious, look at the definition of BLEND() in GIMP’s developer docs.)

  • Normal: B. No blending is done; new pixels replace old pixels.

  • Multiply: A × B. As the name suggests, the channels are multiplied together. This is very common in digital painting for slapping on a basic shadow or tinting a whole image.

    I think the name has always thrown me off just a bit because “Multiply” sounds like it should make things bigger and thus brighter — but because we’re dealing with values from 0 to 1, Multiply can only ever make colors darker.

    Multiplying with black produces black. Multiplying with white leaves the other color unchanged. Multiplying with a gray is equivalent to blending with black. Multiplying a color with itself squares the color, which is similar to applying gamma correction.

    Multiply is commutative — if you swap A and B, you get the same result.

  • Screen: 1 - (1 - A)(1 - B). This is sort of an inverse of Multiply; it multiplies darkness rather than lightness. It’s defined as inverting both colors, multiplying, and inverting the result. Accordingly, Screen can only make colors lighter, and is also commutative. All the properties of Multiply apply to Screen, just inverted.

  • Hard Light: Equivalent to Multiply if B is dark (i.e., less than 0.5), or Screen if B is light. There’s an additional factor of 2 included to compensate for how the range of B is split in half: Hard Light with B = 0.4 is equivalent to Multiply with B = 0.8, since 0.4 is 0.8 of the way to 0.5. Right.

    This seems like a possibly useful way to apply basic highlights and shadows with a single layer? I may give it a try.

    The math is commutative, but since B is checked and A is not, Hard Light is itself not commutative.

  • Soft Light: Like Hard Light, but softer. No, really. There are several different versions of this, and they’re all a bit of a mess, not very helpful for understanding what’s going on.

    If you graphed the effect various values of B had on a color, you’d have a straight line from 0 up to 1 (at B = 0.5), and then it would abruptly change to a straight line back down to 0. Soft Light just seeks to get rid of that crease. Here’s Hard Light compared with GIMP’s Soft Light, where A is a black to white gradient from bottom to top, and B is a black to white gradient from left to right.

    graphs of combinations of all grays with Hard Light versus Soft Light

    You can clearly see the crease in the middle of Hard Light, where B = 0.5 and it transitions from Multiply to Screen.

  • Overlay: Equivalent to either Hard Light or Soft Light, depending on who you ask. In GIMP, it’s Soft Light; in Krita, it’s Hard Light except the check is done on A rather than B. Given the ambiguity, I think I’d rather just stick with Hard Light or Soft Light explicitly.

  • Difference: abs(A - B). Does what it says on the tin. I don’t know why you would use this? Difference with black causes no change; Difference with white inverts the colors. Commutative.

  • Addition and Subtract: A + B and A - B. I didn’t think much of these until I discovered that Krita has a built-in brush that uses Addition mode. It’s essentially just a soft spraypaint brush, but because it uses Addition, painting over the same area with a dark color will gradually turn the center white, while the fainter edges remain dark. The result is a fiery glow effect, which is pretty cool. I used it manually as a layer mode for a similar effect, to make a field of sparkles. I don’t know if there are more general applications.

    Addition is commutative, of course, but Subtract is not.

  • Divide: A ÷ B. Apparently this is the same as changing the white point to 1 - B. Accordingly, the result will blow out towards white very quickly as B gets darker.

  • Dodge and Burn: A ÷ (1 - B) and 1 - (1 - A) ÷ B. Inverses in the same way as Multiply and Screen. Similar to Divide, but with B inverted — so Dodge changes the white point to B, with similar caveats as Divide. I’ve never seen either of these effects not look horrendously gaudy, but I think photographers manage to use them, somehow.

  • Darken Only and Lighten Only: min(A, B) and max(A, B). Commutative.

  • Linear Light: (2 × A + B) - 1. I think this is the same as Sai’s “Lumi and Shade” mode, which is very popular, at least in this house. It works very well for simple lighting effects, and shares the Soft/Hard Light property that darker colors darken and lighter colors lighten, but I don’t have a great grasp of it yet and don’t know quite how to explain what it does. So I made another graph:

    graph of Linear Light, with a diagonal band of shading going from upper left to bottom right

    Super weird! Half the graph is solid black or white; you have to stay in that sweet zone in the middle to get reasonable results.

    This is actually a combination of two other modes, Linear Dodge and Linear Burn, combined in much the same way as Hard Light. I’ve never encountered them used on their own, though.

  • Hue, Saturation, Value: Work like you might expect: converts A to HSV and replaces either its hue, saturation, or value with Bs.

  • Color: Uses HSL, unlike the above three. Combines Bs hue and saturation with As lightness.

  • Grain Extract and Grain Merge: A - B + 0.5 and A + B - 0.5. These are clearly related to film grain, somehow, but their exact use eludes me.

    I did find this example post where someone combines a photo with a blurred copy using Grain Extract and Grain Merge. Grain Extract picked out areas of sharp contrast, and Grain Merge emphasized them, which seems relevant enough to film grain. I might give these a try sometime.

Those are all the modes in GIMP (except Dissolve, which isn’t a real blend mode; also, GIMP doesn’t have Linear Light). Photoshop has a handful more. Krita has a preposterous number of other modes, no, really, it is absolutely ridiculous, you cannot even imagine.

I may be out of things

There’s plenty more to say about color, both technically and design-wise — contrast and harmony, color blindness, relativity, dithering, etc. I don’t know if I can say any of it with any particular confidence, though, so perhaps it’s best I stop here.

I hope some of this was instructive, or at least interesting!

How SmartNews Built a Lambda Architecture on AWS to Analyze Customer Behavior and Recommend Content

Post Syndicated from SmartNews original https://blogs.aws.amazon.com/bigdata/post/Tx2V1BSKGITCMTU/How-SmartNews-Built-a-Lambda-Architecture-on-AWS-to-Analyze-Customer-Behavior-an

This is a guest post by Takumi Sakamoto, a software engineer at SmartNews . SmartNews in their own words: "SmartNews is a machine learning-based news discovery app that delivers the very best stories on the Web for more than 18 million users worldwide."

Data processing is one of the key technologies for SmartNews. Every team’s workload involves data processing for various purposes. The news team at SmartNews uses data as input to their machine learning algorithm for delivering the very best stories on the Web. The product team relies on data to run various A/B tests, to learn about how our customers consume news articles, and to make product decisions.

To meet the goals of both teams, we built a sustainable data platform based on the lambda architecture, which is a data-processing framework that handles a massive amount of data and integrates batch and real-time processing within a single framework.

Thanks to AWS services and OSS technologies, our data platform is highly scalable and reliable, and is flexible enough to satisfy various requirements with minimum cost and effort.

Our current system generates tens of GBs of data from multiple data sources, and runs daily aggregation queries or machine learning algorithms on datasets with hundreds of GBs. Some outputs by machine learning algorithms are joined on data streams for gathering user feedback in near real-time (e.g. the last 5 minutes). It lets us adapt our product for users with minimum latency. In this post, I’ll show you how we built a SmartNews data platform on AWS.

The image below depicts the platform. Please scroll to see the full architecture.

Design principles

Before I dive into how we built our data platform, it’s important to know the design principles behind the architecture.

When we started to discuss the data platform, most data was stored in a document database. Although it was a good at product launch, it became painful with growth. For data platform maintainers, it was very expensive to store and serve data at scale. At that time, our system generated more than 10 GB of user activity records every day and processing time increased linearly. For data platform users, it was hard to try something new for data processing because of the database’s insufficient scalability and limited integration with the big data ecosystem. Obviously, it wasn’t not sustainable for both.

To make our data platform sustainable, we decided to completely separate the compute and storage layers. We adopted Amazon S3  for file storage and Amazon Kinesis Streams for stream storage. Both services replicate data into multiple Availability Zones and keep it available without high operation costs. We don’t have to pay much attention to the storage layer and we can focus on the computation layer that transforms raw data to a valuable output.

In addition, Amazon S3 and Amazon Kinesis Streams let us run multiple compute layers without complex negotiations. After data is stored, everyone can consume it in their own way. For example, if a team wants to try a new version of Spark, they can launch a new cluster and start to evaluate it immediately. That means every engineer in SmartNews can craft any solutions using whatever tools they feel are best suited to the task.

Input data

The first step is dispatching raw data to both the batch layer and the speed layer for processing. There are two types of data sources at SmartNews:

  • Groups of user activity logs generated from our mobile app
  • Various tables on Amazon RDS

User activity logs include more than 60 types of activities to understand user behavior such as which news articles are read. After we receive logs from the mobile app, all logs are passed to Fluentd, an OSS log collector, and forwarded to Amazon S3 and Amazon Kinesis Streams. If you are not familiar with Fluentd, see Store Apache Logs into Amazon S3 and Collect Log Files into Kinesis Stream in Real-Time to understand how Fluentd works.

Our recommended practice is adding the flush_at_shutdown parameter. If set to true, Fluentd waits for the buffer to flush at shutdown. Because our instances are scaled automatically, it’s important to store log files on Amazon S3 before terminating instances.

In addition, monitoring Fluentd status is important so that you know when bad things happen. We use Datadog and some Fluentd plugins. Because the Fluent-plugin-flowcounter counts incoming messages and bytes per second, we post these metrics to Dogstatsd via Fluent-plugin-dogstatsd. An example configuration is available in a GitHub Gist post.

After metrics are sent to Datadog, we can visualize aggregated metrics across any level that we choose. The following graph aggregates the number of records per data source.

Also, Datadog notifies us when things go wrong. The alerts in the figure below let us know that there have been no incoming records on an instance for the last 1 hour. We also monitor Fluentd’s buffer status by using Datadog’s Fluentd integration.

Various tables on Amazon RDS are dumped by Embulk, an OSS bulk data loader, and exported to Amazon S3. Its pluggable architecture lets us mask some fields that we don’t want to export to the data platform.

Batch layer

This layer is responsible for various ETL tasks such as transforming text files into columnar files (RCFile or ORCFile) for following consumers, generating machine learning features, and pre-computing the batch views.

We run multiple Amazon EMR clusters for each task. Amazon EMR lets us run multiple heterogeneous Hive and Spark clusters with a few clicks. Because all data is stored on Amazon S3, we can use Spot Instances for most tasks and adjust cluster capacity dynamically. It significantly reduces the cost of running our data processing system.

In addition to data processing itself, task management is very important for this layer. Although a cron scheduler is a good first solution, it becomes hard to maintain after increasing the number of ETL tasks.

When using a cron scheduler, a developer needs to write additional code to handle dependencies such as waiting until the previous task is done, or failure handling such as retrying failed tasks or specifying timeouts for long-running tasks. We use Airflow, an open-sourced task scheduler, to manage our ETL tasks. We can define ETL tasks and dependencies with Python scripts.

Because every task is described as code, we can introduce pull request–based review flows for modifying ETL tasks.

Serving layer

The serving layer indexes and exposes the views so that they can be queried.

We use Presto for this layer. Presto is an open source, distributed SQL query engine for running interactive queries against various data sources such as Hive tables on S3, MySQL on Amazon RDS, Amazon Redshift, and Amazon Kinesis Streams. Presto converts a SQL query into a series of task stages and processes each stage in parallel. Because all processing occurs in memory to reduce disk I/O, end-to-end latency is very low: ~30 seconds to scan billions of records.

With Presto, we can analyze the data from various perspectives. The following simplified query shows the result of A/B testing by user clusters.

```sql
-- Suppose that this table exists
DESC hive.default.user_activities;
user_id bigint
action  varchar
abtest  array>
url     varchar

-- Summarize page view per A/B Test identifier
--   for comparing two algorithms v1 & v2
SELECT
  dt,
  t['behaviorId'],
  count(*) as pv
FROM hive.default.user_activities CROSS JOIN UNNEST(abtest) AS t (t)
WHERE dt like '2016-01-%' AND action = 'viewArticle'
  AND t['definitionId'] = 163
GROUP BY dt, t['behaviorId'] ORDER BY dt
;

-- Output:
-- 2015-12-01 | algorithm_v1 | 40000
-- 2015-12-01 | algorithm_v2 | 62000
```

Speed layer

Like the batch layer, the speed layer computes views from the data it receives. The difference is latency. Sometimes, the low latency adds variable outputs for the product.

For example, we need to detect current trending news by interest-based clusters to deliver the best stories for each user. For this purpose, we run Spark Streaming.

User feedback in Amazon Kinesis Streams is joined on the interest-based user cluster data calculated in offline machine learning, and then the output metrics for each news article. These metrics are used to rank news articles in a later phase. What Spark Streaming does in the above figure looks something like the following:

```scala
def main(args: Array[String]): Unit = {
  // ..... (prepare SparkContext)

  // Load user clusters that are generated by offline machine learning
  if (needToUpdate) {
    userClusterRDD: RDD[(Long, Int)] = sqlContext.sql(
      "SELECT user_id, cluster_id FROM user_cluster"
    ).map( row => {
      (row.getLong(0), row.getInt(1))
    })
  }

  // Fetch and parse JSON records in Amazon Kinesis Streams
  val userPageviewStream: DStream[(Long, String)] = ssc.union(kinesisStreams)
    .map( byteArray => {
      val json = new String(bytesArray)
      val userActivity = parse(json)
      (userActivity.user_id, userActivity.url)
    })

  // Join stream records with pre-calculated user clusters
  val clusterPageviewStream: DStream[(Int, String)] = userPageviewStream
    .transform( userPageviewStreamRDD => {
      userPageviewStreamRDD.join(userClusterRDD).map( data => {
        val (userId, (url, clusterId) ) = data
        (clusterId, url)
      })
    })

  // ..... (aggregates pageview by clusters and store to DynamoDB)
}
```

Because every EMR cluster uses the shared Hive metastore, Spark Streaming applications can load all tables created on the batch layer by using SQLContext. After the tables are loaded as an RDD (Resilient Distributed Dataset), we can join it to a Kinesis stream.

Spark Streaming is a great tool for empowering your machine learning–based application, but it can be overkill for simpler use cases such as monitoring. For these cases, we use AWS Lambda and PipelineDB (not covered here in detail).

Output data

Chartio is a commercial business intelligence (BI) service. Chartio enables every member (including non-engineers!) in the company to create, edit, and refine beautiful dashboards with minimal effort. This has saved us hours each week so we can spend our time improving our product, not reporting on it. Because Chartio supports various data sources such as Amazon RDS (MySQL, PostgreSQL), Presto, PipelineDB, Amazon Redshift, and Amazon Elasticsearch, you can start using it easily.

Summary

In this post, I’ve shown you how SmartNews uses AWS services and OSS technologies to create a data platform that is highly scalable and reliable, and is flexible enough to satisfy various requirements with minimum cost and effort. If you’re interested in our data platform, check out these two slides in our SlideShare: Building a Sustainable Data Platform on AWS  and Stream Processing in SmartNews.

If you have questions or suggestions, please leave a comment below.

Takumi Sakamoto is not an Amazon employee and does not represent Amazon.

———————————

Related

Building a Near Real-Time Discovery Platform with AWS

Want to learn more about Big Data or Streaming Data? Check out our Big Data and Streaming data educational pages.

 

 

The Scratch Olympics

Post Syndicated from Rik Cross original https://www.raspberrypi.org/blog/the-scratch-olympics-2/

Since the Raspberry Pi Foundation merged with Code Club, the newly enlarged Education Team has been working hard to put the power of digital making into the hands of people all over the world.

Among the work we’ve been doing, we’ve created a set of Scratch projects to celebrate the 2016 Olympic Games in Rio.

The initial inspiration for these projects were the games that we used to love as children, commonly referred to as ‘button mashers’. There was little skill required in these games: all you needed was the ability to smash two keys as fast as humanly possible. Examples of this genre include such classics as Geoff Capes Strongman and Daley Thompson’s Decathlon.

With the 2016 Olympics fast approaching, we began to reminisce about these old sports-themed games, and realised what fun today’s kids are missing out on. With that, the Scratch Olympics were born!

There are two resources available on the resources section of this site, the first of which is the Olympic Weightlifter project. With graphics conceived by Sam Alder and produced by Alex Carter, the project helps you create of a button masher masterpiece, producing your very own 1980s-style keyboard-killer that’s guaranteed to strike fear into the hearts of parents all over the world. Physical buttons are an optional extra for the faint of heart.

A pixellated weightlifter blows steam from his ears as he lifts a barbell above his head in an animated gif

The second game in the series is Olympics Hurdles, where you will make a hurdling game which requires the player to hit the keyboard rapidly to make the hurdler run, and use expert timing to make them jump over the hurdles at the right time.

Pixellated athletes approach, leap and clear a hurdle on an athletics track

You’ll also find three new projects over on the Code Club projects website. The first of these is Synchronised Swimming, where you’ll learn how to code a synchronised swimming routine for Scratch the cat, by using loops and creating clones.

Six copies of the Scratch cat against an aqua blue background form a hexagonal synchronised swimming formation

There’s also an Archery project, where you must overcome an archer’s shaky arm to shoot arrows as close to the bullseye as you can, and Sprint!, which uses a 3D perspective to make the player feel as though they’re running towards a finish line. This project can even be coded to work with a homemade running mat! These two projects are only available to registered Code Clubs, and require an ID and PIN to access.

An archery target overlaid with a crosshair
A straight running track converges towards a flat horizon, with a "FINISH" ribbon and "TIME" and "DISTANCE" counters

Creating new Olympics projects is just one of the ways in which the Raspberry Pi Foundation and Code Club are working together to create awesome new resources, and there’s much more to come!

The post The Scratch Olympics appeared first on Raspberry Pi.

Top 10 Most Pirated Movies of The Week – 07/11/16

Post Syndicated from Ernesto original https://torrentfreak.com/top-10-pirated-movies-week-071116/

warcraftsThis week we have three newcomers in our chart.

Warcraft, which came out as a subbed HDrip this week, is the most downloaded movie.

The data for our weekly download chart is estimated by TorrentFreak, and is for informational and educational reference only. All the movies in the list are Web-DL/Webrip/HDRip/BDrip/DVDrip unless stated otherwise.

RSS feed for the weekly movie download chart.

Ranking (last week) Movie IMDb Rating / Trailer
torrentfreak.com
1 (4) Warcraft (subbed HDRip) 7.7 / trailer
2 (1) Batman v Superman: Dawn of Justice 7.0 / trailer
3 (6) Me Before You (Subbed Webrip) 7.7 / trailer
4 (2) Independence Day: Resurgence (HDTS) 5.6 / trailer
5 (…) The Legend of Tarzan (HDTS) 6.9 / trailer
6 (…) The Nice Guys (Subbed HDRip) 7.8 / trailer
7 (3) Finding Dory (HDTS) 8.1 / trailer
8 (5) Central Intelligence 6.9 / trailer
9 (7) X-Men: Apocalypse (HDCam/TC) 7.7 / trailer
10 (…) Barbershop: The Next Cut 6.0 / trailer

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE

Post Syndicated from Kiuk Chung original https://blogs.aws.amazon.com/bigdata/post/TxGEL8IJ0CAXTK/Generating-Recommendations-at-Amazon-Scale-with-Apache-Spark-and-Amazon-DSSTNE

Kiuk Chung is a Software Development Engineer with the Amazon Personalization team

In Personalization at Amazon, we use neural networks to generate personalized product recommendations for our customers. Amazon’s product catalog is huge compared to the number of products that a customer has purchased, making our datasets extremely sparse. And with hundreds of millions of customers and products, our neural network models often have to be distributed across multiple GPUs to meet space and time constraints.

For this reason, we have created and open-sourced DSSTNE, the Deep Scalable Sparse Tensor Neural Engine, which runs entirely on the GPU. We use DSSTNE to train neural networks and generate recommendations that power various personalized experiences on the retail website and Amazon devices.

On the other hand, data for training and prediction tasks is processed and generated from Apache Spark on a CPU cluster. This presents a fundamental problem: data processing happens on CPU while training and prediction happen on GPU.

Data generation and analysis are often overlooked in favor of modeling. But for fluid and rapid model prototyping, we want to analyze data, and train and evaluate our models inside of a single tool, especially since at least as much time is spent preparing data as designing models. Moreover, while DSSTNE is optimized for sparseness and scalability, it can help to use other libraries, such as Keras, which includes features currently missing in DSSTNE such as recurrent network architectures.

Managing a hybrid cluster of both CPU and GPU instances poses challenges because cluster managers such as Yarn/Mesos do not natively support GPUs. And even if they did, open source deep learning libraries would have to be re-written to abide by the cluster manager API. The following two remedies do not work:

  • Run deep learning on a CPU cluster
  • Run data processing on a GPU cluster

At Amazon’s scale, the first approach takes too long to train on the billions of customer/product interactions in our data sets. The second approach is a poor choice because it requires data analytics code to be written in GPU programming languages such as CUDA.

In this post, I discuss an alternate solution; namely, running separate CPU and GPU clusters, and driving the end-to-end modeling process from Apache Spark.

Architecture overview

We wanted an architecture where tasks could be run on both CPU and GPU from a single tool, and deep learning libraries could be plugged in without the need to re-write the algorithms in a different language or API. Keeping Spark as the main entry point, we thought of the training and prediction of neural networks as coarse grained tasks that could be delegated to a separate cluster with specialized GPU hardware. This is different from a more traditional approach where a lower level task such as matrix multiplication is exposed as a task primitive.

The Spark (CPU) cluster runs on Amazon EMR and the GPU instances are managed by Amazon ECS. In other words, we treat ECS as our GPU master. ECS runs tasks on Docker containers that reside in Amazon ECR, hence a deep learning library can easily be plugged in by exporting its Docker image to ECR.

The following diagram shows the high-level architecture.

In this architecture, data analytics and processing (i.e., CPU jobs) are executed through vanilla Spark, where the job is broken up into tasks and runs on a Spark executor. The GPU job above refers to the training or prediction of neural networks. The partitioning of the dataset for these jobs is done in Spark, but the execution of these jobs is delegated to ECS and is run inside Docker containers on the GPU slaves. Data transfer between the two clusters is done through Amazon S3.

When a GPU job is run, it is broken down into one or more GPU tasks (see later sections for details). Like Spark, a GPU task is assigned for each partition of the data RDD. The Spark executors save their respective partitions to S3, then call ECS to run a task definition with container overrides that specify the S3 location of its input partitions and the command to execute on the specified Docker image. Then they long-poll ECS to monitor the status of the GPU tasks.

On the GPU node, each task does the following:

  1. Downloads its data partition from S3.
  2. Executes the specified command.
  3. Uploads the output of the command back to S3.

Because the training and predictions run in a Docker container, all you need to do to support a new library is create a Docker image, upload it to ECR, and create an ECS task definition that maps to the appropriate image.

In the next section, I dive into the details of what type of GPU tasks we run with DSSTNE.

Deep learning with DSSTNE

Our neural network models often have hundreds of thousands of nodes in the input and output layers (i.e. wide networks). At this scale, we can easily reach trillions of weights for a fully-connected network, even if it is shallow. Therefore, our models often do not fit in the memory of a single GPU.

As mentioned above, we built DSSTNE to support large, sparse layers. One of the ways it does so is by supporting model parallel training. In model parallel training, the model is distributed across N GPUs – the dataset (e.g., RDD) is replicated to all GPU nodes. Contrast this with data parallel training where each GPU only trains on a subset of the data, then shares the weights with each other using synchronization techniques such as a parameter server.

After the model is trained, we generate predictions (e.g., recommendations) for each customer. This is an embarrassingly parallel task as each customer’s recommendations can be generated independently. Thus, we perform data parallel predictions, where each GPU handles the prediction of a batch of customers. This allows us to scale linearly simply by adding more GPUs. That is, doubling the number of partitions (GPUs) halves the amount of time to generate predictions. Thanks to Auto Scaling, we can scale our GPU cluster up and down based on our workload and SLA constraints.

The following diagram depicts model parallel training and data parallel predictions.

Now that you have seen the types of GPU workloads, I’ll show an example of how we orchestrated an end-to-end modeling iteration from Spark.

Orchestration with Apache Spark

As explained earlier, the deep learning algorithms run inside Docker containers on a separate GPU cluster managed by ECS, which gives us programmatic access to the execution engine on a remote cluster. The entry point for the users is Spark. Leveraging notebooks such as Zeppelin or Jupyter, users can interactively pull and analyze data, train neural networks, and generate predictions, without ever having to leave the notebook.

In the next two parts, I discuss how to train/predict a sample neural network as described in DSSTNE’s Github page from a Zeppelin notebook using the previously mentioned two-cluster setup.

Model parallel training

This section describes how to train a three-layer autoencoder using model parallel training across four GPUs on the MovieLens dataset . The data, model configurations, and model topology are defined in the Zeppelin notebook and propagated to the GPU slaves by using S3 as the filesystem. A list of commands are run on ECS to download data and configurations, kick off training, and upload the model artifacts to S3.

To delegate the GPU tasks to ECS, you must set up the ECS cluster and the task definition. The following steps summarize the ECS setup.

  1. Add AWS Java SDK for Amazon ECS in the Spark classpath.
  2. Ensure that the EC2 instance profile on the EMR cluster has permissions to call ECS.
  3. Create an ECS cluster with a GPU instance and install a DSSTNE compatible NVIDIA driver.
  4. Create an ECS task definition with a container that has the privileged flag set and points to DSSTNE’s Docker image with the AWS CLI.

On an EMR cluster with Spark and Zeppelin (Sandbox) installed, the %sh interpreter in Zeppelin is used to download the required files. The size and instance type of the EMR cluster depends on the size of your dataset. We recommend using two c3.2xlarge instances for the MovieLens dataset.

Switch to the spark interpreter to kick-start training. The runOnECS() method, runs the given list of commands on the Docker container specified by the provided task definition.  You can define this method on the Zeppelin notebook and implement it by instantiating an AmazonECSClient with InstanceProfileCredentialsProvider, and submitting a RunTaskRequest with the list of commands as container overrides.

The following graphic demonstrates model parallel training driven from a Zeppelin notebook.

Data parallel prediction

Using the network trained above, we run data parallel predictions and generate the top 100 recommendations for each customer. Each partition of the predict_input RDD is processed in parallel on separate GPUs. After generating the predictions, you can evaluate the precision of the model by comparing against a test dataset. If you are not satisfied with the performance of the model, then you can go back to training the network with different slices of data, model parameters, or network topology and iterate quickly inside the same notebook.

The following graphic demonstrates data parallel predictions and plotting the precision metrics.

Conclusion and future work

In this post, I have explained how Personalization uses Spark and DSSTNE to iterate on large deep learning problems quickly and generate recommendations for millions of customers daily.

Both Spark and DSSTNE allow you to scale to large amounts of data by exploiting parallelism wherever it exists. By using ECS to run mini-batch jobs on GPU hardware, you can circumvent the complexity of managing a heterogeneous cluster with both CPU and GPU nodes where multiple deep learning library installations coexist. Furthermore, deep learning libraries can be plugged into the platform with ease, giving scientists and engineers the freedom to choose the library most appropriate to their problems.

By using EMR to provision Spark clusters, you can easily scale the size and number of CPU clusters based on our workload. However, we currently share a single GPU cluster in ECS, which lacks the ability to queue tasks at this time.Consequently, a user may suffer from GPU starvation. To alleviate this, we have implemented a Mesos scheduler on top of ECS based on the ECS Mesos Scheduler Driver prototype.

Orchestrating the entire machine learning life-cycle from Spark allows you to stack Spark and deep learning libraries to build creative applications. For example, you can leverage Spark Streaming to feed streaming data into the GPU prediction tasks to update the recommendations of customers near real-time. Models supported by MLlib can be used as baselines for the neural networks, or can be joined together to create ensembles. You can take advantage of Spark’s parallel execution engine to run parallel hyper-parameter optimizations. And the list goes on….

The management and setup of this stack is simple, repeatable, and elastic when you run on AWS. We are excited to see the applications that you will build on these tools!

Please leave a comment below to tell us about your apps and the problems you can solve.

Passionate about deep learning and recommendations at scale? Check out Personalization’s careers page.

———————————

Related

Sharpen your Skill Set with Apache Spark on the AWS Big Data Blog

Want to learn more about Big Data or Streaming Data? Check out our Big Data and Streaming data educational pages.

How Una Got Her Stolen Laptop Back

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/how-una-found-her-stolen-laptop/

Lost Laptop World Map

Reading Peter’s post on getting your data ready for vacation travels, reminded me of a story we recently received from a Backblaze customer. Una’s laptop was stolen and then traveled the over multiple continents over the next year. Here’s Una’s story, in her own words, on how she got her laptop back. Enjoy.

Pulse Incident Number 10028192
(or: How Playing Computer Games Can Help You In Adulthood)

One day when I was eleven, my father arrived home with an object that looked like a briefcase made out of beige plastic. Upon lifting it, one realized it had the weight of, oh, around two elephants. It was an Ericsson ‘portable’ computer, one of the earliest prototypes of laptop. All my classmates had really cool and fashionable computer game consoles with amazing names like “Atari” and “Commodore”, beautifully vibrant colour displays, and joysticks. Our Ericsson had a display with two colours (orange and … dark orange), it used floppy discs that were actually floppy (remember those?), ran on DOS and had no hard drive (you had to load the operating system every single time you turned on the computer. Took around 10 minutes). I dearly loved this machine, however, and played each of the 6 games on it incessantly. One of these was “Where In The World Is Carmen Sandiego?” an educational game where a detective has to chase an archvillain around the world, using geographical and cultural references as clues to get to the next destination. Fast forward twenty years and…

It’s June 2013, I’m thirty years old, and I still love laptops. I live in Galway, Ireland; I’m a self-employed musician who works in a non-profit music school so the cash is tight, but I’ve splashed out on a Macbook Pro and I LOVE IT. I’m on a flight from Dublin to Dubai with a transfer in Turkey. I talk to the guy next to me, who has an Australian accent and mentions he’s going to Asia to research natural energy. A total hippy, I’m interested; we chat until the convo dwindles, I do some work on my laptop, and then I fall asleep.

At 11pm the plane lands in Turkey and we’re called off to transfer to a different flight. Groggy, I pick up my stuff and stumble down the stairs onto the tarmac. In the half-light beside the plane, in the queue for the bus to the terminal, I suddenly realize that I don’t have my laptop in my bag. Panicking, I immediately seek out the nearest staff member. “Please! I’ve left my laptop on the plane – I have to go back and get it!”

The guy says: “No. It’s not allowed. You must get on the bus, madam. The cabin crew will find it and put it in “Lost and Found” and send it to you.” I protest but I can tell he’s immovable. So I get on the bus, go into the terminal, get on another plane and fly to Dubai. The second I land I ring Turkish Air to confirm they’ve found my laptop. They haven’t. I pretty much stalk Turkish Air for the next two weeks to see if the laptop turns up, but to no avail. I travel back via the same airport (Ataturk International), and go around all three Lost and Found offices in the airport, but my laptop isn’t there amongst the hundreds of Kindles and iPads. I don’t understand.

As time drags on, the laptop doesn’t turn up. I report the theft in my local Garda station. The young Garda on duty is really lovely to me and gives me lots of empathy, but the fact that the laptop was stolen in airspace, in a foreign, non-EU country, does not bode well. I continue to stalk Turkish Airlines; they continue to stonewall me, so I get in touch with the Turkish Department for Consumer Affairs. I find a champion amongst them called Ece, who contacts Turkish Airlines and pleads on my behalf. Unfortunately they seem to have more stone walls in Turkey than there are in the entire of Co. Galway, and his pleas fall on deaf ears. Ece advises me I’ll have to bring Turkish Airlines to court to get any compensation, which I suspect will cost more time and money than the laptop is realistically worth. In a firstworld way, I’m devastated – this object was a massive financial outlay for me, a really valuable tool for my work. I try to appreciate the good things – Ece and the Garda Sharon have done their absolute best to help me, my pal Jerry has loaned me a laptop to tide me over the interim – and then I suck it up, say goodbye to the last of my savings, and buy a new computer.

I start installing the applications and files I need for my business. I subscribe to an online backup service, Backblaze, whereby every time I’m online my files are uploaded to the cloud. I’m logging in to Backblaze to recover all my files when I see a button I’ve never noticed before labelled “Locate My Computer”. I catch a breath. Not even daring to hope, I click on it… and it tells me that Backblaze keeps a record of my computer’s location every time it’s online, and can give me the IP address my laptop has been using to get online. The records show my laptop has been online since the theft!! Not only that, but Backblaze has continued to back up files, so I can see all files the thief has created on my computer. My laptop has last been online in, of all the places, Thailand. And when I look at the new files saved on my computer, I find Word documents about solar power. It all clicks. It was the plane passenger beside me who had stolen my laptop, and he is so clueless he’s continued to use it under my login, not realizing this makes him trackable every time he connects to the internet.

I keep the ‘Locate My Computer” function turned on, so I’m consistently monitoring the thief’s whereabouts, and start the chapter of my life titled “The Sleep Deprivation and The Phonebill”. I try ringing the police service in Thailand (GMT +7 hours) multiple times. To say this is ineffective is an understatement; the language barrier is insurmountable. I contact the Irish embassy in Bangkok – oh, wait, that doesn’t exist. I try a consulate, who is lovely but has very limited powers, and while waiting for them to get back to me I email two Malaysian buddies asking them if they know anyone who can help me navigate the language barrier. I’m just put in touch with this lovely pal-of-a-pal called Tupps who’s going to help me when… I check Backblaze and find out that my laptop had started going online in East Timor. Bye bye, Thailand.

I’m so wrecked trying to communicate with the Thai bureaucracy I decide to play the waiting game for a while. I suspect East Timor will be even more of an international diplomacy challenge, so let’s see if the thief is going to stay there for a while before I attempt a move, right? I check Backblaze around once a week for a month, but then the thief stops all activity – I’m worried. I think he’s realized I can track him and has stopped using my login, or has just thrown the laptop away. Reason kicks in, and I begin to talk myself into stopping my crazy international stalking project. But then, when I least expect it, I strike informational GOLD. In December, the thief checks in for a flight from Bali to Perth and saves his online check-in to the computer desktop. I get his name, address, phone number, and email address, plus flight number and flight time and date.

I have numerous fantasies about my next move. How about I ring up the police in Australia, they immediately believe my story and do my every bidding, and then the thief is met at Arrivals by the police, put into handcuffs and marched immediately to jail? Or maybe I should somehow use the media to tell the truth about this guy’s behaviour and give him a good dose of public humiliation? Should I try my own version of restorative justice, contact the thief directly and appeal to his better nature? Or, the most tempting of all, should I get my Australian-dwelling cousin to call on him and bash his face in? … This last option, to be honest, is the outcome I want the most, but Emmett’s actually on the other side of the Australian continent, so it’s a big ask, not to mention the ever-so-slightly scary consequences for both Emmett and myself if we’re convicted… ! (And, my conscience cries weakly from the depths, it’s just the teensiest bit immoral.) Christmas is nuts, and I’m just so torn and ignorant about course of action to take I … do nothing.

One morning in the grey light of early February I finally decide what to do. Although it’s the longest shot in the history of long shots, I will ring the Australian police force about a laptop belonging to a girl from the other side of the world, which was stolen in airspace, in yet another country in the world. I use Google to figure out the nearest Australian police station to the thief’s address. I set my alarm for 4am Irish time, I ring Rockhampton Station, Queensland, and explain the situation to a lovely lady called Danielle. Danielle is very kind and understanding but, unsurprisingly, doesn’t hold out much hope that they can do anything. I’m not Australian, the crime didn’t happen in Australia, there’s questions of jurisdiction, etc. etc. I follow up, out of sheer irrational compulsion rather than with the real hope of an answer, with an email 6 weeks later. There’s no response. I finally admit to myself the laptop is gone. Ever since he’s gone to Australia the thief has copped on and stopped using my login, anyway. I unsubscribe my stolen laptop from Backblaze and try to console myself with the thought that at least I did my best.

And then, completely out of the blue, on May 28th 2014, I get an email from a Senior Constable called Kain Brown. Kain tells me that he has executed a search warrant at a residence in Rockhampton and has my laptop!! He has found it!!! I am stunned. He quickly gets to brass tacks and explains my two options: I can press charges, but it’s extremely unlikely to result in a conviction, and even if it did, the thief would probably only be charged with a $200 fine – and in this situation, it could take years to get my laptop back. If I don’t press charges, the laptop will be kept for 3 months as unclaimed property, and then returned to me. It’s a no-brainer; I decide not to press charges. I wait, and wait, and three months later, on the 22nd September 2014, I get an email from Kain telling me that he can finally release the laptop to me.

Naively, I think my tale is at the “Happy Ever After” stage. I dance a jig around the kitchen table, and read my subsequent email from a “Property Officer” of Rockhampton Station, John Broszat. He has researched how to send the laptop back to me … and my jig is suddenly halted. My particular model of laptop has a lithium battery built into the casing which can only be removed by an expert, and it’s illegal to transport a lithium battery by air freight. So the only option for getting the laptop back, whole and functioning, is via “Sea Mail” – which takes three to four months to get to Ireland. This blows my mind. I can’t quite believe that in this day and age, we can send people to space, a media file across the world in an instant, but that transporting a physical object from one side of the globe to another still takes … a third of a year! It’s been almost a year and a half since my laptop was stolen. I shudder to think of what will happen on its final journey via Sea Mail – knowing my luck, the ship will probably be blown off course and it’ll arrive in the Bahamas.

Fortunately, John is empathetic, and willing to think outside the box. Do I know anyone who will be travelling from Australia to Ireland via plane who would take my laptop in their hand luggage? Well, there’s one tiny silver lining to the recession: half of Craughwell village has a child living in Australia. I ask around on Facebook and find out that my neighbour’s daughter is living in Australia and coming home for Christmas. John Broszat is wonderfully cooperative and mails my laptop to Maroubra Police Station for collection by the gorgeous Laura Gibbons. Laura collects it and brings it home in her flight hand luggage, and finally, FINALLY, on the 23rd of December 2014, 19 months after it’s been stolen, I get my hands on my precious laptop again.

I gingerly take the laptop out of the fashionable paper carrier bag in which Laura has transported it. I set the laptop on the table, and examine it. The casing is slightly more dented than it was, but except for that it’s in one piece. Hoping against hope, I open up the screen, press the ‘on’ button and… the lights flash and the computer turns on!!! The casing is dented, there’s a couple of insalubrious pictures on the hard drive I won’t mention, but it has been dragged from Turkey to Thailand to East Timor to Indonesia to Australia, and IT STILL WORKS. It even still has the original charger accompanying it. Still in shock that this machine is on, I begin to go through the hard drive. Of course, it’s radically different – the thief has deleted all my files, changed the display picture, downloaded his own files and applications. I’m curious: What sort of person steals other people’s laptops? How do they think, organize their lives, what’s going through their minds? I’ve seen most of the thief’s files before from stalking him via the Backblaze back-up service, and they’re not particularly interesting or informative about the guy on a personal level. But then I see a file I haven’t seen before, “ free ebook.pdf ”. I click on it, and it opens. I shake my head in disbelief. The one new file that the thief has downloaded onto my computer is the book “How To Win Friends And Influence People”.

A few weeks later, a new friend and I kiss for the first time. He’s a graphic designer from London. Five months later, he moves over to Ireland to be with me. We’re talking about what stuff he needs to bring when he’s moving and he says “I’m really worried; my desktop computer is huge. I mean, I have no idea how I’m going to bring it over.” Smiling, I say “I have a spare laptop that might suit you…”

[Editor: The moral of the story is make sure your data is backed up before you go on vacation.]

The post How Una Got Her Stolen Laptop Back appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Top 10 Most Pirated Movies of The Week – 07/04/16

Post Syndicated from Ernesto original https://torrentfreak.com/top-10-pirated-movies-week-070416/

batsupsThis week we have three newcomers in our chart.

Batman v Superman: Dawn of Justice is the most downloaded movie for the second week in a row.

The data for our weekly download chart is estimated by TorrentFreak, and is for informational and educational reference only. All the movies in the list are Web-DL/Webrip/HDRip/BDrip/DVDrip unless stated otherwise.

RSS feed for the weekly movie download chart.

Ranking (last week) Movie IMDb Rating / Trailer
torrentfreak.com
1 (1) Batman v Superman: Dawn of Justice 7.0 / trailer
2 (…) Independence Day: Resurgence (HDTS) 5.6 / trailer
3 (3) Finding Dory (HDTS) 8.1 / trailer
4 (2) Warcraft (TS/TC) 7.7 / trailer
5 (…) Central Intelligence 6.9 / trailer
6 (…) Me Before You (Subbed Webrip) trailer
7 (7) X-Men: Apocalypse (HDCam/TC) 7.7 / trailer
8 (9) Allegiant 5.9 / trailer
9 (4) The Huntsman: Winter’s War 6.2 / trailer
10 (5) Whiskey Tango Foxtrot 6.8 / trailer

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Month in Review: June 2016

Post Syndicated from Andy Werth original https://blogs.aws.amazon.com/bigdata/post/Tx2ZWGEI8MNGY51/Month-in-Review-June-2016

Lots to see on the Big Data Blog in June! Please take a look at the summaries below for something that catches your interest.

Use Sqoop to Transfer Data from Amazon EMR to Amazon RDS
Customers commonly process and transform vast amounts of data with EMR and then transfer and store summaries or aggregates of that data in relational databases such as MySQL or Oracle. In this post, learn how to transfer data using Apache Sqoop, a tool designed to transfer data between Hadoop and relational databases.

Analyze Realtime Data from Amazon Kinesis Streams Using Zeppelin and Spark Streaming
Streaming data is everywhere. This includes clickstream data, data from sensors, data emitted from billions of IoT devices, and more. Not surprisingly, data scientists want to analyze and explore these data streams in real time. This post shows you how you can use Spark Streaming to process data coming from Amazon Kinesis streams, build some graphs using Zeppelin, and then store the Zeppelin notebook in Amazon S3.

Processing Amazon DynamoDB Streams Using the Amazon Kinesis Client Library
This post demystifies the KCL by explaining some of its important configurable properties and estimate its resource consumption

Apache Tez Now Available with Amazon EMR
Amazon EMR has added Apache Tez version 0.8.3 as a supported application in release 4.7.0. Tez is an extensible framework for building batch and interactive data processing applications on top of Hadoop YARN. This post helps you get started.

Use Apache Oozie Workflows to Automate Apache Spark Jobs (and more!) on Amazon EMR
The AWS Big Data Blog has a large community of authors who are passionate about Apache Spark and who regularly publish content that helps customers use Spark to build real-world solutions. You’ll see content on a variety of topics, including deep-dives on Spark’s internals, building Spark Streaming applications, creating machine learning pipelines using MLlib, and ways to apply Spark to various real-world use cases.

JOIN Amazon Redshift AND Amazon RDS PostgreSQL WITH dblink
In this post, you’ll learn how easy it is to create a master key in KMS, encrypt data either client-side or server-side, upload it to S3, and have EMR seamlessly read and write that encrypted data to and from S3 using the master key that you created.

FROM THE ARCHIVE

Running R on AWS (July 2015) Learn how to install and run R, RStudio Server, and Shiny Server on AWS.

———————————————–

Want to learn more about Big Data or Streaming Data? Check out our Big Data and Streaming data educational pages.

Leave a comment below to let us know what big data topics you’d like to see next on the AWS Big Data Blog.

Ricky Gervais: Don’t Pirate My Film, I Stand to Make Millions

Post Syndicated from Ernesto original https://torrentfreak.com/ricky-gervais-psa-160701/

Anti-piracy PSAs come in all shapes and sizes, but nearly all of them fail to appeal to the public they’re intended for.

Today, as part of the Industry Trust’s “Moments Worth Paying For” campaign, British comedian Ricky Gervais gives it a shot with a special anti-piracy message of his own.

The PSA is for his upcoming movie “David Brent: Life on the Road,” in which he brings the iconic character from The Office back to life. The movie premieres later this summer and Gervais hopes that pirates will go to see it in the cinema, instead of heading to a nearby torrent site.

“We’re basically asking you not to pirate movies. The quality is bad, and it’s a lot of people’s livelihoods,” Gervais says.

Pretty classic language for a PSA, but Gervais then adds another dimension.

“For example my new movie David Brent: Life On The Road. If it does well I stand to make millions. If you pirate it, sure, you’ll save a few quid. But millions…”

Of course, the entire PSA is tongue-in-cheek, but by using one of the classic pirate excuses combined with more traditional anti-piracy language, Gervais creates more food for discussion than more traditional PSAs.

The short video confuses both pirates and creators, and actually provides a starting ground for a decent discussion.

Amusingly, the Industry Trust stresses that Gervais himself wrote the anti-piracy message, as if they need an excuse. To compensate, they are quick to stress that piracy seriously hurts the UK movie industry.

“The Industry Trust handed artistic control of the trailer to Gervais, who tore up the rule book and took the trailer spectacularly off-message as only he can,” they write.

“While the trailer is a light-hearted take on piracy, the reality is that in 2015, the top 20 titles made up, a total of 41% of the total UK box office, meaning that a high percentage of films weren’t seen by a lot of people.”

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Creative computing at Eastwood Academy

Post Syndicated from Oliver Quinlan original https://www.raspberrypi.org/blog/creative-computing-at-eastwood-academy/

It’s nearly two years since Computing became a subject for all children in England to study, and we’re now seeing some amazing work to bring opportunities for digital making into schools. Recently I visited Eastwood Academy in Southend-on-Sea, where teacher Lucas Abbot has created a digital making room, and built a community of young programmers and makers there.
Photo 14-06-2016, 12 51 38

Lucas trained as a physics teacher and got hold of a Raspberry Pi for projects at home back in 2012. His head teacher heard about his hobby, and when the move towards all children learning programming started, Lucas was approached to take up the challenge of developing the new subject of Computing in the school. With the help of friends at the local Raspberry Jam, Linux user group, and other programming meetups, he taught himself the new curriculum and set about creating an environment in which young people could take a similarly empowered approach.

In Year 7, students start by developing an understanding of what a computer is; it’s a journey that takes them down memory lane with their parents, discussing the retro technology of their own childhoods. Newly informed of what they’re working with, they then move on to programming with the Flowol language, moving to Scratch, Kodu and the BBC micro:bit. In Year 8 they get to move on to the Raspberry Pi, firing up the fifteen units Lucas has set up in collaborative workstations in the middle of the room. By the time the students choose their GCSE subjects at the end of Year 8, they have experienced programming a variety of HATs, hacking Minecraft to run games they have invented, and managing a Linux system themselves.
Photo 14-06-2016, 10 02 44

Fifteen Raspberry Pi computers have been set up in the centre of the room, at stations specifically designed to promote collaboration. While the traditional PCs around the edges of the room are still used, it was the Pi stations where pupils were most active, connecting things for their projects, and making together. A clever use of ceiling-mounted sockets, and some chains for health and safety reasons, has allowed these new stations to be set up at a low cost.

The teaching is based on building a firm foundation in each area studied, before giving students the chance to invent, build, and hack their own projects. I spent a whole day at the school; I found the environment to be entirely hands-on, and filled with engaged and excited young people learning through making. In one fabulous project two girls were setting up a paper rocket system, propelled using compressed air with a computer-based countdown system. Problem-solving and learning through failure are part of the environment too. One group spent a session trying to troubleshoot a HAT-mounted display that wasn’t quite behaving as they wanted it to.

Lessons were impressive, but even more so was the lunchtime making club which happens every single day. About 30 young people rushed into the room at lunchtime and got started with projects ranging from figuring out how to program a robot Mr Abbot had brought in, to creating the IKEA coffee table arcade machines from a recent MagPi tutorial.
Photo 14-06-2016, 13 04 41

I had a great conversation with one female student who told me how she had persuaded her father to buy a Raspberry Pi, and then taught him how to use it. Together, they got inspired to create a wood-engraving machine using a laser. Lunchtime clubs are often a place for socialising, but there was a real sense of purpose here too, of students coming together to achieve something for themselves.

Since 2014 most schools in England have had lessons in computing, but Eastwood Academy has also been building a community of young digital makers. They’re linking their ambitious lessons with their own interests and aspirations, building cool projects, learning lots, and having fun along the way. We’d love to hear from other schools that are taking such an ambitious approach to computing and digital making.

The post Creative computing at Eastwood Academy appeared first on Raspberry Pi.

Facebook Using Physical Location to Suggest Friends

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2016/06/facebook_using_.html

This could go badly:

People You May Know are people on Facebook that you might know,” a Facebook spokesperson said. “We show you people based on mutual friends, work and education information, networks you’re part of, contacts you’ve imported and many other factors.”

One of those factors is smartphone location. A Facebook spokesperson said though that shared location alone would not result in a friend suggestion, saying that the two parents must have had something else in common, such as overlapping networks.

“Location information by itself doesn’t indicate that two people might be friends,” said the Facebook spokesperson. “That’s why location is only one of the factors we use to suggest people you may know.”

The article goes on to describe situations where you don’t want Facebook to do this: Alcoholics Anonymous meetings, singles bars, some Tinder dates, and so on. But this is part of Facebook’s aggressive use of location data in many of its services.

BoingBoing post.

EDITED TO ADD: Facebook backtracks.

Get to know the Raspberry Pi Foundation

Post Syndicated from Philip Colligan original https://www.raspberrypi.org/blog/get-know-raspberry-pi-foundation/

One of the best things about the Raspberry Pi Foundation is our awesome community. Anything we achieve is only possible because of the growing movement of makers, educators, programmers, volunteers and young people all over the world who share our mission. We work really hard to celebrate that community on this blog, across social media, in our magazine, and pretty much every other opportunity we get.

Screen Shot 2016-06-27 at 16.32.56

But how much do you know about Raspberry Pi Foundation as an organisation? What kind of organisation are we? Who works here? What do they do?

Trustees

Our founders set Raspberry Pi up as an educational charity. That means we are an organisation that exists for the public benefit and, like all charities in the UK, we are governed by a board of trustees who are responsible for making sure that we use our resources effectively to achieve our charitable goals. It’s not an easy gig being trustee of a charity. There’s a lot of legal and other responsibility; endless paperwork, meetings and decisions; and you don’t get paid for any of it.

We’re insanely lucky to have a fantastic board of trustees, which includes several of our co-founders. In all sorts of different ways they add huge value to our work and we are very grateful to the whole board for their time and expertise.

Pete Lomas: Founder, Trustee and hardware designer of the first-gen Raspberry Pi

Pete Lomas: founder, trustee and hardware designer of the first-gen Raspberry Pi

The board of trustees is chaired by David Cleevely, who is a successful technology entrepreneur, angel investor, founder of charities, adviser to governments, and much, much more besides. If the role of a trustee can be tough, then the role of the chair is an order of magnitude more so. David makes it look effortless, but he puts in a huge amount of his personal time and energy into the Foundation, and we simply wouldn’t be where we are today without him.

cleevely

David Cleevely and some friends

Members

Charities in the UK also have members: if the trustees are like the board of directors of a commercial company, the members are like its shareholders (except without the shares). At the end of last year, we expanded the membership of the Foundation, appointing 20 outstanding individuals who share our mission and who can help us deliver on it. It’s a seriously impressive group already and, over the next few years, we want to expand the membership further, making it even more diverse and international. It’s important we get this right, in future the trustees of the Foundation will be selected from, and elected by the membership.

You can now find a full list of our members and trustees on the Foundation’s website.

A few of our Members - click through to see the rest.

A few of our members and trustees – click through to see the rest.

Trading

Our commercial activity (selling Raspberry Pi computers and other things) is done through a wholly-owned trading subsidiary (Raspberry Pi Trading Limited), which is led by Eben Upton. Any profits we make from our trading activity are invested in our charitable mission. So, every time you buy a Raspberry Pi computer you’re helping young people get involved in computing and digital making.

Eben Upton, Founder and CEO of Raspberry Pi Trading

Eben Upton, Founder and CEO of Raspberry Pi Trading

Like any company, Raspberry Pi Trading Limited has a board of directors, including a mix of executives, trustees of the Foundation and independent non-executives.

We’re delighted to have recently appointed David Gammon as a non-executive director on the board of Raspberry Pi Trading Limited. David has widespread experience in developing and building technology based businesses. He is the non-executive chairman of Frontier Developments and the founding CEO of investment firm Rockspring. He’s only been with us for a couple of weeks and is already making an impact.

Reading

We’ve also added a new section to the website which makes it easier for you to find the key documents that describe what we do, including our strategy, annual reviews from 2014 and 2015, and our Trustees’ report and financial statements for the past few years.

RPi_AnnualReport_2015-448x700

Click through to read our Annual Review, reports, strategy document, and more.

Team

The final part of our new and improved About Us section is an introduction to our fabulous team.

A few of our team members - we're working on getting pictures of the people who are currently ghosts!

A few of our team members – we’re working on getting pictures of the people who are currently ghosts!

The Foundation has grown quite a lot over the past year, not least as a result of the merger with Code Club last autumn. Altogether we now have 65 people beavering away at Pi Towers (and other locations), designing awesome products and software, delivering educational programmes, supporting Code Clubs around the world, producing magazines, books and educational resources, training educators and lots more.

It’s a fantastically diverse and creative bunch of programmers, educators and makers. We love talking to members of the community, so please do look out for us at events, on the forums, on Twitter, and elsewhere.

The post Get to know the Raspberry Pi Foundation appeared first on Raspberry Pi.

Now Open – AWS Asia Pacific (Mumbai) Region

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/now-open-aws-asia-pacific-mumbai-region/

We are expanding the AWS footprint again, this time with a new region in Mumbai, India. AWS customers in the area can use the new Asia Pacific (Mumbai) Region to better serve end users in India.

New Region
The new Mumbai region has two Availability Zones, raising the global total to 35. It supports Amazon Elastic Compute Cloud (EC2) (C4, M4, T2, D2, I2, and R3 instances are available) and related services including Amazon Elastic Block Store (EBS), Amazon Virtual Private Cloud, Auto Scaling, and  Elastic Load Balancing.

It also supports the following services:

There are now three edge locations (Mumbai, Chennai, and New Delhi) in India. The locations support Amazon Route 53, Amazon CloudFront, and S3 Transfer Acceleration. AWS Direct Connect support is available via our Direct Connect Partners (listed below).

This is our thirteenth region (see the AWS Global Infrastructure map for more information). As usual, you can see the list of regions in the region menu of the Console:

Customers
There are over 75,000 active AWS customers in India, representing a diverse base of industries. In the time leading up to today’s launch, we have provided some of these customers with access to the new region in preview form. Two of them (Ola Cabs and NDTV) were kind enough to share some of their experience and observations with us:

Ola Cabs’ mobile app leverages AWS to redefine point-to-point transportation in more than 100 cities across India. AWS allows OLA to constantly innovate faster with new features and services for their customers, without compromising on availability or the customer experience of their service. Ankit Bhati (CTO and Co-Founder) told us:

We are using technology to create mobility for a billion Indians, by giving them convenience and access to transportation of their choice. Technology is a key enabler, where we use AWS to drive supreme customer experience, and innovate faster on new features & services for our customers. This has helped us reach 100+ cities & 550K driver partners across India. We do petabyte scale analytics using various AWS big data services and deep learning techniques, allowing us to bring our driver-partners close to our customers when they need them. AWS allows us to make 30+ changes a day to our highly scalable micro-services based platform consisting of 100s of low latency APIs, serving millions of requests a day. We have tried the AWS India region. It is great and should help us further enhance the experience for our customers.


NDTV, India’s leading media house is watched by millions of people across the world. NDTV has been using AWS since 2009 to run their video platform and all their web properties. During the Indian general elections in May 2014, NDTV fielded an unprecedented amount of web traffic that scaled 26X from 500 million hits per day to 13 billion hits on Election Day (regularly peaking at 400K hits per second), all running on AWS.  According to Kawaljit Singh Bedi (CTO of NDTV Convergence):

NDTV is pleased to report very promising results in terms of reliability and stability of AWS’ infrastructure in India in our preview tests. Based on tests that our technical teams have run in India, we have determined that the network latency from the AWS India infrastructure Region are far superior compared to other alternatives. Our web and mobile traffic has jumped by over 30% in the last year and as we expand to new territories like eCommerce and platform-integration we are very excited on the new AWS India region launch. With the portfolio of services AWS will offer at launch, low latency, great reliability, and the ability to meet regulatory requirements within India, NDTV has decided to move these critical applications and IT infrastructure all-in to the AWS India region from our current set-up.

 


Here are some of our other customers in the region:

Tata Motors Limited, a leading Indian multinational automotive manufacturing company runs its telematics systems on AWS. Fleet owners use this solution to monitor all vehicles in their fleet on a real time basis. AWS has helped Tata Motors become to more agile and has increased their speed of experimentation and innovation.

redBus is India’s leading bus ticketing platform that sells their tickets via web, mobile, and bus agents. They now cover over 67K routes in India with over 1,800 bus operators. redBus has scaled to sell more than 40 million bus tickets annually, up from just 2 million in 2010. At peak season, there are over 100 bus ticketing transactions every minute. The company also recently developed a new SaaS app on AWS that gives bus operators the option of handling their own ticketing and managing seat inventories. redBus has gone global expanding to new geographic locations such as Singapore and Peru using AWS.

Hotstar is India’s largest premium streaming platform with more than 85K hours of drama and movies and coverage of every major global sporting event. Launched in February 2015, Hotstar quickly became one of the fastest adopted new apps anywhere in the world. It has now been downloaded by more than 68M users and has attracted followers on the back of a highly evolved video streaming technology and high attention to quality of experience across devices and platforms.

Macmillan India has provided publishing services to the education market in India for more than 120 years. Prior to using AWS, Macmillan India has moved its core enterprise applications — Business Intelligence (BI), Sales and Distribution, Materials Management, Financial Accounting and Controlling, Human Resources and a customer relationship management (CRM) system from an existing data center in Chennai to AWS. By moving to AWS, Macmillan India has boosted SAP system availability to almost 100 percent and reduced the time it takes them to provision infrastructure from 6 weeks to 30 minutes.

Partners
We are pleased to be working with a broad selection of partners in India. Here’s a sampling:

  • AWS Premier Consulting Partners – BlazeClan Technologies Pvt. Limited, Minjar Cloud Solutions Pvt Ltd, and Wipro.
  • AWS Consulting Partners – Accenture, BluePi, Cloudcover, Frontier, HCL, Powerupcloud, TCS, and Wipro.
  • AWS Technology Partners – Freshdesk, Druva, Indusface, Leadsquared, Manthan, Mithi, Nucleus Software, Newgen, Ramco Systems, Sanovi, and Vinculum.
  • AWS Managed Service Providers – Progressive Infotech and Spruha Technologies.
  • AWS Direct Connect Partners – AirTel, Colt Technology Services,  Global Cloud Xchange, GPX, Hutchison Global Communications, Sify, and Tata Communications.

Amazon Offices in India
We have opened six offices in India since 2011 – Delhi, Mumbai, Hyderabad, Bengaluru, Pune, and Chennai. These offices support our diverse customer base in India including enterprises, government agencies, academic institutions, small-to-mid-size companies, startups, and developers.

Support
The full range of AWS Support options (Basic, Developer, Business, and Enterprise) is also available for the Mumbai Region. All AWS support plans include an unlimited number of account and billing support cases, with no long-term contracts.

Compliance
Every AWS region is designed and built to meet rigorous compliance standards including ISO 27001, ISO 9001, ISO 27017, ISO 27018, SOC 1, SOC 2, and PCI DSS Level 1 (to name a few). AWS implements an information Security Management System (ISMS) that is independently assessed by qualified third parties. These assessments address a wide variety of requirements which are communicated to customers by making certifications and audit reports available, either on our public-facing website or upon request.

To learn more; take a look at the AWS Cloud Compliance page and our Data Privacy FAQ.

Use it Now
This new region is now open for business and you can start using it today! You can find additional information about the new region, documentation on how to migrate, customer use cases, information on training and other events, and a list of AWS Partners in India on the AWS site.

We have set up a seller of record in India (known as AISPL); please see the AISPL customer agreement for details.


Jeff;

 

Top 10 Most Pirated Movies of The Week – 06/27/16

Post Syndicated from Ernesto original https://torrentfreak.com/top-10-pirated-movies-week-062716/

batsupsThis week we have three newcomers in our chart.

Batman v Superman: Dawn of Justice is the most downloaded movie.

The data for our weekly download chart is estimated by TorrentFreak, and is for informational and educational reference only. All the movies in the list are Web-DL/Webrip/HDRip/BDrip/DVDrip unless stated otherwise.

RSS feed for the weekly movie download chart.

Ranking (last week) Movie IMDb Rating / Trailer
torrentfreak.com
1 (…) Batman v Superman: Dawn of Justice 7.0 / trailer
2 (1) Warcraft (TS/TC) 7.7 / trailer
3 (…) Finding Dory (HDTS) 8.1 / trailer
4 (2) The Huntsman: Winter’s War 6.2 / trailer
5 (4) Whiskey Tango Foxtrot 6.8 / trailer
6 (…) Hardcore Henry 6.9 / trailer
7 (3) X-Men: Apocalypse (HDCam/TC) 7.7 / trailer
8 (5) Eye In The Sky 7.6 / trailer
9 (…) Allegiant 5.9 / trailer
10 (6) Zootopia 8.3 / trailer

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Astro Pi: Goodnight, Mr Tim

Post Syndicated from Helen Lynn original https://www.raspberrypi.org/blog/astro-pi-goodnight-mr-tim/

On Saturday, British ESA astronaut Tim Peake returned to Earth after six months on the International Space Station. During his time in orbit, he did a huge amount of work to share the excitement of his trip with young people and support education across the curriculum: as part of this, he used our two Astro Pi computers, Izzy and Ed, to run UK school students’ code and play their music in space. But what lies ahead for the pair now Tim’s mission, Principia, is complete?

Watch Part 4 of the Story of Astro Pi!

The Story of Astro Pi – Part 4: Goodnight, Mr Tim

As British ESA astronaut Tim Peake’s mission comes to an end, what will become of Ed and Izzy, our courageous Astro Pis? Find out more at astro-pi.org/about/mission/ Narration by Fran Scott: franscott.co.uk

Ed and Izzy will remain on the International Space Station until 2022, and they have exciting work ahead of them. Keep an eye on this blog and on our official magazine, The MagPi, for news!

The post Astro Pi: Goodnight, Mr Tim appeared first on Raspberry Pi.

Top 10 Most Pirated Movies of The Week – 06/20/16

Post Syndicated from Ernesto original https://torrentfreak.com/top-10-pirated-movies-week-062016/

warcraftsThis week we have three newcomers in our chart.

Warcraft is the most downloaded movie.

The data for our weekly download chart is estimated by TorrentFreak, and is for informational and educational reference only. All the movies in the list are BD/DVDrips unless stated otherwise.

RSS feed for the weekly movie download chart.

Ranking (last week) Movie IMDb Rating / Trailer
torrentfreak.com
1 (2) Warcraft 7.7 / trailer
2 (…) The Huntsman: Winter’s War 6.2 / trailer
3 (1) X-Men: Apocalypse (HDCam/TC) 7.7 / trailer
4 (…) Whiskey Tango Foxtrot 6.8 / trailer
5 (3) Eye In The Sky 7.6 / trailer
6 (4) Zootopia 8.3 / trailer
7 (7) London Has Fallen 5.9 / trailer
8 (6) 13 Hours: The Secret Soldiers of Benghazi 7.4 / trailer
9 (…) The Last Heist 3.7 / trailer
10 (5) Midnight Special 6.9 / trailer

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.