Tag Archives: election

Digital painter rundown

Post Syndicated from Eevee original https://eev.ee/blog/2017/06/17/digital-painter-rundown/

Another patron post! IndustrialRobot asks:

You should totally write about drawing/image manipulation programs! (Inspired by https://eev.ee/blog/2015/05/31/text-editor-rundown/)

This is a little trickier than a text editor comparison — while most text editors are cross-platform, quite a few digital art programs are not. So I’m effectively unable to even try a decent chunk of the offerings. I’m also still a relatively new artist, and image editors are much harder to briefly compare than text editors…

Right, now that your expectations have been suitably lowered:

Krita

I do all of my digital art in Krita. It’s pretty alright.

Okay so Krita grew out of Calligra, which used to be KOffice, which was an office suite designed for KDE (a Linux desktop environment). I bring this up because KDE has a certain… reputation. With KDE, there are at least three completely different ways to do anything, each of those ways has ludicrous amounts of customization and settings, and somehow it still can’t do what you want.

Krita inherits this aesthetic by attempting to do literally everything. It has 17 different brush engines, more than 70 layer blending modes, seven color picker dockers, and an ungodly number of colorspaces. It’s clearly intended primarily for drawing, but it also supports animation and vector layers and a pretty decent spread of raster editing tools. I just right now discovered that it has Photoshop-like “layer styles” (e.g. drop shadow), after a year and a half of using it.

In fairness, Krita manages all of this stuff well enough, and (apparently!) it manages to stay out of your way if you’re not using it. In less fairness, they managed to break erasing with a Wacom tablet pen for three months?

I don’t want to rag on it too hard; it’s an impressive piece of work, and I enjoy using it! The emotion it evokes isn’t so much frustration as… mystified bewilderment.

I once filed a ticket suggesting the addition of a brush size palette — a panel showing a grid of fixed brush sizes that makes it easy to switch between known sizes with a tablet pen (and increases the chances that you’ll be able to get a brush back to the right size again). It’s a prominent feature of Paint Tool SAI and Clip Studio Paint, and while I’ve never used either of those myself, I’ve seen a good few artists swear by it.

The developer response was that I could emulate the behavior by creating brush presets. But that’s flat-out wrong: getting the same effect would require creating a ton of brush presets for every brush I have, plus giving them all distinct icons so the size is obvious at a glance. Even then, it would be much more tedious to use and fill my presets with junk.

And that sort of response is what’s so mysterious to me. I’ve never even been able to use this feature myself, but a year of amateur painting with Krita has convinced me that it would be pretty useful. But a developer didn’t see the use and suggested an incredibly tedious alternative that only half-solves the problem and creates new ones. Meanwhile, of the 28 existing dockable panels, a quarter of them are different ways to choose colors.

What is Krita trying to be, then? What does Krita think it is? Who precisely is the target audience? I have no idea.


Anyway, I enjoy drawing in Krita well enough. It ships with a respectable set of brushes, and there are plenty more floating around. It has canvas rotation, canvas mirroring, perspective guide tools, and other art goodies. It doesn’t colordrop on right click by default, which is arguably a grave sin (it shows a customizable radial menu instead), but that’s easy to rebind. It understands having a background color beneath a bottom transparent layer, which is very nice. You can also toggle any brush between painting and erasing with the press of a button, and that turns out to be very useful.

It doesn’t support infinite canvases, though it does offer a one-click button to extend the canvas in a given direction. I’ve never used it (and didn’t even know what it did until just now), but would totally use an infinite canvas.

I haven’t used the animation support too much, but it’s pretty nice to have. Granted, the only other animation software I’ve used is Aseprite, so I don’t have many points of reference here. It’s a relatively new addition, too, so I assume it’ll improve over time.

The one annoyance I remember with animation was really an interaction with a larger annoyance, which is: working with selections kind of sucks. You can’t drag a selection around with the selection tool; you have to switch to the move tool. That would be fine if you could at least drag the selection ring around with the selection tool, but you can’t do that either; dragging just creates a new selection.

If you want to copy a selection, you have to explicitly copy it to the clipboard and paste it, which creates a new layer. Ctrl-drag with the move tool doesn’t work. So then you have to merge that layer down, which I think is where the problem with animation comes in: a new layer is non-animated by default, meaning it effectively appears in any frame, so simply merging it down with merge it onto every single frame of the layer below. And you won’t even notice until you switch frames or play back the animation. Not ideal.

This is another thing that makes me wonder about Krita’s sense of identity. It has a lot of fancy general-purpose raster editing features that even GIMP is still struggling to implement, like high color depth support and non-destructive filters, yet something as basic as working with selections is clumsy. (In fairness, GIMP is a bit clumsy here too, but it has a consistent notion of “floating selection” that’s easy enough to work with.)

I don’t know how well Krita would work as a general-purpose raster editor; I’ve never tried to use it that way. I can’t think of anything obvious that’s missing. The only real gotcha is that some things you might expect to be tools, like smudge or clone, are just types of brush in Krita.

GIMP

Ah, GIMP — open source’s answer to Photoshop.

It’s very obviously intended for raster editing, and I’m pretty familiar with it after half a lifetime of only using Linux. I even wrote a little Scheme script for it ages ago to automate some simple edits to a couple hundred files, back before I was aware of ImageMagick. I don’t know what to say about it, specifically; it’s fairly powerful and does a wide variety of things.

In fact I’d say it’s almost frustratingly intended for raster editing. I used GIMP in my first attempts at digital painting, before I’d heard of Krita. It was okay, but so much of it felt clunky and awkward. Painting is split between a pencil tool, a paintbrush tool, and an airbrush tool; I don’t really know why. The default brushes are largely uninteresting. Instead of brush presets, there are tool presets that can be saved for any tool; it’s a neat idea, but doesn’t feel like a real substitute for brush presets.

Much of the same functionality as Krita is there, but it’s all somehow more clunky. I’m sure it’s possible to fiddle with the interface to get something friendlier for painting, but I never really figured out how.

And then there’s the surprising stuff that’s missing. There’s no canvas rotation, for example. There’s only one type of brush, and it just stamps the same pattern along a path. I don’t think it’s possible to smear or blend or pick up color while painting. The only way to change the brush size is via the very sensitive slider on the tool options panel, which I remember being a little annoying with a tablet pen. Also, you have to specifically enable tablet support? It’s not difficult or anything, but I have no idea why the default is to ignore tablet pressure and treat it like a regular mouse cursor.

As I mentioned above, there’s also no support for high color depth or non-destructive editing, which is honestly a little embarrassing. Those are the major things Serious Professionals™ have been asking for for ages, and GIMP has been trying to provide them, but it’s taking a very long time. The first signs of GEGL, a new library intended to provide these features, appeared in GIMP 2.6… in 2008. The last major release was in 2012. GIMP has been working on this new plumbing for almost as long as Krita’s entire development history. (To be fair, Krita has also raised almost €90,000 from three Kickstarters to fund its development; I don’t know that GIMP is funded at all.)

I don’t know what’s up with GIMP nowadays. It’s still under active development, but the exact status and roadmap are a little unclear. I still use it for some general-purpose editing, but I don’t see any reason to use it to draw.

I do know that canvas rotation will be in the next release, and there was some experimentation with embedding MyPaint’s brush engine (though when I tried it it was basically unusable), so maybe GIMP is interested in wooing artists? I guess we’ll see.

MyPaint

Ah, MyPaint. I gave it a try once. Once.

It’s a shame, really. It sounds pretty great: specifically built for drawing, has very powerful brushes, supports an infinite canvas, supports canvas rotation, has a simple UI that gets out of your way. Perfect.

Or so it seems. But in MyPaint’s eagerness to shed unnecessary raster editing tools, it forgot a few of the more useful ones. Like selections.

MyPaint has no notion of a selection, nor of copy/paste. If you want to move a head to align better to a body, for example, the sanctioned approach is to duplicate the layer, erase the head from the old layer, erase everything but the head from the new layer, then move the new layer.

I can’t find anything that resembles HSL adjustment, either. I guess the workaround for that is to create H/S/L layers and floodfill them with different colors until you get what you want.

I can’t work seriously without these basic editing tools. I could see myself doodling in MyPaint, but Krita works just as well for doodling as for serious painting, so I’ve never gone back to it.

Drawpile

Drawpile is the modern equivalent to OpenCanvas, I suppose? It lets multiple people draw on the same canvas simultaneously. (I would not recommend it as a general-purpose raster editor.)

It’s a little clunky in places — I sometimes have bugs where keyboard focus gets stuck in the chat, or my tablet cursor becomes invisible — but the collaborative part works surprisingly well. It’s not a brush powerhouse or anything, and I don’t think it allows textured brushes, but it supports tablet pressure and canvas rotation and locked alpha and selections and whatnot.

I’ve used it a couple times, and it’s worked well enough that… well, other people made pretty decent drawings with it? I’m not sure I’ve managed yet. And I wouldn’t use it single-player. Still, it’s fun.

Aseprite

Aseprite is for pixel art so it doesn’t really belong here at all. But it’s very good at that and I like it a lot.

That’s all

I can’t name any other serious contender that exists for Linux.

I’m dimly aware of a thing called “Photo Shop” that’s more intended for photos but functions as a passable painter. More artists seem to swear by Paint Tool SAI and Clip Studio Paint. Also there’s Paint.NET, but I have no idea how well it’s actually suited for painting.

And that’s it! That’s all I’ve got. Krita for drawing, GIMP for editing, Drawpile for collaborative doodling.

Fake News As A Service (FNaaS?) – $400k To Rig An Election

Post Syndicated from Darknet original http://feedproxy.google.com/~r/darknethackers/~3/UqEqmi9y3oY/

This is pretty interesting, the prices for Fake News as a Service have come out after some research by Trend Micro, imagine that you can create a fake celebrity with 300,000 followers for only $2,600. Now we all know this Fake News thing has been going on for a while, and of course, if it’s […]

The post Fake News As A Service (FNaaS?)…

Read the full post at darknet.org.uk

Latency Distribution Graph in AWS X-Ray

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/latency-distribution-graph-in-aws-x-ray/

We’re continuing to iterate on the AWS X-Ray service based on customer feedback and today we’re excited to release a set of tools to help you quickly dive deep on latencies in your applications. Visual Node and Edge latency distribution graphs are shown in a handy new “Service Details” side bar in your X-Ray Service Map.

The X-Ray service graph gives you a visual representation of services and their interactions over a period of time that you select. The nodes represent services and the edges between the nodes represent calls between the services. The nodes and edges each have a set of statistics associated with them. While the visualizations provided in the service map are useful for estimating the average latency in an application they don’t help you to dive deep on specific issues. Most of the time issues occur at statistical outliers. To alleviate this X-Ray computes histograms like the one above help you solve those 99th percentile bugs.

To see a Response Distribution for a Node just click on it in the service graph. You can also click on the edges between the nodes to see the Response Distribution from the viewpoint of the calling service.

The team had a few interesting problems to solve while building out this feature and I wanted to share a bit of that with you now! Given the large number of traces an app can produce it’s not a great idea (for your browser) to plot every single trace client side. Instead most plotting libraries, when dealing with many points, use approximations and bucketing to get a network and performance friendly histogram. If you’ve used monitoring software in the past you’ve probably seen as you zoom in on the data you get higher fidelity. The interesting thing about the latencies coming in from X-Ray is that they vary by several orders of magnitude.

If the latencies were distributed between strictly 0s and 1s you could easily just create 10 buckets of 100 milliseconds. If your apps are anything like mine there’s a lot of interesting stuff happening in the outliers, so it’s beneficial to have more fidelity at 1% and 99% than it is at 50%. The problem with fixed bucket sizes is that they’re not necessarily giving you an accurate summary of data. So X-Ray, for now, uses dynamic bucket sizing based on the t-digests algorithm by Ted Dunning and Otmar Ertl. One of the distinct advantages of this algorithm over other approximation algorithms is its accuracy and precision at extremes (where most errors typically are).

An additional advantage of X-Ray over other monitoring software is the ability to measure two perspectives of latency simultaneously. Developers almost always have some view into the server side latency from their application logs but with X-Ray you can examine latency from the view of each of the clients, services, and microservices that you’re interacting with. You can even dive deeper by adding additional restrictions and queries on your selection. You can identify the specific users and clients that are having issues at that 99th percentile.

This info has already been available in API calls to GetServiceGraph as ResponseTimeHistogram but now we’re exposing it in the console as well to make it easier for customers to consume. For more information check out the documentation here.

Randall

NSA Document Outlining Russian Attempts to Hack Voter Rolls

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/06/nsa_document_ou.html

This week brought new public evidence about Russian interference in the 2016 election. On Monday, the Intercept published a top-secret National Security Agency document describing Russian hacking attempts against the US election system. While the attacks seem more exploratory than operational ­– and there’s no evidence that they had any actual effect ­– they further illustrate the real threats and vulnerabilities facing our elections, and they point to solutions.

The document describes how the GRU, Russia’s military intelligence agency, attacked a company called VR Systems that, according to its website, provides software to manage voter rolls in eight states. The August 2016 attack was successful, and the attackers used the information they stole from the company’s network to launch targeted attacks against 122 local election officials on October 27, 12 days before the election.

That is where the NSA’s analysis ends. We don’t know whether those 122 targeted attacks were successful, or what their effects were if so. We don’t know whether other election software companies besides VR Systems were targeted, or what the GRU’s overall plan was — if it had one. Certainly, there are ways to disrupt voting by interfering with the voter registration process or voter rolls. But there was no indication on Election Day that people found their names removed from the system, or their address changed, or anything else that would have had an effect — anywhere in the country, let alone in the eight states where VR Systems is deployed. (There were Election Day problems with the voting rolls in Durham, NC ­– one of the states that VR Systems supports ­– but they seem like conventional errors and not malicious action.)

And 12 days before the election (with early voting already well underway in many jurisdictions) seems far too late to start an operation like that. That is why these attacks feel exploratory to me, rather than part of an operational attack. The Russians were seeing how far they could get, and keeping those accesses in their pocket for potential future use.

Presumably, this document was intended for the Justice Department, including the FBI, which would be the proper agency to continue looking into these hacks. We don’t know what happened next, if anything. VR Systems isn’t commenting, and the names of the local election officials targeted did not appear in the NSA document.

So while this document isn’t much of a smoking gun, it’s yet more evidence of widespread Russian attempts to interfere last year.

The document was, allegedly, sent to the Intercept anonymously. An NSA contractor, Reality Leigh Winner, was arrested Saturday and charged with mishandling classified information. The speed with which the government identified her serves as a caution to anyone wanting to leak official US secrets.

The Intercept sent a scan of the document to another source during its reporting. That scan showed a crease in the original document, which implied that someone had printed the document and then carried it out of some secure location. The second source, according to the FBI’s affidavit against Winner, passed it on to the NSA. From there, NSA investigators were able to look at their records and determine that only six people had printed out the document. (The government may also have been able to track the printout through secret dots that identified the printer.) Winner was the only one of those six who had been in e-mail contact with the Intercept. It is unclear whether the e-mail evidence was from Winner’s NSA account or her personal account, but in either case, it’s incredibly sloppy tradecraft.

With President Trump’s election, the issue of Russian interference in last year’s campaign has become highly politicized. Reports like the one from the Office of the Director of National Intelligence in January have been criticized by partisan supporters of the White House. It’s interesting that this document was reported by the Intercept, which has been historically skeptical about claims of Russian interference. (I was quoted in their story, and they showed me a copy of the NSA document before it was published.) The leaker was even praised by WikiLeaks founder Julian Assange, who up until now has been traditionally critical of allegations of Russian election interference.

This demonstrates the power of source documents. It’s easy to discount a Justice Department official or a summary report. A detailed NSA document is much more convincing. Right now, there’s a federal suit to force the ODNI to release the entire January report, not just the unclassified summary. These efforts are vital.

This hack will certainly come up at the Senate hearing where former FBI director James B. Comey is scheduled to testify Thursday. Last year, there were several stories about voter databases being targeted by Russia. Last August, the FBI confirmed that the Russians successfully hacked voter databases in Illinois and Arizona. And a month later, an unnamed Department of Homeland Security official said that the Russians targeted voter databases in 20 states. Again, we don’t know of anything that came of these hacks, but expect Comey to be asked about them. Unfortunately, any details he does know are almost certainly classified, and won’t be revealed in open testimony.

But more important than any of this, we need to better secure our election systems going forward. We have significant vulnerabilities in our voting machines, our voter rolls and registration process, and the vote tabulation systems after the polls close. In January, DHS designated our voting systems as critical national infrastructure, but so far that has been entirely for show. In the United States, we don’t have a single integrated election. We have 50-plus individual elections, each with its own rules and its own regulatory authorities. Federal standards that mandate voter-verified paper ballots and post-election auditing would go a long way to secure our voting system. These attacks demonstrate that we need to secure the voter rolls, as well.

Democratic elections serve two purposes. The first is to elect the winner. But the second is to convince the loser. After the votes are all counted, everyone needs to trust that the election was fair and the results accurate. Attacks against our election system, even if they are ultimately ineffective, undermine that trust and ­– by extension ­– our democracy. Yes, fixing this will be expensive. Yes, it will require federal action in what’s historically been state-run systems. But as a country, we have no other option.

This essay previously appeared in the Washington Post.

An Open Letter To Microsoft: A 64-bit OS is Better Than a 32-bit OS

Post Syndicated from Brian Wilson original https://www.backblaze.com/blog/64-bit-os-vs-32-bit-os/

Windows 32 Bit vs. 64 Bit

Editor’s Note: Our co-founder & CTO, Brian Wilson, was working on a few minor performance enhancements and bug fixes (Inherit Backup State is a lot faster now). We got a version of this note from him late one night and thought it was worth sharing.

There are a few absolutes in life – death, taxes, and that a 64-bit OS is better than a 32-bit OS. Moving over to a 64-bit OS allows your laptop to run BOTH the old compatible 32-bit processes and also the new 64-bit processes. In other words, there is zero downside (and there are gigantic upsides).

32-Bit vs. 64-Bit

The main gigantic upside of a 64-bit process is the ability to support more than 2 GBytes of RAM (pedantic people will say “4 GBytes”… but there are technicalities I don’t want to get into here). Since only 1.6% of Backblaze customers have 2 GBytes or less of RAM, the other 98.4% desperately need 64-bit support, period, end of story. And remember, there is no downside.

Because there is zero downside, the first time it could, Apple shipped with 64-bit OS support. Apple did not give customers the option of “turning off all 64-bit programs.” Apple first shipped 64-bit support in OS X 10.6 Tiger in 2009 (which also had 32-bit support, so there was zero downside to the decision).

This was so successful that Apple shipped all future Operating Systems configured to support both 64-bit and 32-bit processes. All of them. Customers no longer had an option to turn off 64-bit support.

As a result, less than 2/10ths of 1% of Backblaze Mac customers are running a computer that is so old that it can only run 32-bit programs. Despite those microscopic numbers we still loyally support this segment of our customers by providing a 32-bit only version of Backblaze’s backup client.

Apple vs. Microsoft

But let’s contrast the Apple approach with that of Microsoft. Microsoft offers a 64-bit OS in Windows 10 that runs all 64-bit and all 32-bit programs. This is a valid choice of an Operating System. The problem is Microsoft ALSO gives customers the option to install 32-bit Windows 10 which will not run 64-bit programs. That’s crazy.

Another advantage of the 64-bit version of Windows is security. There are a variety of security features such as ASLR (Address Space Layout Randomization) that work best in 64-bits. The 32-bit version is inherently less secure.

By choosing 32-bit Windows 10 a customer is literally choosing a lower performance, LOWER SECURITY, Operating System that is artificially hobbled to not run all software.

When one of our customers running 32-bit Windows 10 contacts Backblaze support, it is almost always a customer that did not realize the choice they were making when they installed 32-bit Windows 10. They did not have the information to understand what they are giving up. For example, we have seen customers that have purchased 8 GB of RAM, yet they had installed 32-bit Windows 10. Simply by their OS “choice”, they disabled about 3/4ths of the RAM that they paid for!

Let’s put some numbers around it: Approximately 4.3% of Backblaze customers with Windows machines are running a 32-bit version of Windows compared with just 2/10ths of 1% of our Apple customers. The Apple customers did not choose incorrectly, they just have not upgraded their operating system in the last 9 years. If we assume the same rate of “legitimate older computers not upgraded yet” for Microsoft users that means 4.1% of the Microsoft users made a fairly large mistake when they choose their Microsoft Operating System version.

Now some people would blame the customer because after all they made the OS selection. Microsoft offers the correct choice, which is 64-bit Windows 10. In fact, 95.7% of Backblaze customers running Windows made the correct choice. My issue is that Microsoft shouldn’t offer the 32-bit version at all.

And again, for the fifth time, you will not lose any 32-bit capabilities as the 64-bit operating system runs BOTH 32-bit applications and 64-bit applications. You only lose capabilities if you choose the 32-bit only Operating System.

This is how bad it is -> When Microsoft released Windows Vista in 2007 it was 64-bit and also ran all 32-bit programs flawlessly. So at that time I was baffled why Microsoft ALSO released Windows Vista in 32-bit only mode – a version that refused to run any 64-bit binaries. Then, again in Windows 7, they did the same thing and I thought I was losing my mind. And again with Windows 8! By Windows 10, I realized Microsoft may never stop doing this. No matter how much damage they cause, no matter what happens.

You might be asking -> why do I care? Why does Brian want Microsoft to stop shipping an Operating System that is likely only chosen by mistake? My problem is this: Backblaze, like any good technology vendor, wants to be easy to use and friendly. In this case, that means we need to quietly, invisibly, continue to support BOTH the 32-bit and the 64-bit versions of every Microsoft OS they release. And we’ll probably need to do this for at least 5 years AFTER Microsoft officially retires the 32-bit only version of their operating system.

Supporting both versions is complicated. The more data our customers have, the more momentarily RAM intensive some functions (like inheriting backup state) can be. The more data you have the bigger the problem. Backblaze customers who accidentally chose to disable 64-bit operations are then going to have problems. It means we have to explain to some customers that their operating system is the root cause of many performance issues in their technical lives. This is never a pleasant conversation.

I know this will probably fall on deaf ears, but Microsoft, for the sake of your customers and third party application developers like Backblaze, please stop shipping Operating Systems that disable 64-bit support. It is causing all of us a bunch of headaches we do not need.

The post An Open Letter To Microsoft: A 64-bit OS is Better Than a 32-bit OS appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

[$] Language summit lightning talks

Post Syndicated from jake original https://lwn.net/Articles/723823/rss

Over the course of the day, the 2017 Python Language Summit hosted a
handful of lightning talks, several of which were worked into the dynamic
schedule when an opportunity presented itself. They ranged from the
traditional “less than five minutes” format to some that strayed well
outside of that time frame—some generated a fair amount of discussion as
well. Topics were all over the map: board elections, beta releases,
Python as a security vulnerability, Jython, and more.

How The Intercept Outed Reality Winner

Post Syndicated from Robert Graham original http://blog.erratasec.com/2017/06/how-intercept-outed-reality-winner.html

Today, The Intercept released documents on election tampering from an NSA leaker. Later, the arrest warrant request for an NSA contractor named “Reality Winner” was published, showing how they tracked her down because she had printed out the documents and sent them to The Intercept. The document posted by the Intercept isn’t the original PDF file, but a PDF containing the pictures of the printed version that was then later scanned in.

As the warrant says, she confessed while interviewed by the FBI. Had she not confessed, the documents still contained enough evidence to convict her: the printed document was digitally watermarked.

The problem is that most new printers print nearly invisibly yellow dots that track down exactly when and where documents, any document, is printed. Because the NSA logs all printing jobs on its printers, it can use this to match up precisely who printed the document.

In this post, I show how.

You can download the document from the original article here. You can then open it in a PDF viewer, such as the normal “Preview” app on macOS. Zoom into some whitespace on the document, and take a screenshot of this. On macOS, hit [Command-Shift-3] to take a screenshot of a window. There are yellow dots in this image, but you can barely see them, especially if your screen is dirty.

We need to highlight the yellow dots. Open the screenshot in an image editor, such as the “Paintbrush” program built into macOS. Now use the option to “Invert Colors” in the image, to get something like this. You should see a roughly rectangular pattern checkerboard in the whitespace.

It’s upside down, so we need to rotate it 180 degrees, or flip-horizontal and flip-vertical:

Now we go to the EFF page and manually click on the pattern so that their tool can decode the meaning:

This produces the following result:

The document leaked by the Intercept was from a printer with model number 54, serial number 29535218. The document was printed on May 9, 2017 at 6:20. The NSA almost certainly has a record of who used the printer at that time.

The situation is similar to how Vice outed the location of John McAfee, by publishing JPEG photographs of him with the EXIF GPS coordinates still hidden in the file. Or it’s how PDFs are often redacted by adding a black bar on top of image, leaving the underlying contents still in the file for people to read, such as in this NYTime accident with a Snowden document. Or how opening a Microsoft Office document, then accidentally saving it, leaves fingerprints identifying you behind, as repeatedly happened with the Wikileaks election leaks. These sorts of failures are common with leaks. To fix this yellow-dot problem, use a black-and-white printer, black-and-white scanner, or convert to black-and-white with an image editor.

Copiers/printers have two features put in there by the government to be evil to you. The first is that scanners/copiers (when using scanner feature) recognize a barely visible pattern on currency, so that they can’t be used to counterfeit money, as shown on this $20 below:

The second is that when they print things out, they includes these invisible dots, so documents can be tracked. In other words, those dots on bills prevent them from being scanned in, and the dots produced by printers help the government track what was printed out.

Yes, this code the government forces into our printers is a violation of our 3rd Amendment rights.


While I was writing up this post, these tweets appeared first:


Comments:
https://news.ycombinator.com/item?id=14494818

Make with Minecraft Pi in The MagPi 58

Post Syndicated from Rob Zwetsloot original https://www.raspberrypi.org/blog/magpi-58/

Hey folks, Rob here! What a busy month it’s been at The MagPi HQ. While we’ve been replying to your tweets, answering questions on YouTube and fiddling with our AIY Voice Project kits, we’ve managed to put together a whole new magazine for you, with issue 58 of the official Raspberry Pi magazine out in stores today.

The front cover of The MagPi 58

The MagPi 58 features our latest Minecraft Pi hacks!

Minecraft Pi

The MagPi 58 is all about making with Minecraft Pi. We’ve got cool projects and hacks that let you take a selfie and display it in the Minecraft world, play music with Steve jumping on a giant piano, and use special cards to switch skins in an instant. It’s the perfect supplement to our Hacking and Making in Minecraft book!

AIY Voice Projects

It’s been great to see everyone getting excited over the last issue of the magazine, and we love seeing your pictures and videos of your AIY Voice projects. In this issue we’ve included loads of ideas to keep you going with the AIY Projects kit. Don’t forget to send us what you’ve made on Twitter!

Issue 57 of The MagPi, showing the Google AIY Voice Projects Kit

Show us what you’ve made with your AIY Voice Projects Kit

The best of the rest in The MagPi 58

We’ve also got our usual selection of reviews, tutorials, and projects. This includes guides to making file servers and electronic instruments, along with our review of Adafruit’s Joy Bonnet handheld gaming kit.

A page from The MagPi 58 showing information on 'Getting Started with GUIs'

You can get started with GUIs in The MagPi 58

You can grab the latest issue in stores in the UK right now, from WHSmith, Sainsburys, Asda, and Tesco. Copies will be arriving very soon in US stores, including Barnes & Noble and Micro Center. You can also get a copy online from our store, or digitally via our Android or iOS app. Don’t forget, there’s always the free PDF as well.

We hope you enjoy the issue! Now if you’ll excuse us, we need a nap after all the excitement!

The post Make with Minecraft Pi in The MagPi 58 appeared first on Raspberry Pi.

Now Anyone Can Embed a Pirate Movie in a Website

Post Syndicated from Andy original https://torrentfreak.com/now-anyone-can-embed-a-pirate-movie-in-a-website-170522/

While torrents are still the go-to source for millions of users seeking free online media, people are increasingly seeking the immediacy and convenience of web-based streaming.

As a result, hundreds of websites have appeared in recent years, offering Netflix-inspired interfaces that provide an enhanced user experience over the predominantly text-based approach utilized by most torrent sites.

While there hasn’t been a huge amount of innovation in either field recently, a service that raised its head during recent weeks is offering something new and potentially significant, if it continues to deliver on its promises without turning evil.

Vodlocker.to is the latest in a long list of sites using the Vodlocker name, which is bound to cause some level of confusion. However, what this Vodlocker variant offers is a convenient way for users to not only search for and find movies hosted on the Internet, but stream them instantly – with a twist.

After entering a movie’s IMDb code (the one starting ‘tt’) in a box on the page, Vodlocker quickly searches for the movie on various online hosting services, including Google Drive.

Entering the IMDb code

“We believe the complexity of uploading a video has become unnecessary, so we have created much like Google, an automated crawler that visits millions of pages every day to find all videos on the internet,” the site explains.

As shown in the image above, the site takes the iMDb number and generates code. That allows the user to embed an HTML5 video player in their own website, which plays the movie in question. We tested around a dozen movies with a 100% success rate, with search times from a couple of seconds to around 20 seconds maximum.

A demo on the site shows exactly how the embed code currently performs, with the video player offering the usual controls such as play and pause, with a selector for quality and volume levels. The usual ‘full screen’ button sits in the bottom right corner.

The player can be embedded anywhere

Near the top of the window are options for selecting different sources for the video, should it become unplayable or if a better quality version is required. Interestingly, should one of those sources be Google Video, Vodlocker says its player offers Chromecast and subtitle support.

“Built-in chromecast plugin streams free HD movies/tv shows from your website to your TV via Google Chromecast. Built-in opensubtitles.org plugin finds subtitles in all languages and auto-selects your language,” the site reports.

In addition to a link-checker that aims to exclude broken links (missing sources), the service also pulls movie-related artwork from IMDb, to display while the selected movie is being prepared for streaming.

The site is already boasting a “massive database” of movies, which will make it of immediate use to thousands of websites that might want to embed movies or TV shows in their web pages.

As long as Vodlocker can cope with the load, this could effectively spawn a thousand new ‘pirate’ websites overnight but the service generally seems more suited to smaller, blog-like sites that might want to display a smaller selection of titles.

That being said, it’s questionable whether a site would seek to become entirely reliant on a service like this. While the videos it indexes are more decentralized, the service itself could be shut down in the blink of an eye, at which point every link stops working.

It’s also worth noting that the service uses IFrame tags, which some webmasters might feel uncomfortable about deploying on their sites due to security concerns.

The New Vodlocker API demo can be found here, for as long as it lasts.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Private Torrent Tracker Freshon.tv Shuts Down

Post Syndicated from Ernesto original https://torrentfreak.com/private-torrent-tracker-freshon-tv-shuts-down-170519/

This isn’t a particularly good week for avid torrent users. After the public ExtraTorrent index shut down, the private community must now say goodbye to Freshon.tv (TvT).

Sysops of the private tracker, which specialized in TV releases, have decided to move on and focus on their personal lives instead.

“We are shutting the website down due to the lack of interest,” TvT staff wrote in a message yesterday.

“This is no longer a fun activity and the old members simply have other things to focus on, like work and families. We like to thank each member that was present in the comunity we built and maintained for a decade.”

Soon after, the site went dark, removing the option to login and showing a brief shutdown message instead.

Poor hamsters

The decsion came as a surprise to the site’s users. Several staffers were also unaware of the decision the sysops had taken, according to a statement by a member on Reddit.

“We, the staff, didn’t know of this until they made the public announcement, so this all comes a bit rushed,” Liberateus writes, adding that fellow torrent tracker Morethan.tv is willing to take on refugees.

“Morethan.tv is a rather young but fast growing tracker focused on TV and movie content, with a nice community, dedicated staff and a huge selection of content,” he writes.

The Reddit thread also includes offers from other trackers that are willing to take people in, so it appears that most users will find a new home.

Freshon, or ‘TV Torrents RO,’ was one of the larger specialized TV trackers on the Internet. The site had close to 20,000 users who together shared tens of thousands of torrents.

The site was particularly popular popular in Romania, from where the site originated. The specialized TV tracker first saw the light of day more than a decade a ago, reportedly with help from several staffers of the defunct torrents.ro tracker.

Better times for TvT (image via Opentrackers)

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Some notes on Trump’s cybersecurity Executive Order

Post Syndicated from Robert Graham original http://blog.erratasec.com/2017/05/some-notes-on-trumps-cybersecurity.html

President Trump has finally signed an executive order on “cybersecurity”. The first draft during his first weeks in power were hilariously ignorant. The current draft, though, is pretty reasonable as such things go. I’m just reading the plain language of the draft as a cybersecurity expert, picking out the bits that interest me. In reality, there’s probably all sorts of politics in the background that I’m missing, so I may be wildly off-base.

Holding managers accountable

This is a great idea in theory. But government heads are rarely accountable for anything, so it’s hard to see if they’ll have the nerve to implement this in practice. When the next breech happens, we’ll see if anybody gets fired.
“antiquated and difficult to defend Information Technology”

The government uses laughably old computers sometimes. Forces in government wants to upgrade them. This won’t work. Instead of replacing old computers, the budget will simply be used to add new computers. The old computers will still stick around.
“Legacy” is a problem that money can’t solve. Programmers know how to build small things, but not big things. Everything starts out small, then becomes big gradually over time through constant small additions. What you have now is big legacy systems. Attempts to replace a big system with a built-from-scratch big system will fail, because engineers don’t know how to build big systems. This will suck down any amount of budget you have with failed multi-million dollar projects.
It’s not the antiquated systems that are usually the problem, but more modern systems. Antiquated systems can usually be protected by simply sticking a firewall or proxy in front of them.

“address immediate unmet budgetary needs necessary to manage risk”

Nobody cares about cybersecurity. Instead, it’s a thing people exploit in order to increase their budget. Instead of doing the best security with the budget they have, they insist they can’t secure the network without more money.

An alternate way to address gaps in cybersecurity is instead to do less. Reduce exposure to the web, provide fewer services, reduce functionality of desktop computers, and so on. Insisting that more money is the only way to address unmet needs is the strategy of the incompetent.

Use the NIST framework
Probably the biggest thing in the EO is that it forces everyone to use the NIST cybersecurity framework.
The NIST Framework simply documents all the things that organizations commonly do to secure themselves, such run intrusion-detection systems or impose rules for good passwords.
There are two problems with the NIST Framework. The first is that no organization does all the things listed. The second is that many organizations don’t do the things well.
Password rules are a good example. Organizations typically had bad rules, such as frequent changes and complexity standards. So the NIST Framework documented them. But cybersecurity experts have long opposed those complex rules, so have been fighting NIST on them.

Another good example is intrusion-detection. These days, I scan the entire Internet, setting off everyone’s intrusion-detection systems. I can see first hand that they are doing intrusion-detection wrong. But the NIST Framework recommends they do it, because many organizations do it, but the NIST Framework doesn’t demand they do it well.
When this EO forces everyone to follow the NIST Framework, then, it’s likely just going to increase the amount of money spent on cybersecurity without increasing effectiveness. That’s not necessarily a bad thing: while probably ineffective or counterproductive in the short run, there might be long-term benefit aligning everyone to thinking about the problem the same way.
Note that “following” the NIST Framework doesn’t mean “doing” everything. Instead, it means documented how you do everything, a reason why you aren’t doing anything, or (most often) your plan to eventually do the thing.
preference for shared IT services for email, cloud, and cybersecurity
Different departments are hostile toward each other, with each doing things their own way. Obviously, the thinking goes, that if more departments shared resources, they could cut costs with economies of scale. Also obviously, it’ll stop the many home-grown wrong solutions that individual departments come up with.
In other words, there should be a single government GMail-type service that does e-mail both securely and reliably.
But it won’t turn out this way. Government does not have “economies of scale” but “incompetence at scale”. It means a single GMail-like service that is expensive, unreliable, and in the end, probably insecure. It means we can look forward to government breaches that instead of affecting one department affecting all departments.

Yes, you can point to individual organizations that do things poorly, but what you are ignoring is the organizations that do it well. When you make them all share a solution, it’s going to be the average of all these things — meaning those who do something well are going to move to a worse solution.

I suppose this was inserted in there so that big government cybersecurity companies can now walk into agencies, point to where they are deficient on the NIST Framework, and say “sign here to do this with our shared cybersecurity service”.
“identify authorities and capabilities that agencies could employ to support the cybersecurity efforts of critical infrastructure entities”
What this means is “how can we help secure the power grid?”.
What it means in practice is that fiasco in the Vermont power grid. The DHS produced a report containing IoCs (“indicators of compromise”) of Russian hackers in the DNC hack. Among the things it identified was that the hackers used Yahoo! email. They pushed these IoCs out as signatures in their “Einstein” intrusion-detection system located at many power grid locations. The next person that logged into their Yahoo! email was then flagged as a Russian hacker, causing all sorts of hilarity to ensue, such as still uncorrected stories by the Washington Post how the Russians hacked our power-grid.
The upshot is that federal government help is also going to include much government hindrance. They really are this stupid sometimes and there is no way to fix this stupid. (Seriously, the DHS still insists it did the right thing pushing out the Yahoo IoCs).
Resilience Against Botnets and Other Automated, Distributed Threats

The government wants to address botnets because it’s just the sort of problem they love, mass outages across the entire Internet caused by a million machines.

But frankly, botnets don’t even make the top 10 list of problems they should be addressing. Number #1 is clearly “phishing” — you know, the attack that’s been getting into the DNC and Podesta e-mails, influencing the election. You know, the attack that Gizmodo recently showed the Trump administration is partially vulnerable to. You know, the attack that most people blame as what probably led to that huge OPM hack. Replace the entire Executive Order with “stop phishing”, and you’d go further fixing federal government security.

But solving phishing is tough. To begin with, it requires a rethink how the government does email, and how how desktop systems should be managed. So the government avoids complex problems it can’t understand to focus on the simple things it can — botnets.

Dealing with “prolonged power outage associated with a significant cyber incident”

The government has had the hots for this since 2001, even though there’s really been no attack on the American grid. After the Russian attacks against the Ukraine power grid, the issue is heating up.

Nation-wide attacks aren’t really a threat, yet, in America. We have 10,000 different companies involved with different systems throughout the country. Trying to hack them all at once is unlikely. What’s funny is that it’s the government’s attempts to standardize everything that’s likely to be our downfall, such as sticking Einstein sensors everywhere.

What they should be doing is instead of trying to make the grid unhackable, they should be trying to lessen the reliance upon the grid. They should be encouraging things like Tesla PowerWalls, solar panels on roofs, backup generators, and so on. Indeed, rather than industrial system blackout, industry backup power generation should be considered as a source of grid backup. Factories and even ships were used to supplant the electric power grid in Japan after the 2011 tsunami, for example. The less we rely on the grid, the less a blackout will hurt us.

“cybersecurity risks facing the defense industrial base, including its supply chain”

So “supply chain” cybersecurity is increasingly becoming a thing. Almost anything electronic comes with millions of lines of code, silicon chips, and other things that affect the security of the system. In this context, they may be worried about intentional subversion of systems, such as that recent article worried about Kaspersky anti-virus in government systems. However, the bigger concern is the zillions of accidental vulnerabilities waiting to be discovered. It’s impractical for a vendor to secure a product, because it’s built from so many components the vendor doesn’t understand.

“strategic options for deterring adversaries and better protecting the American people from cyber threats”

Deterrence is a funny word.

Rumor has it that we forced China to backoff on hacking by impressing them with our own hacking ability, such as reaching into China and blowing stuff up. This works because the Chinese governments remains in power because things are going well in China. If there’s a hiccup in economic growth, there will be mass actions against the government.

But for our other cyber adversaries (Russian, Iran, North Korea), things already suck in their countries. It’s hard to see how we can make things worse by hacking them. They also have a strangle hold on the media, so hacking in and publicizing their leader’s weird sex fetishes and offshore accounts isn’t going to work either.

Also, deterrence relies upon “attribution”, which is hard. While news stories claim last year’s expulsion of Russian diplomats was due to election hacking, that wasn’t the stated reason. Instead, the claimed reason was Russia’s interference with diplomats in Europe, such as breaking into diplomat’s homes and pooping on their dining room table. We know it’s them when they are brazen (as was the case with Chinese hacking), but other hacks are harder to attribute.

Deterrence of nation states ignores the reality that much of the hacking against our government comes from non-state actors. It’s not clear how much of all this Russian hacking is actually directed by the government. Deterrence polices may be better directed at individuals, such as the recent arrest of a Russian hacker while they were traveling in Spain. We can’t get Russian or Chinese hackers in their own countries, so we have to wait until they leave.

Anyway, “deterrence” is one of those real-world concepts that hard to shoe-horn into a cyber (“cyber-deterrence”) equivalent. It encourages lots of bad thinking, such as export controls on “cyber-weapons” to deter foreign countries from using them.

“educate and train the American cybersecurity workforce of the future”

The problem isn’t that we lack CISSPs. Such blanket certifications devalue the technical expertise of the real experts. The solution is to empower the technical experts we already have.

In other words, mandate that whoever is the “cyberczar” is a technical expert, like how the Surgeon General must be a medical expert, or how an economic adviser must be an economic expert. For over 15 years, we’ve had a parade of non-technical people named “cyberczar” who haven’t been experts.

Once you tell people technical expertise is valued, then by nature more students will become technical experts.

BTW, the best technical experts are software engineers and sysadmins. The best cybersecurity for Windows is already built into Windows, whose sysadmins need to be empowered to use those solutions. Instead, they are often overridden by a clueless cybersecurity consultant who insists on making the organization buy a third-party product instead that does a poorer job. We need more technical expertise in our organizations, sure, but not necessarily more cybersecurity professionals.

Conclusion

This is really a government document, and government people will be able to explain it better than I. These are just how I see it as a technical-expert who is a government-outsider.

My guess is the most lasting consequential thing will be making everyone following the NIST Framework, and the rest will just be a lot of aspirational stuff that’ll be ignored.

Securing Elections

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/05/securing_electi.html

Technology can do a lot more to make our elections more secure and reliable, and to ensure that participation in the democratic process is available to all. There are three parts to this process.

First, the voter registration process can be improved. The whole process can be streamlined. People should be able to register online, just as they can register for other government services. The voter rolls need to be protected from tampering, as that’s one of the major ways hackers can disrupt the election.

Second, the voting process can be significantly improved. Voting machines need to be made more secure. There are a lot of technical details best left to the voting-security experts who can deal with them, but such machines must include a paper ballot that provides a record verifiable by voters. The simplest and most reliable way to do that is already practiced in 37 states: optical-scan paper ballots, marked by the voters and counted by computer, but recountable by hand.

We need national security standards for voting machines, and funding for states to procure machines that comply with those standards.

This means no Internet voting. While that seems attractive, and certainly a way technology can improve voting, we don’t know how to do it securely. We simply can’t build an Internet voting system that is secure against hacking because of the requirement for a secret ballot. This makes voting different from banking and anything else we do on the Internet, and it makes security much harder. Even allegations of vote hacking would be enough to undermine confidence in the system, and we simply cannot afford that. We need a system of pre-election and post-election security audits of these voting machines to increase confidence in the system.

The third part of the voting process we need to secure is the tabulation system. After the polls close, we aggregate votes — ­from individual machines, to polling places, to precincts, and finally to totals. This system is insecure as well, and we can do a lot more to make it reliable. Similarly, our system of recounts can be made more secure and efficient.

We have the technology to do all of this. The problem is political will. We have to decide that the goal of our election system is for the most people to be able to vote with the least amount of effort. If we continue to enact voter suppression measures like ID requirements, barriers to voter registration, limitations on early voting, reduced polling place hours, and faulty machines, then we are harming democracy more than we are by allowing our voting machines to be hacked.

We have already declared our election system to be critical national infrastructure. This is largely symbolic, but it demonstrates a commitment to secure elections and makes funding and other resources available to states. We can do much more. We owe it to democracy to do it.

This essay previously appeared on TheAtlantic.com.

Some notes on #MacronLeak

Post Syndicated from Robert Graham original http://blog.erratasec.com/2017/05/some-notes-on-macronleak.html

Tonight (Friday May 5 2017) hackers dumped emails (and docs) related to French presidential candidate Emmanuel Macron. He’s the anti-Putin candidate running against the pro-Putin Marin Le Pen. I thought I’d write up some notes.

Are they Macron’s emails?

No. They are e-mails from members of his staff/supporters, namely Alain Tourret, Pierre Person, Cedric O??, Anne-Christine Lang, and Quentin Lafay.
There are some documents labeled “Macron” which may have been taken from his computer, cloud drive — his own, or an assistant.

Who done it?
Obviously, everyone assumes that Russian hackers did it, but there’s nothing (so far) that points to anybody in particular.
It appears to be the most basic of phishing attacks, which means anyone could’ve done it, including your neighbor’s pimply faced teenager.

Update: Several people [*] have pointed out Trend Micro reporting that Russian/APT28 hackers were targeting Macron back on April 24. Coincidentally, this is also the latest that emails appear in the dump.

What’s the hacker’s evil plan?
Everyone is proposing theories about the hacker’s plan, but the most likely answer is they don’t have one. Hacking is opportunistic. They likely targeted everyone in the campaign, and these were the only victims they could hack. It’s probably not the outcome they were hoping for.
But since they’ve gone through all the work, it’d be a shame to waste it. Thus, they are likely releasing the dump not because they believe it will do any good, but because it’ll do them no harm. It’s a shame to waste all the work they put into it.
If there’s any plan, it’s probably a long range one, serving notice that any political candidate that goes against Putin will have to deal with Russian hackers dumping email.
Why now? Why not leak bits over time like with Clinton?

France has a campaign blackout starting tonight at midnight until the election on Sunday. Thus, it’s the perfect time to leak the files. Anything salacious, or even rumors of something bad, will spread viraly through Facebook and Twitter, without the candidate or the media having a good chance to rebut the allegations.
The last emails in the logs appear to be from April 24, the day after the first round vote (Sunday’s vote is the second, runoff, round). Thus, the hackers could’ve leaked this dump any time in the last couple weeks. They chose now to do it.
Are the emails verified?
Yes and no.
Yes, we have DKIM signatures between people’s accounts, so we know for certain that hackers successfully breached these accounts. DKIM is an anti-spam method that cryptographically signs emails by the sending domain (e.g. @gmail.com), and thus, can also verify the email hasn’t been altered or forged.
But no, when a salacious email or document is found in the dump, it’ll likely not have such a signature (most emails don’t), and thus, we probably won’t be able to verify the scandal. In other words, the hackers could have altered or forged something that becomes newsworthy.
What are the most salacious emails/files?

I don’t know. Before this dump, hackers on 4chan were already making allegations that Macron had secret offshore accounts (debunked). Presumably we need to log in to 4chan tomorrow for them to point out salacious emails/files from this dump.

Another email going around seems to indicate that Alain Tourret, a member of the French legislature, had his assistant @FrancoisMachado buy drugs online with Bitcoin and had them sent to his office in the legislature building. The drugs in question, 3-MMC, is a variant of meth that might be legal in France. The emails point to a tracking number which looks legitimate, at least, that a package was indeed shipped to that area of Paris. There is a bitcoin transaction that matches the address, time, and amount specified in the emails. Some claim these drug emails are fake, but so far, I haven’t seen any emails explaining why they should be fake. On the other hand, there’s nothing proving they are true (no DKIM sig), either.

Some salacious emails might be obvious, but some may take people with more expertise to find. For example, one email is a receipt from Uber (with proper DKIM validation) that shows the route that “Quenten” took on the night of the first round election. Somebody clued into the French political scene might be able to figure out he’s visiting his mistress, or something. (This is hypothetical — in reality, he’s probably going from one campaign rally to the next).

What’s the Macron camp’s response?

They have just the sort of response you’d expect.
They claim some of the documents/email are fake, without getting into specifics. They claim that information is needed to be understand in context. They claim that this was a “massive coordinated attack”, even though it’s something that any pimply faced teenager can do. They claim it’s an attempt to destabilize democracy. They call upon journalists to be “responsible”.

RFD: the alien abduction prophecy protocol

Post Syndicated from Michal Zalewski original http://lcamtuf.blogspot.com/2017/05/rfd-alien-abduction-prophecy-protocol.html

“It’s tough to make predictions, especially about the future.”
– variously attributed to Yogi Berra and Niels Bohr

Right. So let’s say you are visited by transdimensional space aliens from outer space. There’s some old-fashioned probing, but eventually, they get to the point. They outline a series of apocalyptic prophecies, beginning with the surprise 2032 election of Dwayne Elizondo Mountain Dew Herbert Camacho as the President of the United States, followed by a limited-scale nuclear exchange with the Grand Duchy of Ruritania in 2036, and culminating with the extinction of all life due to a series of cascading Y2K38 failures that start at an Ohio pretzel reprocessing plan. Long story short, if you want to save mankind, you have to warn others of what’s to come.

But there’s a snag: when you wake up in a roadside ditch in Alabama, you realize that nobody is going to believe your story! If you come forward, your professional and social reputation will be instantly destroyed. If you’re lucky, the vindication of your claims will come fifteen years later; if not, it might turn out that you were pranked by some space alien frat boys who just wanted to have some cheap space laughs. The bottom line is, you need to be certain before you make your move. You figure this means staying mum until the Election Day of 2032.

But wait, this plan is also not very good! After all, how could your future self convince others that you knew about President Camacho all along? Well… if you work in information security, you are probably familiar with a neat solution: write down your account of events in a text file, calculate a cryptographic hash of this file, and publish the resulting value somewhere permanent. Fifteen years later, reveal the contents of your file and point people to your old announcement. Explain that you must have been in the possession of this very file back in 2017; otherwise, you would not have known its hash. Voila – a commitment scheme!

Although elegant, this approach can be risky: historically, the usable life of cryptographic hash functions seemed to hover at somewhere around 15 years – so even if you pick a very modern algorithm, there is a real risk that future advances in cryptanalysis could severely undermine the strength of your proof. No biggie, though! For extra safety, you could combine several independent hashing functions, or increase the computational complexity of the hash by running it in a loop. There are also some less-known hash functions, such as SPHINCS, that are designed with different trade-offs in mind and may offer longer-term security guarantees.

Of course, the computation of the hash is not enough; it needs to become an immutable part of the public record and remain easy to look up for years to come. There is no guarantee that any particular online publishing outlet is going to stay afloat that long and continue to operate in its current form. The survivability of more specialized and experimental platforms, such as blockchain-based notaries, seems even less clear. Thankfully, you can resort to another kludge: if you publish the hash through a large number of independent online venues, there is a good chance that at least one of them will be around in 2032.

(Offline notarization – whether of the pen-and-paper or the PKI-based variety – offers an interesting alternative. That said, in the absence of an immutable, public ledger, accusations of forgery or collusion would be very easy to make – especially if the fate of the entire planet is at stake.)

Even with this out of the way, there is yet another profound problem with the plan: a current-day scam artist could conceivably generate hundreds or thousands of political predictions, publish the hashes, and then simply discard or delete the ones that do not come true by 2032 – thus creating an illusion of prescience. To convince skeptics that you are not doing just that, you could incorporate a cryptographic proof of work into your approach, attaching a particular CPU time “price tag” to every hash. The future you could then claim that it would have been prohibitively expensive for the former you to attempt the “prediction spam” attack. But this argument seems iffy: a $1,000 proof may already be too costly for a lower middle class abductee, while a determined tech billionaire could easily spend $100,000 to pull off an elaborate prank on the entire world. Not to mention, massive CPU resources can be commandeered with little or no effort by the operators of large botnets and many other actors of this sort.

In the end, my best idea is to rely on an inherently low-bandwidth publication medium, rather than a high-cost one. For example, although a determined hoaxer could place thousands of hash-bearing classifieds in some of the largest-circulation newspapers, such sleigh-of-hand would be trivial for future sleuths to spot (at least compared to combing through the entire Internet for an abandoned hash). Or, as per an anonymous suggestion relayed by Thomas Ptacek: just tattoo the signature on your body, then post some post some pics; there are only so many places for a tattoo to go.

Still, what was supposed to be a nice, scientific proof devolved into a bunch of hand-wavy arguments and poorly-quantified probabilities. For the sake of future abductees: is there a better way?

Get a free AIY Projects Voice Kit with The MagPi 57!

Post Syndicated from Rob Zwetsloot original https://www.raspberrypi.org/blog/free-aiy-projects-voice-kit-magpi-57/

We’re extremely excited to share with you the latest issue of The MagPi, the official Raspberry Pi magazine. It’s a very special issue bundled with an exclusive project kit from Google.

Called AIY Projects, the free hardware kit enables you to add voice interaction to your Raspberry Pi projects. The first AIY Projects kits are bundled free with the print edition of The MagPi 57.

Photo of the free AIY Projects kit bundled with The MagPi 57: HAT accessory boards, wires, button and custom cardboard housing

What you’ll find inside

Inside the magazine, you’ll find a Google Voice Hardware Attached on Top (HAT) accessory board, a stereo microphone Voice HAT board, a large arcade button, and a selection of wires. Last but not least, you’ll find a custom cardboard case to house it all in.

All you need to add is a Raspberry Pi 3. Then, after some software setup, you’ll have access to the Google Assistant SDK and Google Cloud Speech API.

AIY Projects adds natural human interaction to your Raspberry Pi

Check out the exclusive Google AIY Projects Kit that comes free with The MagPi 57! Grab yourself a copy in stores or online now: http://magpi.cc/2pI6IiQ This first AIY Projects kit taps into the Google Assistant SDK and Cloud Speech API using the AIY Projects Voice HAT (Hardware Accessory on Top) board, stereo microphone, and speaker (included free with the magazine).

We’ve got a full breakdown of how to set it all up and get it working inside the magazine. The folks at Google, along with us at The MagPi, are really excited to see what projects you can create (or enhance) with this kit, whether you’re creating a voice-controlled robot or a voice interface that answers all your questions. Some Raspberry Pi owners have been building AIY Projects in secret at Hackster, and we have their best voice interaction ideas in the magazine.

On top of this incredible bundle we also have our usual selection of excellent tutorials – such as an introduction to programming with Minecraft Pi, and hacking an Amazon Dash button – along with reviews, project showcases, and our guide to building the ultimate makers’ toolbox.

Two-page spread from The MagPi, titled "Makers' Toolkit"

Create the ultimate makers’ toolkit and much more with issue 57 of The MagPi

Subscribers should be getting their copies tomorrow, and you can also buy a copy in UK stores including WHSmith, Tesco, Sainsbury’s, and Asda. Copies have been shipped to North America, and are available at Barnes & Noble and other stores. Otherwise, you can get a copy online from The PiHut. Digital versions (without the AIY Projects kit) are available in our Android and iOS app. Finally, as always, there’s the free PDF download.

We really hope you enjoy this issue and make some amazing things with your AIY Projects kit. Let us know what you plan to make on social media, using the hashtag #AIYProjects, or on the Raspberry Pi forums.

The post Get a free AIY Projects Voice Kit with The MagPi 57! appeared first on Raspberry Pi.

Canada and Switzerland Remain on US ‘Pirate Watchlist’ Under President Trump

Post Syndicated from Ernesto original https://torrentfreak.com/canada-and-switzerland-remain-on-us-pirate-watchlist-under-president-trump-170501/

ustrEvery year the Office of the United States Trade Representative (USTR) publishes its Special 301 Report highlighting countries that aren’t doing enough to protect U.S. intellectual property rights.

The format remains the same as in previous years and lists roughly two dozen countries that, for different reasons, threaten the intellectual property rights of US companies.

The latest report, which just came out, is the first under the administration of President Trump and continues where Obama left off. China, Russia, Ukraine, and India are listed among the priority threats, and Canada and Switzerland remain on the general Watch List.

“One of the top trade priorities for the Trump Administration is to use all possible sources of leverage to encourage other countries to open their markets to U.S. exports of goods and services, and provide adequate and effective protection and enforcement of U.S. intellectual property (IP) rights,” the USTR writes.

One of the main problems the US has with Canada is that it doesn’t allow border protection officials to seize or destroy pirated and counterfeit goods that are passing through.

In addition, the US is fiercely against Canada’s fair dealing rules, which adds educational use to the list of copyright infringement exceptions. According to the US, the language used in the law is too broad, damaging the rights of educational publishers.

“The United States also remains deeply troubled by the broad interpretation of an ambiguous education-related exception to copyright that has significantly damaged the market for educational publishers and authors.”

In the past, Canada has also been called out for offering a safe haven to pirate sites, but there is no mention of this in the 2017 report (pdf).

That said, pirate site hosting remains a problem in many other countries including Switzerland, with the USTR noting that the country has become an “increasingly popular host country for websites offering infringing content” since 2010.

While the Swiss Government is taking steps to address these concerns, another enforcement problem also requires attention. One of the key issues the United States has with Switzerland originates from the so-called ‘Logistep Decision.‘

In 2010 the Swiss Federal Supreme Court barred anti-piracy outfit Logistep from harvesting the IP addresses of file-sharers. The Court ruled that IP addresses amount to private data, and outlawed the tracking of file-sharers in Switzerland.

According to the US, this ruling prevents copyright holders from enforcing their rights, and they call on the Swiss Government to address this concern.

“Switzerland remains on the Watch List this year due to U.S. concerns regarding specific difficulties in Switzerland’s system of online copyright protection and enforcement,” the USTR writes.

“Seven years have elapsed since the issuance of a decision by the Swiss Federal Supreme Court, which has been implemented to essentially deprive copyright holders in Switzerland of the means to enforce their rights against online infringers. Enforcement is a critical element of providing meaningful IP protection.”

The above points are merely a selection of the many complaints the United States has about a variety of countries. As is often the case, the allegations are in large part based on reports from copyright-heavy industries, in some cases demanding measures that are not even in effect in the US itself.

By calling out foreign governments, the USTR hopes to elicit change. However, not all countries are receptive to this kind of diplomatic pressure. Canada, for one, said it does’t recognize the Special 301 Report and plans to follow its own path.

“Canada does not recognize the validity of the Special 301 and considers the process and the Report to be flawed,” the Government wrote in a previous memo regarding last year’s report.

“The Report fails to employ a clear methodology and the findings tend to rely on industry allegations rather than empirical evidence and objective analysis,” it added.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Near Zero Downtime Migration from MySQL to DynamoDB

Post Syndicated from YongSeong Lee original https://aws.amazon.com/blogs/big-data/near-zero-downtime-migration-from-mysql-to-dynamodb/

Many companies consider migrating from relational databases like MySQL to Amazon DynamoDB, a fully managed, fast, highly scalable, and flexible NoSQL database service. For example, DynamoDB can increase or decrease capacity based on traffic, in accordance with business needs. The total cost of servicing can be optimized more easily than for the typical media-based RDBMS.

However, migrations can have two common issues:

  • Service outage due to downtime, especially when customer service must be seamlessly available 24/7/365
  • Different key design between RDBMS and DynamoDB

This post introduces two methods of seamlessly migrating data from MySQL to DynamoDB, minimizing downtime and converting the MySQL key design into one more suitable for NoSQL.

AWS services

I’ve included sample code that uses the following AWS services:

  • AWS Database Migration Service (AWS DMS) can migrate your data to and from most widely used commercial and open-source databases. It supports homogeneous and heterogeneous migrations between different database platforms.
  • Amazon EMR is a managed Hadoop framework that helps you process vast amounts of data quickly. Build EMR clusters easily with preconfigured software stacks that include Hive and other business software.
  • Amazon Kinesis can continuously capture and retain a vast amount of data such as transaction, IT logs, or clickstreams for up to 7 days.
  • AWS Lambda helps you run your code without provisioning or managing servers. Your code can be automatically triggered by other AWS services such Amazon Kinesis Streams.

Migration solutions

Here are the two options I describe in this post:

  1. Use AWS DMS

AWS DMS supports migration to a DynamoDB table as a target. You can use object mapping to restructure original data to the desired structure of the data in DynamoDB during migration.

  1. Use EMR, Amazon Kinesis, and Lambda with custom scripts

Consider this method when more complex conversion processes and flexibility are required. Fine-grained user control is needed for grouping MySQL records into fewer DynamoDB items, determining attribute names dynamically, adding business logic programmatically during migration, supporting more data types, or adding parallel control for one big table.

After the initial load/bulk-puts are finished, and the most recent real-time data is caught up by the CDC (change data capture) process, you can change the application endpoint to DynamoDB.

The method of capturing changed data in option 2 is covered in the AWS Database post Streaming Changes in a Database with Amazon Kinesis. All code in this post is available in the big-data-blog GitHub repo, including test codes.

Solution architecture

The following diagram shows the overall architecture of both options.

Option 1:  Use AWS DMS

This section discusses how to connect to MySQL, read the source data, and then format the data for consumption by the target DynamoDB database using DMS.

Create the replication instance and source and target endpoints

Create a replication instance that has sufficient storage and processing power to perform the migration job, as mentioned in the AWS Database Migration Service Best Practices whitepaper. For example, if your migration involves a large number of tables, or if you intend to run multiple concurrent replication tasks, consider using one of the larger instances. The service consumes a fair amount of memory and CPU.

As the MySQL user, connect to MySQL and retrieve data from the database with the privileges of SUPER, REPLICATION CLIENT. Enable the binary log and set the binlog_format parameter to ROW for CDC in the MySQL configuration. For more information about how to use DMS, see Getting Started  in the AWS Database Migration Service User Guide.

mysql> CREATE USER 'repl'@'%' IDENTIFIED BY 'welcome1';
mysql> GRANT all ON <database name>.* TO 'repl'@'%';
mysql> GRANT SUPER,REPLICATION CLIENT  ON *.* TO 'repl'@'%';

Before you begin to work with a DynamoDB database as a target for DMS, make sure that you create an IAM role for DMS to assume, and grant access to the DynamoDB target tables. Two endpoints must be created to connect the source and target. The following screenshot shows sample endpoints.

The following screenshot shows the details for one of the endpoints, source-mysql.

Create a task with an object mapping rule

In this example, assume that the MySQL table has a composite primary key (customerid + orderid + productid). You are going to restructure the key to the desired structure of the data in DynamoDB, using an object mapping rule.

In this case, the DynamoDB table has the hash key that is a combination of the customerid and orderid columns, and the sort key is the productid column. However, the partition key should be decided by the user in an actual migration, based on data ingestion and access pattern. You would usually use high-cardinality attributes. For more information about how to choose the right DynamoDB partition key, see the Choosing the Right DynamoDB Partition Key AWS Database blog post.

DMS automatically creates a corresponding attribute on the target DynamoDB table for the quantity column from the source table because rule-action is set to map-record-to-record and the column is not listed in the exclude-columns attribute list. For more information about map-record-to-record and map-record-to-document, see Using an Amazon DynamoDB Database as a Target for AWS Database Migration Service.

Migration starts immediately after the task is created, unless you clear the Start task on create option. I recommend enabling logging to make sure that you are informed about what is going on with the migration task in the background.

The following screenshot shows the task creation page.

You can use the console to specify the individual database tables to migrate and the schema to use for the migration, including transformations. On the Guided tab, use the Where section to specify the schema, table, and action (include or exclude). Use the Filter section to specify the column name in a table and the conditions to apply.

Table mappings also can be created in JSON format. On the JSON tab, check Enable JSON editing.

Here’s an example of an object mapping rule that determines where the source data is located in the target. If you copy the code, replace the values of the following attributes. For more examples, see Using an Amazon DynamoDB Database as a Target for AWS Database Migration Service.

  • schema-name
  • table-name
  • target-table-name
  • mapping-parameters
  • attribute-mappings
{
  "rules": [
   {
      "rule-type": "selection",
      "rule-id": "1",
      "rule-name": "1",
      "object-locator": {
        "schema-name": "mydatabase",
        "table-name": "purchase"
      },
      "rule-action": "include"
    },
    {
      "rule-type": "object-mapping",
      "rule-id": "2",
      "rule-name": "2",
      "rule-action": "map-record-to-record",
      "object-locator": {
        "schema-name": "mydatabase",
        "table-name": "purchase"
 
      },
      "target-table-name": "purchase",
      "mapping-parameters": {
        "partition-key-name": "customer_orderid",
        "sort-key-name": "productid",
        "exclude-columns": [
          "customerid",
          "orderid"           
        ],
        "attribute-mappings": [
          {
            "target-attribute-name": "customer_orderid",
            "attribute-type": "scalar",
            "attribute-sub-type": "string",
            "value": "${customerid}|${orderid}"
          },
          {
            "target-attribute-name": "productid",
            "attribute-type": "scalar",
            "attribute-sub-type": "string",
            "value": "${productid}"
          }
        ]
      }
    }
  ]
}

Start the migration task

If the target table specified in the target-table-name property does not exist in DynamoDB, DMS creates the table according to data type conversion rules for source and target data types. There are many metrics to monitor the progress of migration. For more information, see Monitoring AWS Database Migration Service Tasks.

The following screenshot shows example events and errors recorded by CloudWatch Logs.

DMS replication instances that you used for the migration should be deleted once all migration processes are completed. Any CloudWatch logs data older than the retention period is automatically deleted.

Option 2: Use EMR, Amazon Kinesis, and Lambda

This section discusses an alternative option using EMR, Amazon Kinesis, and Lambda to provide more flexibility and precise control. If you have a MySQL replica in your environment, it would be better to dump data from the replica.

Change the key design

When you decide to change your database from RDMBS to NoSQL, you need to find a more suitable key design for NoSQL, for performance as well as cost-effectiveness.

Similar to option #1, assume that the MySQL source has a composite primary key (customerid + orderid + productid). However, for this option, group the MySQL records into fewer DynamoDB items by customerid (hash key) and orderid (sort key). Also, remove the last column (productid) of the composite key by converting the record values productid column in MySQL to the attribute name in DynamoDB, and setting the attribute value as quantity.

This conversion method reduces the number of items. You can retrieve the same amount of information with fewer read capacity units, resulting in cost savings and better performance. For more information about how to calculate read/write capacity units, see Provisioned Throughput.

Migration steps

Option 2 has two paths for migration, performed at the same time:

  • Batch-puts: Export MySQL data, upload it to Amazon S3, and import into DynamoDB.
  • Real-time puts: Capture changed data in MySQL, send the insert/update/delete transaction to Amazon Kinesis Streams, and trigger the Lambda function to put data into DynamoDB.

To keep the data consistency and integrity, capturing and feeding data to Amazon Kinesis Streams should be started before the batch-puts process. The Lambda function should stand by and Streams should retain the captured data in the stream until the batch-puts process on EMR finishes. Here’s the order:

  1. Start real-time puts to Amazon Kinesis Streams.
  2. As soon as real-time puts commences, start batch-puts.
  3. After batch-puts finishes, trigger the Lambda function to execute put_item from Amazon Kinesis Streams to DynamoDB.
  4. Change the application endpoints from MySQL to DynamoDB.

Step 1:  Capture changing data and put into Amazon Kinesis Streams

Firstly, create an Amazon Kinesis stream to retain transaction data from MySQL. Set the Data retention period value based on your estimate for the batch-puts migration process. For data integrity, the retention period should be enough to hold all transactions until batch-puts migration finishes. However you do not necessarily need to select the maximum retention period. It depends on the amount of data to migrate.

In the MySQL configuration, set binlog_format to ROW to capture transactions by using the BinLogStreamReader module. The log_bin parameter must be set as well to enable the binlog. For more information, see the Streaming Changes in a Database with Amazon Kinesis AWS Database blog post.

 

[mysqld]
secure-file-priv = ""
log_bin=/data/binlog/binlog
binlog_format=ROW
server-id = 1
tmpdir=/data/tmp

The following sample code is a Python example that captures transactions and sends them to Amazon Kinesis Streams.

 

#!/usr/bin/env python
from pymysqlreplication import BinLogStreamReader
from pymysqlreplication.row_event import (
  DeleteRowsEvent,
  UpdateRowsEvent,
  WriteRowsEvent,
)

def main():
  kinesis = boto3.client("kinesis")

  stream = BinLogStreamReader(
    connection_settings= {
      "host": "<host IP address>",
      "port": <port number>,
      "user": "<user name>",
      "passwd": "<password>"},
    server_id=100,
    blocking=True,
    resume_stream=True,
    only_events=[DeleteRowsEvent, WriteRowsEvent, UpdateRowsEvent])

  for binlogevent in stream:
    for row in binlogevent.rows:
      event = {"schema": binlogevent.schema,
      "table": binlogevent.table,
      "type": type(binlogevent).__name__,
      "row": row
      }

      kinesis.put_record(StreamName="<Amazon Kinesis stream name>", Data=json.dumps(event), PartitionKey="default")
      print json.dumps(event)

if __name__ == "__main__":
main()

The following code is sample JSON data generated by the Python script. The type attribute defines the transaction recorded by that JSON record:

  • WriteRowsEvent = INSERT
  • UpdateRowsEvent = UPDATE
  • DeleteRowsEvent = DELETE
{"table": "purchase_temp", "row": {"values": {"orderid": "orderidA1", "quantity": 100, "customerid": "customeridA74187", "productid": "productid1"}}, "type": "WriteRowsEvent", "schema": "test"}
{"table": "purchase_temp", "row": {"before_values": {"orderid": "orderid1", "quantity": 1, "customerid": "customerid74187", "productid": "productid1"}, "after_values": {"orderid": "orderid1", "quantity": 99, "customerid": "customerid74187", "productid": "productid1"}}, "type": "UpdateRowsEvent", "schema": "test"}
{"table": "purchase_temp", "row": {"values": {"orderid": "orderid100", "quantity": 1, "customerid": "customerid74187", "productid": "productid1"}}, "type": "DeleteRowsEvent", "schema": "test"}

Step 2. Dump data from MySQL to DynamoDB

The easiest way is to use DMS, which recently added Amazon S3 as a migration target. For an S3 target, both full load and CDC data is written to CSV format. However, CDC is not a good fit as UPDATE and DELETE statements are not supported. For more information, see Using Amazon S3 as a Target for AWS Database Migration Service.

Another way to upload data to Amazon S3 is to use the INTO OUTFILE SQL clause and aws s3 sync CLI command in parallel with your own script. The degree of parallelism depends on your server capacity and local network bandwidth. You might find a third-party tool useful, such as pt-archiver (part of the Percona Toolkit see the appendix for details).

SELECT * FROM purchase WHERE <condition_1>
INTO OUTFILE '/data/export/purchase/1.csv' FIELDS TERMINATED BY ',' ESCAPED BY '\\' LINES TERMINATED BY '\n';
SELECT * FROM purchase WHERE <condition_2>
INTO OUTFILE '/data/export/purchase/2.csv' FIELDS TERMINATED BY ',' ESCAPED BY '\\' LINES TERMINATED BY '\n';
...
SELECT * FROM purchase WHERE <condition_n>
INTO OUTFILE '/data/export/purchase/n.csv' FIELDS TERMINATED BY ',' ESCAPED BY '\\' LINES TERMINATED BY '\n';

I recommend the aws s3 sync command for this use case. This command works internally with the S3 multipart upload feature. Pattern matching can exclude or include particular files. In addition, if the sync process crashes in the middle of processing, you do not need to upload the same files again. The sync command compares the size and modified time of files between local and S3 versions, and synchronizes only local files whose size and modified time are different from those in S3. For more information, see the sync command in the S3 section of the AWS CLI Command Reference.

$ aws s3 sync /data/export/purchase/ s3://<your bucket name>/purchase/ 
$ aws s3 sync /data/export/<other path_1>/ s3://<your bucket name>/<other path_1>/
...
$ aws s3 sync /data/export/<other path_n>/ s3://<your bucket name>/<other path_n>/ 

After all data is uploaded to S3, put it into DynamoDB. There are two ways to do this:

  • Use Hive with an external table
  • Write MapReduce code

Hive with an external table

Create a Hive external table against the data on S3 and insert it into another external table against the DynamoDB table, using the org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler property. To improve productivity and the scalability, consider using Brickhouse, which is a collection of UDFs for Hive.

The following sample code assumes that the Hive table for DynamoDB is created with the products column, which is of type ARRAY<STRING >.  The productid and quantity columns are aggregated, grouping by customerid and orderid, and inserted into the products column with the CollectUDAF columns provided by Brickhouse.

hive> DROP TABLE purchase_ext_s3; 
--- To read data from S3 
hive> CREATE EXTERNAL TABLE purchase_ext_s3 (
customerid string,
orderid    string,
productid  string,
quantity   string) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' 
LOCATION 's3://<your bucket name>/purchase/';

Hive> drop table purchase_ext_dynamodb ; 
--- To connect to DynamoDB table  
Hive> CREATE EXTERNAL TABLE purchase_ext_dynamodb (
      customerid STRING, orderid STRING, products ARRAY<STRING>)
      STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler' 
      TBLPROPERTIES ("dynamodb.table.name" = "purchase", 
      "dynamodb.column.mapping" = "customerid:customerid,orderid:orderid,products:products");

--- Batch-puts to DynamoDB using Brickhouse 
hive> add jar /<jar file path>/brickhouse-0.7.1-SNAPSHOT.jar ; 
hive> create temporary function collect as 'brickhouse.udf.collect.CollectUDAF';
hive> INSERT INTO purchase_ext_dynamodb 
select customerid as customerid , orderid as orderid
       ,collect(concat(productid,':' ,quantity)) as products
      from purchase_ext_s3
      group by customerid, orderid; 

Unfortunately, the MAP, LIST, BOOLEAN, and NULL data types are not supported by the  DynamoDBStorageHandler class, so the ARRAY<STRING> data type has been chosen. The products column of ARRAY<STRING> data type in Hive is matched to the StringSet type attribute in DynamoDB. The sample code mostly shows how Brickhouse works, and only for those who want to aggregate multiple records into one StringSet type attribute in DynamoDB.

Python MapReduce with Hadoop Streaming

A mapper task reads each record from the input data on S3, and maps input key-value pairs to intermediate key-value pairs. It divides source data from S3 into two parts (key part and value part) delimited by a TAB character (“\t”). Mapper data is sorted in order by their intermediate key (customerid and orderid) and sent to the reducer. Records are put into DynamoDB in the reducer step.

#!/usr/bin/env python
import sys
 
# get all lines from stdin
for line in sys.stdin:
    line = line.strip()
    cols = line.split(',')
# divide source data into Key and attribute part.
# example output : “cusotmer1,order1	product1,10”
    print '%s,%s\t%s,%s' % (cols[0],cols[1],cols[2],cols[3] )

Generally, the reduce task receives the output produced after map processing (which is key/list-of-values pairs) and then performs an operation on the list of values against each key.

In this case, the reducer is written in Python and is based on STDIN/STDOUT/hadoop streaming. The enumeration data type is not available. The reducer receives data sorted and ordered by the intermediate key set in the mapper, customerid and orderid (cols[0],cols[1]) in this case, and stores all attributes for the specific key in the item_data dictionary. The attributes in the item_data dictionary are put, or flushed, into DynamoDB every time a new intermediate key comes from sys.stdin.

#!/usr/bin/env python
import sys
import boto.dynamodb
 
# create connection to DynamoDB
current_keys = None
conn = boto.dynamodb.connect_to_region( '<region>', aws_access_key_id='<access key id>', aws_secret_access_key='<secret access key>')
table = conn.get_table('<dynamodb table name>')
item_data = {}

# input comes from STDIN emitted by Mapper
for line in sys.stdin:
    line = line.strip()
    dickeys, items  = line.split('\t')
    products = items.split(',')
    if current_keys == dickeys:
       item_data[products[0]]=products[1]  
    else:
        if current_keys:
          try:
              mykeys = current_keys.split(',') 
              item = table.new_item(hash_key=mykeys[0],range_key=mykeys[1], attrs=item_data )
              item.put() 
          except Exception ,e:
              print 'Exception occurred! :', e.message,'==> Data:' , mykeys
        item_data = {}
        item_data[products[0]]=products[1]
        current_keys = dickeys

# put last data
if current_keys == dickeys:
   print 'Last one:' , current_keys #, item_data
   try:
       mykeys = dickeys.split(',')
       item = table.new_item(hash_key=mykeys[0] , range_key=mykeys[1], attrs=item_data )
       item.put()
   except Exception ,e:
print 'Exception occurred! :', e.message, '==> Data:' , mykeys

To run the MapReduce job, connect to the EMR master node and run a Hadoop streaming job. The hadoop-streaming.jar file location or name could be different, depending on your EMR version. Exception messages that occur while reducers run are stored at the directory assigned as the –output option. Hash key and range key values are also logged to identify which data causes exceptions or errors.

$ hadoop fs -rm -r s3://<bucket name>/<output path>
$ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar \
           -input s3://<bucket name>/<input path> -output s3://<bucket name>/<output path>\
           -file /<local path>/mapper.py -mapper /<local path>/mapper.py \
           -file /<local path>/reducer.py -reducer /<local path>/reducer.py

In my migration experiment using the above scripts, with self-generated test data, I found the following results, including database size and the time taken to complete the migration.

Server MySQL instance m4.2xlarge
EMR cluster

master : 1 x m3.xlarge

core  : 2 x m4.4xlarge

DynamoDB 2000 write capacity unit
Data Number of records 1,000,000,000
Database file size (.ibc) 100.6 GB
CSV files size 37 GB
Performance (time) Export to CSV 6 min 10 sec
Upload to S3 (sync) 3 min 30 sec
Import to DynamoDB depending on write capacity unit

 

The following screenshot shows the performance results by write capacity.

Note that the performance result is flexible and can vary depending on the server capacity, network bandwidth, degree of parallelism, conversion logic, program language, and other conditions. All provisioned write capacity units are consumed by the MapReduce job for data import, so the more you increase the size of the EMR cluster and write capacity units of DynamoDB table, the less time it takes to complete. Java-based MapReduce code would be more flexible for function and MapReduce framework.

Step 3: Amazon Lambda function updates DynamoDB by reading data from Amazon Kinesis

In the Lambda console, choose Create a Lambda function and the kinesis-process-record-python blueprint. Next, in the Configure triggers page, select the stream that you just created.

The Lambda function must have an IAM role with permissions to read from Amazon Kinesis and put items into DynamoDB.

The Lambda function can recognize the transaction type of the record by looking up the type attribute. The transaction type determines the method for conversion and update.

For example, when a JSON record is passed to the function, the function looks up the type attribute. It also checks whether an existing item in the DynamoDB table has the same key with the incoming record. If so, the existing item must be retrieved and saved in a dictionary variable (item, in this case). Apply a new update information command to the item dictionary before it is put back into DynamoDB table. This prevents the existing item from being overwritten by the incoming record.

from __future__ import print_function

import base64
import json
import boto3

print('Loading function')
client = boto3.client('dynamodb')

def lambda_handler(event, context):
    #print("Received event: " + json.dumps(event, indent=2))
    for record in event['Records']:
        # Amazon Kinesis data is base64-encoded so decode here
        payload = base64.b64decode(record['kinesis']['data'])
        print("Decoded payload: " + payload)
        data = json.loads(payload)
        
        # user logic for data triggered by WriteRowsEvent
        if data["type"] == "WriteRowsEvent":
            my_table = data["table"]
            my_hashkey = data["row"]["values"]["customerid"]
            my_rangekey = data["row"]["values"]["orderid"]
            my_productid = data["row"]["values"]["productid"]
            my_quantity = str( data["row"]["values"]["quantity"] )
            try:
                response = client.get_item( Key={'customerid':{'S':my_hashkey} , 'orderid':{'S':my_rangekey}} ,TableName = my_table )
                if 'Item' in response:
                    item = response['Item']
                    item[data["row"]["values"]["productid"]] = {"S":my_quantity}
                    result1 = client.put_item(Item = item , TableName = my_table )
                else:
                    item = { 'customerid':{'S':my_hashkey} , 'orderid':{'S':my_rangekey} , my_productid :{"S":my_quantity}  }
                    result2 = client.put_item( Item = item , TableName = my_table )
            except Exception, e:
                print( 'WriteRowsEvent Exception ! :', e.message  , '==> Data:' ,data["row"]["values"]["customerid"]  , data["row"]["values"]["orderid"] )
        
        # user logic for data triggered by UpdateRowsEvent
        if data["type"] == "UpdateRowsEvent":
            my_table = data["table"]
            
        # user logic for data triggered by DeleteRowsEvent    
        if data["type"] == "DeleteRowsEvent":
            my_table = data["table"]
            
            
    return 'Successfully processed {} records.'.format(len(event['Records']))

Step 4:  Switch the application endpoint to DynamoDB

Application codes need to be refactored when you change from MySQL to DynamoDB. The following simple Java code snippets focus on the connection and query part because it is difficult to cover all cases for all applications. For more information, see Programming with DynamoDB and the AWS SDKs.

Query to MySQL

The following sample code shows a common way to connect to MySQL and retrieve data.

import java.sql.* ;
...
try {
    Connection conn =  DriverManager.getConnection("jdbc:mysql://<host name>/<database name>" , "<user>" , "<password>");
    stmt = conn.createStatement();
    String sql = "SELECT quantity as quantity FROM purchase WHERE customerid = '<customerid>' and orderid = '<orderid>' and productid = '<productid>'";
    ResultSet rs = stmt.executeQuery(sql);

    while(rs.next()){ 
       int quantity  = rs.getString("quantity");   //Retrieve by column name 
       System.out.print("quantity: " + quantity);  //Display values 
       }
} catch (SQLException ex) {
    // handle any errors
    System.out.println("SQLException: " + ex.getMessage());}
...
==== Output ====
quantity:1
Query to DynamoDB

To retrieve items from DynamoDB, follow these steps:

  1. Create an instance of the DynamoDB
  2. Create an instance of the Table
  3. Add the withHashKey and withRangeKeyCondition methods to an instance of the QuerySpec
  4. Execute the query method with the querySpec instance previously created. Items are retrieved as JSON format, so use the getJSON method to look up a specific attribute in an item.
...
DynamoDB dynamoDB = new DynamoDB( new AmazonDynamoDBClient(new ProfileCredentialsProvider()));

Table table = dynamoDB.getTable("purchase");

QuerySpec querySpec = new QuerySpec()
        .withHashKey("customerid" , "customer1")  // hashkey name and its value 
        .withRangeKeyCondition(new RangeKeyCondition("orderid").eq("order1") ) ; // Ranage key and its condition value 

ItemCollection<QueryOutcome> items = table.query(querySpec); 

Iterator<Item> iterator = items.iterator();          
while (iterator.hasNext()) {
Item item = iterator.next();
System.out.println(("quantity: " + item.getJSON("product1"));   // 
}
...
==== Output ====
quantity:1

Conclusion

In this post, I introduced two options for seamlessly migrating data from MySQL to DynamoDB and minimizing downtime during the migration. Option #1 used DMS, and option #2 combined EMR, Amazon Kinesis, and Lambda. I also showed you how to convert the key design in accordance with database characteristics to improve read/write performance and reduce costs. Each option has advantages and disadvantages, so the best option depends on your business requirements.

The sample code in this post is not enough for a complete, efficient, and reliable data migration code base to be reused across many different environments. Use it to get started, but design for other variables in your actual migration.

I hope this post helps you plan and implement your migration and minimizes service outages. If you have questions or suggestions, please leave a comment below.

Appendix

To install the Percona Toolkit:

# Install Percona Toolkit

$ wget https://www.percona.com/downloads/percona-toolkit/3.0.2/binary/redhat/6/x86_64/percona-toolkit-3.0.2-1.el6.x86_64.rpm

$ yum install perl-IO-Socket-SSL

$ yum install perl-TermReadKey

$ rpm -Uvh percona-toolkit-3.0.2-1.el6.x86_64.rpm

# run pt-archiver

Example command:

$ pt-archiver –source h=localhost,D=blog,t=purchase –file ‘/data/export/%Y-%m-%d-%D.%t’  –where “1=1” –limit 10000 –commit-each

 


About the Author

Yong Seong Lee is a Cloud Support Engineer for AWS Big Data Services. He is interested in every technology related to data/databases and helping customers who have difficulties in using AWS services. His motto is “Enjoy life, be curious and have maximum experience.”

 

 

 


Converging Data Silos to Amazon Redshift Using AWS DMS