Tag Archives: wd

A kindly lesson for you non-techies about encryption

Post Syndicated from Robert Graham original http://blog.erratasec.com/2017/06/a-kindly-lesson-for-you-non-techies.html

The following tweets need to be debunked:

The answer to John Schindler’s question is:

every expert in cryptography doesn’t know this

Oh, sure, you can find fringe wacko who also knows crypto that agrees with you but all the sane members of the security community will not.

Telegram is not trustworthy because it’s partially closed-source. We can’t see how it works. We don’t know if they’ve made accidental mistakes that can be hacked. We don’t know if they’ve been bribed by the NSA or Russia to put backdoors in their program. In contrast, PGP and Signal are open-source. We can read exactly what the software does. Indeed, thousands of people have been reviewing their software looking for mistakes and backdoors. Being open-source doesn’t automatically make software better, but it does make hiding secret backdoors much harder.

Telegram is not trustworthy because we aren’t certain the crypto is done properly. Signal, and especially PGP, are done properly.

The thing about encryption is that when done properly, it works. Neither the NSA nor the Russians can break properly encrypted content. There’s no such thing as “military grade” encryption that is better than consumer grade. There’s only encryption that nobody can hack vs. encryption that your neighbor’s teenage kid can easily hack. Those scenes in TV/movies about breaking encryption is as realistic as sound in space: good for dramatic presentation, but not how things work in the real world.

In particular, end-to-end encryption works. Sure, in the past, such apps only encrypted as far as the server, so whoever ran the server could read your messages. Modern chat apps, though, are end-to-end: the servers have absolutely no ability to decrypt what’s on them, unless they can get the decryption keys from the phones. But some tasks, like encrypted messages to a group of people, can be hard to do properly.

Thus, in contrast to what John Schindler says, while we techies have doubts about Telegram, we don’t have doubts about Russia authorities having access to Signal and PGP messages.

Snowden hatred has become the anti-vax of crypto. Sure, there’s no particular reason to trust Snowden — people should really stop treating him as some sort of privacy-Jesus. But there’s no particular reason to distrust him, either. His bland statements on crypto are indistinguishable from any other crypto-enthusiast statements. If he’s a Russian pawn, then so too is the bulk of the crypto community.

With all this said, using Signal doesn’t make you perfectly safe. The person you are chatting with could be a secret agent — especially in group chat. There could be cameras/microphones in the room where you are using the app. The Russians can also hack into your phone, and likewise eavesdrop on everything you do with the phone, regardless of which app you use. And they probably have hacked specific people’s phones. On the other hand, if the NSA or Russians were widely hacking phones, we’d detect that this was happening. We haven’t.

Signal is therefore not a guarantee of safety, because nothing is, and if your life depends on it, you can’t trust any simple advice like “use Signal”. But, for the bulk of us, it’s pretty damn secure, and I trust neither the Russians nor the NSA are reading my Signal or PGP messages.

At first blush, this @20committee tweet appears to be non-experts opining on things outside their expertise. But in reality, it’s just obtuse partisanship, where truth and expertise doesn’t matter. Nothing you or I say can change some people’s minds on this matter, no matter how much our expertise gives weight to our words. This post is instead for bystanders, who don’t know enough to judge whether these crazy statements have merit.


Bonus:

So let’s talk about “every crypto expert“. It’s, of course, impossible to speak for every crypto expert. It’s like saying how the consensus among climate scientists is that mankind is warming the globe, while at the same time, ignoring the wide spread disagreement on how much warming that is.

The same is true here. You’ll get a widespread different set of responses from experts about the above tweet. Some, for example, will stress my point at the bottom that hacking the endpoint (the phone) breaks all the apps, and thus justify the above tweet from that point of view. Others will point out that all software has bugs, and it’s quite possible that Signal has some unknown bug that the Russians are exploiting.

So I’m not attempting to speak for what all experts might say here in the general case and what long lecture they can opine about. I am, though, pointing out the basics that virtually everyone agrees on, the consensus of open-source and working crypto.

NSA Insider Security Post-Snowden

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/06/nsa_insider_sec.html

According to a recently declassified report obtained under FOIA, the NSA’s attempts to protect itself against insider attacks aren’t going very well:

The N.S.A. failed to consistently lock racks of servers storing highly classified data and to secure data center machine rooms, according to the report, an investigation by the Defense Department’s inspector general completed in 2016.

[…]

The agency also failed to meaningfully reduce the number of officials and contractors who were empowered to download and transfer data classified as top secret, as well as the number of “privileged” users, who have greater power to access the N.S.A.’s most sensitive computer systems. And it did not fully implement software to monitor what those users were doing.

In all, the report concluded, while the post-Snowden initiative — called “Secure the Net” by the N.S.A. — had some successes, it “did not fully meet the intent of decreasing the risk of insider threats to N.S.A. operations and the ability of insiders to exfiltrate data.”

Marcy Wheeler comments:

The IG report examined seven of the most important out of 40 “Secure the Net” initiatives rolled out since Snowden began leaking classified information. Two of the initiatives aspired to reduce the number of people who had the kind of access Snowden did: those who have privileged access to maintain, configure, and operate the NSA’s computer systems (what the report calls PRIVACs), and those who are authorized to use removable media to transfer data to or from an NSA system (what the report calls DTAs).

But when DOD’s inspectors went to assess whether NSA had succeeded in doing this, they found something disturbing. In both cases, the NSA did not have solid documentation about how many such users existed at the time of the Snowden leak. With respect to PRIVACs, in June 2013 (the start of the Snowden leak), “NSA officials stated that they used a manually kept spreadsheet, which they no longer had, to identify the initial number of privileged users.” The report offered no explanation for how NSA came to no longer have that spreadsheet just as an investigation into the biggest breach thus far at NSA started. With respect to DTAs, “NSA did not know how many DTAs it had because the manually kept list was corrupted during the months leading up to the security breach.”

There seem to be two possible explanations for the fact that the NSA couldn’t track who had the same kind of access that Snowden exploited to steal so many documents. Either the dog ate their homework: Someone at NSA made the documents unavailable (or they never really existed). Or someone fed the dog their homework: Some adversary made these lists unusable. The former would suggest the NSA had something to hide as it prepared to explain why Snowden had been able to walk away with NSA’s crown jewels. The latter would suggest that someone deliberately obscured who else in the building might walk away with the crown jewels. Obscuring that list would be of particular value if you were a foreign adversary planning on walking away with a bunch of files, such as the set of hacking tools the Shadow Brokers have since released, which are believed to have originated at NSA.

Read the whole thing. Securing against insiders, especially those with technical access, is difficult, but I had assumed the NSA did more post-Snowden.

BPI Breaks Record After Sending 310 Million Google Takedowns

Post Syndicated from Andy original https://torrentfreak.com/bpi-breaks-record-after-sending-310-million-google-takedowns-170619/

A little over a year ago during March 2016, music industry group BPI reached an important milestone. After years of sending takedown notices to Google, the group burst through the 200 million URL barrier.

The fact that it took BPI several years to reach its 200 million milestone made the surpassing of the quarter billion milestone a few months later even more remarkable. In October 2016, the group sent its 250 millionth takedown to Google, a figure that nearly doubled when accounting for notices sent to Microsoft’s Bing.

But despite the volumes, the battle hadn’t been won, let alone the war. The BPI’s takedown machine continued to run at a remarkable rate, churning out millions more notices per week.

As a result, yet another new milestone was reached this month when the BPI smashed through the 300 million URL barrier. Then, days later, a further 10 million were added, with the latter couple of million added during the time it took to put this piece together.

BPI takedown notices, as reported by Google

While demanding that Google places greater emphasis on its de-ranking of ‘pirate’ sites, the BPI has called again and again for a “notice and stay down” regime, to ensure that content taken down by the search engine doesn’t simply reappear under a new URL. It’s a position BPI maintains today.

“The battle would be a whole lot easier if intermediaries played fair,” a BPI spokesperson informs TF.

“They need to take more proactive responsibility to reduce infringing content that appears on their platform, and, where we expressly notify infringing content to them, to ensure that they do not only take it down, but also keep it down.”

The long-standing suggestion is that the volume of takedown notices sent would reduce if a “take down, stay down” regime was implemented. The BPI says it’s difficult to present a precise figure but infringing content has a tendency to reappear, both in search engines and on hosting sites.

“Google rejects repeat notices for the same URL. But illegal content reappears as it is re-indexed by Google. As to the sites that actually host the content, the vast majority of notices sent to them could be avoided if they implemented take-down & stay-down,” BPI says.

The fact that the BPI has added 60 million more takedowns since the quarter billion milestone a few months ago is quite remarkable, particularly since there appears to be little slowdown from month to month. However, the numbers have grown so huge that 310 billion now feels a lot like 250 million, with just a few added on top for good measure.

That an extra 60 million takedowns can almost be dismissed as a handful is an indication of just how massive the issue is online. While pirates always welcome an abundance of links to juicy content, it’s no surprise that groups like the BPI are seeking more comprehensive and sustainable solutions.

Previously, it was hoped that the Digital Economy Bill would provide some relief, hopefully via government intervention and the imposition of a search engine Code of Practice. In the event, however, all pressure on search engines was removed from the legislation after a separate voluntary agreement was reached.

All parties agreed that the voluntary code should come into effect two weeks ago on June 1 so it seems likely that some effects should be noticeable in the near future. But the BPI says it’s still early days and there’s more work to be done.

“BPI has been working productively with search engines since the voluntary code was agreed to understand how search engines approach the problem, but also what changes can and have been made and how results can be improved,” the group explains.

“The first stage is to benchmark where we are and to assess the impact of the changes search engines have made so far. This will hopefully be completed soon, then we will have better information of the current picture and from that we hope to work together to continue to improve search for rights owners and consumers.”

With more takedown notices in the pipeline not yet publicly reported by Google, the BPI informs TF that it has now notified the search giant of 315 million links to illegal content.

“That’s an astonishing number. More than 1 in 10 of the entire world’s notices to Google come from BPI. This year alone, one in every three notices sent to Google from BPI is for independent record label repertoire,” BPI concludes.

While it’s clear that groups like BPI have developed systems to cope with the huge numbers of takedown notices required in today’s environment, it’s clear that few rightsholders are happy with the status quo. With that in mind, the fight will continue, until search engines are forced into compromise. Considering the implications, that could only appear on a very distant horizon.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

How to Deploy Local Administrator Password Solution with AWS Microsoft AD

Post Syndicated from Dragos Madarasan original https://aws.amazon.com/blogs/security/how-to-deploy-local-administrator-password-solution-with-aws-microsoft-ad/

Local Administrator Password Solution (LAPS) from Microsoft simplifies password management by allowing organizations to use Active Directory (AD) to store unique passwords for computers. Typically, an organization might reuse the same local administrator password across the computers in an AD domain. However, this approach represents a security risk because it can be exploited during lateral escalation attacks. LAPS solves this problem by creating unique, randomized passwords for the Administrator account on each computer and storing it encrypted in AD.

Deploying LAPS with AWS Microsoft AD requires the following steps:

  1. Install the LAPS binaries on instances joined to your AWS Microsoft AD domain. The binaries add additional client-side extension (CSE) functionality to the Group Policy client.
  2. Extend the AWS Microsoft AD schema. LAPS requires new AD attributes to store an encrypted password and its expiration time.
  3. Configure AD permissions and delegate the ability to retrieve the local administrator password for IT staff in your organization.
  4. Configure Group Policy on instances joined to your AWS Microsoft AD domain to enable LAPS. This configures the Group Policy client to process LAPS settings and uses the binaries installed in Step 1.

The following diagram illustrates the setup that I will be using throughout this post and the associated tasks to set up LAPS. Note that the AWS Directory Service directory is deployed across multiple Availability Zones, and monitoring automatically detects and replaces domain controllers that fail.

Diagram illustrating this blog post's solution

In this blog post, I explain the prerequisites to set up Local Administrator Password Solution, demonstrate the steps involved to update the AD schema on your AWS Microsoft AD domain, show how to delegate permissions to IT staff and configure LAPS via Group Policy, and demonstrate how to retrieve the password using the graphical user interface or with Windows PowerShell.

This post assumes you are familiar with Lightweight Directory Access Protocol Data Interchange Format (LDIF) files and AWS Microsoft AD. If you need more of an introduction to Directory Service and AWS Microsoft AD, see How to Move More Custom Applications to the AWS Cloud with AWS Directory Service, which introduces working with schema changes in AWS Microsoft AD.

Prerequisites

In order to implement LAPS, you must use AWS Directory Service for Microsoft Active Directory (Enterprise Edition), also known as AWS Microsoft AD. Any instance on which you want to configure LAPS must be joined to your AWS Microsoft AD domain. You also need a Management instance on which you install the LAPS management tools.

In this post, I use an AWS Microsoft AD domain called example.com that I have launched in the EU (London) region. To see which the regions in which Directory Service is available, see AWS Regions and Endpoints.

Screenshot showing the AWS Microsoft AD domain example.com used in this blog post

In addition, you must have at least two instances launched in the same region as the AWS Microsoft AD domain. To join the instances to your AWS Microsoft AD domain, you have two options:

  1. Use the Amazon EC2 Systems Manager (SSM) domain join feature. To learn more about how to set up domain join for EC2 instances, see joining a Windows Instance to an AWS Directory Service Domain.
  2. Manually configure the DNS server addresses in the Internet Protocol version 4 (TCP/IPv4) settings of the network card to use the AWS Microsoft AD DNS addresses (172.31.9.64 and 172.31.16.191, for this blog post) and perform a manual domain join.

For the purpose of this post, my two instances are:

  1. A Management instance on which I will install the management tools that I have tagged as Management.
  2. A Web Server instance on which I will be deploying the LAPS binary.

Screenshot showing the two EC2 instances used in this post

Implementing the solution

 

1. Install the LAPS binaries on instances joined to your AWS Microsoft AD domain by using EC2 Run Command

LAPS binaries come in the form of an MSI installer and can be downloaded from the Microsoft Download Center. You can install the LAPS binaries manually, with an automation service such as EC2 Run Command, or with your existing software deployment solution.

For this post, I will deploy the LAPS binaries on my Web Server instance (i-0b7563d0f89d3453a) by using EC2 Run Command:

  1. While signed in to the AWS Management Console, choose EC2. In the Systems Manager Services section of the navigation pane, choose Run Command.
  2. Choose Run a command, and from the Command document list, choose AWS-InstallApplication.
  3. From Target instances, choose the instance on which you want to deploy the LAPS binaries. In my case, I will be selecting the instance tagged as Web Server. If you do not see any instances listed, make sure you have met the prerequisites for Amazon EC2 Systems Manager (SSM) by reviewing the Systems Manager Prerequisites.
  4. For Action, choose Install, and then stipulate the following values:
    • Parameters: /quiet
    • Source: https://download.microsoft.com/download/C/7/A/C7AAD914-A8A6-4904-88A1-29E657445D03/LAPS.x64.msi
    • Source Hash: f63ebbc45e2d080630bd62a195cd225de734131a56bb7b453c84336e37abd766
    • Comment: LAPS deployment

Leave the other options with the default values and choose Run. The AWS Management Console will return a Command ID, which will initially have a status of In Progress. It should take less than 5 minutes to download and install the binaries, after which the Command ID will update its status to Success.

Status showing the binaries have been installed successfully

If the Command ID runs for more than 5 minutes or returns an error, it might indicate a problem with the installer. To troubleshoot, review the steps in Troubleshooting Systems Manager Run Command.

To verify the binaries have been installed successfully, open Control Panel and review the recently installed applications in Programs and Features.

Screenshot of Control Panel that confirms LAPS has been installed successfully

You should see an entry for Local Administrator Password Solution with a version of 6.2.0.0 or newer.

2. Extend the AWS Microsoft AD schema

In the previous section, I used EC2 Run Command to install the LAPS binaries on an EC2 instance. Now, I am ready to extend the schema in an AWS Microsoft AD domain. Extending the schema is a requirement because LAPS relies on new AD attributes to store the encrypted password and its expiration time.

In an on-premises AD environment, you would update the schema by running the Update-AdmPwdADSchema Windows PowerShell cmdlet with schema administrator credentials. Because AWS Microsoft AD is a managed service, I do not have permissions to update the schema directly. Instead, I will update the AD schema from the Directory Service console by importing an LDIF file. If you are unfamiliar with schema updates or LDIF files, see How to Move More Custom Applications to the AWS Cloud with AWS Directory Service.

To make things easier for you, I am providing you with a sample LDIF file that contains the required AD schema changes. Using Notepad or a similar text editor, open the SchemaChanges-0517.ldif file and update the values of dc=example,dc=com with your own AWS Microsoft AD domain and suffix.

After I update the LDIF file with my AWS Microsoft AD details, I import it by using the AWS Management Console:

  1. On the Directory Service console, select from the list of directories in the Microsoft AD directory by choosing its identifier (it will look something like d-534373570ea).
  2. On the Directory details page, choose the Schema extensions tab and choose Upload and update schema.
    Screenshot showing the "Upload and update schema" option
  3. When prompted for the LDIF file that contains the changes, choose the sample LDIF file.
  4. In the background, the LDIF file is validated for errors and a backup of the directory is created for recovery purposes. Updating the schema might take a few minutes and the status will change to Updating Schema. When the process has completed, the status of Completed will be displayed, as shown in the following screenshot.

Screenshot showing the schema updates in progress
When the process has completed, the status of Completed will be displayed, as shown in the following screenshot.

Screenshot showing the process has completed

If the LDIF file contains errors or the schema extension fails, the Directory Service console will generate an error code and additional debug information. To help troubleshoot error messages, see Schema Extension Errors.

The sample LDIF file triggers AWS Microsoft AD to perform the following actions:

  1. Create the ms-Mcs-AdmPwd attribute, which stores the encrypted password.
  2. Create the ms-Mcs-AdmPwdExpirationTime attribute, which stores the time of the password’s expiration.
  3. Add both attributes to the Computer class.

3. Configure AD permissions

In the previous section, I updated the AWS Microsoft AD schema with the required attributes for LAPS. I am now ready to configure the permissions for administrators to retrieve the password and for computer accounts to update their password attribute.

As part of configuring AD permissions, I grant computers the ability to update their own password attribute and specify which security groups have permissions to retrieve the password from AD. As part of this process, I run Windows PowerShell cmdlets that are not installed by default on Windows Server.

Note: To learn more about Windows PowerShell and the concept of a cmdlet (pronounced “command-let”), go to Getting Started with Windows PowerShell.

Before getting started, I need to set up the required tools for LAPS on my Management instance, which must be joined to the AWS Microsoft AD domain. I will be using the same LAPS installer that I downloaded from the Microsoft LAPS website. In my Management instance, I have manually run the installer by clicking the LAPS.x64.msi file. On the Custom Setup page of the installer, under Management Tools, for each option I have selected Install on local hard drive.

Screenshot showing the required management tools

In the preceding screenshot, the features are:

  • The fat client UI – A simple user interface for retrieving the password (I will use it at the end of this post).
  • The Windows PowerShell module – Needed to run the commands in the next sections.
  • The GPO Editor templates – Used to configure Group Policy objects.

The next step is to grant computers in the Computers OU the permission to update their own attributes. While connected to my Management instance, I go to the Start menu and type PowerShell. In the list of results, right-click Windows PowerShell and choose Run as administrator and then Yes when prompted by User Account Control.

In the Windows PowerShell prompt, I type the following command.

Import-module AdmPwd.PS

Set-AdmPwdComputerSelfPermission –OrgUnit “OU=Computers,OU=MyMicrosoftAD,DC=example,DC=com

To grant the administrator group called Admins the permission to retrieve the computer password, I run the following command in the Windows PowerShell prompt I previously started.

Import-module AdmPwd.PS

Set-AdmPwdReadPasswordPermission –OrgUnit “OU=Computers, OU=MyMicrosoftAD,DC=example,DC=com” –AllowedPrincipals “Admins”

4. Configure Group Policy to enable LAPS

In the previous section, I deployed the LAPS management tools on my management instance, granted the computer accounts the permission to self-update their local administrator password attribute, and granted my Admins group permissions to retrieve the password.

Note: The following section addresses the Group Policy Management Console and Group Policy objects. If you are unfamiliar with or wish to learn more about these concepts, go to Get Started Using the GPMC and Group Policy for Beginners.

I am now ready to enable LAPS via Group Policy:

  1. On my Management instance (i-03b2c5d5b1113c7ac), I have installed the Group Policy Management Console (GPMC) by running the following command in Windows PowerShell.
Install-WindowsFeature –Name GPMC
  1. Next, I have opened the GPMC and created a new Group Policy object (GPO) called LAPS GPO.
  2. In the Local Group Policy Editor, I navigate to Computer Configuration > Policies > Administrative Templates > LAPS. I have configured the settings using the values in the following table.

Setting

State

Options

Password Settings

Enabled

Complexity: large letters, small letters, numbers, specials

Do not allow password expiration time longer than required by policy

Enabled

N/A

Enable local admin password management

Enabled

N/A

  1. Next, I need to link the GPO to an organizational unit (OU) in which my machine accounts sit. In your environment, I recommend testing the new settings on a test OU and then deploying the GPO to production OUs.

Note: If you choose to create a new test organizational unit, you must create it in the OU that AWS Microsoft AD delegates to you to manage. For example, if your AWS Microsoft AD directory name were example.com, the test OU path would be example.com/example/Computers/Test.

  1. To test that LAPS works, I need to make sure the computer has received the new policy by forcing a Group Policy update. While connected to the Web Server instance (i-0b7563d0f89d3453a) using Remote Desktop, I open an elevated administrative command prompt and run the following command: gpupdate /force. I can check if the policy is applied by running the command: gpresult /r | findstr LAPS GPO, where LAPS GPO is the name of the GPO created in the second step.
  2. Back on my Management instance, I can then launch the LAPS interface from the Start menu and use it to retrieve the password (as shown in the following screenshot). Alternatively, I can run the Get-ADComputer Windows PowerShell cmdlet to retrieve the password.
Get-ADComputer [YourComputerName] -Properties ms-Mcs-AdmPwd | select name, ms-Mcs-AdmPwd

Screenshot of the LAPS UI, which you can use to retrieve the password

Summary

In this blog post, I demonstrated how you can deploy LAPS with an AWS Microsoft AD directory. I then showed how to install the LAPS binaries by using EC2 Run Command. Using the sample LDIF file I provided, I showed you how to extend the schema, which is a requirement because LAPS relies on new AD attributes to store the encrypted password and its expiration time. Finally, I showed how to complete the LAPS setup by configuring the necessary AD permissions and creating the GPO that starts the LAPS password change.

If you have comments about this post, submit them in the “Comments” section below. If you have questions about or issues implementing this solution, please start a new thread on the Directory Service forum.

– Dragos

What about other leaked printed documents?

Post Syndicated from Robert Graham original http://blog.erratasec.com/2017/06/what-about-other-leaked-printed.html

So nat-sec pundit/expert Marci Wheeler (@emptywheel) asks about those DIOG docs leaked last year. They were leaked in printed form, then scanned in an published by The Intercept. Did they have these nasty yellow dots that track the source? If not, why not?

The answer is that the scanned images of the DIOG doc don’t have dots. I don’t know why. One reason might be that the scanner didn’t pick them up, as it’s much lower quality than the scanner for the Russian hacking docs. Another reason is that the printer used my not have printed them — while most printers do print such dots, some printers don’t. A third possibility is that somebody used a tool to strip the dots from scanned images. I don’t think such a tool exists, but it wouldn’t be hard to write.

Scanner quality

The printed docs are here. They are full of whitespace where it should be easy to see these dots, but they appear not to be there. If we reverse the image, we see something like the following from the first page of the DIOG doc:

Compare this to the first page of the Russian hacking doc which shows the blue dots:

What we see in the difference is that the scan of the Russian doc is much better. We see that in the background, which is much noisier, able to pick small things like the blue dots. In contrast, the DIOG scan is worse. We don’t see much detail in the background.

Looking closer, we can see the lack of detail. We also see banding, which indicates other defects of the scanner.

Thus, one theory is that the scanner just didn’t pick up the dots from the page.

Not all printers

The EFF has a page where they document which printers produce these dots. Samsung and Okidata don’t, virtually all the other printers do.

The person who printed these might’ve gotten lucky. Or, they may have carefully chosen a printer that does not produce these dots.

The reason Reality Winner exfiltrated these documents by printing them is that the NSA had probably clamped down on USB thumb drives for secure facilities. Walking through the metal detector with a chip hidden in a Rubic’s Cube (as shown in the Snowden movie) will not work anymore.

But, presumably, the FBI is not so strict, and a person would be able to exfiltrate the digital docs from FBI facilities, and print elsewhere.

Conclusion

By pure chance, those DIOG docs should’ve had visible tracking dots. Either the person leaking the docs knew about this and avoided it, or they got lucky.

How The Intercept Outed Reality Winner

Post Syndicated from Robert Graham original http://blog.erratasec.com/2017/06/how-intercept-outed-reality-winner.html

Today, The Intercept released documents on election tampering from an NSA leaker. Later, the arrest warrant request for an NSA contractor named “Reality Winner” was published, showing how they tracked her down because she had printed out the documents and sent them to The Intercept. The document posted by the Intercept isn’t the original PDF file, but a PDF containing the pictures of the printed version that was then later scanned in.

As the warrant says, she confessed while interviewed by the FBI. Had she not confessed, the documents still contained enough evidence to convict her: the printed document was digitally watermarked.

The problem is that most new printers print nearly invisibly yellow dots that track down exactly when and where documents, any document, is printed. Because the NSA logs all printing jobs on its printers, it can use this to match up precisely who printed the document.

In this post, I show how.

You can download the document from the original article here. You can then open it in a PDF viewer, such as the normal “Preview” app on macOS. Zoom into some whitespace on the document, and take a screenshot of this. On macOS, hit [Command-Shift-3] to take a screenshot of a window. There are yellow dots in this image, but you can barely see them, especially if your screen is dirty.

We need to highlight the yellow dots. Open the screenshot in an image editor, such as the “Paintbrush” program built into macOS. Now use the option to “Invert Colors” in the image, to get something like this. You should see a roughly rectangular pattern checkerboard in the whitespace.

It’s upside down, so we need to rotate it 180 degrees, or flip-horizontal and flip-vertical:

Now we go to the EFF page and manually click on the pattern so that their tool can decode the meaning:

This produces the following result:

The document leaked by the Intercept was from a printer with model number 54, serial number 29535218. The document was printed on May 9, 2017 at 6:20. The NSA almost certainly has a record of who used the printer at that time.

The situation is similar to how Vice outed the location of John McAfee, by publishing JPEG photographs of him with the EXIF GPS coordinates still hidden in the file. Or it’s how PDFs are often redacted by adding a black bar on top of image, leaving the underlying contents still in the file for people to read, such as in this NYTime accident with a Snowden document. Or how opening a Microsoft Office document, then accidentally saving it, leaves fingerprints identifying you behind, as repeatedly happened with the Wikileaks election leaks. These sorts of failures are common with leaks. To fix this yellow-dot problem, use a black-and-white printer, black-and-white scanner, or convert to black-and-white with an image editor.

Copiers/printers have two features put in there by the government to be evil to you. The first is that scanners/copiers (when using scanner feature) recognize a barely visible pattern on currency, so that they can’t be used to counterfeit money, as shown on this $20 below:

The second is that when they print things out, they includes these invisible dots, so documents can be tracked. In other words, those dots on bills prevent them from being scanned in, and the dots produced by printers help the government track what was printed out.

Yes, this code the government forces into our printers is a violation of our 3rd Amendment rights.


While I was writing up this post, these tweets appeared first:


Comments:
https://news.ycombinator.com/item?id=14494818

Seven Tips for Using S3DistCp on Amazon EMR to Move Data Efficiently Between HDFS and Amazon S3

Post Syndicated from Illya Yalovyy original https://aws.amazon.com/blogs/big-data/seven-tips-for-using-s3distcp-on-amazon-emr-to-move-data-efficiently-between-hdfs-and-amazon-s3/

Have you ever needed to move a large amount of data between Amazon S3 and Hadoop Distributed File System (HDFS) but found that the data set was too large for a simple copy operation? EMR can help you with this. In addition to processing and analyzing petabytes of data, EMR can move large amounts of data.

In the Hadoop ecosystem, DistCp is often used to move data. DistCp provides a distributed copy capability built on top of a MapReduce framework. S3DistCp is an extension to DistCp that is optimized to work with S3 and that adds several useful features. In addition to moving data between HDFS and S3, S3DistCp is also a Swiss Army knife of file manipulations. In this post we’ll cover the following tips for using S3DistCp, starting with basic use cases and then moving to more advanced scenarios:

1. Copy or move files without transformation
2. Copy and change file compression on the fly
3. Copy files incrementally
4. Copy multiple folders in one job
5. Aggregate files based on a pattern
6. Upload files larger than 1 TB in size
7. Submit a S3DistCp step to an EMR cluster

1. Copy or move files without transformation

We’ve observed that customers often use S3DistCp to copy data from one storage location to another, whether S3 or HDFS. Syntax for this operation is simple and straightforward:

$ s3-dist-cp --src /data/incoming/hourly_table --dest s3://my-tables/incoming/hourly_table

The source location may contain extra files that we don’t necessarily want to copy. Here, we can use filters based on regular expressions to do things such as copying files with the .log extension only.

Each subfolder has the following files:

$ hadoop fs -ls /data/incoming/hourly_table/2017-02-01/03
Found 8 items
-rw-r--r--   1 hadoop hadoop     197850 2017-02-19 03:41 /data/incoming/hourly_table/2017-02-01/03/2017-02-01.03.25845.log
-rw-r--r--   1 hadoop hadoop     484006 2017-02-19 03:41 /data/incoming/hourly_table/2017-02-01/03/2017-02-01.03.32953.log
-rw-r--r--   1 hadoop hadoop     868522 2017-02-19 03:41 /data/incoming/hourly_table/2017-02-01/03/2017-02-01.03.62649.log
-rw-r--r--   1 hadoop hadoop     408072 2017-02-19 03:41 /data/incoming/hourly_table/2017-02-01/03/2017-02-01.03.64637.log
-rw-r--r--   1 hadoop hadoop    1031949 2017-02-19 03:41 /data/incoming/hourly_table/2017-02-01/03/2017-02-01.03.70767.log
-rw-r--r--   1 hadoop hadoop     368240 2017-02-19 03:41 /data/incoming/hourly_table/2017-02-01/03/2017-02-01.03.89910.log
-rw-r--r--   1 hadoop hadoop     437348 2017-02-19 03:41 /data/incoming/hourly_table/2017-02-01/03/2017-02-01.03.96053.log
-rw-r--r--   1 hadoop hadoop        800 2017-02-19 03:41 /data/incoming/hourly_table/2017-02-01/03/processing.meta

To copy only the required files, let’s use the --srcPattern option:

$ s3-dist-cp --src /data/incoming/hourly_table --dest s3://my-tables/incoming/hourly_table_filtered --srcPattern .*\.log

After the upload has finished successfully, let’s check the folder contents in the destination location to confirm only the files ending in .log were copied:

$ hadoop fs -ls s3://my-tables/incoming/hourly_table_filtered/2017-02-01/03
-rw-rw-rw-   1     197850 2017-02-19 22:56 s3://my-tables/incoming/hourly_table_filtered/2017-02-01/03/2017-02-01.03.25845.log
-rw-rw-rw-   1     484006 2017-02-19 22:56 s3://my-tables/incoming/hourly_table_filtered/2017-02-01/03/2017-02-01.03.32953.log
-rw-rw-rw-   1     868522 2017-02-19 22:56 s3://my-tables/incoming/hourly_table_filtered/2017-02-01/03/2017-02-01.03.62649.log
-rw-rw-rw-   1     408072 2017-02-19 22:56 s3://my-tables/incoming/hourly_table_filtered/2017-02-01/03/2017-02-01.03.64637.log
-rw-rw-rw-   1    1031949 2017-02-19 22:56 s3://my-tables/incoming/hourly_table_filtered/2017-02-01/03/2017-02-01.03.70767.log
-rw-rw-rw-   1     368240 2017-02-19 22:56 s3://my-tables/incoming/hourly_table_filtered/2017-02-01/03/2017-02-01.03.89910.log
-rw-rw-rw-   1     437348 2017-02-19 22:56 s3://my-tables/incoming/hourly_table_filtered/2017-02-01/03/2017-02-01.03.96053.log

Sometimes, data needs to be moved instead of copied. In this case, we can use the --deleteOnSuccess option. This option is similar to aws s3 mv, which you might have used previously with the AWS CLI. The files are first copied and then deleted from the source:

$ s3-dist-cp --src s3://my-tables/incoming/hourly_table --dest s3://my-tables/incoming/hourly_table_archive --deleteOnSuccess

After the preceding operation, the source location has only empty folders, and the target location contains all files.

$ hadoop fs -ls -R s3://my-tables/incoming/hourly_table/2017-02-01/
drwxrwxrwx   -          0 1970-01-01 00:00 s3://my-tables/incoming/hourly_table/2017-02-01/00
drwxrwxrwx   -          0 1970-01-01 00:00 s3://my-tables/incoming/hourly_table/2017-02-01/01
...
drwxrwxrwx   -          0 1970-01-01 00:00 s3://my-tables/incoming/hourly_table/2017-02-01/21
drwxrwxrwx   -          0 1970-01-01 00:00 s3://my-tables/incoming/hourly_table/2017-02-01/22


$ hadoop fs -ls s3://my-tables/incoming/hourly_table_archive/2017-02-01/01
-rw-rw-rw-   1     676756 2017-02-19 23:27 s3://my-tables/incoming/hourly_table_archive/2017-02-01/01/2017-02-01.01.27047.log
-rw-rw-rw-   1     780197 2017-02-19 23:27 s3://my-tables/incoming/hourly_table_archive/2017-02-01/01/2017-02-01.01.59789.log
-rw-rw-rw-   1    1041789 2017-02-19 23:27 s3://my-tables/incoming/hourly_table_archive/2017-02-01/01/2017-02-01.01.82293.log
-rw-rw-rw-   1        400 2017-02-19 23:27 s3://my-tables/incoming/hourly_table_archive/2017-02-01/01/processing.meta

The important things to remember here are that S3DistCp deletes only files with the --deleteOnSuccess flag and that it doesn’t delete parent folders, even when they are empty.

2. Copy and change file compression on the fly

Raw files often land in S3 or HDFS in an uncompressed text format. This format is suboptimal both for the cost of storage and for running analytics on that data. S3DistCp can help you efficiently store data and compress files on the fly with the --outputCodec option:

$ s3-dist-cp --src s3://my-tables/incoming/hourly_table_filtered --dest s3://my-tables/incoming/hourly_table_gz --outputCodec=gz

The current version of S3DistCp supports the codecs gzip, gz, lzo, lzop, and snappy, and the keywords none and keep (the default). These keywords have the following meaning:

  • none” – Save files uncompressed. If the files are compressed, then S3DistCp decompresses them.
  • keep” – Don’t change the compression of the files but copy them as-is.

Let’s check the files in the target folder, which have now been compressed with the gz codec:

$ hadoop fs -ls s3://my-tables/incoming/hourly_table_gz/2017-02-01/01/
Found 3 items
-rw-rw-rw-   1     78756 2017-02-20 00:07 s3://my-tables/incoming/hourly_table_gz/2017-02-01/01/2017-02-01.01.27047.log.gz
-rw-rw-rw-   1     80197 2017-02-20 00:07 s3://my-tables/incoming/hourly_table_gz/2017-02-01/01/2017-02-01.01.59789.log.gz
-rw-rw-rw-   1    121178 2017-02-20 00:07 s3://my-tables/incoming/hourly_table_gz/2017-02-01/01/2017-02-01.01.82293.log.gz

3. Copy files incrementally

In real life, the upstream process drops files in some cadence. For instance, new files might get created every hour, or every minute. The downstream process can be configured to pick it up at a different schedule.

Let’s say data lands on S3 and we want to process it on HDFS daily. Copying all files every time doesn’t scale very well. Fortunately, S3DistCp has a built-in solution for that.

For this solution, we use a manifest file. That file allows S3DistCp to keep track of copied files. Following is an example of the command:

$ s3-dist-cp --src s3://my-tables/incoming/hourly_table --dest s3://my-tables/processing/hourly_table --srcPattern .*\.log --outputManifest=manifest-2017-02-25.gz --previousManifest=s3://my-tables/processing/hourly_table/manifest-2017-02-24.gz

The command takes two manifest files as parameters, outputManifest and previousManifest. The first one contains a list of all copied files (old and new), and the second contains a list of files copied previously. This way, we can recreate the full history of operations and see what files were copied during each run:

$ hadoop fs -text s3://my-tables/processing/hourly_table/manifest-2017-02-24.gz > previous.lst
$ hadoop fs -text s3://my-tables/processing/hourly_table/manifest-2017-02-25.gz > current.lst
$ diff previous.lst current.lst
2548a2549,2550
> {"path":"s3://my-tables/processing/hourly_table/2017-02-25/00/2017-02-15.00.50958.log","baseName":"2017-02-25/00/2017-02-15.00.50958.log","srcDir":"s3://my-tables/processing/hourly_table","size":610308}
> {"path":"s3://my-tables/processing/hourly_table/2017-02-25/00/2017-02-25.00.93423.log","baseName":"2017-02-25/00/2017-02-25.00.93423.log","srcDir":"s3://my-tables/processing/hourly_table","size":178928}

S3DistCp creates the file in the local file system using the provided path, /tmp/mymanifest.gz. When the copy operation finishes, it moves that manifest to <DESTINATION LOCATION>.

4. Copy multiple folders in one job

Imagine that we need to copy several folders. Usually, we run as many copy jobs as there are folders that need to be copied. With S3DistCp, the copy can be done in one go. All we need is to prepare a file with list of prefixes and use it as a parameter for the tool:

$ s3-dist-cp --src s3://my-tables/incoming/hourly_table_filtered --dest s3://my-tables/processing/sample_table --srcPrefixesFile file://${PWD}/folders.lst

In this case, the folders.lst file contains the following prefixes:

$ cat folders.lst
s3://my-tables/incoming/hourly_table_filtered/2017-02-10/11
s3://my-tables/incoming/hourly_table_filtered/2017-02-19/02
s3://my-tables/incoming/hourly_table_filtered/2017-02-23

As a result, the target location has only the requested subfolders:

$ hadoop fs -ls -R s3://my-tables/processing/sample_table
drwxrwxrwx   -          0 1970-01-01 00:00 s3://my-tables/processing/sample_table/2017-02-10
drwxrwxrwx   -          0 1970-01-01 00:00 s3://my-tables/processing/sample_table/2017-02-10/11
-rw-rw-rw-   1     139200 2017-02-24 05:59 s3://my-tables/processing/sample_table/2017-02-10/11/2017-02-10.11.12980.log
...
drwxrwxrwx   -          0 1970-01-01 00:00 s3://my-tables/processing/sample_table/2017-02-19
drwxrwxrwx   -          0 1970-01-01 00:00 s3://my-tables/processing/sample_table/2017-02-19/02
-rw-rw-rw-   1     702058 2017-02-24 05:59 s3://my-tables/processing/sample_table/2017-02-19/02/2017-02-19.02.19497.log
-rw-rw-rw-   1     265404 2017-02-24 05:59 s3://my-tables/processing/sample_table/2017-02-19/02/2017-02-19.02.26671.log
...
drwxrwxrwx   -          0 1970-01-01 00:00 s3://my-tables/processing/sample_table/2017-02-23
drwxrwxrwx   -          0 1970-01-01 00:00 s3://my-tables/processing/sample_table/2017-02-23/00
-rw-rw-rw-   1     310425 2017-02-24 05:59 s3://my-tables/processing/sample_table/2017-02-23/00/2017-02-23.00.10061.log
-rw-rw-rw-   1    1030397 2017-02-24 05:59 s3://my-tables/processing/sample_table/2017-02-23/00/2017-02-23.00.22664.log
...

5. Aggregate files based on a pattern

Hadoop is optimized for reading a fewer number of large files rather than many small files, whether from S3 or HDFS. You can use S3DistCp to aggregate small files into fewer large files of a size that you choose, which can optimize your analysis for both performance and cost.

In the following example, we combine small files into bigger files. We do so by using a regular expression with the –groupBy option.

$ s3-dist-cp --src /data/incoming/hourly_table --dest s3://my-tables/processing/daily_table --targetSize=10 --groupBy=’.*/hourly_table/.*/(\d\d)/.*\.log’

Let’s take a look into the target folders and compare them to the corresponding source folders:

$ hadoop fs -ls /data/incoming/hourly_table/2017-02-22/05/
Found 8 items
-rw-r--r--   1 hadoop hadoop     289949 2017-02-19 06:07 /data/incoming/hourly_table/2017-02-22/05/2017-02-22.05.11125.log
-rw-r--r--   1 hadoop hadoop     407290 2017-02-19 06:07 /data/incoming/hourly_table/2017-02-22/05/2017-02-22.05.19596.log
-rw-r--r--   1 hadoop hadoop     253434 2017-02-19 06:07 /data/incoming/hourly_table/2017-02-22/05/2017-02-22.05.30135.log
-rw-r--r--   1 hadoop hadoop     590655 2017-02-19 06:07 /data/incoming/hourly_table/2017-02-22/05/2017-02-22.05.36531.log
-rw-r--r--   1 hadoop hadoop     762076 2017-02-19 06:07 /data/incoming/hourly_table/2017-02-22/05/2017-02-22.05.47822.log
-rw-r--r--   1 hadoop hadoop     489783 2017-02-19 06:07 /data/incoming/hourly_table/2017-02-22/05/2017-02-22.05.80518.log
-rw-r--r--   1 hadoop hadoop     205976 2017-02-19 06:07 /data/incoming/hourly_table/2017-02-22/05/2017-02-22.05.99127.log
-rw-r--r--   1 hadoop hadoop        800 2017-02-19 06:07 /data/incoming/hourly_table/2017-02-22/05/processing.meta

 

$ hadoop fs -ls s3://my-tables/processing/daily_table/2017-02-22/05/
Found 2 items
-rw-rw-rw-   1   10541944 2017-02-28 05:16 s3://my-tables/processing/daily_table/2017-02-22/05/054
-rw-rw-rw-   1   10511516 2017-02-28 05:16 s3://my-tables/processing/daily_table/2017-02-22/05/055

As you can see, seven data files were combined into two with a size close to the requested 10 MB. The *.meta file was filtered out because --groupBy pattern works in a similar way to –srcPattern. We recommend keeping files larger than the default block size, which is 128 MB on EMR.

The name of the final file is composed of groups in the regular expression used in --groupBy plus some number to make the name unique. The pattern must have at least one group defined.

Let’s consider one more example. This time, we want the file name to be formed from three parts: year, month, and file extension (.log in this case). Here is an updated command:

$ s3-dist-cp --src /data/incoming/hourly_table --dest s3://my-tables/processing/daily_table_2017 --targetSize=10 --groupBy=’.*/hourly_table/.*(2017-).*/(\d\d)/.*\.(log)’

Now we have final files named in a different way:

$ hadoop fs -ls s3://my-tables/processing/daily_table_2017/2017-02-22/05/
Found 2 items
-rw-rw-rw-   1   10541944 2017-02-28 05:16 s3://my-tables/processing/daily_table/2017-02-22/05/2017-05log4
-rw-rw-rw-   1   10511516 2017-02-28 05:16 s3://my-tables/processing/daily_table/2017-02-22/05/2017-05log5

As you can see, names of final files consist of concatenation of 3 groups from the regular expression (2017-), (\d\d), (log).

You might find that occasionally you get an error that looks like the following:

$ s3-dist-cp --src /data/incoming/hourly_table --dest s3://my-tables/processing/daily_table_2017 --targetSize=10 --groupBy=’.*/hourly_table/.*(2018-).*/(\d\d)/.*\.(log)’
...
17/04/27 15:37:45 INFO S3DistCp.S3DistCp: Created 0 files to copy 0 files
... 
Exception in thread “main” java.lang.RuntimeException: Error running job
	at com.amazon.elasticmapreduce.S3DistCp.S3DistCp.run(S3DistCp.java:927)
	at com.amazon.elasticmapreduce.S3DistCp.S3DistCp.run(S3DistCp.java:705)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
	at com.amazon.elasticmapreduce.S3DistCp.Main.main(Main.java:22)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
…

In this case, the key information is contained in Created 0 files to copy 0 files. S3DistCp didn’t find any files to copy because the regular expression in the --groupBy option doesn’t match any files in the source location.

The reason for this issue varies. For example, it can be a mistake in the specified pattern. In the preceding example, we don’t have any files for the year 2018. Another common reason is incorrect escaping of the pattern when we submit S3DistCp command as a step, which is addressed later later in this post.

6. Upload files larger than 1 TB in size

The default upload chunk size when doing an S3 multipart upload is 128 MB. When files are larger than 1 TB, the total number of parts can reach over 10,000. Such a large number of parts can make the job run for a very long time or even fail.

In this case, you can improve job performance by increasing the size of each part. In S3DistCp, you can do this by using the --multipartUploadChunkSize option.

Let’s test how it works on several files about 200 GB in size. With the default part size, it takes about 84 minutes to copy them to S3 from HDFS.

We can increase the default part size to 1000 MB:

$ time s3-dist-cp --src /data/gb200 --dest s3://my-tables/data/S3DistCp/gb200_2 --multipartUploadChunkSize=1000
...
real    41m1.616s

The maximum part size is 5 GB. Keep in mind that larger parts have a higher chance to fail during upload and don’t necessarily speed up the process. Let’s run the same job with the maximum part size:

time s3-dist-cp --src /data/gb200 --dest s3://my-tables/data/S3DistCp/gb200_2 --multipartUploadChunkSize=5000
...
real    40m17.331s

7. Submit a S3DistCp step to an EMR cluster

You can run the S3DistCp tool in several ways. First, you can SSH to the master node and execute the command in a terminal window as we did in the preceding examples. This approach might be convenient for many use cases, but sometimes you might want to create a cluster that has some data on HDFS. You can do this by submitting a step directly in the AWS Management Console when creating a cluster.

In the console add step dialog box, we can fill the fields in the following way:

  • Step type: Custom JAR
  • Name*: S3DistCp Stepli>
  • JAR location: command-runner.jar
  • Arguments: s3-dist-cp --src s3://my-tables/incoming/hourly_table --dest /data/input/hourly_table --targetSize 10 --groupBy .*/hourly_table/.*(2017-).*/(\d\d)/.*\.(log)
  • Action of failure: Continue

Notice that we didn’t add quotation marks around our pattern. We needed quotation marks when we were using bash in the terminal window, but not here. The console takes care of escaping and transferring our arguments to the command on the cluster.

Another common use case is to run S3DistCp recurrently or on some event. We can always submit a new step to the existing cluster. The syntax here is slightly different than in previous examples. We separate arguments by commas. In the case of a complex pattern, we shield the whole step option with single quotation marks:

aws emr add-steps --cluster-id j-ABC123456789Z --steps 'Name=LoadData,Jar=command-runner.jar,ActionOnFailure=CONTINUE,Type=CUSTOM_JAR,Args=s3-dist-cp,--src,s3://my-tables/incoming/hourly_table,--dest,/data/input/hourly_table,--targetSize,10,--groupBy,.*/hourly_table/.*(2017-).*/(\d\d)/.*\.(log)'

Summary

This post showed you the basics of how S3DistCp works and highlighted some of its most useful features. It covered how you can use S3DistCp to optimize for raw files of different sizes and also selectively copy different files between locations. We also looked at several options for using the tool from SSH, the AWS Management Console, and the AWS CLI.

If you have questions or suggestions, leave a message in the comments.


Next Steps

Take your new knowledge to the next level! Click on the post below and learn the top 10 tips to improve query performance in Amazon Athena.

Top 10 Performance Tuning Tips for Amazon Athena


About the Author

Illya Yalovyy is a Senior Software Development Engineer with Amazon Web Services. He works on cutting-edge features of EMR and is heavily involved in open source projects such as Apache Hive, Apache Zookeeper, Apache Sqoop. His spare time is completely dedicated to his children and family.

 

New AWS Certification Specialty Exam for Big Data

Post Syndicated from Sara Snedeker original https://aws.amazon.com/blogs/big-data/new-aws-certification-specialty-exam-for-big-data/

AWS Certifications validate technical knowledge with an industry-recognized credential. Today, the AWS Certification team released the AWS Certified Big Data – Specialty exam. This new exam validates technical skills and experience in designing and implementing AWS services to derive value from data. The exam requires a current Associate AWS Certification and is intended for individuals who perform complex big data analyses.

Individuals who are interested in sitting for this exam should know how to do the following:

  • Implement core AWS big data services according to basic architectural best practices
  • Design and maintain big data
  • Leverage tools to automate data analysis

To prepare for the exam, we recommend the Big Data on AWS course, plus AWS whitepapers and documentation that are focused on big data.

This credential can help you stand out from the crowd, get recognized, and provide more evidence of your unique technical skills.

The AWS Certification team also released an AWS Certified Advanced Networking – Specialty exam and new AWS Certification Benefits. You can read more about these new releases on the AWS Blog.

Have more questions about AWS Certification? See our AWS Certification FAQ.

New AWS Certification Specialty Exams & Benefits

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-aws-certification-specialty-exams-benefits/

We are making two important updates to the AWS Certification program today. We are introducing two new AWS Certification Specialty Exams and our new AWS Certification Benefits Program, giving you another way to validate your skills and to showcase your expertise.

New AWS Certification Specialty Exams
Our new AWS Certified Advanced Networking – Specialty and AWS Certified Big Data – Specialty exams are designed for people with at least one current Associate AWS Certification and deep hands-on experience in the relevant specialty. These credentials can help you stand out from the crowd, get recognized, and provide more evidence of your unique technical skills.

New AWS Certification Benefits
Designed to help showcase your achievement and further advance your AWS expertise, tiered AWS Certification Benefits include newly designed AWS Certified logos and certificates, digital badges, free practice exams, branded merchandise, transcript sharing, and more. Benefits are accessed based on the AWS Certifications you have achieved. The more exams you successfully complete, the more benefits you will receive.

Access Your Specialty Exams and Benefits Today
Sign in to the AWS Training and Certification Portal using an Amazon account or (if you are an APN Partner) your APN Portal credentials. Then click on the Certification link on the AWS Training and Certification Portal to access your AWS Certification Account:

If you previously had an account in Webassessor, you can link your accounts so that your AWS Certification history shows in the portal (read “I already have an AWS Certification account in Webassessor. How do I access my AWS Certification history?” in the AWS Training FAQ to see how to do this).

Learn More
Check out the AWS Certifications FAQ and the AWS Training and Certification Portal FAQ if you have any questions.

Jeff;

Who Are the Shadow Brokers?

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/05/who_are_the_sha.html

In 2013, a mysterious group of hackers that calls itself the Shadow Brokers stole a few disks full of NSA secrets. Since last summer, they’ve been dumping these secrets on the Internet. They have publicly embarrassed the NSA and damaged its intelligence-gathering capabilities, while at the same time have put sophisticated cyberweapons in the hands of anyone who wants them. They have exposed major vulnerabilities in Cisco routers, Microsoft Windows, and Linux mail servers, forcing those companies and their customers to scramble. And they gave the authors of the WannaCry ransomware the exploit they needed to infect hundreds of thousands of computer worldwide this month.

After the WannaCry outbreak, the Shadow Brokers threatened to release more NSA secrets every month, giving cybercriminals and other governments worldwide even more exploits and hacking tools.

Who are these guys? And how did they steal this information? The short answer is: we don’t know. But we can make some educated guesses based on the material they’ve published.

The Shadow Brokers suddenly appeared last August, when they published a series of hacking tools and computer exploits­ — vulnerabilities in common software — ­from the NSA. The material was from autumn 2013, and seems to have been collected from an external NSA staging server, a machine that is owned, leased, or otherwise controlled by the US, but with no connection to the agency. NSA hackers find obscure corners of the Internet to hide the tools they need as they go about their work, and it seems the Shadow Brokers successfully hacked one of those caches.

In total, the group has published four sets of NSA material: a set of exploits and hacking tools against routers, the devices that direct data throughout computer networks; a similar collection against mail servers; another collection against Microsoft Windows; and a working directory of an NSA analyst breaking into the SWIFT banking network. Looking at the time stamps on the files and other material, they all come from around 2013. The Windows attack tools, published last month, might be a year or so older, based on which versions of Windows the tools support.

The releases are so different that they’re almost certainly from multiple sources at the NSA. The SWIFT files seem to come from an internal NSA computer, albeit one connected to the Internet. The Microsoft files seem different, too; they don’t have the same identifying information that the router and mail server files do. The Shadow Brokers have released all the material unredacted, without the care journalists took with the Snowden documents or even the care WikiLeaks has taken with the CIA secrets it’s publishing. They also posted anonymous messages in bad English but with American cultural references.

Given all of this, I don’t think the agent responsible is a whistleblower. While possible, it seems like a whistleblower wouldn’t sit on attack tools for three years before publishing. They would act more like Edward Snowden or Chelsea Manning, collecting for a time and then publishing immediately­ — and publishing documents that discuss what the US is doing to whom. That’s not what we’re seeing here; it’s simply a bunch of exploit code, which doesn’t have the political or ethical implications that a whistleblower would want to highlight. The SWIFT documents are records of an NSA operation, and the other posted files demonstrate that the NSA is hoarding vulnerabilities for attack rather than helping fix them and improve all of our security.

I also don’t think that it’s random hackers who stumbled on these tools and are just trying to harm the NSA or the US. Again, the three-year wait makes no sense. These documents and tools are cyber-Kryptonite; anyone who is secretly hoarding them is in danger from half the intelligence agencies in the world. Additionally, the publication schedule doesn’t make sense for the leakers to be cybercriminals. Criminals would use the hacking tools for themselves, incorporating the exploits into worms and viruses, and generally profiting from the theft.

That leaves a nation state. Whoever got this information years before and is leaking it now has to be both capable of hacking the NSA and willing to publish it all. Countries like Israel and France are capable, but would never publish, because they wouldn’t want to incur the wrath of the US. Country like North Korea or Iran probably aren’t capable. (Additionally, North Korea is suspected of being behind WannaCry, which was written after the Shadow Brokers released that vulnerability to the public.) As I’ve written previously, the obvious list of countries who fit my two criteria is small: Russia, China, and­ — I’m out of ideas. And China is currently trying to make nice with the US.

It was generally believed last August, when the first documents were released and before it became politically controversial to say so, that the Russians were behind the leak, and that it was a warning message to President Barack Obama not to retaliate for the Democratic National Committee hacks. Edward Snowden guessed Russia, too. But the problem with the Russia theory is, why? These leaked tools are much more valuable if kept secret. Russia could use the knowledge to detect NSA hacking in its own country and to attack other countries. By publishing the tools, the Shadow Brokers are signaling that they don’t care if the US knows the tools were stolen.

Sure, there’s a chance the attackers knew that the US knew that the attackers knew — ­and round and round we go. But the “we don’t give a damn” nature of the releases points to an attacker who isn’t thinking strategically: a lone hacker or hacking group, which clashes with the nation-state theory.

This is all speculation on my part, based on discussion with others who don’t have access to the classified forensic and intelligence analysis. Inside the NSA, they have a lot more information. Many of the files published include operational notes and identifying information. NSA researchers know exactly which servers were compromised, and through that know what other information the attackers would have access to. As with the Snowden documents, though, they only know what the attackers could have taken and not what they did take. But they did alert Microsoft about the Windows vulnerability the Shadow Brokers released months in advance. Did they have eavesdropping capability inside whoever stole the files, as they claimed to when the Russians attacked the State Department? We have no idea.

So, how did the Shadow Brokers do it? Did someone inside the NSA accidentally mount the wrong server on some external network? That’s possible, but seems very unlikely for the organization to make that kind of rookie mistake. Did someone hack the NSA itself? Could there be a mole inside the NSA?

If it is a mole, my guess is that the person was arrested before the Shadow Brokers released anything. No country would burn a mole working for it by publishing what that person delivered while he or she was still in danger. Intelligence agencies know that if they betray a source this severely, they’ll never get another one.

That points to two possibilities. The first is that the files came from Hal Martin. He’s the NSA contractor who was arrested in August for hoarding agency secrets in his house for two years. He can’t be the publisher, because the Shadow Brokers are in business even though he is in prison. But maybe the leaker got the documents from his stash, either because Martin gave the documents to them or because he himself was hacked. The dates line up, so it’s theoretically possible. There’s nothing in the public indictment against Martin that speaks to his selling secrets to a foreign power, but that’s just the sort of thing that would be left out. It’s not needed for a conviction.

If the source of the documents is Hal Martin, then we can speculate that a random hacker did in fact stumble on it — ­no need for nation-state cyberattack skills.

The other option is a mysterious second NSA leaker of cyberattack tools. Could this be the person who stole the NSA documents and passed them on to someone else? The only time I have ever heard about this was from a Washington Post story about Martin:

There was a second, previously undisclosed breach of cybertools, discovered in the summer of 2015, which was also carried out by a TAO employee [a worker in the Office of Tailored Access Operations], one official said. That individual also has been arrested, but his case has not been made public. The individual is not thought to have shared the material with another country, the official said.

Of course, “not thought to have” is not the same as not having done so.

It is interesting that there have been no public arrests of anyone in connection with these hacks. If the NSA knows where the files came from, it knows who had access to them — ­and it’s long since questioned everyone involved and should know if someone deliberately or accidentally lost control of them. I know that many people, both inside the government and out, think there is some sort of domestic involvement; things may be more complicated than I realize.

It’s also not over. Last week, the Shadow Brokers were back, with a rambling and taunting message announcing a “Data Dump of the Month” service. They’re offering to sell unreleased NSA attack tools­ — something they also tried last August­ — with the threat to publish them if no one pays. The group has made good on their previous boasts: In the coming months, we might see new exploits against web browsers, networking equipment, smartphones, and operating systems — Windows in particular. Even scarier, they’re threatening to release raw NSA intercepts: data from the SWIFT network and banks, and “compromised data from Russian, Chinese, Iranian, or North Korean nukes and missile programs.”

Whoever the Shadow Brokers are, however they stole these disks full of NSA secrets, and for whatever reason they’re releasing them, it’s going to be a long summer inside of Fort Meade­ — as it will be for the rest of us.

This essay previously appeared in the Atlantic, and is an update of this essay from Lawfare.

Reddit’s Piracy Sub-Reddit Reopens After Mutiny Shutdown

Post Syndicated from Andy original https://torrentfreak.com/reddits-piracy-sub-reddit-reopens-after-mutiny-shutdown-170523/

For millions of people, Reddit is one of the most popular sources of news online. Arguably, though, the site’s real value lies with its users.

Like any community, Reddit user comments can range from the brilliantly informed to the deliberately destructive. But, more often than not, the weight of the crowd tends to get to the truth, sometimes with the help of the site’s moderators.

Each section of Reddit (known as a ‘sub-Reddit’), is dedicated to a particular topic and is controlled by a team of moderators. While mileage can vary, moderators tend to do a good job and are often relied upon to settle disputes and hold errant users to the rules.

Last night in /r/piracy (a sub-Reddit with close to 100,000 subscribers) one moderator went rogue, which resulted in the sub-Reddit being shut down.

According to one of the moderators now in charge of /r/piracy, a now-former moderator by the name of Samewhiterabbits committed a sin by using the sub-Reddit to further his own agenda. ‘Dysgraphical’ says that the problems started when Samewhiterabbits began heavily spamming the ‘sub’ with links to his own streaming website projects.

Apparently, this has been going on for some time, with Samewhiterabbits standing accused of launching, promoting and spamming websites that have the same names as existing and/or defunct platforms, but claiming them to be the real deal.

“Samewhiterabbits is using r/Piracy as a platform to spam his monetized website forks which he claims as official,” Dysgraphical said in a statement.

“This isn’t recent activity but rather his model. He capitalizes from streaming sites that were shutdown and spams his new domain(s) as the new home for the aforementioned streaming site.

“This moderator explicitly deletes competing stream sites and uses alternative account(s) to spam his monetized stream sites. It is not only blatant spam, but censorship as well.”

After another post appeared promoting ‘popular streaming sites’ that the /r/piracy team as a whole had no hand in, moderators including Dysgraphical and TheWalkingTroll stepped in to sort out the problem.

They were met with resistance, with Samewhiterabbit – who still had moderator powers – taking several popular threads ‘hostage’ and stopping the rest of the mod team from ending the wave of misleading spam.

“He has held several threads hostage by locking/removing them to censor any critique or mention of his shady wrongdoings. With limited moderation privileges, the most we can do at the moment is delete his threads,” Dysgraphical reported last night.

While sorting out the problem, /r/piracy was shutdown or, more accurately, made ‘private’. Then, in order to move forward, the moderators applied for more power (known as ‘permissions’ in forum speak) to remove the errant mod from the team.

To achieve that, an application was made to Reddit’s admins (those at the top of the site) who responded extremely quickly to help sort out the mess.

“A few of us now have full permissions. Thankfully the admins were rather quick in their response (given they can take several days) and we got this sorted quickly,” Dysgraphical reports.

Once that power was in the right hands, justice was served in the manner determined by the rest of the team. A few hours ago, Samewhiterabbits was reported banned from /r/piracy and everything started to get back to normal.

While online ‘drama’ like this predates the Internet, this particular situation does highlight the importance of having responsible moderators on any discussion platform. There is often an assumption that these figures are in authority because they can be trusted, but that is not necessarily so.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

WannaBark (at the Moon)

Post Syndicated from Йовко Ламбрев original https://yovko.net/wannabark/

Не. Няма да пиша за ИскаПлаче. Вече много се изписа – и както обикновено малка част си струваше четенето.

Проблемът е много по-голям от раздуханата случка. А резюмето е, че сме прецакани. Генерално сме прецакани! Нещо, което си повтаряме от време на време из технологичните среди, но е крайно време да го обясним с човешки думи на всички и да започнем някак да поправяме нещата.

Интернет е лабораторно чедо. Няма някакъв съвършен имунитет. Роди се и проходи в среда на академична романтика, обгрижвано с наивната добронамереност на първосъздателите и първопотребителите си. До скоро (в Интернет) все още беше донякъде вярно, че мнозинството по принцип е рационално, що-годе грамотно, а полезното и смисленото естествено ще надделяват над глупостта и враждебността. Вече не е така. Приказката свърши!

Време е да се събудим и да признаем, че доброто няма да победи злото по подразбиране, без да му помогнем.

Свързани сме. Всички. Повече от всякога. И затова трябва да осъзнаваме отговорността си един към друг. Както когато сме пипнали грип, не си стоим вкъщи само за да се излекуваме по-бързо, а и за да ограничим заразата сред останалите – така и не можем в наши дни да си позволим да ползваме компютър, смартфон и софтуер, който е стар и изоставен от поддръжка – защото сме уязвими не само ние, но застрашаваме и останалите.

Както някой сполучливо обобщи тези дни в twitter: „Не е вярно, че не можеш да си позволиш да обновяваш. Не можеш да си позволиш да не обновяваш!“

Системите, които ползваме явно или невидимо около нас, ще стават все по-свързани и отговорността да ги опазим е обща. Тя включва и да изискваме отговорност – от себе си, от операторите, от правителствата.

WannaCry нямаше да има този ефект, ако пострадалите бяха обновили софтуера си. Затова, когато на телефона или какъвто и да е компютър или умно устройство изгрее обновление, за бога, не го пренебрегвайте! Да, понякога може да е досадно. Не е много забавно и да си миеш зъбите, но е силно препоръчително и полезно за здравето.

Но… дори и от утре всички да започнем стриктно да спазваме това, то пак няма да е достатъчно, ако срещу себе си имаме правителства и организации, които злоупотребяват. WannaCry е производна на уязвимост в Windows, която Агенцията за сигурност на Съединените щати е открила, но вместо да уведоми за това Microsoft, неясно колко време се е възползвала от нея, за да прониква в чужди системи и да проследява и краде данни от тях. Кракерска групировка ги открадна пък от тях преди време, публикува присвоения арсенал – и ето – бързо се намери някой, който да го използва с користна цел.

Такива случки тепърва ще зачестяват. И ако правителствата ни играят срещу нас… няма да е никак весело.

Нужна е глобална, масова и упорита съпротива срещу практиката да се пазят в тайна уязвимости.

Играем и една друга рискована игра. Ежедневно. С великодушно безразличие за мащаба и ефекта на проблема. Смартфоните и таблетите ни също са компютри, а огромна част от производителите им, увлечени от стремежа за повече продажби на нови модели, бързат да „пенсионират“ старите, спирайки обновленията за тях, притискайки клиентите си да сменят устройството си. Това обаче не се случва така, както на производителите им се иска, и по-старите устройства продължават да бъдат ползвани без обновления, с уязвимости, препродават се на вторичен пазар, преотстъпват се на деца, роднини или по-възрастни хора. Докато един ден… нещо като WannaCry ще направи и от това новина… или тихо ще отмъква данни – телефонни номера, съобщения, снимки, пароли, кредитни карти, всевъзможна лична информация… И понеже сме толкова свързани – ще пострадат не само притежателите на пробити устройства, а косвено и тези, с които те са в някакви взаимоотношения.

Най-лошият пример са старите телефони и таблети с Android, за които Google няма механизъм да принуди производителите им да се грижат по-добре и по-адекватно и продължително за тях.

Огледайте се около себе си и вижте колко ваши познати използват много стари устройства.

За кошмарната сигурност на доста IoT джаджи за автоматизация и управление на умни домове и производства дори не ми се отваря тема.

Но като споменах Google… Необходим ни е нов, променен Интернет!

Централизираният модел на гигантски силози с информация, които пълним всички, но контрол върху тях имат малцина, е фундаментално сбъркан.

И тук проблемът не опира само до сигурност, защото пробив в такава система директно се проектира върху много хора, които разчитат на нея. Имаме и вторичен, но много сериозен проблем, свързан със зависимостта ни от нея и злоупотребата с данните ни там.

Подхлъзвайки ни да ползваме „безплатните“ услуги на Google, Facebook и подобните им… те ни обричат на зависимост и контрол. Елегантно се оказва, че данните, които им поверяваме, не са наши данни, а техни. Те ги използват, за да ни профилират, да отгатват интересите ни, темите към които имаме чувствителност, манипулират ни с тях, продават ги, за да ни манипулират и други. Това е цената на „безплатното“.

Както казва Aral Balkan (вече два пъти беше и в България) – това не е data farming, а people farming, защото нашите данни това сме самите ние. А пренебрежителното махване с ръка, че няма какво да крием, е престъпление към общността ни (пак да акцентирам) в нашия свързан свят, защото пък както казва Edward Snowden: „Да нямаш нужда от лична неприкосновеност, защото нямало какво да криеш, е като да нямаш нужда от право на свободна воля, защото няма какво да кажеш.“

Права = Сила

И борбата за тях (трябва да) е непрекъсната.

  • Трябва да си върнем контрола върху дигиталното ни Аз в Интернет. Да редуцираме до минимум използването на безплатни услуги, които събират данни.
  • Да приемем грижата за сигурността на софтуера и устройствата ни като част от личната ни хигиена.
  • Да възпитаваме чувствителност към манипулациите в Интернет и особено към фалшивите новини и некачествената журналистика.
  • Да настояваме за прозрачност от правителствата, организациите, политиците и корпорациите.
  • Да предпочитаме децентрализирани или фокусирани (в едно нещо) услуги, вместо глобални конгломерати със стремеж към монопол в колкото се може повече теми (напр. ProtonMail или FastMail вместо Gmail, собствени блогове вместо Facebook и др.)
  • Да използваме по-малки, децентрализирани платформи (медийни, за услуги, за комуникация) и да ги подкрепяме финансово, а когато можем – и да стартираме собствени такива.
  • Да надвиваме индивидуализма си и да се подкрепяме взаимно в общността си.
  • Да обучаваме и призоваваме повече хора да правят същото…

Бъдещето принадлежи не на големите мастодонти, а на мрежи от малки, взаимносвързани, независими и подкрепящи се проекти, които случваме заедно. Колкото по-рано осъзнаем тенденцията и силата си, толкова по-добре.

Снимка: Markus Spiske

AWS Lambda Support for AWS X-Ray

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/aws-lambda-support-for-aws-x-ray/

Today we’re announcing general availability of AWS Lambda support for AWS X-Ray. As you may already know from Jeff’s GA POST, X-Ray is an AWS service for analyzing the execution and performance behavior of distributed applications. Traditional debugging methods don’t work so well for microservice based applications, in which there are multiple, independent components running on different services. X-Ray allows you to rapidly diagnose errors, slowdowns, and timeouts by breaking down the latency in your applications. I’ll demonstrate how you can use X-Ray in your own applications in just a moment by walking us through building and analyzing a simple Lambda based application.

If you just want to get started right away you can easily turn on X-Ray for your existing Lambda functions by navigating to your function’s configuration page and enabling tracing:

Or in the AWS Command Line Interface (CLI) by updating the functions’s tracing-config (Be sure to pass in a --function-name as well):

$ aws lambda update-function-configuration --tracing-config '{"Mode": "Active"}'

When tracing mode is active Lambda will attempt to trace your function (unless explicitly told not to trace by an upstream service). Otherwise, your function will only be traced if it is explicitly told to do so by an upstream service. Once tracing is enabled, you’ll start generating traces and you’ll get a visual representation of the resources in your application and the connections (edges) between them. One thing to note is that the X-Ray daemon does consume some of your Lambda function’s resources. If you’re getting close to your memory limit Lambda will try to kill the X-Ray daemon to avoid throwing an out-of-memory error.

Let’s test this new integration out by building a quick application that uses a few different services.


As twenty-something with a smartphone I have a lot of pictures selfies (10000+!) and I thought it would be great to analyze all of them. We’ll write a simple Lambda function with the Java 8 runtime that responds to new images uploaded into an Amazon Simple Storage Service (S3) bucket. We’ll use Amazon Rekognition on the photos and store the detected labels in Amazon DynamoDB.

service map

First, let’s define a few quick X-Ray vocabulary words: subsegments, segments, and traces. Got that? X-Ray is easy to understand if you remember that subsegments and segments make up traces which X-Ray processes to generate service graphs. Service graphs make a nice visual representation we can see above (with different colors indicating various request responses). The compute resources that run your applications send data about the work they’re doing in the form of segments. You can add additional annotations about that data and more granular timing of your code by creating subsgements. The path of a request through your application is tracked with a trace. A trace collects all the segments generated by a single request. That means you can easily trace Lambda events coming in from S3 all the way to DynamoDB and understand where errors and latencies are cropping up.

So, we’ll create an S3 bucket called selfies-bucket, a DynamoDB table called selfies-table, and a Lambda function. We’ll add a trigger to our Lambda function for the S3 bucket on ObjectCreated:All events. Our Lambda function code will be super simple and you can look at it in it’s entirety here. With no code changes we can enable X-Ray in our Java function by including the aws-xray-sdk and aws-xray-sdk-recorder-aws-sdk-instrumentor packages in our JAR.

Let’s trigger some photo uploads and get a look at the traces in X-Ray.

We’ve got some data! We can click on one of these individual traces for a lot of detailed information on our invocation.

In the first AWS::Lambda segmet we see the dwell time of the function, how long it spent waiting to execute, followed by the number of execution attempts.

In the second AWS::Lambda::Function segment there are a few possible subsegments:

  • The inititlization subsegment includes all of the time spent before your function handler starts executing
  • The outbound service calls
  • Any of your custom subsegments (these are really easy to add)

Hmm, it seems like there’s a bit of an issue on the DynamoDB side. We can even dive deeper and get the full exception stacktrace by clicking on the error icon. You can see we’ve been throttled by DynamoDB because we’re out of write capacity units. Luckily we can add more with just a few clicks or a quick API call. As we do that we’ll see more and more green on our service map!

The X-Ray SDKs make it super easy to emit data to X-Ray, but you don’t have to use them to talk to the X-Ray daemon. For Python, you can check out this library from rackspace called fleece. The X-Ray service is full of interesting stuff and the best place to learn more is by hopping over to the documentation. I’ve been using it for my @awscloudninja bot and it’s working great! Just keep in mind that this isn’t an official library and isn’t supported by AWS.

Personally, I’m really excited to use X-Ray in all of my upcoming projects because it really will save me some time and effort debugging and operating. I look forward to seeing what our customers can build with it as well. If you come up with any cool tricks or hacks please let me know!

– Randall

The Quick vs. the Strong: Commentary on Cory Doctorow’s Walkaway

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/05/the_quick_vs_th.html

Technological advances change the world. That’s partly because of what they are, but even more because of the social changes they enable. New technologies upend power balances. They give groups new capabilities, increased effectiveness, and new defenses. The Internet decades have been a never-ending series of these upendings. We’ve seen existing industries fall and new industries rise. We’ve seen governments become more powerful in some areas and less in others. We’ve seen the rise of a new form of governance: a multi-stakeholder model where skilled individuals can have more power than multinational corporations or major governments.

Among the many power struggles, there is one type I want to particularly highlight: the battles between the nimble individuals who start using a new technology first, and the slower organizations that come along later.

In general, the unempowered are the first to benefit from new technologies: hackers, dissidents, marginalized groups, criminals, and so on. When they first encountered the Internet, it was transformative. Suddenly, they had access to technologies for dissemination, coordination, organization, and action — things that were impossibly hard before. This can be incredibly empowering. In the early decades of the Internet, we saw it in the rise of Usenet discussion forums and special-interest mailing lists, in how the Internet routed around censorship, and how Internet governance bypassed traditional government and corporate models. More recently, we saw it in the SOPA/PIPA debate of 2011-12, the Gezi protests in Turkey and the various “color” revolutions, and the rising use of crowdfunding. These technologies can invert power dynamics, even in the presence of government surveillance and censorship.

But that’s just half the story. Technology magnifies power in general, but the rates of adoption are different. Criminals, dissidents, the unorganized — all outliers — are more agile. They can make use of new technologies faster, and can magnify their collective power because of it. But when the already-powerful big institutions finally figured out how to use the Internet, they had more raw power to magnify.

This is true for both governments and corporations. We now know that governments all over the world are militarizing the Internet, using it for surveillance, censorship, and propaganda. Large corporations are using it to control what we can do and see, and the rise of winner-take-all distribution systems only exacerbates this.

This is the fundamental tension at the heart of the Internet, and information-based technology in general. The unempowered are more efficient at leveraging new technology, while the powerful have more raw power to leverage. These two trends lead to a battle between the quick and the strong: the quick who can make use of new power faster, and the strong who can make use of that same power more effectively.

This battle is playing out today in many different areas of information technology. You can see it in the security vs. surveillance battles between criminals and the FBI, or dissidents and the Chinese government. You can see it in the battles between content pirates and various media organizations. You can see it where social-media giants and Internet-commerce giants battle against new upstarts. You can see it in politics, where the newer Internet-aware organizations fight with the older, more established, political organizations. You can even see it in warfare, where a small cadre of military can keep a country under perpetual bombardment — using drones — with no risk to the attackers.

This battle is fundamental to Cory Doctorow’s new novel Walkaway. Our heroes represent the quick: those who have checked out of traditional society, and thrive because easy access to 3D printers enables them to eschew traditional notions of property. Their enemy is the strong: the traditional government institutions that exert their power mostly because they can. This battle rages through most of the book, as the quick embrace ever-new technologies and the strong struggle to catch up.

It’s easy to root for the quick, both in Doctorow’s book and in the real world. And while I’m not going to give away Doctorow’s ending — and I don’t know enough to predict how it will play out in the real world — right now, trends favor the strong.

Centralized infrastructure favors traditional power, and the Internet is becoming more centralized. This is true both at the endpoints, where companies like Facebook, Apple, Google, and Amazon control much of how we interact with information. It’s also true in the middle, where companies like Comcast increasingly control how information gets to us. It’s true in countries like Russia and China that increasingly legislate their own national agenda onto their pieces of the Internet. And it’s even true in countries like the US and the UK, that increasingly legislate more government surveillance capabilities.

At the 1996 World Economic Forum, cyber-libertarian John Perry Barlow issued his “Declaration of the Independence of Cyberspace,” telling the assembled world leaders and titans of Industry: “You have no moral right to rule us, nor do you possess any methods of enforcement that we have true reason to fear.” Many of us believed him a scant 20 years ago, but today those words ring hollow.

But if history is any guide, these things are cyclic. In another 20 years, even newer technologies — both the ones Doctorow focuses on and the ones no one can predict — could easily tip the balance back in favor of the quick. Whether that will result in more of a utopia or a dystopia depends partly on these technologies, but even more on the social changes resulting from these technologies. I’m short-term pessimistic but long-term optimistic.

This essay previously appeared on Crooked Timber.

Product or Project?

Post Syndicated from Matt Richardson original https://www.raspberrypi.org/blog/product-or-project/

This column is from The MagPi issue 57. You can download a PDF of the full issue for free, or subscribe to receive the print edition in your mailbox or the digital edition on your tablet. All proceeds from the print and digital editions help the Raspberry Pi Foundation achieve its charitable goals.

Image of MagPi magazine and AIY Project Kit

Taking inspiration from a widely known inspirational phrase, I like to tell people, “make the thing you wish to see in the world.” In other words, you don’t have to wait for a company to create the exact product you want. You can be a maker as well as a consumer! Prototyping with hardware has become easier and more affordable, empowering people to make products that suit their needs perfectly. And the people making these things aren’t necessarily electrical engineers, computer scientists, or product designers. They’re not even necessarily adults. They’re often self-taught hobbyists who are empowered by maker-friendly technology.

It’s a subject I’ve been very interested in, and I have written about it before. Here’s what I’ve noticed: the flow between maker project and consumer product moves in both directions. In other words, consumer products can start off as maker projects. Just take a look at the story behind many of the crowdfunded products on sites such as Kickstarter. Conversely, consumer products can evolve into maker products as well. The cover story for the latest issue of The MagPi is a perfect example of that. Google has given you the resources you need to build your own dedicated Google Assistant device. How cool is that?

David Pride on Twitter

@Raspberry_Pi @TheMagP1 Oh this is going to be a ridiculous amount of fun. 😊 #AIYProjects #woodchuck https://t.co/2sWYmpi6T1

But consumer products becoming hackable hardware isn’t always an intentional move by the product’s maker. In the 2000s, TiVo set-top DVRs were a hot product and their most enthusiastic fans figured out how to hack the product to customise it to meet their needs without any kind of support from TiVo.

Embracing change

But since then, things have changed. For example, when Microsoft’s Kinect for the Xbox 360 was released in 2010, makers were immediately enticed by its capabilities. It not only acted as a camera, but it could also sense depth, a feature that would be useful for identifying the position of objects in a space. At first, there was no hacker support from Microsoft, so Adafruit Industries announced a $3,000 bounty to create open-source drivers so that anyone could access the features of Kinect for their own projects. Since then, Microsoft has embraced the use of Kinect for these purposes.

The Create 2 from iRobot

iRobot’s Create 2, a hackable version of the Roomba

Consumer product companies even make versions of their products that are specifically meant for hacking, making, and learning. Belkin’s WeMo home automation product line includes the WeMo Maker, a device that can act as a remote relay or sensor and hook into your home automation system. And iRobot offers Create 2, a hackable version of its Roomba floor-cleaning robot. While iRobot aimed the robot at STEM educators, you could use it for personal projects too. Electronic instrument maker Korg takes its maker-friendly approach to the next level by releasing the schematics for some of its analogue synthesiser products.

Why would a company want to do this? There are a few possible reasons. For one, it’s a way of encouraging consumers to create a community around a product. It could be a way for innovation with the product to continue, unchecked by the firm’s own limits on resources. For certain, it’s an awesome feel-good way for a company to empower their own users. Whatever the reason these products exist, it’s the digital maker who comes out ahead. They have more affordable tools, materials, and resources to create their own customised products and possibly learn a thing or two along the way.

With maker-friendly, hackable products, being a creator and a consumer aren’t mutually exclusive. In fact, you’re probably getting the best of both worlds: great products and great opportunities to make the thing you wish to see in the world.

The post Product or Project? appeared first on Raspberry Pi.

YouTube Keeps People From Pirate Sites, Study Shows

Post Syndicated from Ernesto original https://torrentfreak.com/youtube-keeps-people-from-pirate-sites-study-shows-170511/

The music industry has witnessed some dramatic changes over the past decade and a half.

With the rise of digital, people’s music consumption habits evolved dramatically, followed by more change when subscription streaming services came along.

Another popular way for people to enjoy music noawdays is via YouTube. The video streaming platform offers free access to millions of songs, which are often uploaded by artists or the labels themselves.

Still, YouTube is getting little praise from the major labels. Instead, music insiders often characterize the video platform as a DMCA-protected piracy racketeer, that exploits legal loopholes to profit from artists’ hard work.

YouTube is generating healthy profits at a minimal cost and drives people away from legal platforms, the argument goes.

In an attempt to change this perception, YouTube has commissioned a study from the research outfit RBB Economics to see how the service impacts the music industry. The first results, published today, are a positive start.

The study examined exclusive YouTube data and a survey of 1,500 users across Germany, France, Italy and the U.K, asking them about their consumption habits. In particular, they were asked if YouTube keeps them away from paid music alternatives.

According to YouTube, which just unveiled the results, the data paints a different picture.

“The study finds that this is not the case. In fact, if YouTube didn’t exist, 85% of time spent on YouTube would move to lower value channels, and would result in a significant increase in piracy,” YouTube’s Simon Morrison writes.

If YouTube disappeared overnight, roughly half of all the time spent there on music would be “lost.” Furthermore, a significant portion of YouTube users would switch to using pirate sites and services instead.

“The results suggest that if YouTube were no longer able to offer music, time spent listening to pirated content would increase by +29%. This is consistent with YouTube being a substitute for pirated content,” RBB Economics writes.

In addition, the researchers also found that blocking music on YouTube doesn’t lead to an increase in streaming on other platforms, such as Spotify.

While YouTube doesn’t highlight it, the report also finds that some people would switch to “higher value” (e.g. paid) services if YouTube weren’t available. This amounts to roughly 15% of the total.

In other words, if the music industry is willing to pass on the $1 billion YouTube currently pays out and accept a hefty increase in piracy, there would be a boost in revenue through other channels. Whether that’s worth it is up for debate of course.

YouTube believes that the results are pretty convincing though. They rely on RBB Economic’s conclusion that there no evidence of “significant cannibalization” and believe that their service has a positive impact overall.

“The cumulative effect of these findings is that YouTube has a market expansion effect, not a cannibalizing one,” YouTube writes.

The full results are available here (pdf), courtesy of RBB Economics. YouTube announced that more of these reports will follow in the near future.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Steal This Show S03E01: ‘Toxic Avenger Loves Pirates’

Post Syndicated from J.J. King original https://torrentfreak.com/steal-show-s03e01-toxic-avenger-loves-pirates/

stslogo180If you enjoy this episode, consider becoming a patron and getting involved with the show. Check out Steal This Show’s Patreon campaign: support us and get all kinds of fantastic benefits!

In this first episode of the new season, we meet the irrepressible Lloyd Kaufman, co-founder of Troma Entertainment and director of The Toxic Avenger.

Find out why Lloyd loves pirates – except state-sponsored ones; why vertical integration, media consolidation and the end of Net Neutrality would kill Troma; and why Lloyd is skeptical that crowdfunding systems like Patreon have a role in crowdfunding independent movies.

Check out Lloyd’s movies on his YouTube channel!

Steal This Show aims to release bi-weekly episodes featuring insiders discussing copyright and file-sharing news. It complements our regular reporting by adding more room for opinion, commentary, and analysis.

The guests for our news discussions will vary, and we’ll aim to introduce voices from different backgrounds and persuasions. In addition to news, STS will also produce features interviewing some of the great innovators and minds.

Host: Jamie King

Guest: Lloyd Kaufman

Produced by Jamie King
Edited & Mixed by Riley Byrne
Original Music by David Triana
Web Production by Siraje Amarniss

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

John Oliver is wrong about Net Neutrality

Post Syndicated from Robert Graham original http://blog.erratasec.com/2017/05/john-oliver-is-wrong-about-net.html

People keep linking to John Oliver bits. We should stop doing this. This is comedy, but people are confused into thinking Oliver is engaging in rational political debate:
Enlightened people know that reasonable people disagree, that there’s two sides to any debate. John Oliver’s bit erodes that belief, making one side (your side) sound smart, and the other side sound unreasonable.
The #1 thing you should know about Net Neutrality is that reasonable people disagree. It doesn’t mean they are right, only that they are reasonable. They aren’t stupid. They aren’t shills for the telcom lobby, or confused by the telcom lobby. Indeed, those opposed to Net Neutrality are the tech experts who know how packets are routed, whereas the supporters tend only to be lawyers, academics, and activists. If you think that the anti-NetNeutrality crowd is unreasonable, then you are in a dangerous filter bubble.
Most everything in John Oliver’s piece is incorrect.
For example, he says that without Net Neutrality, Comcast can prefer original shows it produces, and slow down competing original shows by Netflix. This is silly: Comcast already does that, even with NetNeutrality rules.
Comcast owns NBC, which produces a lot of original shows. During prime time (8pm to 11pm), Comcast delivers those shows at 6-mbps to its customers, while Netflix is throttled to around 3-mbps. Because of this, Comcast original shows are seen at higher quality than Netflix shows.
Comcast can do this, even with NetNeutrality rules, because it separates its cables into “channels”. One channel carries public Internet traffic, like Netflix. The other channels carry private Internet traffic, for broadcast TV shows and pay-per-view.
All NetNeutrality means is that if Comcast wants to give preference to its own contents/services, it has to do so using separate channels on the wire, rather than pushing everything over the same channel. This is a detail nobody tells you because NetNeutrality proponents aren’t techies. They are lawyers and academics. They maximize moral outrage, while ignoring technical details.
Another example in Oliver’s show is whether search engines like Google or the (hypothetical) Bing can pay to get faster access to customers. They already do that. The average distance a packet travels on the web is less than 100-miles. That’s because the biggest companies (Google, Facebook, Netflix, etc.) pay to put servers in your city close to you. Smaller companies, such as search engine DuckDuckGo.com, also pay third-party companies like Akamai or Amazon Web Services to get closer to you. The smallest companies, however, get poor performance, being a thousand miles away.
You can test this out for yourself. Run a packet-sniffer on your home network for a week, then for each address, use mapping tools like ping and traceroute to figure out how far away things are.
The Oliver bit mentioned how Verizon banned Google Wallet. Again, technical details are important here. It had nothing to do with Net Neutrality issues blocking network packets, but only had to do with Verizon-branded phones blocking access to the encrypted enclave. You could use Google Wallet on unlocked phones you bought separately. Moreover, market forces won in the end, with Google Wallet (aka. Android Wallet) now the preferred wallet on their network. In other words, this incident shows that the “free market” fixes things in the long run without the heavy hand of government.
Oliver shows a piece where FCC chief Ajit Pai points out that Internet companies didn’t do evil without Net Neutrality rules, and thus NetNeutrality rules were unneeded. Oliver claimed this was a “disingenuous” argument. No, it’s not “disingenuous”, it entirely the point of why Net Neutrality is bad. It’s chasing theoretical possibility of abuse, not the real thing. Sure, Internet companies will occasionally go down misguided paths. If it’s truly bad, customers will rebel. In some cases, it’s not actually a bad thing, and will end up being a benefit to customers (e.g. throttling BitTorrent during primetime would benefit most BitTorrent users). It’s the pro-NetNeutrality side that’s being disingenuous, knowingly trumping up things as problems that really aren’t.
The point is this. The argument here is a complicated one, between reasonable sides. For humor, John Oliver has created a one-sided debate that falls apart under any serious analysis. Those like the EFF should not mistake such humor for intelligent technical debate.

Visualize Big Data with Amazon QuickSight, Presto, and Apache Spark on Amazon EMR

Post Syndicated from Luis Wang original https://aws.amazon.com/blogs/big-data/visualize-big-data-with-amazon-quicksight-presto-and-apache-spark-on-amazon-emr/

Last December, we introduced the Amazon Athena connector in Amazon QuickSight, in the Derive Insights from IoT in Minutes using AWS IoT, Amazon Kinesis Firehose, Amazon Athena, and Amazon QuickSight post.

The connector allows you to visualize your big data easily in Amazon S3 using Athena’s interactive query engine in a serverless fashion. This turned out to be a very popular combination, as customers benefit from the speed, agility, and cost benefit that serverless business intelligence (BI) and analytics architecture brings.

Today, we’re excited to announce two new native connectors in QuickSight for big data analytics: Presto and Spark. With the Presto and SparkSQL connector in QuickSight, you can easily create interactive visualizations over large datasets using Amazon EMR.

EMR provides a simple and cost effective way to run highly distributed processing frameworks such as Presto and Spark when compared to on-premises deployments. EMR provides you with the flexibility to define specific compute, memory, storage, and application parameters and optimize your analytic requirements.

In this post, I walk you through connecting QuickSight to an EMR cluster running Presto. If you’d like a walkthrough with Spark, let us know in the comments section!

Presto overview

Presto is an open source, distributed SQL query engine for running interactive analytic queries against data sources ranging from gigabytes to petabytes. It supports the ANSI SQL standard, including complex queries, aggregations, joins, and window functions. Presto can run on multiple data sources, including Amazon S3.

Presto’s execution framework is fundamentally different from that of Hive/MapReduce. Presto has a custom query and execution engine where the stages of execution are pipelined, similar to a directed acyclic graph (DAG), and all processing occurs in memory to reduce disk I/O. This pipelined execution model can run multiple stages in parallel and streams data from one stage to another as the data becomes available. This reduces end-to-end latency and makes Presto a great tool for ad hoc data exploration over large data sets.

Walkthrough

Use the following steps to connect QuickSight to an EMR cluster running Presto:

  1. Create an EMR cluster with the latest 5.5.0 release.
  2. Configure LDAP for user authentication in QuickSight.
  3. Configure SSL using a QuickSight supported certificate authority (CA).
  4. Create tables for Presto in the Hive metastore.
  5. Whitelist the QuickSight IP address range in your EMR master security group rules.
  6. Connect QuickSight to Presto and create some visualizations.

Prerequisites

You need run Presto version 0.167, at a minimum, which is the first release that supports LDAP authentication. LDAP authentication is a requirement for the Presto and Spark connectors and QuickSight refuses to connect if LDAP is not configured on your cluster.

Create an EMR cluster with release version 5.5.0

In the EMR console, use the Quick Create option to create a cluster.  For this post, use most of the default settings with a few exceptions. To install both Presto and Spark on your cluster (and customize other settings), create your cluster from the Advanced Options wizard instead.

Make sure that EMR release 5.5.0 is selected and under Applications, choose Presto. If you have an EC2 key pair, you can use it. Otherwise, create a key pair (.PEM file) and then return to this page to create the cluster. 

Make sure that you configure your cluster’s security group inbound rules to allow SSH from your machine’s IP address range. 

Configure LDAP for user authentication in QuickSight

After your cluster is in a running state, connect using SSH to your cluster to configure LDAP authentication.

To SSH into your EMR cluster, use the following commands in the terminal:

chmod 600 ~/YOUR_PEM_FILE.pem
ssh -i ~/YOUR_PEM_FILE.pem [email protected]_MASTER_PUBLIC_DNS_FROM_EMR_CLUSTER

After you log in, install OpenLDAP, configure it, and create users in the directory. For more about configuring LDAP, see Editing /etc/openldap/slapd.conf in the OpenLDAP documentation.

# Install LDAP Server
 sudo yum install openldap openldap-servers openldap-clients
# Create the config files
sudo cp /usr/share/openldap-servers/DB_CONFIG.example /var/lib/ldap/DB_CONFIG
sudo cp /usr/share/openldap-servers/slapd.conf.obsolete /etc/openldap/slapd.conf
# Bounce LDAP
sudo service slapd restart

After LDAP is installed and restarted, you issue a couple of commands to change the LDAP password. First, generate a hash for the LDAP root password and save the output hash that looks like this:

{SSHA}DmD616c3yZyKndsccebZK/vmWiaQde83

Issue the following command and set a root password for LDAP when prompted:

slappasswd

Now, prepare the commands to set the password for the LDAP root. Make sure to replace the hash below with the one that you generated in the previous step:

cat > /tmp/config.ldif <<EOF
dn: olcDatabase={0}config,cn=config
changetype: modify
add: olcRootPW
olcRootPW: {SSHA}DmD616c3yZyKndsccebZK/vmWiaQde83

dn: olcDatabase={2}bdb,cn=config
changetype: modify
add: olcRootPW
olcRootPW: {SSHA}DmD616c3yZyKndsccebZK/vmWiaQde83
-
replace: olcRootDN
olcRootDN: cn=dev,dc=example,dc=com
-
replace: olcSuffix
olcSuffix: dc=example,dc=com
EOF

Run the following command to execute the above commands against LDAP:

sudo ldapadd -Y EXTERNAL -H ldapi:/// -f /tmp/ config.ldif

Next, create a user account with password in the LDAP directory with the following commands. When prompted for a password, use the LDAP root password that you created in the previous step.

cat > /tmp/accounts.ldif <<EOF
dn: dc=example,dc=com
objectclass: domain
objectclass: top
dc: example

dn: ou= dev,dc=example,dc=com
objectclass: organizationalUnit
ou: dev
description: Container for developer entries

dn: uid=<REPLACE_WITH_YOUR_USER_NAME>,ou=dev,dc=example,dc=com
uid: <REPLACE_WITH_YOUR_USER_NAME>
objectClass: inetOrgPerson
userPassword: <REPLACE_WITH_STRONG_PASSWORD>
sn: <REPLACE_WITH_SURNAME>
cn: dev
EOF

ldapadd -D "cn=dev,dc=example,dc=com" -W -f /tmp/accounts.ldif

You now have OpenLDAP configured on your EMR cluster running Presto and a user that you later use to authenticate against when connecting to Presto.

Configure SSL using a QuickSight supported certificate authority

To ensure that any communication between QuickSight and Presto is secured, QuickSight requires that the connection to be established with SSL enabled. You need to obtain a certificate from a certificate authority (CA) that QuickSight trusts. You can find the full list of public CAs accepted by QuickSight in the Network and Database Configuration Requirements topic.

To set up SSL on LDAP and Presto, obtain the following three SSL certificate files from your CA and store them in the /home/hadoop/ directory.

  • Certificate key file
  • Certificate file
  • CA certificate

Configure the keys in LDAP with the following commands:

cat > /tmp/ca.ldif <<EOF
dn: cn=config
replace: olcTLSCertificateKeyFile
olcTLSCertificateKeyFile: /home/hadoop/certificateKey.pem

replace: olcTLSCertificateFile
olcTLSCertificateFile: /home/hadoop/certificate.pem

replace: olcTLSCACertificateFile
olcTLSCACertificateFile: /home/hadoop/cacertificate.pem
EOF

sudo ldapmodify -Y EXTERNAL -H ldapi:/// -f /tmp/ca.ldif

Now, enable SSL in LDAP by editing the /etc/sysconfi/ldap file and set SLAPD_LDAPS=yes:

sudo vi /etc/sysconfig/ldap

SLAPD_LDAPS=yes

sudo service slapd restart

Use the following commands to generate keystore. You will be prompted to provide a password for the keystore.

openssl pkcs12 -inkey certificatekey.pem -in certificate.pem -export -out server-key.p12

keytool -importkeystore -srckeystore server-key.p12 -srcstoretype PKCS12 -destkeystore server.keystore

Edit the configuration files for Presto in EMR.

SERVERNAME=<PUBLIC_DNS_NAME_OF_EMR_CLUSTER>
cd /etc/presto/conf

# Enable LDAPS auth for Presto
echo http-server.authentication.type=LDAP | sudo tee -a config.properties
echo authentication.ldap.url=ldaps://${SERVERNAME}:636 | sudo tee -a config.properties
echo authentication.ldap.user-bind-pattern=uid=\${USER},OU=dev,DC=example,DC=com | sudo tee -a config.properties

# Enable SSL for the Presto server
echo http-server.https.enabled=true | sudo tee -a config.properties
echo http-server.https.port=<PORT_NUMBER> | sudo tee -a config.properties
echo http-server.https.keystore.path=/home/hadoop/server.keystore | sudo tee -a config.properties
echo http-server.https.keystore.key=<KEYSTORE_PASSWORD> | sudo tee -a config.properties

# Bounce Presto to pick up the new config
sudo pkill presto
# wait until presto is up
while [[ 1 ]]; do pgrep presto; if [ $? -eq 0 ]; then break; else echo -n .; sleep 1; fi; done

Create tables for Presto in the Hive metastore

Now that you have a running EMR cluster with Presto and LDAP set up, you can load some sample data into the cluster for analysis. Use the same CloudFront log sample data set that is available for Athena. The following SQL query creates a table in EMR and loads the sample data set into it:

# Run hive
$hive

#Create table and load data
CREATE EXTERNAL TABLE IF NOT EXISTS cloudfront_logs (
  Date Date,
  Time STRING,
  Location STRING,
  Bytes INT,
  RequestIP STRING,
  Method STRING,
  Host STRING,
  Uri STRING,
  Status INT,
  Referrer STRING,
  OS String,
  Browser String,
  BrowserVersion String
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
  "input.regex" = "^(?!#)([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+[^\()+[\()([^\;]+).*\%20([^\/]+)[\/](.*)$"
) LOCATION 's3://athena-examples/cloudfront/plaintext/'; 

# quit hive
quit

Try to query the data using the Presto CLI with the following commands:

#Run the Presto CLI
presto-cli --server https://<PUBLIC_DNS_NAME_OF_EMR_CLUSTER>:<PORT_NUMER>     --user <USERNAME> --password --catalog hive

#Issue query to Presto
SELECT * FROM cloudfront_logs limit 10;

You should see an output from Presto like the following:

Whitelist the QuickSight IP address range in your EMR master security group rules

Now you’re ready to connect QuickSight to Presto. For QuickSight to connect to Presto, you need to make sure that Presto is reachable by QuickSight’s public endpoints by adding QuickSight’s IP address ranges to your EMR master node security group.

Connect QuickSight to Presto and create some visualizations

If you have not already signed up for QuickSight, you can do so at https://quicksight.aws. QuickSight offers a 1 user and 1 GB perpetual free tier.

After you’re signed up for QuickSight, navigate to the New Analysis page and the New Data Set page. You see the new Presto and Spark connector as in the following screenshot.

Open the Presto connector, provide the connection details in the modal window, and choose Create data source.

Select the default schema and choose the cloudfront_logs table that you just created.

In QuickSight, you can choose between importing the data in SPICE for analysis or directly querying your data in Presto. SPICE is an in-memory optimized columnar engine in QuickSight that enable fast, interactive visualization as you explore your data. For this post, choose to import the data into SPICE and choose Visualize.

In the analysis view, you can see the notification that shows import is complete with 4996 rows imported. On the left, you see the list of fields available in the data set and below, the various types of visualizations from which you can choose.

QuickSight makes it easy for you to create visualizations and analyze data with AutoGraph, a feature that automatically selects the best visualization for you based on selected fields.

To create a visualization, select the fields on the left panel. In this case, look at the number of connections to CloudFront ordered by the various OS types, by selecting the OS field. Additionally, you can select the bytes fields to look at total bytes transferred by OS instead of count.

Summary

You just finished creating an EMR cluster, setting up Presto and LDAP with SSL, and using QuickSight to visualize your data. I hope this post was helpful. Feel free to reach out if you have any questions or suggestions.

Learn more

To learn more about these capabilities and start using them in your dashboards, check out the QuickSight User Guide.

Stay engaged

If you have questions and suggestions, you can post them on the QuickSight forum. 

Not a QuickSight user

Go to the QuickSight website to get started for FREE.

 

FBI’s Comey dangerous definition of "valid" journalism

Post Syndicated from Robert Graham original http://blog.erratasec.com/2017/05/fbis-comey-dangerous-definition-of.html

The First Amendment, the “freedom of speech” one, does not mention journalists. When it says “freedom of the press” it means the physical printing press. Yes, that does include newspapers, but it also includes anybody else publishing things, such as the famous agitprop pamphlets published by James Otis, John Dickinson, and Thomas Paine. There was no journalistic value to Thomas Paine’s Common Sense. The pamphlet argued for abolishing the monarchy and for American independence.

Today in testimony before congress, FBI directory James Comey came out in support of journalism, pointing out that they would not prosecute journalists doing their jobs. But he then modified his statement, describing “valid” journalists as those who in possession of leaks would first check with the government, to avoid publishing anything that would damage national security. It’s a power the government has abused in the past to delay or censor leaks. It’s specifically why Edward Snowden contacted Glenn Greenwald and Laura Poitras — he wanted journalists who would not kowtow the government on publishing the leaks.

Comey’s testimony today was in regards to prosecuting Assange and Wikileaks. Under the FBI’s official “journalist” classification scheme, Wikileaks are not real journalists, but instead publish “intelligence porn” and are hostile to America’s interests.

To be fair, there may be good reasons to prosecute Assange. Publishing leaks is one thing, but the suspicion with Wikileaks is that they do more, that they actively help getting the leaks in the first place. The original leaks that started Wikileaks may have come from hacks by Assange himself. Assange may have helped Manning grab the diplomatic cables. Wikileaks may have been involved in hacking the DNC and Podesta emails, more than simply receiving and publishing the information.

If that’s the case, then the US government would have good reason to prosecute Wikileaks.

But that’s not what Comey said today. Instead, Comey referred only to Wikileaks constitutionally protected publishing activities, and how since they didn’t fit his definition of “journalism”, they were open to prosecution. This is fundamentally wrong, and a violation of the both the spirit and the letter of the First Amendment. The FBI should not have a definition of “journalism” it thinks is valid. Yes, Assange is an anti-American douchebag. Being an apologist for Putin’s Russia disproves his claim of being a neutral journalist targeting the corrupt and powerful. But these activities are specifically protected by the Constitution.

If this were 1776, Comey would of course be going after Thomas Paine, for publishing “revolution porn”, and not being a real journalist.