Tag Archives: ip address

Copyright Troll Piracy ‘Witness’ Went Back to the Future – and Lost

Post Syndicated from Andy original https://torrentfreak.com/copyright-troll-piracy-witness-went-back-to-the-future-and-lost-170526/

Since the early 2000s, copyright trolls have been attempting to squeeze cash from pirating Internet users and fifteen years later the practice is still going strong.

While there’s little doubt that trolls catch some genuine infringers in their nets, the claim that actions are all about protecting copyrights is a shallow one. The aim is to turn piracy into profit and history has shown us that the bigger the operation, the more likely it is they’ll cut corners to cut costs.

The notorious Guardaley trolling operation is a prime example. After snaring the IP addresses of hundreds of thousands of Internet users, the company extracts cash settlements in the United States, Europe and beyond. It’s a project of industrial scale based on intimidation of alleged infringers. But, when those people fight back, the scary trolls suddenly become less so.

The latest case of Guardaley running for the hills comes courtesy of SJD from troll-watching site FightCopyrightTrolls, who reports on an attempt by Guardaley partner Criminal Productions to extract settlement from Zach Bethke, an alleged downloader of the Ryan Reynolds movie, Criminal.

On May 12, Bethke’s lawyer, J. Christopher Lynch, informed Criminal Productions’ lawyer David A. Lowe that Bethke is entirely innocent.

“Neither Mr. Bethke nor his girlfriend copied your client’s movie and they do not know who, if anyone, may have done so,” Lynch wrote.

“Mr. Bethke does not use BitTorrent. Prior to this lawsuit, Mr. Bethke had never heard of your client’s movie and he has no interest in it. If he did have any interest in it, he could have rented it for no marginal cost using his Netflix or Amazon Prime accounts.”

Lynch went on to request that Criminal Productions drop the case. Failing that, he said, things would probably get more complicated. As reported last year, Lynch and Lowe have been regularly locking horns over these cases, with Lynch largely coming out on top.

Part of Lynch’s strategy has been to shine light on Guardaley’s often shadowy operations. He previously noted that its investigators were not properly licensed to operate in the U.S. and the company had been found to put forward a fictitious witness, among other things.

In the past, these efforts to bring Guardaley out into the open have resulted in its clients’, which include several film companies, dropping cases. Lynch, it appears, wants that to happen again in Bethke’s case, noting in his letter that it’s “long past due for a judge to question the qualifications” of the company’s so-called technical experts.

In doing so he calls Guardaley’s evidence into account once more, noting inconsistencies in the way alleged infringements were supposedly “observed” by “foreign investigator[s], with a direct financial interest in the matter.”

One of Lynch’s findings is that the “observations” of two piracy investigators overlap each others’ monitoring periods in separate cases, while reportedly monitoring the same torrent hash.

“Both declarations cover the same ‘hash number’ of the movie, i.e. the same soak. This overlap seems impossible if we stick with the fictions of the Complaint and Motion for Expedited Discovery that the declarant ‘observed’ the defendant ‘infringing’,” Lynch notes.

While these are interesting points, the quality of evidence presented by Guardaley and Criminal Productions is really called into question following another revelation. Daniel Macek, an ‘observing’ investigator used in numerous Guardaley cases, apparently has a unique talent.

As seen from the image below, the alleged infringements relating to Mr. Bethke’s case were carried out between June 25 and 28, 2016.

However, the declaration (pdf) filed with the Court on witness Macek’s behalf was signed and dated either June 14 or 16, more than a week before the infringements allegedly took place.

Time-traveler? Lynch thinks not.

“How can a witness sign a declaration that he observed something BEFORE it happened?” he writes.

“Criminal Productions submitted four such Declarations of Mr. Macek that were executed BEFORE the dates of the accompanying typed up list of observations that Mr. Macek swore that he made.

“Unless Daniel Macek is also Marty McFly, it is impossible to execute a declaration claiming to observe something that has yet to happen.”

So what could explain this strange phenomenon? Lynch believes he’s got to the bottom of that one too.

After comparing all four Macek declarations, he found that aside from the case numbers, the dates and signatures were identical. Instead of taking the issue of presenting evidence before the Court seriously, he believes Criminal Productions and partner Guardaley have been taking short cuts.

“From our review, it appears these metaphysical Macek declarations are not just temporally improper, they are also photocopies, including the signatures not separately executed,” he notes.

“We are astonished by your client’s foreign representatives’ apparent lack of respect for our federal judicial system. Use of duplicate signatures from a witness testifying to events that have yet to happen is on the same level of horror as the use of a fictitious witness and ‘his’ initials as a convenience to obtain subpoenas.”

Not entirely unexpectedly, five days later the case against Bethke and other defendants was voluntarily dismissed (pdf), indicating once again that like vampires, trolls do not like the light. Other lawyers defending similar cases globally should take note.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

IPTV Providers Counter Premier League Piracy Blocks

Post Syndicated from Andy original https://torrentfreak.com/iptv-providers-counter-premier-league-piracy-blocks-170520/

In the UK, top tier football is handled by The Premier League and its broadcasting partners Sky and BT Sport. All are facing problems with Internet piracy.

In a nutshell, official subscriptions are far from cheap, so people are always on the lookout for more affordable alternatives. As a result, large numbers of fans are turning to piracy-enabled set-top boxes for their fix.

These devices, often running Kodi with third-party addons, not only provide free or cheap football streams but also enable fans to watch matches at 3pm on Saturdays, a time traditionally covered by the blackout.

To mitigate this threat, earlier this year the Premier League obtained a rather special High Court injunction.

While similar in its aims to earlier orders targeting torrent sites including The Pirate Bay, this injunction enables the Premier League to act quickly, forcing local ISPs such as Sky, BT, and Virgin to block football streams in real-time.

“This will enable us to target the suppliers of illegal streams to IPTV boxes, and the internet, in a proportionate and precise manner,” the Premier League said at the time.

Ever since the injunction was issued, TF has monitored for signs that it has been achieving its stated aim of stopping or at least reducing stream availability. Based on information obtained from several popular IPTV suppliers, after several weeks we have concluded that Premier League streams are still easy to find, with some conditions.

HD sources for games across all Sky channels are commonplace on paid services, with SD sources available for free. High-quality streams have been consistently available on Saturday afternoons for the sensitive 3pm kick-off, with little to no interference or signs of disruption.

Of course, the Internet is a very big place, so it is certainly possible that disruption has been experienced by users elsewhere. However, what we do know is that some IPTV providers have been working behind the scenes to keep their services going.

According to a low-level contact at one IPTV provider who demanded total anonymity, servers used by his ‘company’ (he uses the term loosely) have seen their loads drop unexpectedly during match times, an indication that ISPs might be targeting their customers with blocks.

A re-seller for another well-known provider told TF that some intermittent disruption had been felt but that it was “being handled” as and when it “becomes a problem.” Complaint levels from customers are not yet considered a concern, he added.

That the Premier League’s efforts are having at least some effect doesn’t appear to be in doubt, but it’s pretty difficult to find evidence in public. That being said, an IPTV provider whose identity we were asked to conceal has taken more easily spotted measures.

After Premier League matches got underway this past Tuesday night, the provider in question launched a new beta service in its Kodi addon. Perhaps unsurprisingly, it allows users to cycle through proxy servers in order to bypass blocks put in place by ISPs on behalf of the Premier League.

Embedded proxy service in Kodi

As seen from the image above, the beta unblocking service is accessible via the service’s Kodi addon and requires no special skills to operate. Simply clicking on the “Find a Proxy to Use” menu item opens up the page below.

The servers used to bypass the blocks

Once a working proxy is found, access to the streams is facilitated indirectly, thereby evading the Premier League’s attempts at blocking IP addresses at the UK’s ISPs. Once that’s achieved, the list of streams is accessible again.

Sky Sports streams ready, in HD

The use of proxies for this kind of traffic is of interest, at least as far as the injunction goes.

What we know already is that the Premier League only has permission to block servers if it “reasonably believes” they have the “sole or predominant purpose of enabling or facilitating access to infringing streams of Premier League match footage.”

If any server “is being used for any other substantial purpose”, the football organization cannot block it, meaning that non-dedicated or multi-function proxies cannot be blocked by ISPs, legally at least.

On Thursday evening, however, a TF source monitoring a popular IPTV provider using proxies reported that the match between Southampton and Manchester United suddenly became blocked. Whether that was due to Premier League action is unclear but by using a VPN, usual service was restored.

The use of VPNs with IPTV services raises other issues, however. All Premier League blockades can be circumvented with the use of a VPN but many IPTV providers are known for being intolerant of them, since they can also be used by restreamers to ‘pirate’ their service.

The Premier League injunction came into force on March 18, 2017, and will run out this weekend when the football season ends.

It’s reasonable to presume that the period will have been used for testing and that the Premier League will be back in court again this year seeking a further injunction for the new season starting in August. Expect it to be more effective than it has been thus far.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

How to Update AWS CloudHSM Devices and Client Instances to the Software and Firmware Versions Supported by AWS

Post Syndicated from Tracy Pierce original https://aws.amazon.com/blogs/security/how-to-update-aws-cloudhsm-devices-and-client-instances-to-the-software-and-firmware-versions-supported-by-aws/

As I explained in my previous Security Blog post, a hardware security module (HSM) is a hardware device designed with the security of your data and cryptographic key material in mind. It is tamper-resistant hardware that prevents unauthorized users from attempting to pry open the device, plug in any extra devices to access data or keys such as subtokens, or damage the outside housing. The HSM device AWS CloudHSM offers is the Luna SA 7000 (also called Safenet Network HSM 7000), which is created by Gemalto. Depending on the firmware version you install, many of the security properties of these HSMs will have been validated under Federal Information Processing Standard (FIPS) 140-2, a standard issued by the National Institute of Standards and Technology (NIST) for cryptography modules. These standards are in place to protect the integrity and confidentiality of the data stored on cryptographic modules.

To help ensure its continued use, functionality, and support from AWS, we suggest that you update your AWS CloudHSM device software and firmware as well as the client instance software to current versions offered by AWS. As of the publication of this blog post, the current non-FIPS-validated versions are 5.4.9/client, 5.3.13/software, and 6.20.2/firmware, and the current FIPS-validated versions are 5.4.9/client, 5.3.13/software, and 6.10.9/firmware. (The firmware version determines FIPS validation.) It is important to know your current versions before updating so that you can follow the correct update path.

In this post, I demonstrate how to update your current CloudHSM devices and client instances so that you are using the most current versions of software and firmware. If you contact AWS Support for CloudHSM hardware and application issues, you will be required to update to these supported versions before proceeding. Also, any newly provisioned CloudHSM devices will use these supported software and firmware versions only, and AWS does not offer “downgrade options.

Note: Before you perform any updates, check with your local CloudHSM administrator and application developer to verify that these updates will not conflict with your current applications or architecture.

Overview of the update process

To update your client and CloudHSM devices, you must use both update paths offered by AWS. The first path involves updating the software on your client instance, also known as a control instance. Following the second path updates the software first and then the firmware on your CloudHSM devices. The CloudHSM software must be updated before the firmware because of the firmware’s dependencies on the software in order to work appropriately.

As I demonstrate in this post, the correct update order is:

  1. Updating your client instance
  2. Updating your CloudHSM software
  3. Updating your CloudHSM firmware

To update your client instance, you must have the private SSH key you created when you first set up your client environment. This key is used to connect via SSH protocol on port 22 of your client instance. If you have more than one client instance, you must repeat this connection and update process on each of them. The following diagram shows the flow of an SSH connection from your local network to your client instances in the AWS Cloud.

Diagram that shows the flow of an SSH connection from your local network to your client instances in the AWS Cloud

After you update your client instance to the most recent software (5.3.13), you then must update the CloudHSM device software and firmware. First, you must initiate an SSH connection from any one client instance to each CloudHSM device, as illustrated in the following diagram. A successful SSH connection will have you land at the Luna shell, denoted by lunash:>. Second, you must be able to initiate a Secure Copy (SCP) of files to each device from the client instance. Because the software and firmware updates require an elevated level of privilege, you must have the Security Officer (SO) password that you created when you initialized your CloudHSM devices.

Diagram illustrating the initiation of an SSH connection from any one client instance to each CloudHSM device

After you have completed all updates, you can receive enhanced troubleshooting assistance from AWS, if you need it. When new versions of software and firmware are released, AWS performs extensive testing to ensure your smooth transition from version to version.

Detailed guidance for updating your client instance, CloudHSM software, and CloudHSM firmware

1.  Updating your client instance

Let’s start by updating your client instances. My client instance and CloudHSM devices are in the eu-west-1 region, but these steps work the same in any AWS region. Because Gemalto offers client instances in both Linux and Windows, I will cover steps to update both. I will start with Linux. Please note that all commands should be run as the “root” user.

Updating the Linux client

  1. SSH from your local network into the client instance. You can do this from Linux or Windows. Typically, you would do this from the directory where you have stored your private SSH key by using a command like the following command in a terminal or PuTTY This initiates the SSH connection by pointing to the path of your SSH key and denoting the user name and IP address of your client instance.
    ssh –i /Users/Bob/Keys/CloudHSM_SSH_Key.pem ec2-user@1.1.1.1

  1. After the SSH connection is established, you must stop all applications and services on the instance that are using the CloudHSM device. This is required because you are removing old software and installing new software in its place. After you have stopped all applications and services, you can move on to remove the existing version of the client software.
    /usr/safenet/lunaclient/bin/uninstall.sh

This command will remove the old client software, but will not remove your configuration file or certificates. These will be saved in the Chrystoki.conf file of your /etc directory and your usr/safenet/lunaclient/cert directory. Do not delete these files because you will lose the configuration of your CloudHSM devices and client connections.

  1. Download the new client software package: cloudhsm-safenet-client. Double-click it to extract the archive.
    SafeNet-Luna-client-5-4-9/linux/64/install.sh

    Make sure you choose the Luna SA option when presented with it. Because the directory where your certificates are installed is the same, you do not need to copy these certificates to another directory. You do, however, need to ensure that the Chrystoki.conf file, located at /etc/Chrystoki.conf, has the same path and name for the certificates as when you created them. (The path or names should not have changed, but you should still verify they are same as before the update.)

  1. Check to ensure that the PATH environment variable points to the directory, /usr/safenet/lunaclient/bin, to ensure no issues when you restart applications and services. The update process for your Linux client Instance is now complete.

Updating the Windows client

Use the following steps to update your Windows client instances:

  1. Use Remote Desktop Protocol (RDP) from your local network into the client instance. You can accomplish this with the RDP application of your choice.
  2. After you establish the RDP connection, stop all applications and services on the instance that are using the CloudHSM device. This is required because you will remove old software and install new software in its place or overwrite If your client software version is older than 5.4.1, you need to completely remove it and all patches by using Programs and Features in the Windows Control Panel. If your client software version is 5.4.1 or newer, proceed without removing the software. Your configuration file will remain intact in the crystoki.ini file of your C:\Program Files\SafeNet\Lunaclient\ directory. All certificates are preserved in the C:\Program Files\SafeNet\Lunaclient\cert\ directory. Again, do not delete these files, or you will lose all configuration and client connection data.
  3. After you have completed these steps, download the new client software: cloudhsm-safenet-client. Extract the archive from the downloaded file, and launch the SafeNet-Luna-client-5-4-9\win\64\Lunaclient.msi Choose the Luna SA option when it is presented to you. Because the directory where your certificates are installed is the same, you do not need to copy these certificates to another directory. You do, however, need to ensure that the crystoki.ini file, which is located at C:\Program Files\SafeNet\Lunaclient\crystoki.ini, has the same path and name for the certificates as when you created them. (The path and names should not have changed, but you should still verify they are same as before the update.)
  4. Make one last check to ensure the PATH environment variable points to the directory C:\Program Files\SafeNet\Lunaclient\ to help ensure no issues when you restart applications and services. The update process for your Windows client instance is now complete.

2.  Updating your CloudHSM software

Now that your clients are up to date with the most current software version, it’s time to move on to your CloudHSM devices. A few important notes:

  • Back up your data to a Luna SA Backup device. AWS does not sell or support the Luna SA Backup devices, but you can purchase them from Gemalto. We do, however, offer the steps to back up your data to a Luna SA Backup device. Do not update your CloudHSM devices without backing up your data first.
  • If the names of your clients used for Network Trust Link Service (NTLS) connections has a capital “T” as the eighth character, the client will not work after this update. This is because of a Gemalto naming convention. Before upgrading, ensure you modify your client names accordingly. The NTLS connection uses a two-way digital certificate authentication and SSL data encryption to protect sensitive data transmitted between your CloudHSM device and the client Instances.
  • The syslog configuration for the CloudHSM devices will be lost. After the update is complete, notify AWS Support and we will update the configuration for you.

Now on to updating the software versions. There are actually three different update paths to follow, and I will cover each. Depending on the current software versions on your CloudHSM devices, you might need to follow all three or just one.

Updating the software from version 5.1.x to 5.1.5

If you are running any version of the software older than 5.1.5, you must first update to version 5.1.5 before proceeding. To update to version 5.1.5:

  1. Stop all applications and services that access the CloudHSM device.
  2. Download the Luna SA software update package.
  3. Extract all files from the archive.
  4. Run the following command from your client instance to copy the lunasa_update-5.1.5-2.spkg file to the CloudHSM device.
    $ scp –I <private_key_file> lunasa_update-5.1.5-2.spkg manager@<hsm_ip_address>:

    <private_key_file> is the private portion of your SSH key pair and <hsm_ip_address> is the IP address of your CloudHSM elastic network interface (ENI). The ENI is the network endpoint that permits access to your CloudHSM device. The IP address was supplied to you when the CloudHSM device was provisioned.

  1. Use the following command to connect to your CloudHSM device and log in with your Security Officer (SO) password.
    $ ssh –I <private_key_file> manager@<hsm_ip_address>
    
    lunash:> hsm login

  1. Run the following commands to verify and then install the updated Luna SA software package.
    lunash:> package verify lunasa_update-5.1.5-2.spkg –authcode <auth_code>
    
    lunash:> package update lunasa_update-5.1.5-2.spkg –authcode <auth_code>

    The value you will use for <auth_code> is contained in the lunasa_update-5.1.5-2.auth file found in the 630-010165-018_reva.tar archive you downloaded in Step 2.

  1. Reboot the CloudHSM device by running the following command.
    lunash:> sysconf appliance reboot

When all the steps in this section are completed, you will have updated your CloudHSM software to version 5.1.5. You can now move on to update to version 5.3.10.

Updating the software to version 5.3.10

You can update to version 5.3.10 only if you are currently running version 5.1.5. To update to version 5.3.10:

  1. Stop all applications and services that access the CloudHSM device.
  2. Download the v 5.3.10 Luna SA software update package.
  3. Extract all files from the archive.
  4. Run the following command to copy the lunasa_update-5.3.10-7.spkg file to the CloudHSM device.
    $ scp –i <private_key_file> lunasa_update-5.3.10-7.spkg manager@<hsm_ip_address>:

    <private_key_file> is the private portion of your SSH key pair and <hsm_ip_address> is the IP address of your CloudHSM ENI.

  1. Run the following command to connect to your CloudHSM device and log in with your SO password.
    $ ssh –i <private_key_file> manager@<hsm_ip_address>
    
    lunash:> hsm login

  1. Run the following commands to verify and then install the updated Luna SA software package.
    lunash:> package verify lunasa_update-5.3.10-7.spkg –authcode <auth_code>
    
    lunash:> package update lunasa_update-5.3.10-7.spkg –authcode <auth_code>

The value you will use for <auth_code> is contained in the lunasa_update-5.3.10-7.auth file found in the SafeNet-Luna-SA-5-3-10.zip archive you downloaded in Step 2.

  1. Reboot the CloudHSM device by running the following command.
    lunash:> sysconf appliance reboot

When all the steps in this section are completed, you will have updated your CloudHSM software to version 5.3.10. You can now move on to update to version 5.3.13.

Note: Do not configure your applog settings at this point; you must first update the software to version 5.3.13 in the following step.

Updating the software to version 5.3.13

You can update to version 5.3.13 only if you are currently running version 5.3.10. If you are not already running version 5.3.10, follow the two update paths mentioned previously in this section.

To update to version 5.3.13:

  1. Stop all applications and services that access the CloudHSM device.
  2. Download the Luna SA software update package.
  3. Extract all files from the archive.
  4. Run the following command to copy the lunasa_update-5.3.13-1.spkg file to the CloudHSM device.
    $ scp –i <private_key_file> lunasa_update-5.3.13-1.spkg manager@<hsm_ip_address>

<private_key_file> is the private portion of your SSH key pair and <hsm_ip_address> is the IP address of your CloudHSM ENI.

  1. Run the following command to connect to your CloudHSM device and log in with your SO password.
    $ ssh –i <private_key_file> manager@<hsm_ip_address>
    
    lunash:> hsm login

  1. Run the following commands to verify and then install the updated Luna SA software package.
    lunash:> package verify lunasa_update-5.3.13-1.spkg –authcode <auth_code>
    
    lunash:> package update lunasa_update-5.3.13-1.spkg –authcode <auth_code>

The value you will use for <auth_code> is contained in the lunasa_update-5.3.13-1.auth file found in the SafeNet-Luna-SA-5-3-13.zip archive that you downloaded in Step 2.

  1. When updating to this software version, the option to update the firmware also is offered. If you do not require a version of the firmware validated under FIPS 140-2, accept the firmware update to version 6.20.2. If you do require a version of the firmware validated under FIPS 140-2, do not accept the firmware update and instead update by using the steps in the next section, “Updating your CloudHSM FIPS 140-2 validated firmware.”
  2. After updating the CloudHSM device, reboot it by running the following command.
    lunash:> sysconf appliance reboot

  1. Disable NTLS IP checking on the CloudHSM device so that it can operate within its VPC. To do this, run the following command.
    lunash:> ntls ipcheck disable

When all the steps in this section are completed, you will have updated your CloudHSM software to version 5.3.13. If you don’t need the FIPS 140-2 validated firmware, you will have also updated the firmware to version 6.20.2. If you do need the FIPS 140-2 validated firmware, proceed to the next section.

3.  Updating your CloudHSM FIPS 140-2 validated firmware

To update the FIPS 140-2 validated version of the firmware to 6.10.9, use the following steps:

  1. Download version 6.10.9 of the firmware package.
  2. Extract all files from the archive.
  3. Run the following command to copy the 630-010430-010_SPKG_LunaFW_6.10.9.spkg file to the CloudHSM device.
    $ scp –i <private_key_file> 630-010430-010_SPKG_LunaFW_6.10.9.spkg manager@<hsm_ip_address>:

<private_key_file> is the private portion of your SSH key pair, and <hsm_ip_address> is the IP address of your CloudHSM ENI.

  1. Run the following command to connect to your CloudHSM device and log in with your SO password.
    $ ssh –i <private_key_file> manager#<hsm_ip_address>
    
    lunash:> hsm login

  1. Run the following commands to verify and then install the updated Luna SA firmware package.
    lunash:> package verify 630-010430-010_SPKG_LunaFW_6.10.9.spkg –authcode <auth_code>
    
    lunash:> package update 630-010430-010_SPKG_LunaFW_6.10.9.spkg –authcode <auth_code>

The value you will use for <auth_code> is contained in the 630-010430-010_SPKG_LunaFW_6.10.9.auth file found in the 630-010430-010_SPKG_LunaFW_6.10.9.zip archive that you downloaded in Step 1.

  1. Run the following command to update the firmware of the CloudHSM devices.
    lunash:> hsm update firmware

  1. After you have updated the firmware, reboot the CloudHSM devices to complete the installation.
    lunash:> sysconf appliance reboot

Summary

In this blog post, I walked you through how to update your existing CloudHSM devices and clients so that they are using supported client, software, and firmware versions. Per AWS Support and CloudHSM Terms and Conditions, your devices and clients must use the most current supported software and firmware for continued troubleshooting assistance. Software and firmware versions regularly change based on customer use cases and requirements. Because AWS tests and validates all updates from Gemalto, you must install all updates for firmware and software by using our package links described in this post and elsewhere in our documentation.

If you have comments about this blog post, submit them in the “Comments” section below. If you have questions about implementing this solution, please start a new thread on the CloudHSM forum.

– Tracy

News from the AWS Summit in Berlin – 3rd AZ & Lightsail in Frankfurt and Another Polly Voice

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/news-from-the-aws-summit-in-berlin-3rd-az-lightsail-in-frankfurt-and-another-polly-voice/

We launched the AWS Region in Frankfurt in the fall of 2014 and opened the AWS Marketplace for the Region the next year.

Our customers in Germany come in all shapes and sizes: startups, mid-market, enterprise, and public sector. These customers have made great use of the new Region, building and running applications and businesses that serve Germany, Europe, and more. They rely on the broad collection of security features, certifications, and assurances provided by AWS to help protect and secure their customer data, in accord with internal and legal requirements and regulations. Our customers in Germany also take advantage of the sales, support, and architecture resources and expertise located in Berlin, Dresden, and Munich.

The AWS Summit in Berlin is taking place today and we made some important announcements from the stage. Here’s a summary:

  • Third Availability Zone in Frankfurt
  • Amazon Lightsail in Frankfurt
  • New voice for Amazon Polly

Third Availability Zone in Frankfurt
We will be opening an additional Availability Zone (AZ) in the EU (Frankfurt) Region in mid-2017 in response to the continued growth in the use of AWS. This brings us up to 43 Availability Zones within 16 geographic Regions around the world. We are also planning to open five Availability Zones in new AWS Regions in France and China later this year (see the AWS Global Infrastructure maps for more information).

AWS customers in Germany are already making plans to take advantage of the new AZ. For example:

Siemens expects to gain additional flexibility by mirroring their services across all of the AZs. It will also allow them to store all of their data in Germany.

Zalando will do the same, mirroring their services across all of the AZs and looking ahead to moving more applications to the cloud.

Amazon Lightsail in Frankfurt
Amazon Lightsail lets you launch a virtual machine preconfigured with SSD storage, DNS management, and a static IP address in a matter of minutes (read Amazon Lightsail – The Power of AWS, the Simplicity of a VPS to learn more).

Amazon Lightsail is now available in the EU (Frankfurt) Region and you can start using it today. This allows you to use it to host applications that are required to store customer data or other sensitive information in Germany.

New Voice for Amazon Polly
Polly gives you high-quality, natural-sounding male and female speech in multiple languages. Today we are adding another German-speaking female voice to Polly, bringing the total number of voices to 48:

Like the German voice of Alexa, Vicki (the new voice) is fluent and natural. Vicki is able to fluently and intelligently pronounce the Anglicisms frequently used in German texts, including the fully inflected versions. To get started with Polly, open up the Polly Console or read the Polly Documentation.

I’m looking forward to hearing more about the continued growth and success of our customers in and around Germany!

Jeff;

EU Votes Today On Content Portability to Reduce Piracy (Updated)

Post Syndicated from Andy original https://torrentfreak.com/eu-votes-today-on-content-portability-to-reduce-piracy-170518/

Being a fully-paid up customer of a streaming service such as Spotify or Netflix should be a painless experience, but for citizens of the EU, complexities exist.

Subscribers of Netflix, for example, have access to different libraries, depending on where they’re located. This means that a viewer in the Netherlands could begin watching a movie at home, travel to France for a weekend break, and find on arrival that the content he paid for is not available there.

A similar situation can arise with a UK citizen’s access to BBC’s iPlayer. While he has free access to the service he previously paid for while at home, travel to Spain for a week and access is denied, since the service believes he’s not entitled to view.

While the EU is fiercely protective of its aim to grant free movement to both people and goods, this clearly hasn’t always translated well to the digital domain. There are currently no explicit provisions under EU law which mandate cross-border portability of online content services.

Following a vote today, however, all that may change.

In a few hours time, Members of the European Parliament will vote on whether to introduce new ‘Cross-border portability’ rules (pdf), that will give citizens the freedom to enjoy their media wherever they are in the EU, without having to resort to piracy.

“If you live for instance in Germany but you go on holiday or visit your family or work in Spain, you will be able to access the services that you had in Germany in any other country in the Union, because the text covers the EU,” says Jean-Marie Cavada, the French ALDE member responsible for steering the new rules through Parliament.

But while freedom to receive content is the aim, there will be a number of restrictions in practice. While travelers to other EU countries will get access to the same content they would back home on the same range of devices, it will only be available on a temporary basis.

People traveling on a holiday, business, or study trip will enjoy the freedom to consume “for a limited period.” Extended stays will not be catered for under the new rules so as not to upset licensing arrangements already in place between rightsholders and service providers.

So how will the system work in practice?

At the moment, services like Netflix use the current IP address of the subscriber to determine where they are and therefore which regional library they’ll have access to when they sign in.

It appears that a future system would have to consider in which country the user signed up, before checking to ensure that the user trying to access the service in another EU country is the same person. That being said, if copyright holders agree, service providers can omit the verification process.

“The draft text to be voted on calls for safeguarding measures to be included in the regulation to ensure that the data and privacy of users are respected throughout the verification process,” European Parliament news said this week.

If adopted, the new rules would come into play during the first six months of 2018 and would apply to subscriptions already in place.

Separately, MEPs are also considering new rules on geo-blocking “to ensure that online sellers do not discriminate against consumers” because of where they live in the EU.

Update: The vote has passed. Here is the full statement by Vice-President for the Digital Single Market, Andrus Ansip.

I welcome today’s positive vote of the European Parliament on the portability of online content across borders, following the agreement reached between the European Parliament, Council and Commission at the beginning of the year.

I warmly thank the European Parliament rapporteur Jean-Marie Cavada for his work in achieving this and look forward to final approval by Member States in the coming weeks.

The rules voted today mean that, as of the beginning of next year, people who have subscribed to their favourite series, music and sports events at home will be able to enjoy them when they travel in the European Union.

Combined with the end of roaming charges, it means that watching films or listening to music while on holiday abroad will not bring any additional costs to people who use mobile networks.

This is an important step in breaking down barriers in the Digital Single Market.

We now need agreements on our other proposals to modernise EU copyright rules and ensure wider access to creative content across borders and fairer rules for creators. I rely on the European Parliament and Member States to make swift progress to make this happen.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

ISP Bombarded With 82,000+ Demands to Reveal Alleged Pirates

Post Syndicated from Andy original https://torrentfreak.com/isp-bombarded-with-82000-demands-to-reveal-alleged-pirates-170513/

It was once a region where people could share files without fear of reprisal, but over the years Scandinavia has become a hotbed of ‘pirate’ prosecutions.

Sweden, in particular, has seen many sites shut down and their operators sentenced, notably those behind The Pirate Bay but also more recent cases such as those against DreamFilm and Swefilmer.

To this backdrop, members of the public have continued to share files, albeit in decreasing numbers. However, at the same time copyright trolls have hit countries like Sweden, Finland, and Denmark, hoping to scare alleged file-sharers into cash settlements.

This week regional ISP Telia revealed that the activity has already reached epidemic proportions.

Under the EU IPR Enforcement Directive (IPRED), Internet service providers are required to hand over the personal details of suspected pirates to copyright holders, if local courts deem that appropriate. Telia says it is now being bombarded with such demands.

“Telia must adhere to court decisions. At the same time we have a commitment to respect the privacy of our customers and therefore to be transparent,” the company says.

“While in previous years Telia has normally received less than ten such [disclosure] requests per market, per year, lately the number of requests has increased significantly.”

The scale is huge. The company reports that in Sweden during the past year alone, it has been ordered to hand over the identities of subscribers behind more than 45,000 IP addresses.

In Finland during the same period, court orders covered almost 37,000 IP addresses. Four court orders in Denmark currently require the surrendering of data on “hundreds” of customers.

Telia says that a Danish law firm known as Njord Law is behind many of the demands. The company is connected to international copyright trolls operating out of the United States, United Kingdom, and elsewhere.

“A Danish law firm (NJORD Law firm), representing the London-based copyright holder Copyright Management Services Ltd, was recently (2017-01-31) granted a court order forcing Telia Sweden to disclose to the law firm the subscriber identities behind 25,000 IP-addresses,” the company notes.

Copyright Management Services Ltd was incorporated in the UK during October 2014. Its sole director is Patrick Achache, who also operates German-based BitTorrent tracking company MaverickEye. Both are part of the notorious international trolling operation Guardaley.

Copyright Management Services, which is based at the same London address as fellow UK copyright-trolling partner Hatton and Berkeley, filed accounts in June 2016 claiming to be a dormant company. Other than that, it has never filed any financial information.

Copyright Management Services will be legally required to publish more detailed accounts next time around, since the company is now clearly trading, but its role in this operation is far from clear. For its part, Telia hopes the court has done the necessary checking when handing information over to partner firm, Njord Law.

“Telia assumes that the courts perform adequate assessments of the evidence provided by the above law firm, and also that the courts conduct a sufficient assessment of proportionality between copyright and privacy,” the company says.

“Telia does not know what the above law firm intends to do with the large amount of customer data which they are now collecting.”

While that statement from Telia is arguably correct, it doesn’t take a genius to work out where this is going. Every time that these companies can match an IP address to an account holder, they will receive a letter in the mail demanding a cash settlement. Anything that substantially deviates from this outcome would be a very surprising development indeed.

In the meantime, Jon Karlung, the outspoken boss of ISP Bahnhof, has pointed out that if Telia didn’t store customer IP addresses in the first place, it wouldn’t have anything to hand out to copyright trolls.

“Bahnhof does not store this data – and we can’t give out something we do not have. The same logic should apply to Telia,” he said.

Bahnhof says it stores customer data including IP addresses for 24 hours, just long enough to troubleshoot technical issues but nowhere near long enough to be useful to trolls.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Judge Threatens to Bar ‘Copyright Troll’ Cases Over Lacking IP-location Evidence

Post Syndicated from Ernesto original https://torrentfreak.com/judge-threatens-to-bar-copyright-troll-cases-over-lacking-ip-location-evidence-170212/

While relatively underreported, many U.S. district courts are still swamped with lawsuits against alleged film pirates.

The copyright holders who initiate these cases generally rely on an IP address as evidence. This information is collected from BitTorrent swarms and linked to a geographical location using geolocation tools.

With this information in hand, they then ask the courts to grant a subpoena, directing Internet providers to hand over the personal details of the associated account holders.

Malibu Media, the Los Angeles-based company behind the ‘X-Art’ adult movies, is behind most of these cases. The company has filed thousands of lawsuits in recent years, targeting Internet subscribers whose accounts were allegedly used to share Malibu’s films via BitTorrent.

Increasingly, judges around the country have grown wary of these litigation efforts. This includes US Federal Judge William Alsup, who’s tasked with handling all such cases in the Northern District of California.

Responding to a recent request, Judge Alsup highlights the fact that Malibu filed a “monsoon” of hundreds of lawsuits over the past 18 months, but later dismissed many of them after without specifying a reason.

The judge is skeptical about the motivation for these dismissals. In particular, because courts have previously highlighted that Maxmind’s geolocation tools, which are cited in the complaints, may not be entirely accurate. This could mean that the cases have been filed in the wrong court.

“Malibu Media’s voluntary dismissal without prejudice of groups of its cases is not a new pattern. A sizable portion of the cases from previous waves were terminated in the same way,” Judge Alsup writes (pdf).

“The practice has just become more frequent, and it follows skepticism by the undersigned judge and others around the country about the accuracy of the Maxmind database,” he adds.

This is not the first time that geolocation tools have been called into doubt and to move the accuracy claims beyond Maxmind’s own “hearsay,” Judge Alsup now demands extra evidence.

In his order he denies the request to continue a case management conference in one of their cases. Instead, he will use that hearing to address the geolocation issues. In addition, all Malibu cases in the district may be barred if the accuracy of these tools isn’t “fully vetted.”

“That request is DENIED. Instead, Malibu Media is hereby ordered to SHOW CAUSE at that hearing, why the Court should not bar further Malibu Media cases in this district until the accuracy of the geolocation technology is fully vetted,” the order reads.

“To be clear, this order applies even if Malibu Media voluntarily dismisses this action,” Judge Alsup adds.

Denied

SJD, who follows the developments closely and first reported on the order, suspects that the IP-address ‘error rate’ may in fact be higher than most people believe. She therefore recommends defense lawyer to depose ISP employees to get to the bottom of the issue.

“If you are a defense attorney who litigates one of the BitTorrent infringement cases, I suggest deposing a Comcast employee tasked with subpoena processing. I suspect that the error rate is much higher than trolls want everyone to believe, and such testimony has a potential to become a heavy weapon in every troll victim’s arsenal,” SJD says.

In any case, it’s no secret that geolocation databases are far from perfect. Most are not updated instantly, which means that the information could be outdated, and other entries are plainly inaccurate.

This is something the residents of a Kansas farm know all too well, as their house is the default location of 600 million IP-addresses, which causes them quite a bit of trouble.

It will be interesting to see if Malibu will make any efforts to properly “vet” Maxmind’s database. It’s clear, however, that Judge Alsup will not let the company use his court before fully backing up their claims.

To be continued.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

New – USASpending.gov on an Amazon RDS Snapshot

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-usaspending-gov-on-an-amazon-rds-snapshot/

My colleague Jed Sundwall runs the AWS Public Datasets program. He wrote the guest post below to tell you about an important new dataset that is available as an Amazon RDS Snapshot. In the post, Jed introduces the dataset and shows you how to create an Amazon RDS DB Instance from the snapshot.

Jeff;


I am very excited to announce that, starting today, the entire public USAspending.gov database is available for anyone to copy via Amazon Relational Database Service (RDS). USAspending.gov data includes data on all spending by the federal government, including contracts, grants, loans, employee salaries, and more. The data is available via a PostgreSQL snapshot, which provides bulk access to the entire USAspending.gov database, and is updated nightly. At this time, the database includes all USAspending.gov for the second quarter of fiscal year 2017, and data going back to the year 2000 will be added over the summer. You can learn more about the database and how to access it on its AWS Public Dataset landing page.

Through the AWS Public Datasets program, we work with AWS customers to experiment with ways that the cloud can make data more accessible to more people. Most of our AWS Public Datasets are made available through Amazon S3 because of its tremendous flexibility and ability to scale to serve any volume of any kind of data files. What’s exciting about the USAspending.gov database is that it provides a great example of how Amazon RDS can be used to share an entire relational database quickly and easily. Typically, sharing a relational database requires extract, transfer, and load (ETL) processes that require redundant storage capacity, time for data transfer, and often scripts to migrate your database schema from one database engine to another. ETL processes can be so intimidating and cumbersome that they’re effectively impossible for many people to carry out.

By making their data available as a public Amazon RDS snapshot, the team at USASPending.gov has made it easy for anyone to get a copy of their entire production database for their own use within minutes. This will be useful for researchers and businesses who want to work with real data about all US Government spending and quickly combine it with their own data or other data resources.

Deploying the USASpending.gov Database Using the AWS Management Console
Let’s go through the steps involved in deploying the database in your AWS account using the AWS Management Console.

  1. Sign in to the AWS Management Console and select the US East (N. Virginia) region in the menu bar.
  2. Open the Amazon RDS Console and choose Snapshots in the navigation pane.
  3. In the filter for the search bar, select All Public Snapshots and search for 515495268755:
  4. Select the snapshot named arn:aws:rds:us-east-1:515495268755:snapshot:usaspending-db.
  5. Select Snapshot Actions -> Restore Snapshot. Select an instance size, and enter the other details, then click on Restore DB Instance.
  6. You will see that a DB Instance is being created from the snapshot, within your AWS account.
  7. After a few minutes, the status of the instance will change to Available.
  8. You can see the endpoint for your database on the main page along with other useful info:

Deploying the USASpending.gov Database Using the AWS CLI
You can also install the AWS Command Line Interface (CLI) and use it to create a DB Instance from the snapshot. Here’s a sample command:

$ aws rds restore-db-instance-from-db-snapshot --db-instance-identifier my-test-db-cli \
  --db-snapshot-identifier arn:aws:rds:us-east-1:515495268755:snapshot:usaspending-db \
  --region us-east-1

This will give you an ARN (Amazon Resource Name) that you can use to reference the DB Instance. For example:

$ aws rds describe-db-instances \
  --db-instance-identifier arn:aws:rds:us-east-1:917192695859:db:my-test-db-cli

This command will display the Endpoint.Address that you use to connect to the database.

Connecting to the DB Instance
After following the AWS Management Console or AWS CLI instructions above, you will have access to the full USAspending.gov database within this Amazon RDS DB instance, and you can connect to it using any PostgreSQL client using the following credentials:

  • Username: root
  • Password: password
  • Database: data_store_api

If you use psql, you can access the database using this command:

$ psql -h my-endpoint.rds.amazonaws.com -U root -d data_store_api

You should change the database password after you log in:

ALTER USER "root" WITH ENCRYPTED PASSWORD '{new password}';

If you can’t connect to your instance but think you should be able to, you may need to check your VPC Security Groups and make sure inbound and outbound traffic on the port (usually 5432) is allowed from your IP address.

Exploring the Data
The USAspending.gov data is very rich, so it will be hard to do it justice in this blog post, but hopefully these queries will give you an idea of what’s possible. To learn about the contents of the database, please review the USAspending.gov Data Dictionary.

The following query will return the total amount of money the government is obligated to pay for contracts awarded by NASA that include “Mars” or “Martian” in the description of the award:

select sum(total_obligation) from awards, subtier_agency 
  where (awards.description like '% MARTIAN %' OR awards.description like '% MARS %') 
  AND subtier_agency.name = 'National Aeronautics and Space Administration';

As I write this, the result I get for this query is $55,411,025.42. Note that the database is updated nightly and will include more historical data in the coming months, so you may get a different result if you run this query.

Now, here’s the same query, but looking for awards with “Jupiter” or “Jovian” in the description:

select sum(total_obligation) from awards, subtier_agency
  where (awards.description like '%JUPITER%' OR awards.description like '%JOVIAN%') 
  AND subtier_agency.name = 'National Aeronautics and Space Administration';

The result I get is $14,766,392.96.

Questions & Comments
I’m looking forward to seeing what people can do with this data. If you have any questions about the data, please create an issue on the USAspending.gov API’s issue tracker on GitHub.

— Jed

scanless – A Public Port Scan Scraper

Post Syndicated from Darknet original http://feedproxy.google.com/~r/darknethackers/~3/BzB9c8HkhZo/

scanless is a Python-based command-line utility that functions as a public port scan scraper, it can use websites that can perform port scans on your behalf. This is useful for early stages of penetration tests when you’d like to run a port scan on a host without having it originate from your IP address. Public […]

The post scanless –…

Read the full post at darknet.org.uk

ISP Lands Supreme Court Win Over Copyright Trolls

Post Syndicated from Andy original https://torrentfreak.com/isp-lands-supreme-court-win-over-copyright-trolls-170505/

Every day, millions of people use BitTorrent to obtain free movies, TV shows, and music but many aren’t aware that their activities can be monitored. Most monitoring is relatively benign but there are companies out there who make a living from threatening to sue file-sharers.

These so-called ‘copyright trolls’ share files along with regular users, capture their IP addresses and trace them back to their ISPs. From there, ISPs are asked to hand over the alleged pirates’ names and addresses so trolls can extract a cash settlement from them, but most ISPs demand a court process before doing so.

Over in Norway, a company called Scanbox Entertainment hired German anti-piracy outfit Excipio to track people sharing the movie ‘The Captive’. Between November 27 and December 1, 2015, the company reportedly found eight customers of telecoms giant Telenor doing so. While the numbers are small, initial cases are often presented this way to attract less attention in advance of bigger moves.

During December 2015, Scanbox sent a request to the Oslo District Court to force Telenor to hand over its subscribers’ information. It also asked the Court to prevent the ISP from deleting or anonymizing logs that could identify the alleged infringers.

In May 2016 Scanbox won its case, and Telenor was ordered to hand over the names and postal addresses of its subscribers. However, determined to protect its customers’ privacy (now and for similar cases in the future), the ISP filed an appeal.

At the Court of Appeal in September 2016, the tables were turned when it was decided that Telenor wouldn’t have to hand over the personal information of its customers after all. The evidence of the alleged infringements failed to show that any sharing was substantial.

But after coming this far and with lots of potential settlement payments at stake, Scanbox refused to give in, taking its case all the way to the Supreme Court where a panel of judges was asked to issue a definitive ruling. The decision just handed down by the Court is bad news for Scanbox.

In essence, the Court weighed Scanbox’s right to protect copyright versus Norwegian citizens’ right to privacy. If the former is to trump the latter, then any copyright infringements must be of a serious nature. The panel of judges at the Supreme Court felt that the evidence presented against Telenor’s customers was not good enough to prove infringement beyond the threshold. The panel, therefore, upheld the earlier decision of the Court of Appeal.

Torgeir Waterhouse of Internet interest group ICT Norway says that online privacy should always be respected and not disregarded as the rightsholders and their law firm, Denmark-based Njord Law, would like.

“This is not about enforcing copyright, this is about what methods are acceptable to use within the law,” Waterhouse says.

“This is an important decision that sends an important message to the licensees and Njord Law that the rule of law can not be set aside in their eagerness to deal with illegal file-sharing. We are very pleased that Njord’s frivolous activity has been stopped. We expect licensees to act responsibly and respect both privacy and the rule of law.”

ScanBox is now required to pay Telenor almost $70,000 in costs, a not insignificant amount that should give reason to pause before future trolling efforts get underway in Norway.

Full decision (Norwegian, pdf)

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Visualize Big Data with Amazon QuickSight, Presto, and Apache Spark on Amazon EMR

Post Syndicated from Luis Wang original https://aws.amazon.com/blogs/big-data/visualize-big-data-with-amazon-quicksight-presto-and-apache-spark-on-amazon-emr/

Last December, we introduced the Amazon Athena connector in Amazon QuickSight, in the Derive Insights from IoT in Minutes using AWS IoT, Amazon Kinesis Firehose, Amazon Athena, and Amazon QuickSight post.

The connector allows you to visualize your big data easily in Amazon S3 using Athena’s interactive query engine in a serverless fashion. This turned out to be a very popular combination, as customers benefit from the speed, agility, and cost benefit that serverless business intelligence (BI) and analytics architecture brings.

Today, we’re excited to announce two new native connectors in QuickSight for big data analytics: Presto and Spark. With the Presto and SparkSQL connector in QuickSight, you can easily create interactive visualizations over large datasets using Amazon EMR.

EMR provides a simple and cost effective way to run highly distributed processing frameworks such as Presto and Spark when compared to on-premises deployments. EMR provides you with the flexibility to define specific compute, memory, storage, and application parameters and optimize your analytic requirements.

In this post, I walk you through connecting QuickSight to an EMR cluster running Presto. If you’d like a walkthrough with Spark, let us know in the comments section!

Presto overview

Presto is an open source, distributed SQL query engine for running interactive analytic queries against data sources ranging from gigabytes to petabytes. It supports the ANSI SQL standard, including complex queries, aggregations, joins, and window functions. Presto can run on multiple data sources, including Amazon S3.

Presto’s execution framework is fundamentally different from that of Hive/MapReduce. Presto has a custom query and execution engine where the stages of execution are pipelined, similar to a directed acyclic graph (DAG), and all processing occurs in memory to reduce disk I/O. This pipelined execution model can run multiple stages in parallel and streams data from one stage to another as the data becomes available. This reduces end-to-end latency and makes Presto a great tool for ad hoc data exploration over large data sets.

Walkthrough

Use the following steps to connect QuickSight to an EMR cluster running Presto:

  1. Create an EMR cluster with the latest 5.5.0 release.
  2. Configure LDAP for user authentication in QuickSight.
  3. Configure SSL using a QuickSight supported certificate authority (CA).
  4. Create tables for Presto in the Hive metastore.
  5. Whitelist the QuickSight IP address range in your EMR master security group rules.
  6. Connect QuickSight to Presto and create some visualizations.

Prerequisites

You need run Presto version 0.167, at a minimum, which is the first release that supports LDAP authentication. LDAP authentication is a requirement for the Presto and Spark connectors and QuickSight refuses to connect if LDAP is not configured on your cluster.

Create an EMR cluster with release version 5.5.0

In the EMR console, use the Quick Create option to create a cluster.  For this post, use most of the default settings with a few exceptions. To install both Presto and Spark on your cluster (and customize other settings), create your cluster from the Advanced Options wizard instead.

Make sure that EMR release 5.5.0 is selected and under Applications, choose Presto. If you have an EC2 key pair, you can use it. Otherwise, create a key pair (.PEM file) and then return to this page to create the cluster. 

Make sure that you configure your cluster’s security group inbound rules to allow SSH from your machine’s IP address range. 

Configure LDAP for user authentication in QuickSight

After your cluster is in a running state, connect using SSH to your cluster to configure LDAP authentication.

To SSH into your EMR cluster, use the following commands in the terminal:

chmod 600 ~/YOUR_PEM_FILE.pem
ssh -i ~/YOUR_PEM_FILE.pem hadoop@YOUR_MASTER_PUBLIC_DNS_FROM_EMR_CLUSTER

After you log in, install OpenLDAP, configure it, and create users in the directory. For more about configuring LDAP, see Editing /etc/openldap/slapd.conf in the OpenLDAP documentation.

# Install LDAP Server
 sudo yum install openldap openldap-servers openldap-clients
# Create the config files
sudo cp /usr/share/openldap-servers/DB_CONFIG.example /var/lib/ldap/DB_CONFIG
sudo cp /usr/share/openldap-servers/slapd.conf.obsolete /etc/openldap/slapd.conf
# Bounce LDAP
sudo service slapd restart

After LDAP is installed and restarted, you issue a couple of commands to change the LDAP password. First, generate a hash for the LDAP root password and save the output hash that looks like this:

{SSHA}DmD616c3yZyKndsccebZK/vmWiaQde83

Issue the following command and set a root password for LDAP when prompted:

slappasswd

Now, prepare the commands to set the password for the LDAP root. Make sure to replace the hash below with the one that you generated in the previous step:

cat > /tmp/config.ldif <<EOF
dn: olcDatabase={0}config,cn=config
changetype: modify
add: olcRootPW
olcRootPW: {SSHA}DmD616c3yZyKndsccebZK/vmWiaQde83

dn: olcDatabase={2}bdb,cn=config
changetype: modify
add: olcRootPW
olcRootPW: {SSHA}DmD616c3yZyKndsccebZK/vmWiaQde83
-
replace: olcRootDN
olcRootDN: cn=dev,dc=example,dc=com
-
replace: olcSuffix
olcSuffix: dc=example,dc=com
EOF

Run the following command to execute the above commands against LDAP:

sudo ldapadd -Y EXTERNAL -H ldapi:/// -f /tmp/ config.ldif

Next, create a user account with password in the LDAP directory with the following commands. When prompted for a password, use the LDAP root password that you created in the previous step.

cat > /tmp/accounts.ldif <<EOF
dn: dc=example,dc=com
objectclass: domain
objectclass: top
dc: example

dn: ou= dev,dc=example,dc=com
objectclass: organizationalUnit
ou: dev
description: Container for developer entries

dn: uid=<REPLACE_WITH_YOUR_USER_NAME>,ou=dev,dc=example,dc=com
uid: <REPLACE_WITH_YOUR_USER_NAME>
objectClass: inetOrgPerson
userPassword: <REPLACE_WITH_STRONG_PASSWORD>
sn: <REPLACE_WITH_SURNAME>
cn: dev
EOF

ldapadd -D "cn=dev,dc=example,dc=com" -W -f /tmp/accounts.ldif

You now have OpenLDAP configured on your EMR cluster running Presto and a user that you later use to authenticate against when connecting to Presto.

Configure SSL using a QuickSight supported certificate authority

To ensure that any communication between QuickSight and Presto is secured, QuickSight requires that the connection to be established with SSL enabled. You need to obtain a certificate from a certificate authority (CA) that QuickSight trusts. You can find the full list of public CAs accepted by QuickSight in the Network and Database Configuration Requirements topic.

To set up SSL on LDAP and Presto, obtain the following three SSL certificate files from your CA and store them in the /home/hadoop/ directory.

  • Certificate key file
  • Certificate file
  • CA certificate

Configure the keys in LDAP with the following commands:

cat > /tmp/ca.ldif <<EOF
dn: cn=config
replace: olcTLSCertificateKeyFile
olcTLSCertificateKeyFile: /home/hadoop/certificateKey.pem

replace: olcTLSCertificateFile
olcTLSCertificateFile: /home/hadoop/certificate.pem

replace: olcTLSCACertificateFile
olcTLSCACertificateFile: /home/hadoop/cacertificate.pem
EOF

sudo ldapmodify -Y EXTERNAL -H ldapi:/// -f /tmp/ca.ldif

Now, enable SSL in LDAP by editing the /etc/sysconfi/ldap file and set SLAPD_LDAPS=yes:

sudo vi /etc/sysconfig/ldap

SLAPD_LDAPS=yes

sudo service slapd restart

Use the following commands to generate keystore. You will be prompted to provide a password for the keystore.

openssl pkcs12 -inkey certificatekey.pem -in certificate.pem -export -out server-key.p12

keytool -importkeystore -srckeystore server-key.p12 -srcstoretype PKCS12 -destkeystore server.keystore

Edit the configuration files for Presto in EMR.

SERVERNAME=<PUBLIC_DNS_NAME_OF_EMR_CLUSTER>
cd /etc/presto/conf

# Enable LDAPS auth for Presto
echo http-server.authentication.type=LDAP | sudo tee -a config.properties
echo authentication.ldap.url=ldaps://${SERVERNAME}:636 | sudo tee -a config.properties
echo authentication.ldap.user-bind-pattern=uid=\${USER},OU=dev,DC=example,DC=com | sudo tee -a config.properties

# Enable SSL for the Presto server
echo http-server.https.enabled=true | sudo tee -a config.properties
echo http-server.https.port=<PORT_NUMBER> | sudo tee -a config.properties
echo http-server.https.keystore.path=/home/hadoop/server.keystore | sudo tee -a config.properties
echo http-server.https.keystore.key=<KEYSTORE_PASSWORD> | sudo tee -a config.properties

# Bounce Presto to pick up the new config
sudo pkill presto
# wait until presto is up
while [[ 1 ]]; do pgrep presto; if [ $? -eq 0 ]; then break; else echo -n .; sleep 1; fi; done

Create tables for Presto in the Hive metastore

Now that you have a running EMR cluster with Presto and LDAP set up, you can load some sample data into the cluster for analysis. Use the same CloudFront log sample data set that is available for Athena. The following SQL query creates a table in EMR and loads the sample data set into it:

# Run hive
$hive

#Create table and load data
CREATE EXTERNAL TABLE IF NOT EXISTS cloudfront_logs (
  Date Date,
  Time STRING,
  Location STRING,
  Bytes INT,
  RequestIP STRING,
  Method STRING,
  Host STRING,
  Uri STRING,
  Status INT,
  Referrer STRING,
  OS String,
  Browser String,
  BrowserVersion String
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
  "input.regex" = "^(?!#)([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+[^\()+[\()([^\;]+).*\%20([^\/]+)[\/](.*)$"
) LOCATION 's3://athena-examples/cloudfront/plaintext/'; 

# quit hive
quit

Try to query the data using the Presto CLI with the following commands:

#Run the Presto CLI
presto-cli --server https://<PUBLIC_DNS_NAME_OF_EMR_CLUSTER>:<PORT_NUMER>     --user <USERNAME> --password --catalog hive

#Issue query to Presto
SELECT * FROM cloudfront_logs limit 10;

You should see an output from Presto like the following:

Whitelist the QuickSight IP address range in your EMR master security group rules

Now you’re ready to connect QuickSight to Presto. For QuickSight to connect to Presto, you need to make sure that Presto is reachable by QuickSight’s public endpoints by adding QuickSight’s IP address ranges to your EMR master node security group.

Connect QuickSight to Presto and create some visualizations

If you have not already signed up for QuickSight, you can do so at https://quicksight.aws. QuickSight offers a 1 user and 1 GB perpetual free tier.

After you’re signed up for QuickSight, navigate to the New Analysis page and the New Data Set page. You see the new Presto and Spark connector as in the following screenshot.

Open the Presto connector, provide the connection details in the modal window, and choose Create data source.

Select the default schema and choose the cloudfront_logs table that you just created.

In QuickSight, you can choose between importing the data in SPICE for analysis or directly querying your data in Presto. SPICE is an in-memory optimized columnar engine in QuickSight that enable fast, interactive visualization as you explore your data. For this post, choose to import the data into SPICE and choose Visualize.

In the analysis view, you can see the notification that shows import is complete with 4996 rows imported. On the left, you see the list of fields available in the data set and below, the various types of visualizations from which you can choose.

QuickSight makes it easy for you to create visualizations and analyze data with AutoGraph, a feature that automatically selects the best visualization for you based on selected fields.

To create a visualization, select the fields on the left panel. In this case, look at the number of connections to CloudFront ordered by the various OS types, by selecting the OS field. Additionally, you can select the bytes fields to look at total bytes transferred by OS instead of count.

Summary

You just finished creating an EMR cluster, setting up Presto and LDAP with SSL, and using QuickSight to visualize your data. I hope this post was helpful. Feel free to reach out if you have any questions or suggestions.

Learn more

To learn more about these capabilities and start using them in your dashboards, check out the QuickSight User Guide.

Stay engaged

If you have questions and suggestions, you can post them on the QuickSight forum. 

Not a QuickSight user

Go to the QuickSight website to get started for FREE.

 

How to Visualize and Refine Your Network’s Security by Adding Security Group IDs to Your VPC Flow Logs

Post Syndicated from Guy Denney original https://aws.amazon.com/blogs/security/how-to-visualize-and-refine-your-networks-security-by-adding-security-group-ids-to-your-vpc-flow-logs/

Many organizations begin their cloud journey to AWS by moving a few applications to demonstrate the power and flexibility of AWS. This initial application architecture includes building security groups that control the network ports, protocols, and IP addresses that govern access and traffic to their AWS Virtual Private Cloud (VPC). When the architecture process is complete and an application is fully functional, some organizations forget to revisit their security groups to optimize rules and help ensure the appropriate level of governance and compliance. Not optimizing security groups can create less-than-optimal security, with ports open that may not be needed or source IP ranges set that are broader than required.

Last year, I published an AWS Security Blog post that showed how to optimize and visualize your security groups. Today’s post continues in the vein of that post by using Amazon Kinesis Firehose and AWS Lambda to enrich the VPC Flow Logs dataset and enhance your ability to optimize security groups. The capabilities in this post’s solution are based on the Lambda functions available in this VPC Flow Log Appender GitHub repository.

Solution overview

Removing unused rules or limiting source IP addresses requires either an in-depth knowledge of an application’s active ports on Amazon EC2 instances or analysis of active network traffic. In this blog post, I discuss a method to:

  • Use VPC Flow Logs to capture information about the IP traffic in an Amazon VPC.
  • Enrich the VPC Flow Logs dataset with security group IDs by using Firehose and Lambda.
  • Demonstrate how to visualize and analyze network traffic from VPC Flow Logs by using Amazon Elasticsearch Service (Amazon ES).

Using this approach can help you remediate security group rules to necessary source IPs, ports, and nested security groups, helping to improve the security of your AWS resources while minimizing the potential risk to production environments.

Solution diagram

As illustrated in the preceding diagram, this is how the data flows in this model:

  1. The VPC posts its flow log data to Amazon CloudWatch Logs.
  2. The Lambda ingestor function passes the data to Firehose.
  3. Firehose then passes the data to the Lambda decorator function.
  4. The Lambda decorator function performs a number of lookups for each record and returns the data to Firehose with additional fields.
  5. Firehose then posts the enhanced dataset to the Amazon ES endpoint and any errors to Amazon S3.

The solution

Step 1: Set up your Amazon ES cluster and VPC Flow Logs

Create an Amazon ES cluster

The first step in this solution is to create an Amazon ES cluster. Do this first because it takes some time for the cluster to become available. If you are new to Amazon ES, you can learn more about it in the Amazon ES documentation.

To create an Amazon ES cluster:

  1. In the AWS Management Console, choose Elasticsearch Service under Analytics.
  2. Choose Create a new domain or Get started.
  3. Type es-flowlogs for the Elasticsearch domain name.
  4. Set Version to 1 in the drop-down list. Choose Next.
  5. Set Instance count to 2 and select the Enable zone awareness check box. (This ensures cluster stability in the event of an Availability Zone outage.) Accept the defaults for the rest of the page.
    • [Optional] If you use this domain for production purposes, I recommend using dedicated master nodes. Select the Enable dedicated master check box and select medium.elasticsearch from the Instance type drop-down list. Leave the Instance count at 3, which is the default.
  6. Choose Next.
  7. From the Set the domain access policy to drop-down list on the next page, select Allow access to the domain from specific IP(s). In the dialog box, type or paste the comma-separated list of valid IPv4 addresses or Classless Inter-Domain Routing (CIDR) blocks you would like to be able to access the Amazon ES domain.
  8. Choose Next.
  9. On the next page, choose Confirm and create.

It will take a few minutes for the cluster to be available. In the meantime, you can begin enabling VPC Flow Logs.

Enable VPC Flow Logs

VPC Flow Logs is a feature that lets you capture information about the IP traffic going to and from network interfaces in your VPC. Flow log data is stored using Amazon CloudWatch Logs. For more information about VPC Flow Logs, see VPC Flow Logs and CloudWatch Logs.

To enable VPC Flow Logs:

  1. In the AWS Management Console, choose CloudWatch under Management Tools.
  2. Click Logs in the navigation pane.
  3. From the Actions drop-down list, choose Create log group.
  4. Type Flowlogs as the Log Group Name.
  5. In the AWS Management Console, choose VPC under Networking & Content Delivery.
  6. Choose Your VPCs in the navigation pane, and select the VPC you would like to analyze. (You can also enable VPC Flow Logs on only a subnet if you do not want to enable it on the entire VPC.)
  7. Choose the Flow Logs tab in the bottom pane, and then choose Create Flow Log.
  8. In the text beneath the Role box, choose Set Up Permissions (this will open an IAM management page).
  9. Choose Allow on the IAM management page. Return to the VPC Flow Logs setup page.
  10. Choose All from the Filter drop-down list.
  11. Choose flowlogsRole from the Role drop-down list (you created this role in steps 3 and 4 in this procedure).
  12. Choose Flowlogs from the Destination Log Group drop-down list.
  13. Choose Create Flow Log.

Step 2: Set up AWS Lambda to enrich the VPC Flow Logs dataset with security group IDs

If you completed Step 1, VPC Flow Logs data is now streaming to CloudWatch Logs. Next, you will deploy two Lambda functions. The first, the ingestor function, moves the data into Firehose, and the second, the decorator function, adds three new fields to the VPC Flow Logs dataset and returns records to Firehose for delivery to Amazon ES.

The new fields added by the decorator function are:

  1. Direction – By comparing the primary IP address of the elastic network interface (ENI) in the destination IP address, you can set the direction for the IP connection.
  2. Security group IDs – Each ENI can be associated with as many as five security groups. The security group IDs are added as an array in the record.
  3. Source – This includes a number of fields that result from looking up srcaddr from a free service for geographical lookups.
    1. The Source includes:
      • source-country-code
      • source-country-name
      • source-region-code
      • source-region-name
      • source-city
      • source-location, latitude, and longitude.

Follow the instructions in this GitHub repository to deploy the two Lambda functions and the associated permissions that are required.

Step 3: Set up Firehose

Firehose is a fully managed service that allows you to transform flow log data and stream it into Amazon ES. The service scales automatically with load, and you only pay for the data transmitted through the service.

To create a Firehose delivery stream:

  1. In the AWS Management Console, choose Kinesis under Analytics.
  2. Choose Go to Firehose and then choose Create Delivery Stream.

Step 3.1: Define the destination

  1. Choose Amazon Elasticsearch Service from the Destination drop-down list.
  2. For Delivery stream name, type VPCFlowLogsToElasticSearch (the name must match the default environment variable in the ingestion Lambda function).
  3. Choose es-flowlogs from the Elasticsearch domain drop-down list. (The Amazon ES cluster configuration state needs to be Active for es-flowlogs to be available in the drop-down list.)
  4. For Index, type cwl.
  5. Choose OneDay from the Index rotation drop-down list.
  6. For Type, type log.
  7. For Backup mode, select Failed Documents Only.
  8. For S3 bucket, select New S3 bucket in the drop-down list and type a bucket name of your choice. Choose Create bucket.
  9. Choose Next.

Step 3.2: Configure Lambda

  1. Choose Enable for Data transformation.
  2. Choose vpc-flow-log-appender-dev-FlowLogDecoratorFunction-xxxxx from the Lambda function drop-down list (make sure you select the Decorator function).
  3. Choose Create/Update existing IAM role, Firehose delivery IAM roll from the IAM role drop-down list.
  4. Choose Allow. This takes you back to the Firehose Configuration.
  5. Choose Next and then choose Create Delivery Stream.

Step 4: Stream data to Firehose

The next step is to enable the data to stream from CloudWatch Logs to Firehose. You will use the Lambda ingestion function you deployed earlier: vpc-flow-log-appender-dev-FlowLogIngestionFunction-xxxxxxx.

  1. In the AWS Management Console, choose CloudWatch under Management Tools.
  2. Choose Logs in the navigation pane, and select the check box next to Flowlogs under Log Groups.
  3. From the Actions menu, choose Stream to AWS Lambda. Choose vpc-flow-log-appender-dev-FlowLogIngestionFunction-xxxxxxx (select the Ingestion function). Choose Next.
  4. Choose Amazon VPC Flow Logs from the Log Format drop-down list. Choose Next.
    Screenshot of Log Format drop-down list
  5. Choose Start Streaming.

VPC Flow Logs will now be forwarded to Firehose, capturing information about the IP traffic going to and from network interfaces in your VPC. Firehose appends additional data fields and forwards the enriched data to your Amazon ES cluster.

Data is now flowing to your Amazon ES cluster, but be patient because it can take up to 30 minutes for the data to begin appearing in your Amazon ES cluster.

Step 5: Verify that the flow log data is streaming through Firehose to the Amazon ES cluster

You should see VPC Flow Logs with ENI IDs under Log Streams (see the following screenshot) and Stored Bytes greater than zero in the CloudWatch log group.

Do you have logs from the Lambda ingestion function in the CloudWatch log group? As shown in the following screenshot, you should see START, END and REPORT records. These show that the ingestion function is running and streaming data to Firehose.

Screenshot showing logs from the Lambda ingestion function

Do you have logs from the Lambda decorator function in the CloudWatch log group? You should see START, END, and REPORT records as well as entries similar to: “Processing completed. Successful records XXX, Failed records 0.”

Screenshot showing logs from the Lambda decorator function

Do you have cwl-* indexes in the Amazon ES dashboard, as shown in the following screenshot? If you do, you are successfully streaming through Firehose and populating the Amazon ES cluster, and you are ready to proceed to Step 6. Remember, it can take up to 30 minutes for the flow logs from your workloads to begin flowing to the Amazon ES cluster.

Screenshot showing cwl-* indexes in the Amazon ES dashboard

Step 6: Using the SGDashboard to analyze VPC network traffic

You now need set up a Kibana dashboard to monitor the traffic in your VPC.

To find the Kibana URL:

  1. In the AWS Management Console, click Elasticsearch Service under Analytics.
  2. Choose es-flowlogs under Elasticsearch domain name.
  3. Click the link next to Kibana, as shown in the following screenshot.
    Screenshot showing the Kibana link

The first time you access Kibana, you will be asked to set the defaultindex. To set the defaultindex in the Amazon ES cluster:

  1. Set the Index name or pattern to cwl-*.
    Screenshot of configuring an index pattern
  2. For Time-field name, type @timestamp.
  3. Choose Create.

Load the SGDashboard:

  1. Download this JSON file and save it to your computer. The file includes a dashboard and visualizations I created for this blog post’s purposes.
  2. In Kibana, choose Management in the navigation pane, choose Saved Objects, and then import the file you just downloaded.
  3. Choose Dashboard and Open to load the SGDashboard you just imported. (You might have to press Enter in the top search box to have the dashboard load the first time.)

The following screenshot shows the SGDashboard after it has loaded.

Screenshot showing the dashboard after it has loaded

The SGDashboard is composed of a set of visualizations. Each visualization contains a view or summary of the underlying data contained in the Amazon ES cluster, as shown in the preceding screenshot. You can control the timeframe for the dashboard in the upper right corner. By clicking the timeframe, the dashboard exposes alternative timeframes that you can select.

The SGDashboard includes a list of security groups, destination ports, source IP addresses, actions, protocols, and connection directions as well as raw VPC Flow Log records. This information is useful because you can compare this to your security group configurations. Ports might be open in the security group but have no network traffic flowing to the instances on those ports, which means the corresponding rules can probably be removed. Also, by evaluating IP ranges in use, you can narrow the ranges to only those IP addresses required for the application. The following screenshot on the left shows a view of the SGDashboard for a specific security group. By comparing its accepted inbound IP addresses with the security group rules in the following screenshot on the right, you can ensure the source IP ranges are sufficiently restrictive.

Screenshot showing a view of the SGDashboard for a specific security group   Screenshot showing security group rules

Analyze VPC Flow Logs data

Amazon ES allows you to quickly view and filter VPC Flow Logs data to determine what network traffic is flowing in your VPC. This analysis requires an understanding of security groups and elastic network interfaces (ENIs). Let’s say you have two security groups associated with the same ENI, and the first security group has traffic it will register for both groups. You will still see traffic to the ENI listed in the second security group because it is allowing traffic to the ENI. Therefore, when you click a security group that you want to filter, additional groups might still be on the list because they are included in the VPC Flow Logs records.

The following screenshot on the left is a view of the SGDashboard with a security group selected (sg-978414e8). Even though that security group has a filter, two additional security groups remain in the dashboard. The following screenshot on the right shows the raw log data where each record contains all three security groups and demonstrates that all three security groups share a common set of flow log records.

Screenshot showing the SGDashboard with a security group selected   Screenshot showing raw log data

Also, note that security groups are stateful, so if the instance itself is initiating traffic to a different location, the return traffic will be displayed in the Kibana dashboard. The best example of this is port 123 Network Time Protocol (NTP). This type of traffic can be easily removed from the display by choosing the port on the right side of the dashboard, and then reversing the filter, as shown in the following screenshot. By reversing the filter, you can exclude data from the view.

Screenshot of reversing the filter on a port

Example: Unused security groups

Let’s say that some security groups are no longer in use. First, I change the time range by clicking the current time range in the top right corner of the dashboard, as shown in the following screenshot. I select Week to date.

Screenshot of changing the time range

As the following screenshot shows, the dashboard has identified five security groups that have had traffic during the week to date.

Screenshot showing five security groups that have had traffic during the week to date

As you can see in the following screenshot, I have many security groups in my test account that are not in use. Any security groups not in the SGDashboard are candidates for removal.

Example: Unused inbound rules

Let’s take a look at security group sg-63ed8c1c from the preceding screenshot. When I click sg-63ed8c1c (the security group ID) in the dashboard, a filter is applied that reduces the security groups displayed to only the records with that security group included. We can compare the traffic associated with this security group in the SGDashboard (shown in the following screenshot) to the security group rules in the EC2 console.

Screenshot showing the traffic of the sg-63ed8c1c security group

As the following screenshot of the EC2 console shows, this security group has only 2 inbound rules: one for HTTP on port 80 and one for RDP. The SGDashboard shows that traffic is not flowing on port 80, so I can safely remove that rule from the security group.

Screenshot showing this security group has only 2 inbound rules

Summary

It can be challenging to help ensure that your AWS Cloud environment allows only intended traffic and is as secure and manageable as possible. In this post, I have shown how to enable VPC Flow Logs. I then showed how to use Firehose and Lambda to add security group IDs, directions, and locations to the VPC Flow Logs dataset. The SGDashboard then enables you to analyze the flow log data and compare it with your security group configurations to improve your cloud security.

If you have comments about this blog post, submit them in the “Comments” section below. If you have implementation or troubleshooting questions about the solution in this post, please start a new thread on the AWS WAF forum.

– Guy

Canada and Switzerland Remain on US ‘Pirate Watchlist’ Under President Trump

Post Syndicated from Ernesto original https://torrentfreak.com/canada-and-switzerland-remain-on-us-pirate-watchlist-under-president-trump-170501/

ustrEvery year the Office of the United States Trade Representative (USTR) publishes its Special 301 Report highlighting countries that aren’t doing enough to protect U.S. intellectual property rights.

The format remains the same as in previous years and lists roughly two dozen countries that, for different reasons, threaten the intellectual property rights of US companies.

The latest report, which just came out, is the first under the administration of President Trump and continues where Obama left off. China, Russia, Ukraine, and India are listed among the priority threats, and Canada and Switzerland remain on the general Watch List.

“One of the top trade priorities for the Trump Administration is to use all possible sources of leverage to encourage other countries to open their markets to U.S. exports of goods and services, and provide adequate and effective protection and enforcement of U.S. intellectual property (IP) rights,” the USTR writes.

One of the main problems the US has with Canada is that it doesn’t allow border protection officials to seize or destroy pirated and counterfeit goods that are passing through.

In addition, the US is fiercely against Canada’s fair dealing rules, which adds educational use to the list of copyright infringement exceptions. According to the US, the language used in the law is too broad, damaging the rights of educational publishers.

“The United States also remains deeply troubled by the broad interpretation of an ambiguous education-related exception to copyright that has significantly damaged the market for educational publishers and authors.”

In the past, Canada has also been called out for offering a safe haven to pirate sites, but there is no mention of this in the 2017 report (pdf).

That said, pirate site hosting remains a problem in many other countries including Switzerland, with the USTR noting that the country has become an “increasingly popular host country for websites offering infringing content” since 2010.

While the Swiss Government is taking steps to address these concerns, another enforcement problem also requires attention. One of the key issues the United States has with Switzerland originates from the so-called ‘Logistep Decision.‘

In 2010 the Swiss Federal Supreme Court barred anti-piracy outfit Logistep from harvesting the IP addresses of file-sharers. The Court ruled that IP addresses amount to private data, and outlawed the tracking of file-sharers in Switzerland.

According to the US, this ruling prevents copyright holders from enforcing their rights, and they call on the Swiss Government to address this concern.

“Switzerland remains on the Watch List this year due to U.S. concerns regarding specific difficulties in Switzerland’s system of online copyright protection and enforcement,” the USTR writes.

“Seven years have elapsed since the issuance of a decision by the Swiss Federal Supreme Court, which has been implemented to essentially deprive copyright holders in Switzerland of the means to enforce their rights against online infringers. Enforcement is a critical element of providing meaningful IP protection.”

The above points are merely a selection of the many complaints the United States has about a variety of countries. As is often the case, the allegations are in large part based on reports from copyright-heavy industries, in some cases demanding measures that are not even in effect in the US itself.

By calling out foreign governments, the USTR hopes to elicit change. However, not all countries are receptive to this kind of diplomatic pressure. Canada, for one, said it does’t recognize the Special 301 Report and plans to follow its own path.

“Canada does not recognize the validity of the Special 301 and considers the process and the Report to be flawed,” the Government wrote in a previous memo regarding last year’s report.

“The Report fails to employ a clear methodology and the findings tend to rely on industry allegations rather than empirical evidence and objective analysis,” it added.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Looking at the Netgear Arlo home IP camera

Post Syndicated from Matthew Garrett original http://mjg59.dreamwidth.org/48215.html

Another in the series of looking at the security of IoT type objects. This time I’ve gone for the Arlo network connected cameras produced by Netgear, specifically the stock Arlo base system with a single camera. The base station is based on a Broadcom 5358 SoC with an 802.11n radio, along with a single Broadcom gigabit ethernet interface. Other than it only having a single ethernet port, this looks pretty much like a standard Netgear router. There’s a convenient unpopulated header on the board that turns out to be a serial console, so getting a shell is only a few minutes work.

Normal setup is straight forward. You plug the base station into a router, wait for all the lights to come on and then you visit arlo.netgear.com and follow the setup instructions – by this point the base station has connected to Netgear’s cloud service and you’re just associating it to your account. Security here is straightforward: you need to be coming from the same IP address as the Arlo. For most home users with NAT this works fine. I sat frustrated as it repeatedly failed to find any devices, before finally moving everything behind a backup router (my main network isn’t NATted) for initial setup. Once you and the Arlo are on the same IP address, the site shows you the base station’s serial number for confirmation and then you attach it to your account. Next step is adding cameras. Each base station is broadcasting an 802.11 network on the 2.4GHz spectrum. You connect a camera by pressing the sync button on the base station and then the sync button on the camera. The camera associates with the base station via WDS and now you’re up and running.

This is the point where I get bored and stop following instructions, but if you’re using a desktop browser (rather than using the mobile app) you appear to need Flash in order to actually see any of the camera footage. Bleah.

But back to the device itself. The first thing I traced was the initial device association. What I found was that once the device is associated with an account, it can’t be attached to another account. This is good – I can’t simply request that devices be rebound to my account from someone else’s. Further, while the serial number is displayed to the user to disambiguate between devices, it doesn’t seem to be what’s used internally. Tracing the logon traffic from the base station shows it sending a long random device ID along with an authentication token. If you perform a factory reset, these values are regenerated. The device to account mapping seems to be based on this random device ID, which means that once the device is reset and bound to another account there’s no way for the initial account owner to regain access (other than resetting it again and binding it back to their account). This is far better than many devices I’ve looked at.

Performing a factory reset also changes the WPA PSK for the camera network. Newsky Security discovered that doing so originally reset it to 12345678, which is, uh, suboptimal? That’s been fixed in newer firmware, along with their discovery that the original random password choice was not terribly random.

All communication from the base station to the cloud seems to be over SSL, and everything validates certificates properly. This also seems to be true for client communication with the cloud service – camera footage is streamed back over port 443 as well.

Most of the functionality of the base station is provided by two daemons, xagent and vzdaemon. xagent appears to be responsible for registering the device with the cloud service, while vzdaemon handles the camera side of things (including motion detection). All of this is running as root, so in the event of any kind of vulnerability the entire platform is owned. For such a single purpose device this isn’t really a big deal (the only sensitive data it has is the camera feed – if someone has access to that then root doesn’t really buy them anything else). They’re statically linked and stripped so I couldn’t be bothered spending any significant amount of time digging into them. In any case, they don’t expose any remotely accessible ports and only connect to services with verified SSL certificates. They’re probably not a big risk.

Other than the dependence on Flash, there’s nothing immediately concerning here. What is a little worrying is a family of daemons running on the device and listening to various high numbered UDP ports. These appear to be provided by Broadcom and a standard part of all their router platforms – they’re intended for handling various bits of wireless authentication. It’s not clear why they’re listening on 0.0.0.0 rather than 127.0.0.1, and it’s not obvious whether they’re vulnerable (they mostly appear to receive packets from the driver itself, process them and then stick packets back into the kernel so who knows what’s actually going on), but since you can’t set one of these devices up in the first place without it being behind a NAT gateway it’s unlikely to be of real concern to most users. On the other hand, the same daemons seem to be present on several Broadcom-based router platforms where they may end up being visible to the outside world. That’s probably investigation for another day, though.

Overall: pretty solid, frustrating to set up if your network doesn’t match their expectations, wouldn’t have grave concerns over having it on an appropriately firewalled network.

comment count unavailable comments

Near Zero Downtime Migration from MySQL to DynamoDB

Post Syndicated from YongSeong Lee original https://aws.amazon.com/blogs/big-data/near-zero-downtime-migration-from-mysql-to-dynamodb/

Many companies consider migrating from relational databases like MySQL to Amazon DynamoDB, a fully managed, fast, highly scalable, and flexible NoSQL database service. For example, DynamoDB can increase or decrease capacity based on traffic, in accordance with business needs. The total cost of servicing can be optimized more easily than for the typical media-based RDBMS.

However, migrations can have two common issues:

  • Service outage due to downtime, especially when customer service must be seamlessly available 24/7/365
  • Different key design between RDBMS and DynamoDB

This post introduces two methods of seamlessly migrating data from MySQL to DynamoDB, minimizing downtime and converting the MySQL key design into one more suitable for NoSQL.

AWS services

I’ve included sample code that uses the following AWS services:

  • AWS Database Migration Service (AWS DMS) can migrate your data to and from most widely used commercial and open-source databases. It supports homogeneous and heterogeneous migrations between different database platforms.
  • Amazon EMR is a managed Hadoop framework that helps you process vast amounts of data quickly. Build EMR clusters easily with preconfigured software stacks that include Hive and other business software.
  • Amazon Kinesis can continuously capture and retain a vast amount of data such as transaction, IT logs, or clickstreams for up to 7 days.
  • AWS Lambda helps you run your code without provisioning or managing servers. Your code can be automatically triggered by other AWS services such Amazon Kinesis Streams.

Migration solutions

Here are the two options I describe in this post:

  1. Use AWS DMS

AWS DMS supports migration to a DynamoDB table as a target. You can use object mapping to restructure original data to the desired structure of the data in DynamoDB during migration.

  1. Use EMR, Amazon Kinesis, and Lambda with custom scripts

Consider this method when more complex conversion processes and flexibility are required. Fine-grained user control is needed for grouping MySQL records into fewer DynamoDB items, determining attribute names dynamically, adding business logic programmatically during migration, supporting more data types, or adding parallel control for one big table.

After the initial load/bulk-puts are finished, and the most recent real-time data is caught up by the CDC (change data capture) process, you can change the application endpoint to DynamoDB.

The method of capturing changed data in option 2 is covered in the AWS Database post Streaming Changes in a Database with Amazon Kinesis. All code in this post is available in the big-data-blog GitHub repo, including test codes.

Solution architecture

The following diagram shows the overall architecture of both options.

Option 1:  Use AWS DMS

This section discusses how to connect to MySQL, read the source data, and then format the data for consumption by the target DynamoDB database using DMS.

Create the replication instance and source and target endpoints

Create a replication instance that has sufficient storage and processing power to perform the migration job, as mentioned in the AWS Database Migration Service Best Practices whitepaper. For example, if your migration involves a large number of tables, or if you intend to run multiple concurrent replication tasks, consider using one of the larger instances. The service consumes a fair amount of memory and CPU.

As the MySQL user, connect to MySQL and retrieve data from the database with the privileges of SUPER, REPLICATION CLIENT. Enable the binary log and set the binlog_format parameter to ROW for CDC in the MySQL configuration. For more information about how to use DMS, see Getting Started  in the AWS Database Migration Service User Guide.

mysql> CREATE USER 'repl'@'%' IDENTIFIED BY 'welcome1';
mysql> GRANT all ON <database name>.* TO 'repl'@'%';
mysql> GRANT SUPER,REPLICATION CLIENT  ON *.* TO 'repl'@'%';

Before you begin to work with a DynamoDB database as a target for DMS, make sure that you create an IAM role for DMS to assume, and grant access to the DynamoDB target tables. Two endpoints must be created to connect the source and target. The following screenshot shows sample endpoints.

The following screenshot shows the details for one of the endpoints, source-mysql.

Create a task with an object mapping rule

In this example, assume that the MySQL table has a composite primary key (customerid + orderid + productid). You are going to restructure the key to the desired structure of the data in DynamoDB, using an object mapping rule.

In this case, the DynamoDB table has the hash key that is a combination of the customerid and orderid columns, and the sort key is the productid column. However, the partition key should be decided by the user in an actual migration, based on data ingestion and access pattern. You would usually use high-cardinality attributes. For more information about how to choose the right DynamoDB partition key, see the Choosing the Right DynamoDB Partition Key AWS Database blog post.

DMS automatically creates a corresponding attribute on the target DynamoDB table for the quantity column from the source table because rule-action is set to map-record-to-record and the column is not listed in the exclude-columns attribute list. For more information about map-record-to-record and map-record-to-document, see Using an Amazon DynamoDB Database as a Target for AWS Database Migration Service.

Migration starts immediately after the task is created, unless you clear the Start task on create option. I recommend enabling logging to make sure that you are informed about what is going on with the migration task in the background.

The following screenshot shows the task creation page.

You can use the console to specify the individual database tables to migrate and the schema to use for the migration, including transformations. On the Guided tab, use the Where section to specify the schema, table, and action (include or exclude). Use the Filter section to specify the column name in a table and the conditions to apply.

Table mappings also can be created in JSON format. On the JSON tab, check Enable JSON editing.

Here’s an example of an object mapping rule that determines where the source data is located in the target. If you copy the code, replace the values of the following attributes. For more examples, see Using an Amazon DynamoDB Database as a Target for AWS Database Migration Service.

  • schema-name
  • table-name
  • target-table-name
  • mapping-parameters
  • attribute-mappings
{
  "rules": [
   {
      "rule-type": "selection",
      "rule-id": "1",
      "rule-name": "1",
      "object-locator": {
        "schema-name": "mydatabase",
        "table-name": "purchase"
      },
      "rule-action": "include"
    },
    {
      "rule-type": "object-mapping",
      "rule-id": "2",
      "rule-name": "2",
      "rule-action": "map-record-to-record",
      "object-locator": {
        "schema-name": "mydatabase",
        "table-name": "purchase"
 
      },
      "target-table-name": "purchase",
      "mapping-parameters": {
        "partition-key-name": "customer_orderid",
        "sort-key-name": "productid",
        "exclude-columns": [
          "customerid",
          "orderid"           
        ],
        "attribute-mappings": [
          {
            "target-attribute-name": "customer_orderid",
            "attribute-type": "scalar",
            "attribute-sub-type": "string",
            "value": "${customerid}|${orderid}"
          },
          {
            "target-attribute-name": "productid",
            "attribute-type": "scalar",
            "attribute-sub-type": "string",
            "value": "${productid}"
          }
        ]
      }
    }
  ]
}

Start the migration task

If the target table specified in the target-table-name property does not exist in DynamoDB, DMS creates the table according to data type conversion rules for source and target data types. There are many metrics to monitor the progress of migration. For more information, see Monitoring AWS Database Migration Service Tasks.

The following screenshot shows example events and errors recorded by CloudWatch Logs.

DMS replication instances that you used for the migration should be deleted once all migration processes are completed. Any CloudWatch logs data older than the retention period is automatically deleted.

Option 2: Use EMR, Amazon Kinesis, and Lambda

This section discusses an alternative option using EMR, Amazon Kinesis, and Lambda to provide more flexibility and precise control. If you have a MySQL replica in your environment, it would be better to dump data from the replica.

Change the key design

When you decide to change your database from RDMBS to NoSQL, you need to find a more suitable key design for NoSQL, for performance as well as cost-effectiveness.

Similar to option #1, assume that the MySQL source has a composite primary key (customerid + orderid + productid). However, for this option, group the MySQL records into fewer DynamoDB items by customerid (hash key) and orderid (sort key). Also, remove the last column (productid) of the composite key by converting the record values productid column in MySQL to the attribute name in DynamoDB, and setting the attribute value as quantity.

This conversion method reduces the number of items. You can retrieve the same amount of information with fewer read capacity units, resulting in cost savings and better performance. For more information about how to calculate read/write capacity units, see Provisioned Throughput.

Migration steps

Option 2 has two paths for migration, performed at the same time:

  • Batch-puts: Export MySQL data, upload it to Amazon S3, and import into DynamoDB.
  • Real-time puts: Capture changed data in MySQL, send the insert/update/delete transaction to Amazon Kinesis Streams, and trigger the Lambda function to put data into DynamoDB.

To keep the data consistency and integrity, capturing and feeding data to Amazon Kinesis Streams should be started before the batch-puts process. The Lambda function should stand by and Streams should retain the captured data in the stream until the batch-puts process on EMR finishes. Here’s the order:

  1. Start real-time puts to Amazon Kinesis Streams.
  2. As soon as real-time puts commences, start batch-puts.
  3. After batch-puts finishes, trigger the Lambda function to execute put_item from Amazon Kinesis Streams to DynamoDB.
  4. Change the application endpoints from MySQL to DynamoDB.

Step 1:  Capture changing data and put into Amazon Kinesis Streams

Firstly, create an Amazon Kinesis stream to retain transaction data from MySQL. Set the Data retention period value based on your estimate for the batch-puts migration process. For data integrity, the retention period should be enough to hold all transactions until batch-puts migration finishes. However you do not necessarily need to select the maximum retention period. It depends on the amount of data to migrate.

In the MySQL configuration, set binlog_format to ROW to capture transactions by using the BinLogStreamReader module. The log_bin parameter must be set as well to enable the binlog. For more information, see the Streaming Changes in a Database with Amazon Kinesis AWS Database blog post.

 

[mysqld]
secure-file-priv = ""
log_bin=/data/binlog/binlog
binlog_format=ROW
server-id = 1
tmpdir=/data/tmp

The following sample code is a Python example that captures transactions and sends them to Amazon Kinesis Streams.

 

#!/usr/bin/env python
from pymysqlreplication import BinLogStreamReader
from pymysqlreplication.row_event import (
  DeleteRowsEvent,
  UpdateRowsEvent,
  WriteRowsEvent,
)

def main():
  kinesis = boto3.client("kinesis")

  stream = BinLogStreamReader(
    connection_settings= {
      "host": "<host IP address>",
      "port": <port number>,
      "user": "<user name>",
      "passwd": "<password>"},
    server_id=100,
    blocking=True,
    resume_stream=True,
    only_events=[DeleteRowsEvent, WriteRowsEvent, UpdateRowsEvent])

  for binlogevent in stream:
    for row in binlogevent.rows:
      event = {"schema": binlogevent.schema,
      "table": binlogevent.table,
      "type": type(binlogevent).__name__,
      "row": row
      }

      kinesis.put_record(StreamName="<Amazon Kinesis stream name>", Data=json.dumps(event), PartitionKey="default")
      print json.dumps(event)

if __name__ == "__main__":
main()

The following code is sample JSON data generated by the Python script. The type attribute defines the transaction recorded by that JSON record:

  • WriteRowsEvent = INSERT
  • UpdateRowsEvent = UPDATE
  • DeleteRowsEvent = DELETE
{"table": "purchase_temp", "row": {"values": {"orderid": "orderidA1", "quantity": 100, "customerid": "customeridA74187", "productid": "productid1"}}, "type": "WriteRowsEvent", "schema": "test"}
{"table": "purchase_temp", "row": {"before_values": {"orderid": "orderid1", "quantity": 1, "customerid": "customerid74187", "productid": "productid1"}, "after_values": {"orderid": "orderid1", "quantity": 99, "customerid": "customerid74187", "productid": "productid1"}}, "type": "UpdateRowsEvent", "schema": "test"}
{"table": "purchase_temp", "row": {"values": {"orderid": "orderid100", "quantity": 1, "customerid": "customerid74187", "productid": "productid1"}}, "type": "DeleteRowsEvent", "schema": "test"}

Step 2. Dump data from MySQL to DynamoDB

The easiest way is to use DMS, which recently added Amazon S3 as a migration target. For an S3 target, both full load and CDC data is written to CSV format. However, CDC is not a good fit as UPDATE and DELETE statements are not supported. For more information, see Using Amazon S3 as a Target for AWS Database Migration Service.

Another way to upload data to Amazon S3 is to use the INTO OUTFILE SQL clause and aws s3 sync CLI command in parallel with your own script. The degree of parallelism depends on your server capacity and local network bandwidth. You might find a third-party tool useful, such as pt-archiver (part of the Percona Toolkit see the appendix for details).

SELECT * FROM purchase WHERE <condition_1>
INTO OUTFILE '/data/export/purchase/1.csv' FIELDS TERMINATED BY ',' ESCAPED BY '\\' LINES TERMINATED BY '\n';
SELECT * FROM purchase WHERE <condition_2>
INTO OUTFILE '/data/export/purchase/2.csv' FIELDS TERMINATED BY ',' ESCAPED BY '\\' LINES TERMINATED BY '\n';
...
SELECT * FROM purchase WHERE <condition_n>
INTO OUTFILE '/data/export/purchase/n.csv' FIELDS TERMINATED BY ',' ESCAPED BY '\\' LINES TERMINATED BY '\n';

I recommend the aws s3 sync command for this use case. This command works internally with the S3 multipart upload feature. Pattern matching can exclude or include particular files. In addition, if the sync process crashes in the middle of processing, you do not need to upload the same files again. The sync command compares the size and modified time of files between local and S3 versions, and synchronizes only local files whose size and modified time are different from those in S3. For more information, see the sync command in the S3 section of the AWS CLI Command Reference.

$ aws s3 sync /data/export/purchase/ s3://<your bucket name>/purchase/ 
$ aws s3 sync /data/export/<other path_1>/ s3://<your bucket name>/<other path_1>/
...
$ aws s3 sync /data/export/<other path_n>/ s3://<your bucket name>/<other path_n>/ 

After all data is uploaded to S3, put it into DynamoDB. There are two ways to do this:

  • Use Hive with an external table
  • Write MapReduce code

Hive with an external table

Create a Hive external table against the data on S3 and insert it into another external table against the DynamoDB table, using the org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler property. To improve productivity and the scalability, consider using Brickhouse, which is a collection of UDFs for Hive.

The following sample code assumes that the Hive table for DynamoDB is created with the products column, which is of type ARRAY<STRING >.  The productid and quantity columns are aggregated, grouping by customerid and orderid, and inserted into the products column with the CollectUDAF columns provided by Brickhouse.

hive> DROP TABLE purchase_ext_s3; 
--- To read data from S3 
hive> CREATE EXTERNAL TABLE purchase_ext_s3 (
customerid string,
orderid    string,
productid  string,
quantity   string) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' 
LOCATION 's3://<your bucket name>/purchase/';

Hive> drop table purchase_ext_dynamodb ; 
--- To connect to DynamoDB table  
Hive> CREATE EXTERNAL TABLE purchase_ext_dynamodb (
      customerid STRING, orderid STRING, products ARRAY<STRING>)
      STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler' 
      TBLPROPERTIES ("dynamodb.table.name" = "purchase", 
      "dynamodb.column.mapping" = "customerid:customerid,orderid:orderid,products:products");

--- Batch-puts to DynamoDB using Brickhouse 
hive> add jar /<jar file path>/brickhouse-0.7.1-SNAPSHOT.jar ; 
hive> create temporary function collect as 'brickhouse.udf.collect.CollectUDAF';
hive> INSERT INTO purchase_ext_dynamodb 
select customerid as customerid , orderid as orderid
       ,collect(concat(productid,':' ,quantity)) as products
      from purchase_ext_s3
      group by customerid, orderid; 

Unfortunately, the MAP, LIST, BOOLEAN, and NULL data types are not supported by the  DynamoDBStorageHandler class, so the ARRAY<STRING> data type has been chosen. The products column of ARRAY<STRING> data type in Hive is matched to the StringSet type attribute in DynamoDB. The sample code mostly shows how Brickhouse works, and only for those who want to aggregate multiple records into one StringSet type attribute in DynamoDB.

Python MapReduce with Hadoop Streaming

A mapper task reads each record from the input data on S3, and maps input key-value pairs to intermediate key-value pairs. It divides source data from S3 into two parts (key part and value part) delimited by a TAB character (“\t”). Mapper data is sorted in order by their intermediate key (customerid and orderid) and sent to the reducer. Records are put into DynamoDB in the reducer step.

#!/usr/bin/env python
import sys
 
# get all lines from stdin
for line in sys.stdin:
    line = line.strip()
    cols = line.split(',')
# divide source data into Key and attribute part.
# example output : “cusotmer1,order1	product1,10”
    print '%s,%s\t%s,%s' % (cols[0],cols[1],cols[2],cols[3] )

Generally, the reduce task receives the output produced after map processing (which is key/list-of-values pairs) and then performs an operation on the list of values against each key.

In this case, the reducer is written in Python and is based on STDIN/STDOUT/hadoop streaming. The enumeration data type is not available. The reducer receives data sorted and ordered by the intermediate key set in the mapper, customerid and orderid (cols[0],cols[1]) in this case, and stores all attributes for the specific key in the item_data dictionary. The attributes in the item_data dictionary are put, or flushed, into DynamoDB every time a new intermediate key comes from sys.stdin.

#!/usr/bin/env python
import sys
import boto.dynamodb
 
# create connection to DynamoDB
current_keys = None
conn = boto.dynamodb.connect_to_region( '<region>', aws_access_key_id='<access key id>', aws_secret_access_key='<secret access key>')
table = conn.get_table('<dynamodb table name>')
item_data = {}

# input comes from STDIN emitted by Mapper
for line in sys.stdin:
    line = line.strip()
    dickeys, items  = line.split('\t')
    products = items.split(',')
    if current_keys == dickeys:
       item_data[products[0]]=products[1]  
    else:
        if current_keys:
          try:
              mykeys = current_keys.split(',') 
              item = table.new_item(hash_key=mykeys[0],range_key=mykeys[1], attrs=item_data )
              item.put() 
          except Exception ,e:
              print 'Exception occurred! :', e.message,'==> Data:' , mykeys
        item_data = {}
        item_data[products[0]]=products[1]
        current_keys = dickeys

# put last data
if current_keys == dickeys:
   print 'Last one:' , current_keys #, item_data
   try:
       mykeys = dickeys.split(',')
       item = table.new_item(hash_key=mykeys[0] , range_key=mykeys[1], attrs=item_data )
       item.put()
   except Exception ,e:
print 'Exception occurred! :', e.message, '==> Data:' , mykeys

To run the MapReduce job, connect to the EMR master node and run a Hadoop streaming job. The hadoop-streaming.jar file location or name could be different, depending on your EMR version. Exception messages that occur while reducers run are stored at the directory assigned as the –output option. Hash key and range key values are also logged to identify which data causes exceptions or errors.

$ hadoop fs -rm -r s3://<bucket name>/<output path>
$ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar \
           -input s3://<bucket name>/<input path> -output s3://<bucket name>/<output path>\
           -file /<local path>/mapper.py -mapper /<local path>/mapper.py \
           -file /<local path>/reducer.py -reducer /<local path>/reducer.py

In my migration experiment using the above scripts, with self-generated test data, I found the following results, including database size and the time taken to complete the migration.

Server MySQL instance m4.2xlarge
EMR cluster

master : 1 x m3.xlarge

core  : 2 x m4.4xlarge

DynamoDB 2000 write capacity unit
Data Number of records 1,000,000,000
Database file size (.ibc) 100.6 GB
CSV files size 37 GB
Performance (time) Export to CSV 6 min 10 sec
Upload to S3 (sync) 3 min 30 sec
Import to DynamoDB depending on write capacity unit

 

The following screenshot shows the performance results by write capacity.

Note that the performance result is flexible and can vary depending on the server capacity, network bandwidth, degree of parallelism, conversion logic, program language, and other conditions. All provisioned write capacity units are consumed by the MapReduce job for data import, so the more you increase the size of the EMR cluster and write capacity units of DynamoDB table, the less time it takes to complete. Java-based MapReduce code would be more flexible for function and MapReduce framework.

Step 3: Amazon Lambda function updates DynamoDB by reading data from Amazon Kinesis

In the Lambda console, choose Create a Lambda function and the kinesis-process-record-python blueprint. Next, in the Configure triggers page, select the stream that you just created.

The Lambda function must have an IAM role with permissions to read from Amazon Kinesis and put items into DynamoDB.

The Lambda function can recognize the transaction type of the record by looking up the type attribute. The transaction type determines the method for conversion and update.

For example, when a JSON record is passed to the function, the function looks up the type attribute. It also checks whether an existing item in the DynamoDB table has the same key with the incoming record. If so, the existing item must be retrieved and saved in a dictionary variable (item, in this case). Apply a new update information command to the item dictionary before it is put back into DynamoDB table. This prevents the existing item from being overwritten by the incoming record.

from __future__ import print_function

import base64
import json
import boto3

print('Loading function')
client = boto3.client('dynamodb')

def lambda_handler(event, context):
    #print("Received event: " + json.dumps(event, indent=2))
    for record in event['Records']:
        # Amazon Kinesis data is base64-encoded so decode here
        payload = base64.b64decode(record['kinesis']['data'])
        print("Decoded payload: " + payload)
        data = json.loads(payload)
        
        # user logic for data triggered by WriteRowsEvent
        if data["type"] == "WriteRowsEvent":
            my_table = data["table"]
            my_hashkey = data["row"]["values"]["customerid"]
            my_rangekey = data["row"]["values"]["orderid"]
            my_productid = data["row"]["values"]["productid"]
            my_quantity = str( data["row"]["values"]["quantity"] )
            try:
                response = client.get_item( Key={'customerid':{'S':my_hashkey} , 'orderid':{'S':my_rangekey}} ,TableName = my_table )
                if 'Item' in response:
                    item = response['Item']
                    item[data["row"]["values"]["productid"]] = {"S":my_quantity}
                    result1 = client.put_item(Item = item , TableName = my_table )
                else:
                    item = { 'customerid':{'S':my_hashkey} , 'orderid':{'S':my_rangekey} , my_productid :{"S":my_quantity}  }
                    result2 = client.put_item( Item = item , TableName = my_table )
            except Exception, e:
                print( 'WriteRowsEvent Exception ! :', e.message  , '==> Data:' ,data["row"]["values"]["customerid"]  , data["row"]["values"]["orderid"] )
        
        # user logic for data triggered by UpdateRowsEvent
        if data["type"] == "UpdateRowsEvent":
            my_table = data["table"]
            
        # user logic for data triggered by DeleteRowsEvent    
        if data["type"] == "DeleteRowsEvent":
            my_table = data["table"]
            
            
    return 'Successfully processed {} records.'.format(len(event['Records']))

Step 4:  Switch the application endpoint to DynamoDB

Application codes need to be refactored when you change from MySQL to DynamoDB. The following simple Java code snippets focus on the connection and query part because it is difficult to cover all cases for all applications. For more information, see Programming with DynamoDB and the AWS SDKs.

Query to MySQL

The following sample code shows a common way to connect to MySQL and retrieve data.

import java.sql.* ;
...
try {
    Connection conn =  DriverManager.getConnection("jdbc:mysql://<host name>/<database name>" , "<user>" , "<password>");
    stmt = conn.createStatement();
    String sql = "SELECT quantity as quantity FROM purchase WHERE customerid = '<customerid>' and orderid = '<orderid>' and productid = '<productid>'";
    ResultSet rs = stmt.executeQuery(sql);

    while(rs.next()){ 
       int quantity  = rs.getString("quantity");   //Retrieve by column name 
       System.out.print("quantity: " + quantity);  //Display values 
       }
} catch (SQLException ex) {
    // handle any errors
    System.out.println("SQLException: " + ex.getMessage());}
...
==== Output ====
quantity:1
Query to DynamoDB

To retrieve items from DynamoDB, follow these steps:

  1. Create an instance of the DynamoDB
  2. Create an instance of the Table
  3. Add the withHashKey and withRangeKeyCondition methods to an instance of the QuerySpec
  4. Execute the query method with the querySpec instance previously created. Items are retrieved as JSON format, so use the getJSON method to look up a specific attribute in an item.
...
DynamoDB dynamoDB = new DynamoDB( new AmazonDynamoDBClient(new ProfileCredentialsProvider()));

Table table = dynamoDB.getTable("purchase");

QuerySpec querySpec = new QuerySpec()
        .withHashKey("customerid" , "customer1")  // hashkey name and its value 
        .withRangeKeyCondition(new RangeKeyCondition("orderid").eq("order1") ) ; // Ranage key and its condition value 

ItemCollection<QueryOutcome> items = table.query(querySpec); 

Iterator<Item> iterator = items.iterator();          
while (iterator.hasNext()) {
Item item = iterator.next();
System.out.println(("quantity: " + item.getJSON("product1"));   // 
}
...
==== Output ====
quantity:1

Conclusion

In this post, I introduced two options for seamlessly migrating data from MySQL to DynamoDB and minimizing downtime during the migration. Option #1 used DMS, and option #2 combined EMR, Amazon Kinesis, and Lambda. I also showed you how to convert the key design in accordance with database characteristics to improve read/write performance and reduce costs. Each option has advantages and disadvantages, so the best option depends on your business requirements.

The sample code in this post is not enough for a complete, efficient, and reliable data migration code base to be reused across many different environments. Use it to get started, but design for other variables in your actual migration.

I hope this post helps you plan and implement your migration and minimizes service outages. If you have questions or suggestions, please leave a comment below.

Appendix

To install the Percona Toolkit:

# Install Percona Toolkit

$ wget https://www.percona.com/downloads/percona-toolkit/3.0.2/binary/redhat/6/x86_64/percona-toolkit-3.0.2-1.el6.x86_64.rpm

$ yum install perl-IO-Socket-SSL

$ yum install perl-TermReadKey

$ rpm -Uvh percona-toolkit-3.0.2-1.el6.x86_64.rpm

# run pt-archiver

Example command:

$ pt-archiver –source h=localhost,D=blog,t=purchase –file ‘/data/export/%Y-%m-%d-%D.%t’  –where “1=1” –limit 10000 –commit-each

 


About the Author

Yong Seong Lee is a Cloud Support Engineer for AWS Big Data Services. He is interested in every technology related to data/databases and helping customers who have difficulties in using AWS services. His motto is “Enjoy life, be curious and have maximum experience.”

 

 

 


Converging Data Silos to Amazon Redshift Using AWS DMS

 

FBI Uses BitTorrent to Find and Catch Child Porn Offenders

Post Syndicated from Ernesto original https://torrentfreak.com/fbi-uses-bittorrent-to-find-and-catch-child-porn-offenders-170415/

To combat the distribution of child pornography on the Internet, U.S. law enforcement is using BitTorrent to track down and catch perpetrators.

File-sharing networks and tools are used to transfer all sorts of files, including pornographic footage of children.

The Department of Justice in the U.S. sees these cases as a high priority and has successfully prosecuted many cases in recent years. Several of these, were concluded with help from P2P file-sharing software.

A few years ago applications with shared folders, such as Limewire, allowed the FBI to pinpoint infringers who were actively sharing illegal content. The evidence in these cases was relatively strong and led to many convictions.

However, now that Limewire and other popular “shared folder” applications are no longer available, law enforcement has switched to BitTorrent.

While there have been similar cases before, this week we first spotted an indictment where BitTorrent was used to find someone sharing these files. In the affidavit, signed by a Homeland Security Investigations agent, the process is explained in detail.

The agent describes BitTorrent as a “very popular” file-sharing network that users typically connect to, through torrents they download from search engines such as Isohunt or The Pirate Bay.

These torrent sites don’t store any material themselves, the affidavit clarifies, but the perpetrators and law enforcement can use these sites to find illegal content.

“Law enforcement can search the BitTorrent network in order to locate individuals sharing previously identified child exploitation material in the same way a user searches this network,” the affidavit reads.

“By searching the network for these known torrents, law enforcement can quickly identify targets in the searcher’s jurisdiction.”

The FBI and other law enforcement agencies use these search engines to find torrents that are known to link to child porn. They then load the torrent files in modified torrent clients and obtain IP-addresses and other information from the associated trackers.

The software in question is modified to download complete files from a single source, so the investigator knows that the person on the other end has a full copy.

“There is law enforcement-specific BitTorrent network software which allows for single-source downloads from a computer at a single IP address, meaning that an entire file or files are downloaded only from a computer at a single IP address as opposed to obtaining the file from multiple peers/clients on the BitTorrent network.

“This procedure allows for the detection and investigation of those computers involved in sharing digital files of known or suspected child pornography on the BitTorrent network,” the affidavit adds.

In the present case a search by FBI special agent David Hand led to a Simi Valley man, who was arrested and indicted by a federal grand jury last week.

In addition to distributing child pornography, a follow-up investigation unveiled more gruesome details. The indictment alleges that the man also took 83 images and three videos of a 6-year-old girl with his iPhone.

Based on the above, the man faces lengthy prison terms for producing, distributing and possession of child pornography.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Introducing DnsControl – “DNS as Code” has Arrived

Post Syndicated from Craig Peterson original http://blog.serverfault.com/2017/04/11/introducing-dnscontrol-dns-as-code-has-arrived/

DNS at Stack Overflow is… complex.  We have hundreds of DNS domains and thousands of DNS records. We have gone from running our own BIND server to hosting DNS with multiple cloud providers, and we change things fairly often. Keeping everything up to date and synced at multiple DNS providers is difficult. We built DnsControl to allow us to perform updates easily and automatically across all providers we use.

The old way

Originally, our DNS was hosted by our own BIND servers, using artisanal, hand crafted zone files. Large changes involved liberal sed usage, and every change was pretty error prone. We decided to start using cloud DNS providers for performance reasons, but those each have their own web panels, which are universally painful to use. Web interfaces rarely have any import/export functionality, and generally lack change control, history tracking, or comments. We quickly decided that web panels were not how we wanted to manage our zones. 

Introducing DnsControl

DNSControl is the system we built to manage our DNS. It permits “describe once, use anywhere” DNS management. It consists of a few key components:

  1. A Domain Specific Language (DSL) for describing domains in a single, provider-independent way.
  2. An “interpreter” application that executes the DSL and creates a standardized representation of your desired DNS state.
  3. Back-end “providers” that sync the desired state to a DNS provider.

At the time of this writing we have 9 different providers implemented, with 3 more on the way shortly. We use it to manage our domains with our own BIND servers, as well as Route 53, Google Cloud DNS, name.com, Cloudflare, and more.

A sample might look like this description of stackoverflow.com:

D(“stackoverflow.com”, REG_NAMEDOTCOM, DnsProvider(R53), DnsProvider(GCLOUD),
    A(“@”, “198.252.206.16”),
    A(“blog”, “198.252.206.20”),
    CNAME(“chat”, “chat.stackexchange.com.”),
    CNAME(“www”, “@”, TTL(3600)),
    A(“meta”, “198.252.206.16”)
)

This is just a small, simple example. The DSL is a fully-featured way to express your DNS config. It is actually just javascript with some helpful functions. We have an examples page with more examples of the power of the language.

Running “dnscontrol preview” with this input will show what updates would be needed to bring DNS providers up to date with the new, desired, configuration. “dnscontrol push” will actually make the changes.

This allows us to manage our DNS configuration as code. Storing it this way has a bunch of advantages:

  • We can use variables to store common IP addresses or repeated data. We can make complicated changes, like failing-over services between data centers, by changing a single variable. We can activate or deactivate our CDN, which involves thousands of record changes, by commenting or uncommenting a single line of code.
  • We are not locked into any single provider, since the automation can sync to any of them. Keeping records synchronized between different cloud providers requires no manual steps.
  • We store our DNS config in git. Our build server runs all changes. We have central logging, access control, and history for our DNS changes. We’re trying to apply DevOps best practices to an area that has not seen those benefits so much yet.

I think the biggest benefit to this tool though is the freedom it has given us with our DNS.  It has allowed us to:

  • Switch providers with no fear of breaking things. We have changed CDNs or DNS providers at least 4 times in the last two years, and it has never been scary at all.
  • Dual-host our DNS with multiple providers simultaneously. The tool keeps them in sync for us.
  • Test fail-over procedures before an emergency happens. We are confident we can point DNS at our secondary datacenter easily, and we can quickly switch providers if one is being DDOSed.

DNS configuration is often difficult and error-prone.  We hope DnsControl makes it easy and more reliable. It has for us.

Some resources: