Tag Archives: 360

How to Prepare for AWS’s Move to Its Own Certificate Authority

Post Syndicated from Jonathan Kozolchyk original https://aws.amazon.com/blogs/security/how-to-prepare-for-aws-move-to-its-own-certificate-authority/

AWS Certificate Manager image

Transport Layer Security (TLS, formerly called Secure Sockets Layer [SSL]) is essential for encrypting information that is exchanged on the internet. For example, Amazon.com uses TLS for all traffic on its website, and AWS uses it to secure calls to AWS services.

An electronic document called a certificate verifies the identity of the server when creating such an encrypted connection. The certificate helps establish proof that your web browser is communicating securely with the website that you typed in your browser’s address field. Certificate Authorities, also known as CAs, issue certificates to specific domains. When a domain presents a certificate that is issued by a trusted CA, your browser or application knows it’s safe to make the connection.

In January 2016, AWS launched AWS Certificate Manager (ACM), a service that lets you easily provision, manage, and deploy SSL/TLS certificates for use with AWS services. These certificates are available for no additional charge through Amazon’s own CA: Amazon Trust Services. For browsers and other applications to trust a certificate, the certificate’s issuer must be included in the browser’s trust store, which is a list of trusted CAs. If the issuing CA is not in the trust store, the browser will display an error message (see an example) and applications will show an application-specific error. To ensure the ubiquity of the Amazon Trust Services CA, AWS purchased the Starfield Services CA, a root found in most browsers and which has been valid since 2005. This means you shouldn’t have to take any action to use the certificates issued by Amazon Trust Services.

AWS has been offering free certificates to AWS customers from the Amazon Trust Services CA. Now, AWS is in the process of moving certificates for services such as Amazon EC2 and Amazon DynamoDB to use certificates from Amazon Trust Services as well. Most software doesn’t need to be changed to handle this transition, but there are exceptions. In this blog post, I show you how to verify that you are prepared to use the Amazon Trust Services CA.

How to tell if the Amazon Trust Services CAs are in your trust store

The following table lists the Amazon Trust Services certificates. To verify that these certificates are in your browser’s trust store, click each Test URL in the following table to verify that it works for you. When a Test URL does not work, it displays an error similar to this example.

Distinguished name SHA-256 hash of subject public key information Test URL
CN=Amazon Root CA 1,O=Amazon,C=US fbe3018031f9586bcbf41727e417b7d1c45c2f47f93be372a17b96b50757d5a2 Test URL
CN=Amazon Root CA 2,O=Amazon,C=US 7f4296fc5b6a4e3b35d3c369623e364ab1af381d8fa7121533c9d6c633ea2461 Test URL
CN=Amazon Root CA 3,O=Amazon,C=US 36abc32656acfc645c61b71613c4bf21c787f5cabbee48348d58597803d7abc9 Test URL
CN=Amazon Root CA 4,O=Amazon,C=US f7ecded5c66047d28ed6466b543c40e0743abe81d109254dcf845d4c2c7853c5 Test URL
CN=Starfield Services Root Certificate Authority – G2,O=Starfield Technologies\, Inc.,L=Scottsdale,ST=Arizona,C=US 2b071c59a0a0ae76b0eadb2bad23bad4580b69c3601b630c2eaf0613afa83f92 Test URL
Starfield Class 2 Certification Authority 2ce1cb0bf9d2f9e102993fbe215152c3b2dd0cabde1c68e5319b839154dbb7f5 Test URL

What to do if the Amazon Trust Services CAs are not in your trust store

If your tests of any of the Test URLs failed, you must update your trust store. The easiest way to update your trust store is to upgrade the operating system or browser that you are using.

You will find the Amazon Trust Services CAs in the following operating systems (release dates are in parentheses):

  • Microsoft Windows versions that have January 2005 or later updates installed, Windows Vista, Windows 7, Windows Server 2008, and newer versions
  • Mac OS X 10.4 with Java for Mac OS X 10.4 Release 5, Mac OS X 10.5 and newer versions
  • Red Hat Enterprise Linux 5 (March 2007), Linux 6, and Linux 7 and CentOS 5, CentOS 6, and CentOS 7
  • Ubuntu 8.10
  • Debian 5.0
  • Amazon Linux (all versions)
  • Java 1.4.2_12, Jave 5 update 2, and all newer versions, including Java 6, Java 7, and Java 8

All modern browsers trust Amazon’s CAs. You can update the certificate bundle in your browser simply by updating your browser. You can find instructions for updating the following browsers on their respective websites:

If your application is using a custom trust store, you must add the Amazon root CAs to your application’s trust store. The instructions for doing this vary based on the application or platform. Please refer to the documentation for the application or platform you are using.

AWS SDKs and CLIs

Most AWS SDKs and CLIs are not impacted by the transition to the Amazon Trust Services CA. If you are using a version of the Python AWS SDK or CLI released before February 5, 2015, you must upgrade. The .NET, Java, PHP, Go, JavaScript, and C++ SDKs and CLIs do not bundle any certificates, so their certificates come from the underlying operating system. The Ruby SDK has included at least one of the required CAs since June 10, 2015. Before that date, the Ruby V2 SDK did not bundle certificates.

Certificate pinning

If you are using a technique called certificate pinning to lock down the CAs you trust on a domain-by-domain basis, you must adjust your pinning to include the Amazon Trust Services CAs. Certificate pinning helps defend you from an attacker using misissued certificates to fool an application into creating a connection to a spoofed host (an illegitimate host masquerading as a legitimate host). The restriction to a specific, pinned certificate is made by checking that the certificate issued is the expected certificate. This is done by checking that the hash of the certificate public key received from the server matches the expected hash stored in the application. If the hashes do not match, the code stops the connection.

AWS recommends against using certificate pinning because it introduces a potential availability risk. If the certificate to which you pin is replaced, your application will fail to connect. If your use case requires pinning, we recommend that you pin to a CA rather than to an individual certificate. If you are pinning to an Amazon Trust Services CA, you should pin to all CAs shown in the table earlier in this post.

If you have comments about this post, submit them in the “Comments” section below. If you have questions about this post, start a new thread on the ACM forum.

– Jonathan

Federate Database User Authentication Easily with IAM and Amazon Redshift

Post Syndicated from Thiyagarajan Arumugam original https://aws.amazon.com/blogs/big-data/federate-database-user-authentication-easily-with-iam-and-amazon-redshift/

Managing database users though federation allows you to manage authentication and authorization procedures centrally. Amazon Redshift now supports database authentication with IAM, enabling user authentication though enterprise federation. No need to manage separate database users and passwords to further ease the database administration. You can now manage users outside of AWS and authenticate them for access to an Amazon Redshift data warehouse. Do this by integrating IAM authentication and a third-party SAML-2.0 identity provider (IdP), such as AD FS, PingFederate, or Okta. In addition, database users can also be automatically created at their first login based on corporate permissions.

In this post, I demonstrate how you can extend the federation to enable single sign-on (SSO) to the Amazon Redshift data warehouse.

SAML and Amazon Redshift

AWS supports Security Assertion Markup Language (SAML) 2.0, which is an open standard for identity federation used by many IdPs. SAML enables federated SSO, which enables your users to sign in to the AWS Management Console. Users can also make programmatic calls to AWS API actions by using assertions from a SAML-compliant IdP. For example, if you use Microsoft Active Directory for corporate directories, you may be familiar with how Active Directory and AD FS work together to enable federation. For more information, see the Enabling Federation to AWS Using Windows Active Directory, AD FS, and SAML 2.0 AWS Security Blog post.

Amazon Redshift now provides the GetClusterCredentials API operation that allows you to generate temporary database user credentials for authentication. You can set up an IAM permissions policy that generates these credentials for connecting to Amazon Redshift. Extending the IAM authentication, you can configure the federation of AWS access though a SAML 2.0–compliant IdP. An IAM role can be configured to permit the federated users call the GetClusterCredentials action and generate temporary credentials to log in to Amazon Redshift databases. You can also set up policies to restrict access to Amazon Redshift clusters, databases, database user names, and user group.

Amazon Redshift federation workflow

In this post, I demonstrate how you can use a JDBC– or ODBC-based SQL client to log in to the Amazon Redshift cluster using this feature. The SQL clients used with Amazon Redshift JDBC or ODBC drivers automatically manage the process of calling the GetClusterCredentials action, retrieving the database user credentials, and establishing a connection to your Amazon Redshift database. You can also use your database application to programmatically call the GetClusterCredentials action, retrieve database user credentials, and connect to the database. I demonstrate these features using an example company to show how different database users accounts can be managed easily using federation.

The following diagram shows how the SSO process works:

  1. JDBC/ODBC
  2. Authenticate using Corp Username/Password
  3. IdP sends SAML assertion
  4. Call STS to assume role with SAML
  5. STS Returns Temp Credentials
  6. Use Temp Credentials to get Temp cluster credentials
  7. Connect to Amazon Redshift using temp credentials

Walkthrough

Example Corp. is using Active Directory (idp host:demo.examplecorp.com) to manage federated access for users in its organization. It has an AWS account: 123456789012 and currently manages an Amazon Redshift cluster with the cluster ID “examplecorp-dw”, database “analytics” in us-west-2 region for its Sales and Data Science teams. It wants the following access:

  • Sales users can access the examplecorp-dw cluster using the sales_grp database group
  • Sales users access examplecorp-dw through a JDBC-based SQL client
  • Sales users access examplecorp-dw through an ODBC connection, for their reporting tools
  • Data Science users access the examplecorp-dw cluster using the data_science_grp database group.
  • Partners access the examplecorp-dw cluster and query using the partner_grp database group.
  • Partners are not federated through Active Directory and are provided with separate IAM user credentials (with IAM user name examplecorpsalespartner).
  • Partners can connect to the examplecorp-dw cluster programmatically, using language such as Python.
  • All users are automatically created in Amazon Redshift when they log in for the first time.
  • (Optional) Internal users do not specify database user or group information in their connection string. It is automatically assigned.
  • Data warehouse users can use SSO for the Amazon Redshift data warehouse using the preceding permissions.

Step 1:  Set up IdPs and federation

The Enabling Federation to AWS Using Windows Active Directory post demonstrated how to prepare Active Directory and enable federation to AWS. Using those instructions, you can establish trust between your AWS account and the IdP and enable user access to AWS using SSO.  For more information, see Identity Providers and Federation.

For this walkthrough, assume that this company has already configured SSO to their AWS account: 123456789012 for their Active Directory domain demo.examplecorp.com. The Sales and Data Science teams are not required to specify database user and group information in the connection string. The connection string can be configured by adding SAML Attribute elements to your IdP. Configuring these optional attributes enables internal users to conveniently avoid providing the DbUser and DbGroup parameters when they log in to Amazon Redshift.

The user-name attribute can be set up as follows, with a user ID (for example, nancy) or an email address (for example. [email protected]):

<Attribute Name="https://redshift.amazon.com/SAML/Attributes/DbUser">  
  <AttributeValue>user-name</AttributeValue>
</Attribute>

The AutoCreate attribute can be defined as follows:

<Attribute Name="https://redshift.amazon.com/SAML/Attributes/AutoCreate">
    <AttributeValue>true</AttributeValue>
</Attribute>

The sales_grp database group can be included as follows:

<Attribute Name="https://redshift.amazon.com/SAML/Attributes/DbGroups">
    <AttributeValue>sales_grp</AttributeValue>
</Attribute>

For more information about attribute element configuration, see Configure SAML Assertions for Your IdP.

Step 2: Create IAM roles for access to the Amazon Redshift cluster

The next step is to create IAM policies with permissions to call GetClusterCredentials and provide authorization for Amazon Redshift resources. To grant a SQL client the ability to retrieve the cluster endpoint, region, and port automatically, include the redshift:DescribeClusters action with the Amazon Redshift cluster resource in the IAM role.  For example, users can connect to the Amazon Redshift cluster using a JDBC URL without the need to hardcode the Amazon Redshift endpoint:

Previous:  jdbc:redshift://endpoint:port/database

Current:  jdbc:redshift:iam://clustername:region/dbname

Use IAM to create the following policies. You can also use an existing user or role and assign these policies. For example, if you already created an IAM role for IdP access, you can attach the necessary policies to that role. Here is the policy created for sales users for this example:

Sales_DW_IAM_Policy

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "redshift:DescribeClusters"
            ],
            "Resource": [
                "arn:aws:redshift:us-west-2:123456789012:cluster:examplecorp-dw"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "redshift:GetClusterCredentials"
            ],
            "Resource": [
                "arn:aws:redshift:us-west-2:123456789012:cluster:examplecorp-dw",
                "arn:aws:redshift:us-west-2:123456789012:dbuser:examplecorp-dw/${redshift:DbUser}"
            ],
            "Condition": {
                "StringEquals": {
                    "aws:userid": "AIDIODR4TAW7CSEXAMPLE:${redshift:DbUser}@examplecorp.com"
                }
            }
        },
        {
            "Effect": "Allow",
            "Action": [
                "redshift:CreateClusterUser"
            ],
            "Resource": [
                "arn:aws:redshift:us-west-2:123456789012:dbuser:examplecorp-dw/${redshift:DbUser}"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "redshift:JoinGroup"
            ],
            "Resource": [
                "arn:aws:redshift:us-west-2:123456789012:dbgroup:examplecorp-dw/sales_grp"
            ]
        }
    ]
}

The policy uses the following parameter values:

  • Region: us-west-2
  • AWS Account: 123456789012
  • Cluster name: examplecorp-dw
  • Database group: sales_grp
  • IAM role: AIDIODR4TAW7CSEXAMPLE
Policy Statement Description
{
"Effect":"Allow",
"Action":[
"redshift:DescribeClusters"
],
"Resource":[
"arn:aws:redshift:us-west-2:123456789012:cluster:examplecorp-dw"
]
}

Allow users to retrieve the cluster endpoint, region, and port automatically for the Amazon Redshift cluster examplecorp-dw. This specification uses the resource format arn:aws:redshift:region:account-id:cluster:clustername. For example, the SQL client JDBC can be specified in the format jdbc:redshift:iam://clustername:region/dbname.

For more information, see Amazon Resource Names.

{
"Effect":"Allow",
"Action":[
"redshift:GetClusterCredentials"
],
"Resource":[
"arn:aws:redshift:us-west-2:123456789012:cluster:examplecorp-dw",
"arn:aws:redshift:us-west-2:123456789012:dbuser:examplecorp-dw/${redshift:DbUser}"
],
"Condition":{
"StringEquals":{
"aws:userid":"AIDIODR4TAW7CSEXAMPLE:${redshift:DbUser}@examplecorp.com"
}
}
}

Generates a temporary token to authenticate into the examplecorp-dw cluster. “arn:aws:redshift:us-west-2:123456789012:dbuser:examplecorp-dw/${redshift:DbUser}” restricts the corporate user name to the database user name for that user. This resource is specified using the format: arn:aws:redshift:region:account-id:dbuser:clustername/dbusername.

The Condition block enforces that the AWS user ID should match “AIDIODR4TAW7CSEXAMPLE:${redshift:DbUser}@examplecorp.com”, so that individual users can authenticate only as themselves. The AIDIODR4TAW7CSEXAMPLE role has the Sales_DW_IAM_Policy policy attached.

{
"Effect":"Allow",
"Action":[
"redshift:CreateClusterUser"
],
"Resource":[
"arn:aws:redshift:us-west-2:123456789012:dbuser:examplecorp-dw/${redshift:DbUser}"
]
}
Automatically creates database users in examplecorp-dw, when they log in for the first time. Subsequent logins reuse the existing database user.
{
"Effect":"Allow",
"Action":[
"redshift:JoinGroup"
],
"Resource":[
"arn:aws:redshift:us-west-2:123456789012:dbgroup:examplecorp-dw/sales_grp"
]
}
Allows sales users to join the sales_grp database group through the resource “arn:aws:redshift:us-west-2:123456789012:dbgroup:examplecorp-dw/sales_grp” that is specified in the format arn:aws:redshift:region:account-id:dbgroup:clustername/dbgroupname.

Similar policies can be created for Data Science users with access to join the data_science_grp group in examplecorp-dw. You can now attach the Sales_DW_IAM_Policy policy to the role that is mapped to IdP application for SSO.
 For more information about how to define the claim rules, see Configuring SAML Assertions for the Authentication Response.

Because partners are not authorized using Active Directory, they are provided with IAM credentials and added to the partner_grp database group. The Partner_DW_IAM_Policy is attached to the IAM users for partners. The following policy allows partners to log in using the IAM user name as the database user name.

Partner_DW_IAM_Policy

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "redshift:DescribeClusters"
            ],
            "Resource": [
                "arn:aws:redshift:us-west-2:123456789012:cluster:examplecorp-dw"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "redshift:GetClusterCredentials"
            ],
            "Resource": [
                "arn:aws:redshift:us-west-2:123456789012:cluster:examplecorp-dw",
                "arn:aws:redshift:us-west-2:123456789012:dbuser:examplecorp-dw/${redshift:DbUser}"
            ],
            "Condition": {
                "StringEquals": {
                    "redshift:DbUser": "${aws:username}"
                }
            }
        },
        {
            "Effect": "Allow",
            "Action": [
                "redshift:CreateClusterUser"
            ],
            "Resource": [
                "arn:aws:redshift:us-west-2:123456789012:dbuser:examplecorp-dw/${redshift:DbUser}"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "redshift:JoinGroup"
            ],
            "Resource": [
                "arn:aws:redshift:us-west-2:123456789012:dbgroup:examplecorp-dw/partner_grp"
            ]
        }
    ]
}

redshift:DbUser“: “${aws:username}” forces an IAM user to use the IAM user name as the database user name.

With the previous steps configured, you can now establish the connection to Amazon Redshift through JDBC– or ODBC-supported clients.

Step 3: Set up database user access

Before you start connecting to Amazon Redshift using the SQL client, set up the database groups for appropriate data access. Log in to your Amazon Redshift database as superuser to create a database group, using CREATE GROUP.

Log in to examplecorp-dw/analytics as superuser and create the following groups and users:

CREATE GROUP sales_grp;
CREATE GROUP datascience_grp;
CREATE GROUP partner_grp;

Use the GRANT command to define access permissions to database objects (tables/views) for the preceding groups.

Step 4: Connect to Amazon Redshift using the JDBC SQL client

Assume that sales user “nancy” is using the SQL Workbench client and JDBC driver to log in to the Amazon Redshift data warehouse. The following steps help set up the client and establish the connection:

  1. Download the latest Amazon Redshift JDBC driver from the Configure a JDBC Connection page
  2. Build the JDBC URL with the IAM option in the following format:
    jdbc:redshift:iam://examplecorp-dw:us-west-2/sales_db

Because the redshift:DescribeClusters action is assigned to the preceding IAM roles, it automatically resolves the cluster endpoints and the port. Otherwise, you can specify the endpoint and port information in the JDBC URL, as described in Configure a JDBC Connection.

Identify the following JDBC options for providing the IAM credentials (see the “Prepare your environment” section) and configure in the SQL Workbench Connection Profile:

plugin_name=com.amazon.redshift.plugin.AdfsCredentialsProvider 
idp_host=demo.examplecorp.com (The name of the corporate identity provider host)
idp_port=443  (The port of the corporate identity provider host)
user=examplecorp\nancy(corporate user name)
password=***(corporate user password)

The SQL workbench configuration looks similar to the following screenshot:

Now, “nancy” can connect to examplecorp-dw by authenticating using the corporate Active Directory. Because the SAML attributes elements are already configured for nancy, she logs in as database user nancy and is assigned the sales_grp. Similarly, other Sales and Data Science users can connect to the examplecorp-dw cluster. A custom Amazon Redshift ODBC driver can also be used to connect using a SQL client. For more information, see Configure an ODBC Connection.

Step 5: Connecting to Amazon Redshift using JDBC SQL Client and IAM Credentials

This optional step is necessary only when you want to enable users that are not authenticated with Active Directory. Partners are provided with IAM credentials that they can use to connect to the examplecorp-dw Amazon Redshift clusters. These IAM users are attached to Partner_DW_IAM_Policy that assigns them to be assigned to the public database group in Amazon Redshift. The following JDBC URLs enable them to connect to the Amazon Redshift cluster:

jdbc:redshift:iam//examplecorp-dw/analytics?AccessKeyID=XXX&SecretAccessKey=YYY&DbUser=examplecorpsalespartner&DbGroup= partner_grp&AutoCreate=true

The AutoCreate option automatically creates a new database user the first time the partner logs in. There are several other options available to conveniently specify the IAM user credentials. For more information, see Options for providing IAM credentials.

Step 6: Connecting to Amazon Redshift using an ODBC client for Microsoft Windows

Assume that another sales user “uma” is using an ODBC-based client to log in to the Amazon Redshift data warehouse using Example Corp Active Directory. The following steps help set up the ODBC client and establish the Amazon Redshift connection in a Microsoft Windows operating system connected to your corporate network:

  1. Download and install the latest Amazon Redshift ODBC driver.
  2. Create a system DSN entry.
    1. In the Start menu, locate the driver folder or folders:
      • Amazon Redshift ODBC Driver (32-bit)
      • Amazon Redshift ODBC Driver (64-bit)
      • If you installed both drivers, you have a folder for each driver.
    2. Choose ODBC Administrator, and then type your administrator credentials.
    3. To configure the driver for all users on the computer, choose System DSN. To configure the driver for your user account only, choose User DSN.
    4. Choose Add.
  3. Select the Amazon Redshift ODBC driver, and choose Finish. Configure the following attributes:
    Data Source Name =any friendly name to identify the ODBC connection 
    Database=analytics
    user=uma(corporate user name)
    Auth Type-Identity Provider: AD FS
    password=leave blank (Windows automatically authenticates)
    Cluster ID: examplecorp-dw
    idp_host=demo.examplecorp.com (The name of the corporate IdP host)

This configuration looks like the following:

  1. Choose OK to save the ODBC connection.
  2. Verify that uma is set up with the SAML attributes, as described in the “Set up IdPs and federation” section.

The user uma can now use this ODBC connection to establish the connection to the Amazon Redshift cluster using any ODBC-based tools or reporting tools such as Tableau. Internally, uma authenticates using the Sales_DW_IAM_Policy  IAM role and is assigned the sales_grp database group.

Step 7: Connecting to Amazon Redshift using Python and IAM credentials

To enable partners, connect to the examplecorp-dw cluster programmatically, using Python on a computer such as Amazon EC2 instance. Reuse the IAM users that are attached to the Partner_DW_IAM_Policy policy defined in Step 2.

The following steps show this set up on an EC2 instance:

  1. Launch a new EC2 instance with the Partner_DW_IAM_Policy role, as described in Using an IAM Role to Grant Permissions to Applications Running on Amazon EC2 Instances. Alternatively, you can attach an existing IAM role to an EC2 instance.
  2. This example uses Python PostgreSQL Driver (PyGreSQL) to connect to your Amazon Redshift clusters. To install PyGreSQL on Amazon Linux, use the following command as the ec2-user:
    sudo easy_install pip
    sudo yum install postgresql postgresql-devel gcc python-devel
    sudo pip install PyGreSQL

  1. The following code snippet demonstrates programmatic access to Amazon Redshift for partner users:
    #!/usr/bin/env python
    """
    Usage:
    python redshift-unload-copy.py <config file> <region>
    
    * Copyright 2014, Amazon.com, Inc. or its affiliates. All Rights Reserved.
    *
    * Licensed under the Amazon Software License (the "License").
    * You may not use this file except in compliance with the License.
    * A copy of the License is located at
    *
    * http://aws.amazon.com/asl/
    *
    * or in the "license" file accompanying this file. This file is distributed
    * on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
    * express or implied. See the License for the specific language governing
    * permissions and limitations under the License.
    """
    
    import sys
    import pg
    import boto3
    
    REGION = 'us-west-2'
    CLUSTER_IDENTIFIER = 'examplecorp-dw'
    DB_NAME = 'sales_db'
    DB_USER = 'examplecorpsalespartner'
    
    options = """keepalives=1 keepalives_idle=200 keepalives_interval=200
                 keepalives_count=6"""
    
    set_timeout_stmt = "set statement_timeout = 1200000"
    
    def conn_to_rs(host, port, db, usr, pwd, opt=options, timeout=set_timeout_stmt):
        rs_conn_string = """host=%s port=%s dbname=%s user=%s password=%s
                             %s""" % (host, port, db, usr, pwd, opt)
        print "Connecting to %s:%s:%s as %s" % (host, port, db, usr)
        rs_conn = pg.connect(dbname=rs_conn_string)
        rs_conn.query(timeout)
        return rs_conn
    
    def main():
        # describe the cluster and fetch the IAM temporary credentials
        global redshift_client
        redshift_client = boto3.client('redshift', region_name=REGION)
        response_cluster_details = redshift_client.describe_clusters(ClusterIdentifier=CLUSTER_IDENTIFIER)
        response_credentials = redshift_client.get_cluster_credentials(DbUser=DB_USER,DbName=DB_NAME,ClusterIdentifier=CLUSTER_IDENTIFIER,DurationSeconds=3600)
        rs_host = response_cluster_details['Clusters'][0]['Endpoint']['Address']
        rs_port = response_cluster_details['Clusters'][0]['Endpoint']['Port']
        rs_db = DB_NAME
        rs_iam_user = response_credentials['DbUser']
        rs_iam_pwd = response_credentials['DbPassword']
        # connect to the Amazon Redshift cluster
        conn = conn_to_rs(rs_host, rs_port, rs_db, rs_iam_user,rs_iam_pwd)
        # execute a query
        result = conn.query("SELECT sysdate as dt")
        # fetch results from the query
        for dt_val in result.getresult() :
            print dt_val
        # close the Amazon Redshift connection
        conn.close()
    
    if __name__ == "__main__":
        main()

You can save this Python program in a file (redshiftscript.py) and execute it at the command line as ec2-user:

python redshiftscript.py

Now partners can connect to the Amazon Redshift cluster using the Python script, and authentication is federated through the IAM user.

Summary

In this post, I demonstrated how to use federated access using Active Directory and IAM roles to enable single sign-on to an Amazon Redshift cluster. I also showed how partners outside an organization can be managed easily using IAM credentials.  Using the GetClusterCredentials API action, now supported by Amazon Redshift, lets you manage a large number of database users and have them use corporate credentials to log in. You don’t have to maintain separate database user accounts.

Although this post demonstrated the integration of IAM with AD FS and Active Directory, you can replicate this solution across with your choice of SAML 2.0 third-party identity providers (IdP), such as PingFederate or Okta. For the different supported federation options, see Configure SAML Assertions for Your IdP.

If you have questions or suggestions, please comment below.


Additional Reading

Learn how to establish federated access to your AWS resources by using Active Directory user attributes.


About the Author

Thiyagarajan Arumugam is a Big Data Solutions Architect at Amazon Web Services and designs customer architectures to process data at scale. Prior to AWS, he built data warehouse solutions at Amazon.com. In his free time, he enjoys all outdoor sports and practices the Indian classical drum mridangam.

 

Abandon Proactive Copyright Filters, Huge Coalition Tells EU Heavyweights

Post Syndicated from Andy original https://torrentfreak.com/abandon-proactive-copyright-filters-huge-coalition-tells-eu-heavyweights-171017/

Last September, EU Commission President Jean-Claude Juncker announced plans to modernize copyright law in Europe.

The proposals (pdf) are part of the Digital Single Market reforms, which have been under development for the past several years.

One of the proposals is causing significant concern. Article 13 would require some online service providers to become ‘Internet police’, proactively detecting and filtering allegedly infringing copyright works, uploaded to their platforms by users.

Currently, users are generally able to share whatever they like but should a copyright holder take exception to their upload, mechanisms are available for that content to be taken down. It’s envisioned that proactive filtering, whereby user uploads are routinely scanned and compared to a database of existing protected content, will prevent content becoming available in the first place.

These proposals are of great concern to digital rights groups, who believe that such filters will not only undermine users’ rights but will also place unfair burdens on Internet platforms, many of which will struggle to fund such a program. Yesterday, in the latest wave of opposition to Article 13, a huge coalition of international rights groups came together to underline their concerns.

Headed up by Civil Liberties Union for Europe (Liberties) and European Digital Rights (EDRi), the coalition is formed of dozens of influential groups, including Electronic Frontier Foundation (EFF), Human Rights Watch, Reporters without Borders, and Open Rights Group (ORG), to name just a few.

In an open letter to European Commission President Jean-Claude Juncker, President of the European Parliament Antonio Tajani, President of the European Council Donald Tusk and a string of others, the groups warn that the proposals undermine the trust established between EU member states.

“Fundamental rights, justice and the rule of law are intrinsically linked and constitute
core values on which the EU is founded,” the letter begins.

“Any attempt to disregard these values undermines the mutual trust between member states required for the EU to function. Any such attempt would also undermine the commitments made by the European Union and national governments to their citizens.”

Those citizens, the letter warns, would have their basic rights undermined, should the new proposals be written into EU law.

“Article 13 of the proposal on Copyright in the Digital Single Market include obligations on internet companies that would be impossible to respect without the imposition of excessive restrictions on citizens’ fundamental rights,” it notes.

A major concern is that by placing new obligations on Internet service providers that allow users to upload content – think YouTube, Facebook, Twitter and Instagram – they will be forced to err on the side of caution. Should there be any concern whatsoever that content might be infringing, fair use considerations and exceptions will be abandoned in favor of staying on the right side of the law.

“Article 13 appears to provoke such legal uncertainty that online services will have no other option than to monitor, filter and block EU citizens’ communications if they are to have any chance of staying in business,” the letter warns.

But while the potential problems for service providers and users are numerous, the groups warn that Article 13 could also be illegal since it contradicts case law of the Court of Justice.

According to the E-Commerce Directive, platforms are already required to remove infringing content, once they have been advised it exists. The new proposal, should it go ahead, would force the monitoring of uploads, something which goes against the ‘no general obligation to monitor‘ rules present in the Directive.

“The requirement to install a system for filtering electronic communications has twice been rejected by the Court of Justice, in the cases Scarlet Extended (C70/10) and Netlog/Sabam (C 360/10),” the rights groups warn.

“Therefore, a legislative provision that requires internet companies to install a filtering system would almost certainly be rejected by the Court of Justice because it would contravene the requirement that a fair balance be struck between the right to intellectual property on the one hand, and the freedom to conduct business and the right to freedom of expression, such as to receive or impart information, on the other.”

Specifically, the groups note that the proactive filtering of content would violate freedom of expression set out in Article 11 of the Charter of Fundamental Rights. That being the case, the groups expect national courts to disapply it and the rule to be annulled by the Court of Justice.

The latest protests against Article 13 come in the wake of large-scale objections earlier in the year, voicing similar concerns. However, despite the groups’ fears, they have powerful adversaries, each determined to stop the flood of copyrighted content currently being uploaded to the Internet.

Front and center in support of Article 13 is the music industry and its current hot-topic, the so-called Value Gap(1,2,3). The industry feels that platforms like YouTube are able to avoid paying expensive licensing fees (for music in particular) by exploiting the safe harbor protections of the DMCA and similar legislation.

They believe that proactively filtering uploads would significantly help to diminish this problem, which may very well be the case. But at what cost to the general public and the platforms they rely upon? Citizens and scholars feel that freedoms will be affected and it’s likely the outcry will continue.

The ball is now with the EU, whose members will soon have to make what could be the most important decision in recent copyright history. The rights groups, who are urging for Article 13 to be deleted, are clear where they stand.

The full letter is available here (pdf)

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

EU Piracy Report Suppression Raises Questions Over Transparency

Post Syndicated from Andy original https://torrentfreak.com/eu-piracy-report-suppression-raises-questions-transparency-170922/

Over the years, copyright holders have made hundreds of statements against piracy, mainly that it risks bringing industries to their knees through widespread and uncontrolled downloading from the Internet.

But while TV shows like Game of Thrones have been downloaded millions of times, the big question (one could argue the only really important question) is whether this activity actually affects sales. After all, if piracy has a massive negative effect on industry, something needs to be done. If it does not, why all the panic?

Quite clearly, the EU Commission wanted to find out the answer to this potential multi-billion dollar question when it made the decision to invest a staggering 360,000 euros in a dedicated study back in January 2014.

With a final title of ‘Estimating displacement rates of copyrighted content in the EU’, the completed study is an intimidating 307 pages deep. Shockingly, until this week, few people even knew it existed because, for reasons unknown, the EU Commission decided not to release it.

However, thanks to the sheer persistence of Member of the European Parliament Julia Reda, the public now has a copy and it contains quite a few interesting conclusions. But first, some background.

The study uses data from 2014 and covers four broad types of content: music,
audio-visual material, books and videogames. Unlike other reports, the study also considered live attendances of music and cinema visits in the key regions of Germany, UK, Spain, France, Poland and Sweden.

On average, 51% of adults and 72% of minors in the EU were found to have illegally downloaded or streamed any form of creative content, with Poland and Spain coming out as the worst offenders. However, here’s the kicker.

“In general, the results do not show robust statistical evidence of displacement of sales by online copyright infringements,” the study notes.

“That does not necessarily mean that piracy has no effect but only that the statistical analysis does not prove with sufficient reliability that there is an effect.”

For a study commissioned by the EU with huge sums of public money, this is a potentially damaging conclusion, not least for the countless industry bodies that lobby day in, day out, for tougher copyright law based on the “fact” that piracy is damaging to sales.

That being said, the study did find that certain sectors can be affected by piracy, notably recent top movies.

“The results show a displacement rate of 40 per cent which means that for every ten recent top films watched illegally, four fewer films are consumed legally,” the study notes.

“People do not watch many recent top films a second time but if it happens, displacement is lower: two legal consumptions are displaced by every ten illegal second views. This suggests that the displacement rate for older films is lower than the 40 per cent for recent top films. All in all, the estimated loss for recent top films is 5 per cent of current sales volumes.”

But while there is some negative effect on the movie industry, others can benefit. The study found that piracy had a slightly positive effect on the videogames industry, suggesting that those who play pirate games eventually become buyers of official content.

On top of displacement rates, the study also looked at the public’s willingness to pay for content, to assess whether price influences pirate consumption. Interestingly, the industry that had the most displaced sales – the movie industry – had the greatest number of people unhappy with its pricing model.

“Overall, the analysis indicates that for films and TV-series current prices are higher than 80 per cent of the illegal downloaders and streamers are willing to pay,” the study notes.

For other industries, where sales were not found to have been displaced or were positively affected by piracy, consumer satisfaction with pricing was greatest.

“For books, music and games, prices are at a level broadly corresponding to the
willingness to pay of illegal downloaders and streamers. This suggests that a
decrease in the price level would not change piracy rates for books, music and
games but that prices can have an effect on displacement rates for films and
TV-series,” the study concludes.

So, it appears that products that are priced fairly do not suffer significant displacement from piracy. Those that are priced too high, on the other hand, can expect to lose some sales.

Now that it’s been released, the findings of the study should help to paint a more comprehensive picture of the infringement climate in the EU, while laying to rest some of the wild claims of the copyright lobby. That being said, it shouldn’t have taken the toils of Julia Reda to bring them to light.

“This study may have remained buried in a drawer for several more years to come if it weren’t for an access to documents request I filed under the European Union’s Freedom of Information law on July 27, 2017, after having become aware of the public tender for this study dating back to 2013,” Reda explains.

“I would like to invite the Commission to become a provider of more solid and timely evidence to the copyright debate. Such data that is valuable both financially and in terms of its applicability should be available to everyone when it is financed by the European Union – it should not be gathering dust on a shelf until someone actively requests it.”

The full study can be downloaded here (pdf)

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

No, Google Drive is Definitely Not The New Pirate Bay

Post Syndicated from Andy original https://torrentfreak.com/no-google-drive-is-definitely-not-the-new-pirate-bay-170910/

Running close to two decades old, the world of true mainstream file-sharing is less of a mystery to the general public than it’s ever been.

Most people now understand the concept of shifting files from one place to another, and a significant majority will be aware of the opportunities to do so with infringing content.

Unsurprisingly, this is a major thorn in the side of rightsholders all over the world, who have been scrambling since the turn of the century in a considerable effort to stem the tide. The results of their work have varied, with some sectors hit harder than others.

One area that has taken a bit of a battering recently involves the dominant peer-to-peer platforms reliant on underlying BitTorrent transfers. Several large-scale sites have shut down recently, not least KickassTorrents, Torrentz, and ExtraTorrent, raising questions of what bad news may arrive next for inhabitants of Torrent Land.

Of course, like any other Internet-related activity, sharing has continued to evolve over the years, with streaming and cloud-hosting now a major hit with consumers. In the main, sites which skirt the borders of legality have been the major hosting and streaming players over the years, but more recently it’s become clear that even the most legitimate companies can become unwittingly involved in the piracy scene.

As reported here on TF back in 2014 and again several times this year (1,2,3), cloud-hosting services operated by Google, including Google Drive, are being used to store and distribute pirate content.

That news was echoed again this week, with a report on Gadgets360 reiterating that Google Drive is still being used for movie piracy. What followed were a string of follow up reports, some of which declared Google’s service to be ‘The New Pirate Bay.’

No. Just no.

While it’s always tempting for publications to squeeze a reference to The Pirate Bay into a piracy article due to the site’s popularity, it’s particularly out of place in this comparison. In no way, shape, or form can a centralized store of data like Google Drive ever replace the underlying technology of sites like The Pirate Bay.

While the casual pirate might love the idea of streaming a movie with a couple of clicks to a browser of his or her choice, the weakness of the cloud system cannot be understated. To begin with, anything hosted by Google is vulnerable to immediate takedown on demand, usually within a matter of hours.

“Google Drive has a variety of piracy counter-measures in place,” a spokesperson told Mashable this week, “and we are continuously working to improve our protections to prevent piracy across all of our products.”

When will we ever hear anything like that from The Pirate Bay? Answer: When hell freezes over. But it’s not just compliance with takedown requests that make Google Drive-hosted files vulnerable.

At the point Google Drive responds to a takedown request, it takes down the actual file. On the other hand, even if Pirate Bay responded to notices (which it doesn’t), it would be unable to do anything about the sharing going on underneath. Removing a torrent file or magnet link from TPB does nothing to negatively affect the decentralized swarm of people sharing files among themselves. Those files stay intact and sharing continues, no matter what happens to the links above.

Importantly, people sharing using BitTorrent do so without any need for central servers – the whole process is decentralized as long as a user can lay his or her hands on a torrent file or magnet link. Those using Google Drive, however, rely on a totally centralized system, where not only is Google king, but it can and will stop the entire party after receiving a few lines of text from a rightsholder.

There is a very good reason why sites like The Pirate Bay have been around for close to 15 years while platforms such as Megaupload, Hotfile, Rapidshare, and similar platforms have all met their makers. File-hosting platforms are expensive-to-run warehouses full of files, each of which brings direct liability for their hosts, once they’re made aware that those files are infringing. These days the choice is clear – take the files down or get brought down, it’s as simple as that.

The Pirate Bay, on the other hand, is nothing more than a treasure map (albeit a valuable one) that points the way to content spread all around the globe in the most decentralized way possible. There are no files to delete, no content to disappear. Comparing a vulnerable Google Drive to this kind of robust system couldn’t be further from the mark.

That being said, this is the way things are going. The cloud, it seems, is here to stay in all its forms. Everyone has access to it and uploading content is easier – much easier – than uploading it to a BitTorrent network. A Google Drive upload is simplicity itself for anyone with a mouse and a file; the same cannot be said about The Pirate Bay.

For this reason alone, platforms like Google Drive and the many dozens of others offering a similar service will continue to become havens for pirated content, until the next big round of legislative change. At the moment, each piece of content has to be removed individually but in the future, it’s possible that pre-emptive filters will kill uploads of pirated content before they see the light of day.

When this comes to pass, millions of people will understand why Google Drive, with its bots checking every file upload for alleged infringement, is not The Pirate Bay. At this point, if people have left it too long, it might be too late to reinvigorate BitTorrent networks to their former glory.

People will try to rebuild them, of course, but realizing why they shouldn’t have been left behind at all is probably the best protection.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

From Data Lake to Data Warehouse: Enhancing Customer 360 with Amazon Redshift Spectrum

Post Syndicated from Dylan Tong original https://aws.amazon.com/blogs/big-data/from-data-lake-to-data-warehouse-enhancing-customer-360-with-amazon-redshift-spectrum/

Achieving a 360o-view of your customer has become increasingly challenging as companies embrace omni-channel strategies, engaging customers across websites, mobile, call centers, social media, physical sites, and beyond. The promise of a web where online and physical worlds blend makes understanding your customers more challenging, but also more important. Businesses that are successful in this medium have a significant competitive advantage.

The big data challenge requires the management of data at high velocity and volume. Many customers have identified Amazon S3 as a great data lake solution that removes the complexities of managing a highly durable, fault tolerant data lake infrastructure at scale and economically.

AWS data services substantially lessen the heavy lifting of adopting technologies, allowing you to spend more time on what matters most—gaining a better understanding of customers to elevate your business. In this post, I show how a recent Amazon Redshift innovation, Redshift Spectrum, can enhance a customer 360 initiative.

Customer 360 solution

A successful customer 360 view benefits from using a variety of technologies to deliver different forms of insights. These could range from real-time analysis of streaming data from wearable devices and mobile interactions to historical analysis that requires interactive, on demand queries on billions of transactions. In some cases, insights can only be inferred through AI via deep learning. Finally, the value of your customer data and insights can’t be fully realized until it is operationalized at scale—readily accessible by fleets of applications. Companies are leveraging AWS for the breadth of services that cover these domains, to drive their data strategy.

A number of AWS customers stream data from various sources into a S3 data lake through Amazon Kinesis. They use Kinesis and technologies in the Hadoop ecosystem like Spark running on Amazon EMR to enrich this data. High-value data is loaded into an Amazon Redshift data warehouse, which allows users to analyze and interact with data through a choice of client tools. Redshift Spectrum expands on this analytics platform by enabling Amazon Redshift to blend and analyze data beyond the data warehouse and across a data lake.

The following diagram illustrates the workflow for such a solution.

This solution delivers value by:

  • Reducing complexity and time to value to deeper insights. For instance, an existing data model in Amazon Redshift may provide insights across dimensions such as customer, geography, time, and product on metrics from sales and financial systems. Down the road, you may gain access to streaming data sources like customer-care call logs and website activity that you want to blend in with the sales data on the same dimensions to understand how web and call center experiences maybe correlated with sales performance. Redshift Spectrum can join these dimensions in Amazon Redshift with data in S3 to allow you to quickly gain new insights, and avoid the slow and more expensive alternative of fully integrating these sources with your data warehouse.
  • Providing an additional avenue for optimizing costs and performance. In cases like call logs and clickstream data where volumes could be many TBs to PBs, storing the data exclusively in S3 yields significant cost savings. Interactive analysis on massive datasets may now be economically viable in cases where data was previously analyzed periodically through static reports generated by inexpensive batch processes. In some cases, you can improve the user experience while simultaneously lowering costs. Spectrum is powered by a large-scale infrastructure external to your Amazon Redshift cluster, and excels at scanning and aggregating large volumes of data. For instance, your analysts maybe performing data discovery on customer interactions across millions of consumers over years of data across various channels. On this large dataset, certain queries could be slow if you didn’t have a large Amazon Redshift cluster. Alternatively, you could use Redshift Spectrum to achieve a better user experience with a smaller cluster.

Proof of concept walkthrough

To make evaluation easier for you, I’ve conducted a Redshift Spectrum proof-of-concept (PoC) for the customer 360 use case. For those who want to replicate the PoC, the instructions, AWS CloudFormation templates, and public data sets are available in the GitHub repository.

The remainder of this post is a journey through the project, observing best practices in action, and learning how you can achieve business value. The walkthrough involves:

  • An analysis of performance data from the PoC environment involving queries that demonstrate blending and analysis of data across Amazon Redshift and S3. Observe that great results are achievable at scale.
  • Guidance by example on query tuning, design, and data preparation to illustrate the optimization process. This includes tuning a query that combines clickstream data in S3 with customer and time dimensions in Amazon Redshift, and aggregates ~1.9 B out of 3.7 B+ records in under 10 seconds with a small cluster!
  • Guidance and measurements to help assess deciding between two options: accessing and analyzing data exclusively in Amazon Redshift, or using Redshift Spectrum to access data left in S3.

Stream ingestion and enrichment

The focus of this post isn’t stream ingestion and enrichment on Kinesis and EMR, but be mindful of performance best practices on S3 to ensure good streaming and query performance:

  • Use random object keys: The data files provided for this project are prefixed with SHA-256 hashes to prevent hot partitions. This is important to ensure that optimal request rates to support PUT requests from the incoming stream in addition to certain queries from large Amazon Redshift clusters that could send a large number of parallel GET requests.
  • Micro-batch your data stream: S3 isn’t optimized for small random write workloads. Your datasets should be micro-batched into large files. For instance, the “parquet-1” dataset provided batches >7 million records per file. The optimal file size for Redshift Spectrum is usually in the 100 MB to 1 GB range.

If you have an edge case that may pose scalability challenges, AWS would love to hear about it. For further guidance, talk to your solutions architect.

Environment

The project consists of the following environment:

  • Amazon Redshift cluster: 4 X dc1.large
  • Data:
    • Time and customer dimension tables are stored on all Amazon Redshift nodes (ALL distribution style):
      • The data originates from the DWDATE and CUSTOMER tables in the Star Schema Benchmark
      • The customer table contains attributes for 3 million customers.
      • The time data is at the day-level granularity, and spans 7 years, from the start of 1992 to the end of 1998.
    • The clickstream data is stored in an S3 bucket, and serves as a fact table.
      • Various copies of this dataset in CSV and Parquet format have been provided, for reasons to be discussed later.
      • The data is a modified version of the uservisits dataset from AMPLab’s Big Data Benchmark, which was generated by Intel’s Hadoop benchmark tools.
      • Changes were minimal, so that existing test harnesses for this test can be adapted:
        • Increased the 751,754,869-row dataset 5X to 3,758,774,345 rows.
        • Added surrogate keys to support joins with customer and time dimensions. These keys were distributed evenly across the entire dataset to represents user visits from six customers over seven years.
        • Values for the visitDate column were replaced to align with the 7-year timeframe, and the added time surrogate key.

Queries across the data lake and data warehouse 

Imagine a scenario where a business analyst plans to analyze clickstream metrics like ad revenue over time and by customer, market segment and more. The example below is a query that achieves this effect: 

The query part highlighted in red retrieves clickstream data in S3, and joins the data with the time and customer dimension tables in Amazon Redshift through the part highlighted in blue. The query returns the total ad revenue for three customers over the last three months, along with info on their respective market segment.

Unfortunately, this query takes around three minutes to run, and doesn’t enable the interactive experience that you want. However, there’s a number of performance optimizations that you can implement to achieve the desired performance.

Performance analysis

Two key utilities provide visibility into Redshift Spectrum:

  • EXPLAIN
    Provides the query execution plan, which includes info around what processing is pushed down to Redshift Spectrum. Steps in the plan that include the prefix S3 are executed on Redshift Spectrum. For instance, the plan for the previous query has the step “S3 Seq Scan clickstream.uservisits_csv10”, indicating that Redshift Spectrum performs a scan on S3 as part of the query execution.
  • SVL_S3QUERY_SUMMARY
    Statistics for Redshift Spectrum queries are stored in this table. While the execution plan presents cost estimates, this table stores actual statistics for past query runs.

You can get the statistics of your last query by inspecting the SVL_S3QUERY_SUMMARY table with the condition (query = pg_last_query_id()). Inspecting the previous query reveals that the entire dataset of nearly 3.8 billion rows was scanned to retrieve less than 66.3 million rows. Improving scan selectivity in your query could yield substantial performance improvements.

Partitioning

Partitioning is a key means to improving scan efficiency. In your environment, the data and tables have already been organized, and configured to support partitions. For more information, see the PoC project setup instructions. The clickstream table was defined as:

CREATE EXTERNAL TABLE clickstream.uservisits_csv10
…
PARTITIONED BY(customer int4, visitYearMonth int4)

The entire 3.8 billion-row dataset is organized as a collection of large files where each file contains data exclusive to a particular customer and month in a year. This allows you to partition your data into logical subsets by customer and year/month. With partitions, the query engine can target a subset of files:

  • Only for specific customers
  • Only data for specific months
  • A combination of specific customers and year/months

You can use partitions in your queries. Instead of joining your customer data on the surrogate customer key (that is, c.c_custkey = uv.custKey), the partition key “customer” should be used instead:

SELECT c.c_name, c.c_mktsegment, t.prettyMonthYear, SUM(uv.adRevenue)
…
ON c.c_custkey = uv.customer
…
ORDER BY c.c_name, c.c_mktsegment, uv.yearMonthKey  ASC

This query should run approximately twice as fast as the previous query. If you look at the statistics for this query in SVL_S3QUERY_SUMMARY, you see that only half the dataset was scanned. This is expected because your query is on three out of six customers on an evenly distributed dataset. However, the scan is still inefficient, and you can benefit from using your year/month partition key as well:

SELECT c.c_name, c.c_mktsegment, t.prettyMonthYear, SUM(uv.adRevenue)
…
ON c.c_custkey = uv.customer
…
ON uv.visitYearMonth = t.d_yearmonthnum
…
ORDER BY c.c_name, c.c_mktsegment, uv.visitYearMonth ASC

All joins between the tables are now using partitions. Upon reviewing the statistics for this query, you should observe that Redshift Spectrum scans and returns the exact number of rows, 66,270,117. If you run this query a few times, you should see execution time in the range of 8 seconds, which is a 22.5X improvement on your original query!

Predicate pushdown and storage optimizations 

Previously, I mentioned that Redshift Spectrum performs processing through large-scale infrastructure external to your Amazon Redshift cluster. It is optimized for performing large scans and aggregations on S3. In fact, Redshift Spectrum may even out-perform a medium size Amazon Redshift cluster on these types of workloads with the proper optimizations. There are two important variables to consider for optimizing large scans and aggregations:

  • File size and count. As a general rule, use files 100 MB-1 GB in size, as Redshift Spectrum and S3 are optimized for reading this object size. However, the number of files operating on a query is directly correlated with the parallelism achievable by a query. There is an inverse relationship between file size and count: the bigger the files, the fewer files there are for the same dataset. Consequently, there is a trade-off between optimizing for object read performance, and the amount of parallelism achievable on a particular query. Large files are best for large scans as the query likely operates on sufficiently large number of files. For queries that are more selective and for which fewer files are operating, you may find that smaller files allow for more parallelism.
  • Data format. Redshift Spectrum supports various data formats. Columnar formats like Parquet can sometimes lead to substantial performance benefits by providing compression and more efficient I/O for certain workloads. Generally, format types like Parquet should be used for query workloads involving large scans, and high attribute selectivity. Again, there are trade-offs as formats like Parquet require more compute power to process than plaintext. For queries on smaller subsets of data, the I/O efficiency benefit of Parquet is diminished. At some point, Parquet may perform the same or slower than plaintext. Latency, compression rates, and the trade-off between user experience and cost should drive your decision.

To help illustrate how Redshift Spectrum performs on these large aggregation workloads, run a basic query that aggregates the entire ~3.7 billion record dataset on Redshift Spectrum, and compared that with running the query exclusively on Amazon Redshift:

SELECT uv.custKey, COUNT(uv.custKey)
FROM <your clickstream table> as uv
GROUP BY uv.custKey
ORDER BY uv.custKey ASC

For the Amazon Redshift test case, the clickstream data is loaded, and distributed evenly across all nodes (even distribution style) with optimal column compression encodings prescribed by the Amazon Redshift’s ANALYZE command.

The Redshift Spectrum test case uses a Parquet data format with each file containing all the data for a particular customer in a month. This results in files mostly in the range of 220-280 MB, and in effect, is the largest file size for this partitioning scheme. If you run tests with the other datasets provided, you see that this data format and size is optimal and out-performs others by ~60X. 

Performance differences will vary depending on the scenario. The important takeaway is to understand the testing strategy and the workload characteristics where Redshift Spectrum is likely to yield performance benefits. 

The following chart compares the query execution time for the two scenarios. The results indicate that you would have to pay for 12 X DC1.Large nodes to get performance comparable to using a small Amazon Redshift cluster that leverages Redshift Spectrum. 

Chart showing simple aggregation on ~3.7 billion records

So you’ve validated that Spectrum excels at performing large aggregations. Could you benefit by pushing more work down to Redshift Spectrum in your original query? It turns out that you can, by making the following modification:

The clickstream data is stored at a day-level granularity for each customer while your query rolls up the data to the month level per customer. In the earlier query that uses the day/month partition key, you optimized the query so that it only scans and retrieves the data required, but the day level data is still sent back to your Amazon Redshift cluster for joining and aggregation. The query shown here pushes aggregation work down to Redshift Spectrum as indicated by the query plan:

In this query, Redshift Spectrum aggregates the clickstream data to the month level before it is returned to the Amazon Redshift cluster and joined with the dimension tables. This query should complete in about 4 seconds, which is roughly twice as fast as only using the partition key. The speed increase is evident upon reviewing the SVL_S3QUERY_SUMMARY table:

  • Bytes scanned is 21.6X less because of the Parquet data format.
  • Only 90 records are returned back to the Amazon Redshift cluster as a result of the push-down, instead of ~66.2 million, leading to substantially less join overhead, and about 530 MB less data sent back to your cluster.
  • No adverse change in average parallelism.

Assessing the value of Amazon Redshift vs. Redshift Spectrum

At this point, you might be asking yourself, why would I ever not use Redshift Spectrum? Well, you still get additional value for your money by loading data into Amazon Redshift, and querying in Amazon Redshift vs. querying S3.

In fact, it turns out that the last version of our query runs even faster when executed exclusively in native Amazon Redshift, as shown in the following chart:

Chart comparing Amazon Redshift vs. Redshift Spectrum with pushdown aggregation over 3 months of data

As a general rule, queries that aren’t dominated by I/O and which involve multiple joins are better optimized in native Amazon Redshift. For instance, the performance difference between running the partition key query entirely in Amazon Redshift versus with Redshift Spectrum is twice as large as that that of the pushdown aggregation query, partly because the former case benefits more from better join performance.

Furthermore, the variability in latency in native Amazon Redshift is lower. For use cases where you have tight performance SLAs on queries, you may want to consider using Amazon Redshift exclusively to support those queries.

On the other hand, when you perform large scans, you could benefit from the best of both worlds: higher performance at lower cost. For instance, imagine that you wanted to enable your business analysts to interactively discover insights across a vast amount of historical data. In the example below, the pushdown aggregation query is modified to analyze seven years of data instead of three months:

SELECT c.c_name, c.c_mktsegment, t.prettyMonthYear, uv.totalRevenue
…
WHERE customer <= 3 and visitYearMonth >= 199201
… 
FROM dwdate WHERE d_yearmonthnum >= 199201) as t
…
ORDER BY c.c_name, c.c_mktsegment, uv.visitYearMonth ASC

This query requires scanning and aggregating nearly 1.9 billion records. As shown in the chart below, Redshift Spectrum substantially speeds up this query. A large Amazon Redshift cluster would have to be provisioned to support this use case. With the aid of Redshift Spectrum, you could use an existing small cluster, keep a single copy of your data in S3, and benefit from economical, durable storage while only paying for what you use via the pay per query pricing model.

Chart comparing Amazon Redshift vs. Redshift Spectrum with pushdown aggregation over 7 years of data

Summary

Redshift Spectrum lowers the time to value for deeper insights on customer data queries spanning the data lake and data warehouse. It can enable interactive analysis on datasets in cases that weren’t economically practical or technically feasible before.

There are cases where you can get the best of both worlds from Redshift Spectrum: higher performance at lower cost. However, there are still latency-sensitive use cases where you may want native Amazon Redshift performance. For more best practice tips, see the 10 Best Practices for Amazon Redshift post.

Please visit the Amazon Redshift Spectrum PoC Environment Github page. If you have questions or suggestions, please comment below.

 


Additional Reading

Learn more about how Amazon Redshift Spectrum extends data warehousing out to exabytes – no loading required.


About the Author

Dylan Tong is an Enterprise Solutions Architect at AWS. He works with customers to help drive their success on the AWS platform through thought leadership and guidance on designing well architected solutions. He has spent most of his career building on his expertise in data management and analytics by working for leaders and innovators in the space.

 

 

Awesome Raspberry Pi cases to 3D print at home

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/3d-printed-raspberry-pi-cases/

Unless you’re planning to fit your Raspberry Pi inside a build, you may find yourself in need of a case to protect it from dust, damage and/or the occasional pet attack. Here are some of our favourite 3D-printed cases, for which files are available online so you can recreate them at home.

TARDIS

TARDIS Raspberry PI 3 case – 3D Printing Time lapse

Every Tuesday we’ll 3D print designs from the community and showcase slicer settings, use cases and of course, Time-lapses! This week: TARDIS Raspberry PI 3 case By: https://www.thingiverse.com/Jason3030 https://www.thingiverse.com/thing:2430122/ BCN3D Sigma Blue PLA 3hrs 20min X:73 Y:73 Z:165mm .4mm layer / .6mm nozzle 0% Infill / 4mm retract 230C / 0C 114G 60mm/s —————————————– Shop for parts for your own DIY projects http://adafru.it/3dprinting Download Autodesk Fusion 360 – 1 Year Free License (renew it after that for more free use!)

Since I am an avid Whovian, it’s not surprising that this case made its way onto the list. Its outside is aesthetically pleasing to the aspiring Time Lord, and it snugly fits your treasured Pi.



Pop this case on your desk and chuckle with glee every time someone asks what’s inside it:

Person: What’s that?
You: My Raspberry Pi.
Person: What’s a Raspberry Pi?
You: It’s a computer!
Person: There’s a whole computer in that tiny case?
You: Yes…it’s BIGGER ON THE INSIDE!

I’ll get my coat.

Pi crust

Yes, we all wish we’d thought of it first. What better case for a Raspberry Pi than a pie crust?

3D-printed Raspberry Pi cases

While the case is designed to fit the Raspberry Pi Model B, you will be able to upgrade the build to accommodate newer models with a few tweaks.



Just make sure that if you do, you credit Marco Valenzuela, its original baker.

Consoles

Since many people use the Raspberry Pi to run RetroPie, there is a growing trend of 3D-printed console-style Pi cases.

3D-printed Raspberry Pi cases

So why not pop your Raspberry Pi into a case made to look like your favourite vintage console, such as the Nintendo NES or N64?



You could also use an adapter to fit a Raspberry Pi Zero within an actual Atari cartridge, or go modern and print a PlayStation 4 case!

Functional

Maybe you’re looking to use your Raspberry Pi as a component of a larger project, such as a home automation system, learning suite, or makerspace. In that case you may need to attach it to a wall, under a desk, or behind a monitor.

3D-printed Raspberry Pi cases

Coo! Coo!

The Pidgeon, shown above, allows you to turn your Zero W into a surveillance camera, while the piPad lets you keep a breadboard attached for easy access to your Pi’s GPIO pins.



Functional cases with added brackets are great for incorporating your Pi on the sly. The VESA mount case will allow you to attach your Pi to any VESA-compatible monitor, and the Fallout 4 Terminal is just really cool.

Cute

You might want your case to just look cute, especially if it’s going to sit in full view on your desk or shelf.

3D-printed Raspberry Pi cases

The tired cube above is the only one of our featured 3D prints for which you have to buy the files ($1.30), but its adorable face begged to be shared anyway.



If you’d rather save your money for another day, you may want to check out this adorable monster from Adafruit. Be aware that this case will also need some altering to fit newer versions of the Pi.

Our cases

Finally, there are great options for you if you don’t have access to a 3D printer, or if you would like to help the Raspberry Pi Foundation’s mission. You can buy one of the official Raspberry Pi cases for the Raspberry Pi 3 and Raspberry Pi Zero (and Zero W)!

3D-printed Raspberry Pi cases



As with all official Raspberry Pi accessories (and with the Pi itself), your money goes toward helping the Foundation to put the power of digital making into the hands of people all over the world.

3D-printed Raspberry Pi cases

You could also print a replica of the official Astro Pi cases, in which two Pis are currently orbiting the earth on the International Space Station.

Design your own Raspberry Pi case!

If you’ve built a case for your Raspberry Pi, be it with a 3D printer, laser-cutter, or your bare hands, make sure to share it with us in the comments below, or via our social media channels.

And if you’d like to give 3D printing a go, there are plenty of free online learning resources, and sites that offer tutorials and software to get you started, such as TinkerCAD, Instructables, and Adafruit.

The post Awesome Raspberry Pi cases to 3D print at home appeared first on Raspberry Pi.

Running an elastic HiveMQ cluster with auto discovery on AWS

Post Syndicated from The HiveMQ Team original http://www.hivemq.com/blog/running-hivemq-cluster-aws-auto-discovery

hivemq-aws

HiveMQ is a cloud-first MQTT broker with elastic clustering capabilities and a resilient software design which is a perfect fit for common cloud infrastructures. This blogpost discussed what benefits a MQTT broker cluster offers. Today’s post aims to be more practical and talk about how to set up a HiveMQ on one of the most popular cloud computing platform: Amazon Webservices.

Running HiveMQ on cloud infrastructure

Running a HiveMQ cluster on cloud infrastructure like AWS not only offers the advantage the possibility of elastically scaling the infrastructure, it also assures that state of the art security standards are in place on the infrastructure side. These platforms are typically highly available and new virtual machines can be spawned in a snap if they are needed. HiveMQ’s unique ability to add (and remove) cluster nodes at runtime without any manual reconfiguration of the cluster allow to scale linearly on IaaS providers. New cluster nodes can be started (manually or automatically) and the cluster sizes adapts automatically. For more detailed information about HiveMQ clustering and how to achieve true high availability and linear scalability with HiveMQ, we recommend reading the HiveMQ Clustering Paper.

As Amazon Webservice is amongst the best known and most used cloud platforms, we want to illustrate the setup of a HiveMQ cluster on AWS in this post. Note that similar concepts as displayed in this step by step guide for Running an elastic HiveMQ cluster on AWS apply to other cloud platforms such as Microsoft Azure or Google Cloud Platform.

Setup and Configuration

Amazon Webservices prohibits the use of UDP multicast, which is the default HiveMQ cluster discovery mode. The use of Amazon Simple Storage Service (S3) buckets for auto-discovery is a perfect alternative if the brokers are running on AWS EC2 instances anyway. HiveMQ has a free off-the-shelf plugin available for AWS S3 Cluster Discovery.

The following provides a step-by-step guide how to setup the brokers on AWS EC2 with automatic cluster member discovery via S3.

Setup Security Group

The first step is creating a security group that allows inbound traffic to the listeners we are going to configure for MQTT communication. It is also vital to have SSH access on the instances. After you created the security group you need to edit the group and add an additional rule for internal communication between the cluster nodes (meaning the source is the security group itself) on all TCP ports.

To create and edit security groups go to the EC2 console – NETWORK & SECURITY – Security Groups

Inbound traffic

Inbound traffic

Outbound traffic

Outbound traffic

The next step is to create an s3-bucket in the s3 console. Make sure to choose a region, close to the region you want to run your HiveMQ instances on.

Option A: Create IAM role and assign to EC2 instance

Our recommendation is to configure your EC2 instances in a way, allowing them to have access to the s3 bucket. This way you don’t need to create a specific user and don’t need to use the user’s credentials in the

s3discovery.properties

file.

Create IAM Role

Create IAM Role

EC2 Instance Role Type

EC2 Instance Role Type

Select S3 Full Access

Select S3 Full Access

Assign new Role to Instance

Assign new Role to Instance

Option B: Create user and assign IAM policy

The next step is creating a user in the IAM console.

Choose name and set programmatic access

Choose name and set programmatic access

Assign s3 full access role

Assign s3 full access role

Review and create

Review and create

Download credentials

Download credentials

It is important you store these credentials, as they will be needed later for configuring the S3 Cluster Discovery Plugin.

Start EC2 instances with HiveMQ

The next step is spawning 2 or more EC-2 instances with HiveMQ. Follow the steps in the HiveMQ User Guide.

Install s3 discovery plugin

The final step is downloading, installing and configuring the S3 Cluster Discovery Plugin.
After you downloaded the plugin you need to configure the s3 access in the

s3discovery.properties

file according to which s3 access option you chose.

Option A:

# AWS Credentials                                          #
############################################################

#
# Use environment variables to specify your AWS credentials
# the following variables need to be set:
# AWS_ACCESS_KEY_ID
# AWS_SECRET_ACCESS_KEY
#
#credentials-type:environment_variables

#
# Use Java system properties to specify your AWS credentials
# the following variables need to be set:
# aws.accessKeyId
# aws.secretKey
#
#credentials-type:java_system_properties

#
# Uses the credentials file wich ############################################################
# can be created by calling 'aws configure' (AWS CLI)
# usually this file is located at ~/.aws/credentials (platform dependent)
# The location of the file can be configured by setting the environment variable
# AWS_CREDENTIAL_PROFILE_FILE to the location of your file
#
#credentials-type:user_credentials_file

#
# Uses the IAM Profile assigned to the EC2 instance running HiveMQ to access S3
# Notice: This only works if HiveMQ is running on an EC2 instance !
#
credentials-type:instance_profile_credentials

#
# Tries to access S3 via the default mechanisms in the following order
# 1) Environment variables
# 2) Java system properties
# 3) User credentials file
# 4) IAM profiles assigned to EC2 instance
#
#credentials-type:default

#
# Uses the credentials specified in this file.
# The variables you must provide are:
# credentials-access-key-id
# credentials-secret-access-key
#
#credentials-type:access_key
#credentials-access-key-id:
#credentials-secret-access-key:

#
# Uses the credentials specified in this file to authenticate with a temporary session
# The variables you must provide are:
# credentials-access-key-id
# credentials-secret-access-key
# credentials-session-token
#
#credentials-type:temporary_session
#credentials-access-key-id:{access_key_id}
#credentials-secret-access-key:{secret_access_key}
#credentials-session-token:{session_token}


############################################################
# S3 Bucket                                                #
############################################################

#
# Region for the S3 bucket used by hivemq
# see http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region for a list of regions for S3
# example: us-west-2
#
s3-bucket-region:<your region here>

#
# Name of the bucket used by HiveMQ
#
s3-bucket-name:<your s3 bucket name here>

#
# Prefix for the filename of every node's file (optional)
#
file-prefix:hivemq/cluster/nodes/

#
# Expiration timeout (in minutes).
# Files with a timestamp older than (timestamp + expiration) will be automatically deleted
# Set to 0 if you do not want the plugin to handle expiration.
#
file-expiration:360

#
# Interval (in minutes) in which the own information in S3 is updated.
# Set to 0 if you do not want the plugin to update its own information.
# If you disable this you also might want to disable expiration.
#
update-interval:180

Option B:

# AWS Credentials                                          #
############################################################

#
# Use environment variables to specify your AWS credentials
# the following variables need to be set:
# AWS_ACCESS_KEY_ID
# AWS_SECRET_ACCESS_KEY
#
#credentials-type:environment_variables

#
# Use Java system properties to specify your AWS credentials
# the following variables need to be set:
# aws.accessKeyId
# aws.secretKey
#
#credentials-type:java_system_properties

#
# Uses the credentials file wich ############################################################
# can be created by calling 'aws configure' (AWS CLI)
# usually this file is located at ~/.aws/credentials (platform dependent)
# The location of the file can be configured by setting the environment variable
# AWS_CREDENTIAL_PROFILE_FILE to the location of your file
#
#credentials-type:user_credentials_file

#
# Uses the IAM Profile assigned to the EC2 instance running HiveMQ to access S3
# Notice: This only works if HiveMQ is running on an EC2 instance !
#
#credentials-type:instance_profile_credentials

#
# Tries to access S3 via the default mechanisms in the following order
# 1) Environment variables
# 2) Java system properties
# 3) User credentials file
# 4) IAM profiles assigned to EC2 instance
#
#credentials-type:default

#
# Uses the credentials specified in this file.
# The variables you must provide are:
# credentials-access-key-id
# credentials-secret-access-key
#
credentials-type:access_key
credentials-access-key-id:<your access key id here>
credentials-secret-access-key:<your secret access key here>

#
# Uses the credentials specified in this file to authenticate with a temporary session
# The variables you must provide are:
# credentials-access-key-id
# credentials-secret-access-key
# credentials-session-token
#
#credentials-type:temporary_session
#credentials-access-key-id:{access_key_id}
#credentials-secret-access-key:{secret_access_key}
#credentials-session-token:{session_token}


############################################################
# S3 Bucket                                                #
############################################################

#
# Region for the S3 bucket used by hivemq
# see http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region for a list of regions for S3
# example: us-west-2
#
s3-bucket-region:<your region here>

#
# Name of the bucket used by HiveMQ
#
s3-bucket-name:<your s3 bucket name here>

#
# Prefix for the filename of every node's file (optional)
#
file-prefix:hivemq/cluster/nodes/

#
# Expiration timeout (in minutes).
# Files with a timestamp older than (timestamp + expiration) will be automatically deleted
# Set to 0 if you do not want the plugin to handle expiration.
#
file-expiration:360

#
# Interval (in minutes) in which the own information in S3 is updated.
# Set to 0 if you do not want the plugin to update its own information.
# If you disable this you also might want to disable expiration.
#
update-interval:180

This file has to be identical on all your cluster nodes.

That’s it. Starting HiveMQ on multiple EC2 instances will now result in them forming a cluster, taking advantage of the S3 bucket for discovery.
You know that your setup was successful when HiveMQ logs something similar to this.

Cluster size = 2, members : [0QMpE, jw8wu].

Enjoy an elastic MQTT broker cluster

We are now able to take advantage of rapid elasticity. Scaling the HiveMQ cluster up or down by adding or removing EC2 instances without the need of administrative intervention is now possible.

For production environments it’s recommended to use automatic provisioning of the EC2 instances (e.g. by using Chef, Puppet, Ansible or similar tools) so you don’t need to configure each EC2 instance manually. Of course HiveMQ can also be used with Docker, which can also ease the provisioning of HiveMQ nodes.

Running an elastic HiveMQ cluster with auto discovery on AWS

Post Syndicated from The HiveMQ Team original https://www.hivemq.com/blog/running-hivemq-cluster-aws-auto-discovery

hivemq-aws

HiveMQ is a cloud-first MQTT broker with elastic clustering capabilities and a resilient software design which is a perfect fit for common cloud infrastructures. This blogpost discussed what benefits a MQTT broker cluster offers. Today’s post aims to be more practical and talk about how to set up a HiveMQ on one of the most popular cloud computing platform: Amazon Webservices.

Running HiveMQ on cloud infrastructure

Running a HiveMQ cluster on cloud infrastructure like AWS not only offers the advantage the possibility of elastically scaling the infrastructure, it also assures that state of the art security standards are in place on the infrastructure side. These platforms are typically highly available and new virtual machines can be spawned in a snap if they are needed. HiveMQ’s unique ability to add (and remove) cluster nodes at runtime without any manual reconfiguration of the cluster allow to scale linearly on IaaS providers. New cluster nodes can be started (manually or automatically) and the cluster sizes adapts automatically. For more detailed information about HiveMQ clustering and how to achieve true high availability and linear scalability with HiveMQ, we recommend reading the HiveMQ Clustering Paper.

As Amazon Webservice is amongst the best known and most used cloud platforms, we want to illustrate the setup of a HiveMQ cluster on AWS in this post. Note that similar concepts as displayed in this step by step guide for Running an elastic HiveMQ cluster on AWS apply to other cloud platforms such as Microsoft Azure or Google Cloud Platform.

Setup and Configuration

Amazon Webservices prohibits the use of UDP multicast, which is the default HiveMQ cluster discovery mode. The use of Amazon Simple Storage Service (S3) buckets for auto-discovery is a perfect alternative if the brokers are running on AWS EC2 instances anyway. HiveMQ has a free off-the-shelf plugin available for AWS S3 Cluster Discovery.

The following provides a step-by-step guide how to setup the brokers on AWS EC2 with automatic cluster member discovery via S3.

Setup Security Group

The first step is creating a security group that allows inbound traffic to the listeners we are going to configure for MQTT communication. It is also vital to have SSH access on the instances. After you created the security group you need to edit the group and add an additional rule for internal communication between the cluster nodes (meaning the source is the security group itself) on all TCP ports.

To create and edit security groups go to the EC2 console – NETWORK & SECURITY – Security Groups

Inbound traffic

Inbound traffic

Outbound traffic

Outbound traffic

The next step is to create an s3-bucket in the s3 console. Make sure to choose a region, close to the region you want to run your HiveMQ instances on.

Option A: Create IAM role and assign to EC2 instance

Our recommendation is to configure your EC2 instances in a way, allowing them to have access to the s3 bucket. This way you don’t need to create a specific user and don’t need to use the user’s credentials in the s3discovery.properties file.

Create IAM Role

Create IAM Role

EC2 Instance Role Type

EC2 Instance Role Type

Select S3 Full Access

Select S3 Full Access

Assign new Role to Instance

Assign new Role to Instance

Option B: Create user and assign IAM policy

The next step is creating a user in the IAM console.

Choose name and set programmatic access

Choose name and set programmatic access

Assign s3 full access role

Assign s3 full access role

Review and create

Review and create

Download credentials

Download credentials

It is important you store these credentials, as they will be needed later for configuring the S3 Cluster Discovery Plugin.

Start EC2 instances with HiveMQ

The next step is spawning 2 or more EC-2 instances with HiveMQ. Follow the steps in the HiveMQ User Guide.

Install s3 discovery plugin

The final step is downloading, installing and configuring the S3 Cluster Discovery Plugin.
After you downloaded the plugin you need to configure the s3 access in the s3discovery.properties file according to which s3 access option you chose.

Option A:

# AWS Credentials                                          #
############################################################

#
# Use environment variables to specify your AWS credentials
# the following variables need to be set:
# AWS_ACCESS_KEY_ID
# AWS_SECRET_ACCESS_KEY
#
#credentials-type:environment_variables

#
# Use Java system properties to specify your AWS credentials
# the following variables need to be set:
# aws.accessKeyId
# aws.secretKey
#
#credentials-type:java_system_properties

#
# Uses the credentials file wich ############################################################
# can be created by calling 'aws configure' (AWS CLI)
# usually this file is located at ~/.aws/credentials (platform dependent)
# The location of the file can be configured by setting the environment variable
# AWS_CREDENTIAL_PROFILE_FILE to the location of your file
#
#credentials-type:user_credentials_file

#
# Uses the IAM Profile assigned to the EC2 instance running HiveMQ to access S3
# Notice: This only works if HiveMQ is running on an EC2 instance !
#
credentials-type:instance_profile_credentials

#
# Tries to access S3 via the default mechanisms in the following order
# 1) Environment variables
# 2) Java system properties
# 3) User credentials file
# 4) IAM profiles assigned to EC2 instance
#
#credentials-type:default

#
# Uses the credentials specified in this file.
# The variables you must provide are:
# credentials-access-key-id
# credentials-secret-access-key
#
#credentials-type:access_key
#credentials-access-key-id:
#credentials-secret-access-key:

#
# Uses the credentials specified in this file to authenticate with a temporary session
# The variables you must provide are:
# credentials-access-key-id
# credentials-secret-access-key
# credentials-session-token
#
#credentials-type:temporary_session
#credentials-access-key-id:{access_key_id}
#credentials-secret-access-key:{secret_access_key}
#credentials-session-token:{session_token}


############################################################
# S3 Bucket                                                #
############################################################

#
# Region for the S3 bucket used by hivemq
# see http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region for a list of regions for S3
# example: us-west-2
#
s3-bucket-region:

#
# Name of the bucket used by HiveMQ
#
s3-bucket-name:

#
# Prefix for the filename of every node's file (optional)
#
file-prefix:hivemq/cluster/nodes/

#
# Expiration timeout (in minutes).
# Files with a timestamp older than (timestamp + expiration) will be automatically deleted
# Set to 0 if you do not want the plugin to handle expiration.
#
file-expiration:360

#
# Interval (in minutes) in which the own information in S3 is updated.
# Set to 0 if you do not want the plugin to update its own information.
# If you disable this you also might want to disable expiration.
#
update-interval:180

Option B:

# AWS Credentials                                          #
############################################################

#
# Use environment variables to specify your AWS credentials
# the following variables need to be set:
# AWS_ACCESS_KEY_ID
# AWS_SECRET_ACCESS_KEY
#
#credentials-type:environment_variables

#
# Use Java system properties to specify your AWS credentials
# the following variables need to be set:
# aws.accessKeyId
# aws.secretKey
#
#credentials-type:java_system_properties

#
# Uses the credentials file wich ############################################################
# can be created by calling 'aws configure' (AWS CLI)
# usually this file is located at ~/.aws/credentials (platform dependent)
# The location of the file can be configured by setting the environment variable
# AWS_CREDENTIAL_PROFILE_FILE to the location of your file
#
#credentials-type:user_credentials_file

#
# Uses the IAM Profile assigned to the EC2 instance running HiveMQ to access S3
# Notice: This only works if HiveMQ is running on an EC2 instance !
#
#credentials-type:instance_profile_credentials

#
# Tries to access S3 via the default mechanisms in the following order
# 1) Environment variables
# 2) Java system properties
# 3) User credentials file
# 4) IAM profiles assigned to EC2 instance
#
#credentials-type:default

#
# Uses the credentials specified in this file.
# The variables you must provide are:
# credentials-access-key-id
# credentials-secret-access-key
#
credentials-type:access_key
credentials-access-key-id:
credentials-secret-access-key:

#
# Uses the credentials specified in this file to authenticate with a temporary session
# The variables you must provide are:
# credentials-access-key-id
# credentials-secret-access-key
# credentials-session-token
#
#credentials-type:temporary_session
#credentials-access-key-id:{access_key_id}
#credentials-secret-access-key:{secret_access_key}
#credentials-session-token:{session_token}


############################################################
# S3 Bucket                                                #
############################################################

#
# Region for the S3 bucket used by hivemq
# see http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region for a list of regions for S3
# example: us-west-2
#
s3-bucket-region:

#
# Name of the bucket used by HiveMQ
#
s3-bucket-name:

#
# Prefix for the filename of every node's file (optional)
#
file-prefix:hivemq/cluster/nodes/

#
# Expiration timeout (in minutes).
# Files with a timestamp older than (timestamp + expiration) will be automatically deleted
# Set to 0 if you do not want the plugin to handle expiration.
#
file-expiration:360

#
# Interval (in minutes) in which the own information in S3 is updated.
# Set to 0 if you do not want the plugin to update its own information.
# If you disable this you also might want to disable expiration.
#
update-interval:180

This file has to be identical on all your cluster nodes.

That’s it. Starting HiveMQ on multiple EC2 instances will now result in them forming a cluster, taking advantage of the S3 bucket for discovery.
You know that your setup was successful when HiveMQ logs something similar to this.

Cluster size = 2, members : [0QMpE, jw8wu].

Enjoy an elastic MQTT broker cluster

We are now able to take advantage of rapid elasticity. Scaling the HiveMQ cluster up or down by adding or removing EC2 instances without the need of administrative intervention is now possible.

For production environments it’s recommended to use automatic provisioning of the EC2 instances (e.g. by using Chef, Puppet, Ansible or similar tools) so you don’t need to configure each EC2 instance manually. Of course HiveMQ can also be used with Docker, which can also ease the provisioning of HiveMQ nodes.

Bitcoin, UASF… и политиката

Post Syndicated from Григор original http://www.gatchev.info/blog/?p=2064

Напоследък се заговори из Нета за UASF при Bitcoin. Надали обаче много хора са обърнали внимание на тия акроними. (Обикновено статиите по въпроса на свой ред са салата от други акроними, което също не улеснява разбирането им.) Какво, по дяволите, значи това? И важно ли е?

Всъщност не е особено важно, освен за хора, които сериозно се занимават с криптовалути. Останалите спокойно могат да не му обръщат внимание.

Поне на пръв поглед. Защото дава и сериозно разбиране за ефективността на някои фундаментални политически понятия. Затова смятам да му посветя тук част от времето си – и да изгубя част от вашето.

1. Проблемите на Bitcoin

Електронна валута, която се контролира не от политикани и меринджеи, а от строги правила – мечта, нали? Край на страховете, че поредният популист ще отвори печатницата за пари и ще превърне спестяванията ви в шарена тоалетна хартия… Но идеи без проблеми няма (за реализациите им да не говорим). Така е и с Bitcoin.

Всички транзакции в биткойни се записват в блокове, които образуват верига – така нареченият блокчейн. По този начин всяка стотинка (пардон, сатоши 🙂 ) може да бъде проследена до самото ѝ създаване. Адресите, между които се обменят парите, са анонимни, но самите обмени са публични и явни. Може да ги проследи и провери за валидност всеки, които има нужния софтуер (достъпен свободно) и поддържа „пълен възел“ (full node), тоест е склонен да отдели стотина гигабайта на диска си.

Проблемът е, че блокът на Bitcoin има фиксиран максимален размер – до 1 мегабайт. Той побира максимум 2-3 хиляди транзакции. При 6 блока на час това означава около 15 000 транзакции на час, или около 360 000 на денонощие. Звучи много, но всъщност е абсолютно недостатъчно – доста големи банки правят по повече транзакции на секунда. Та, от известно време насам нуждата от транзакции надхвърля капацитета на блокчейна. Което създава проблем за потребителите на валутата. Някои от тях започват да я изоставят и да се насочват към традиционни валути, или към други криптовалути. Съответно, влиянието и ролята ѝ спада.

2. Положението с решенията

Предлагани са немалко решения на този проблем. Последното се нарича SegWit (segregated witness). Срещу всички тях (и конкретно срещу това) обаче има сериозна съпротива от ключови фактори в Bitcoin.

Сравнително скоро след създаването на Bitcoin в него беше въведено правилото, че транзакциите са платени. (Иначе беше много лесно да бъдат генерирани огромен брой транзакции за минимална сума напред-назад, и така да бъде задръстен блокчейнът.) Всяка транзакция указва колко ще плати за включването си в блок. (Това е, което я „узаконява“.)

Кои транзакции от чакащите реда си ще включи в блок решава този, който създава блока. Това е „копачът“, който е решил целта от предишния блок. Той прибира заплащането за включените транзакции, освен стандартната „награда“ за блока. Затова копачите имат изгода транзакциите да са колкото се може по-скъпи – тоест, капацитетът на блокчейна да е недостатъчен.

В добавка, немалко копачи използват „хак“ в технологията на системата – така нареченият ASICBOOST. Едно от предимствата на SegWit е, че пречи на подобни хакове – тоест, на тези „копачи“. (Подробности можете да намерите тук.)

Резултатът е, че някои копачи се съпротивляват на въвеждането на SegWit. А „копаещата мощност“ е, която служи като „демократичен глас“ в системата на Bitcoin. Вече е правен опит да се въведе SegWit, който не сполучи. За да е по-добър консенсусът, този опит изискваше SegWit да се приеме когато 95% от копаещата мощност го подкрепи. Скоро стана ясно, че това няма да се случи.

3. UASF? WTF? (Демек, кво е тва UASF?)

Не зная колко точно е процентът на отхвърлящите SegWit копачи. Но към момента копаенето е централизирано до степен да се върши почти всичкото от малък брой мощни компании. Напълно е възможно отхвърлящите SegWit да са над 50% от копаещата мощност. Ако е така, въвеждането на SegWit чрез подкрепа от нея би било невъзможно. (Разбира се, това ще значи в близко бъдеще упадъка на Bitcoin и превръщането му от „царя на криптовалутите“ в евтин музеен експонат. В крайна сметка тези копачи ще са си изкопали гроба. Но ако има на света нещо, на което може да се разчита винаги и докрай, това е човешката глупост.)

За да се избегне такъв сценарий, девелоперите от Bitcoin Core Team предложиха т.нар. User-Activated Soft Fork, съкратено UASF. Същността му е, че от 1 август нататък възлите в мрежата на Bitcoin, които подкрепят SegWit, ще започнат да смятат блокове, които не потвърждават че го поддържат, за невалидни.

Отхвърлящите SegWit копачи могат да продължат да си копаят по старому. Поддържащите го ще продължат по новому. Съответно блокчейнът на Bitcoin от този момент нататък ще се раздели на два – клон без SegWit и клон с него.

4. Какъв ще е резултатът?

Преобладаващата копаеща мощност може да се окаже в първия – тоест, по правилата на Сатоши Накамото той ще е основният. Но ако мрежата е разделена на две, всяка ще има своя основен клон, така че няма да бъдат технически обединени. Ще има две различни валути на име Bitcoin, и всяка ще претендира, че е основната.

Как ще се разреши този спор? Потребителите на Bitcoin търсят по-ниски цени за транзакции, така че огромният процент от тях бързо ще се ориентират към веригата със SegWit. А ценността и приетостта на Bitcoin се дължи просто на факта, че хората го приемат и са склонни да го използват. Затова и Segwit-натият Bitcoin ще запази ролята (и цената) на оригиналния Bitcoin, докато този без SegWit ще поевтинее и ще загуби повечето от релевантността си.

(Всъщност, подобно „разцепление“ вече се е случвало с No. 2 в света на криптовалутите – Ethereum. Затова има Ethereum и Ethereum Classic. Вторите изгубиха борбата да са наследникът на оригиналния Ethereum, но продължава да ги има, макар и да са с много по-малка роля и цена.)

Отхвърлилите SegWit копачи скоро ще се окажат в положение да копаят нещо, което струва жълти стотинки. Затова вероятно те шумно или тихо ще преминат към поддръжка на SegWit. Не бих се учудил дори доста от тях да го направят още на 1 август. (Въпреки че някои сигурно ще продължат да опищяват света колко лошо е решението и какви загуби понасят от него. Може да има дори съдебни процеси… Подробностите ще ги видим.)

5. Политиката

Ако сте издържали дотук, четете внимателно – същността на този запис е в тази част.

Наскоро си говорих с горда випускничка на български икономически ВУЗ. Изслушах обяснение как икономията от мащаба не съществува и е точно обратното. Как малките фирми са по-ефективни от големите и т.н…

Нищо чудно, че ги учат на глупости. Който плаща, дори зад сцената, той поръчва музиката. Странно ми е, че обучаваните вярват на тези глупости при положение, че реалността е пред очите им. И че в нея големите фирми разоряват и/или купуват малките, а не обратното. Няма как да е иначе. Както законите на Нютон важат еднакво за лабораторни тежести и за търговски контейнери, така и дисипативните закони важат еднакво за тенджери с вода и за икономически системи.

В ИТ бизнеса динамиката е много над средната. Където не е и няма как да бъде регулиран лесно, където нещата са по-laissez-faire, както е примерно в копаенето на биткойни, е още по-голяма. Нищо чудно, че копаенето премина толкова бързо от милиони индивидуални участници към малък брой лесно картелиращи се тиранозаври. Всяка система еволюира вътрешно в такава посока… Затова „перфектна система“ и „щастие завинаги“ няма как да съществуват. Затова, ако щете, свободата трябва да се замесва и изпича всеки ден.

„Преобладаващата копаеща мощност“, било като преобладаващият брой индивиди във вида, било като основната маса пари, било като управление на най-популярните сред гласоподавателите мемове, лесно може да се съсредоточи в тесен кръг ръце. И законите на вътрешната еволюция на системите, като конкретно изражение на дисипативните закони, водят именно натам… Тогава всяко гласуване започва да подкрепя статуквото. Демокрацията престава да бъде възможност за промяна – такава остава само разделянето на възгледите в отделни системи. Единствено тогава новото получава възможност реално да конкурира старото.

Затова и всеки биологичен вид наоколо е започнал някога като миниатюрна различна клонка от могъщото тогава стъбло на друг вид. Който днес познават само палеобиолозите. И всяка могъща банка, или производствена или медийна фирма е започнала – като сума пари, или производствен капацитет, или интелектуална собственост – като обикновена будка за заеми, или работилничка, или ателие. В сянката на тогавашните тиранозаври, помнени днес само от историците. Намерили начин да се отделят и скрият някак от тях, за да съберат мощта да ги конкурират…

Който разбрал – разбрал.

Perl 5.26.0 released

Post Syndicated from corbet original https://lwn.net/Articles/724363/rss

The Perl 5.26.0 release is out. “Perl 5.26.0 represents approximately 13 months of development since Perl
5.24.0 and contains approximately 360,000 lines of changes across 2,600
files from 86 authors
“. See this
page
for a list of changes in this release; new features include
indented here-documents, the ability to declare references to variables,
Unicode 9.0 support, and the removal of the current directory
(“.“) from @INC by default.

Product or Project?

Post Syndicated from Matt Richardson original https://www.raspberrypi.org/blog/product-or-project/

This column is from The MagPi issue 57. You can download a PDF of the full issue for free, or subscribe to receive the print edition in your mailbox or the digital edition on your tablet. All proceeds from the print and digital editions help the Raspberry Pi Foundation achieve its charitable goals.

Image of MagPi magazine and AIY Project Kit

Taking inspiration from a widely known inspirational phrase, I like to tell people, “make the thing you wish to see in the world.” In other words, you don’t have to wait for a company to create the exact product you want. You can be a maker as well as a consumer! Prototyping with hardware has become easier and more affordable, empowering people to make products that suit their needs perfectly. And the people making these things aren’t necessarily electrical engineers, computer scientists, or product designers. They’re not even necessarily adults. They’re often self-taught hobbyists who are empowered by maker-friendly technology.

It’s a subject I’ve been very interested in, and I have written about it before. Here’s what I’ve noticed: the flow between maker project and consumer product moves in both directions. In other words, consumer products can start off as maker projects. Just take a look at the story behind many of the crowdfunded products on sites such as Kickstarter. Conversely, consumer products can evolve into maker products as well. The cover story for the latest issue of The MagPi is a perfect example of that. Google has given you the resources you need to build your own dedicated Google Assistant device. How cool is that?

David Pride on Twitter

@Raspberry_Pi @TheMagP1 Oh this is going to be a ridiculous amount of fun. 😊 #AIYProjects #woodchuck https://t.co/2sWYmpi6T1

But consumer products becoming hackable hardware isn’t always an intentional move by the product’s maker. In the 2000s, TiVo set-top DVRs were a hot product and their most enthusiastic fans figured out how to hack the product to customise it to meet their needs without any kind of support from TiVo.

Embracing change

But since then, things have changed. For example, when Microsoft’s Kinect for the Xbox 360 was released in 2010, makers were immediately enticed by its capabilities. It not only acted as a camera, but it could also sense depth, a feature that would be useful for identifying the position of objects in a space. At first, there was no hacker support from Microsoft, so Adafruit Industries announced a $3,000 bounty to create open-source drivers so that anyone could access the features of Kinect for their own projects. Since then, Microsoft has embraced the use of Kinect for these purposes.

The Create 2 from iRobot

iRobot’s Create 2, a hackable version of the Roomba

Consumer product companies even make versions of their products that are specifically meant for hacking, making, and learning. Belkin’s WeMo home automation product line includes the WeMo Maker, a device that can act as a remote relay or sensor and hook into your home automation system. And iRobot offers Create 2, a hackable version of its Roomba floor-cleaning robot. While iRobot aimed the robot at STEM educators, you could use it for personal projects too. Electronic instrument maker Korg takes its maker-friendly approach to the next level by releasing the schematics for some of its analogue synthesiser products.

Why would a company want to do this? There are a few possible reasons. For one, it’s a way of encouraging consumers to create a community around a product. It could be a way for innovation with the product to continue, unchecked by the firm’s own limits on resources. For certain, it’s an awesome feel-good way for a company to empower their own users. Whatever the reason these products exist, it’s the digital maker who comes out ahead. They have more affordable tools, materials, and resources to create their own customised products and possibly learn a thing or two along the way.

With maker-friendly, hackable products, being a creator and a consumer aren’t mutually exclusive. In fact, you’re probably getting the best of both worlds: great products and great opportunities to make the thing you wish to see in the world.

The post Product or Project? appeared first on Raspberry Pi.

Tinkernut’s do-it-yourself Pi Zero audio HAT

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/tinkernut-diy-pi-zero-audio/

Why buy a Raspberry Pi Zero audio HAT when Tinkernut can show you how to make your own?

Adding Audio Output To The Raspberry Pi Zero – Tinkernut Workbench

The Raspberry Pi Zero W is an amazing miniature computer piece of technology. I want to turn it into an epic portable Spotify radio that displays visuals such as Album Art. So in this new series called “Tinkernut Workbench”, I show you step by step what it takes to build a product from the ground up.

Raspberry Pi Zero audio

Unlike their grown-up siblings, the Pi Zero and Zero W lack an onboard audio jack, but that doesn’t stop you from using them to run an audio output. Various audio HATs exist on the market, from Adafruit, Pimoroni and Pi Supply to name a few, providing easy audio output for the Zero. But where would the fun be in a Tinkernut video that shows you how to attach a HAT?

Tinkernut Pi Zero Audio

“Take this audio HAT, press it onto the header pins and, errr, done? So … how was your day?”

DIY Audio: Tinkernut style

For the first video in his Hipster Spotify Radio using a Raspberry Pi Tinkernut Workbench series, Tinkernut – real name Daniel Davis – goes through the steps of researching, prototyping and finishing his own audio HAT for his newly acquired Raspberry Pi Zero W.

The build utilises the GPIO pins on the Zero W, specifically pins #18 and #13. FYI, this hidden gem of information comes from the Adafruit Pi Zero PWM Audio guide. Before he can use #18 and #13, header pins need to be soldered. If the thought of soldering pins to the Pi is somewhat daunting, check out the Pimoroni Hammer Header.

Pimoroni Hammer Header for Raspberry Pi

You’re welcome.

Once complete, with Raspbian installed on the micro SD, and SSH enabled for remote access, he’s ready to start prototyping.

Ingredients

Tinkernut uses two 270 ohm resistors, two 150 ohm resistors, two 10μf electrolytic capacitors, two 0.01 μf polyester film capacitors, an audio jack and some wire. You’ll also need a breadboard for prototyping. For the final build, you’ll need a single row female pin header and some prototyping board, if you want to join in at home.

Tinkernut audio board Raspberry Pi Zero W

It should look like this…hopefully.

Once the prototype is working to run audio through to a cheap speaker (thanks to an edit of the config.txt file), the final board can be finished.

What’s next?

The audio board is just one step in the build.

Spotify is such an awesome music service. Raspberry Pi Zero is such an awesome ultra-mini computing device. Obviously, combining the two is something I must do!!! The idea here is to make something that’s stylish, portable, can play Spotify, and hopefully also display visuals such as album art.

Subscribe to Tinkernut’s YouTube channel to keep up to date with the build, and check out some of his other Raspberry Pi builds, such as his cheap 360 video camera, security camera and digital vintage camera.

Have you made your own Raspberry Pi HAT? Show it off in the comments below!

The post Tinkernut’s do-it-yourself Pi Zero audio HAT appeared first on Raspberry Pi.

Introducing DnsControl – “DNS as Code” has Arrived

Post Syndicated from Craig Peterson original http://blog.serverfault.com/2017/04/11/introducing-dnscontrol-dns-as-code-has-arrived/

DNS at Stack Overflow is… complex.  We have hundreds of DNS domains and thousands of DNS records. We have gone from running our own BIND server to hosting DNS with multiple cloud providers, and we change things fairly often. Keeping everything up to date and synced at multiple DNS providers is difficult. We built DnsControl to allow us to perform updates easily and automatically across all providers we use.

The old way

Originally, our DNS was hosted by our own BIND servers, using artisanal, hand crafted zone files. Large changes involved liberal sed usage, and every change was pretty error prone. We decided to start using cloud DNS providers for performance reasons, but those each have their own web panels, which are universally painful to use. Web interfaces rarely have any import/export functionality, and generally lack change control, history tracking, or comments. We quickly decided that web panels were not how we wanted to manage our zones. 

Introducing DnsControl

DNSControl is the system we built to manage our DNS. It permits “describe once, use anywhere” DNS management. It consists of a few key components:

  1. A Domain Specific Language (DSL) for describing domains in a single, provider-independent way.
  2. An “interpreter” application that executes the DSL and creates a standardized representation of your desired DNS state.
  3. Back-end “providers” that sync the desired state to a DNS provider.

At the time of this writing we have 9 different providers implemented, with 3 more on the way shortly. We use it to manage our domains with our own BIND servers, as well as Route 53, Google Cloud DNS, name.com, Cloudflare, and more.

A sample might look like this description of stackoverflow.com:

D(“stackoverflow.com”, REG_NAMEDOTCOM, DnsProvider(R53), DnsProvider(GCLOUD),
    A(“@”, “198.252.206.16”),
    A(“blog”, “198.252.206.20”),
    CNAME(“chat”, “chat.stackexchange.com.”),
    CNAME(“www”, “@”, TTL(3600)),
    A(“meta”, “198.252.206.16”)
)

This is just a small, simple example. The DSL is a fully-featured way to express your DNS config. It is actually just javascript with some helpful functions. We have an examples page with more examples of the power of the language.

Running “dnscontrol preview” with this input will show what updates would be needed to bring DNS providers up to date with the new, desired, configuration. “dnscontrol push” will actually make the changes.

This allows us to manage our DNS configuration as code. Storing it this way has a bunch of advantages:

  • We can use variables to store common IP addresses or repeated data. We can make complicated changes, like failing-over services between data centers, by changing a single variable. We can activate or deactivate our CDN, which involves thousands of record changes, by commenting or uncommenting a single line of code.
  • We are not locked into any single provider, since the automation can sync to any of them. Keeping records synchronized between different cloud providers requires no manual steps.
  • We store our DNS config in git. Our build server runs all changes. We have central logging, access control, and history for our DNS changes. We’re trying to apply DevOps best practices to an area that has not seen those benefits so much yet.

I think the biggest benefit to this tool though is the freedom it has given us with our DNS.  It has allowed us to:

  • Switch providers with no fear of breaking things. We have changed CDNs or DNS providers at least 4 times in the last two years, and it has never been scary at all.
  • Dual-host our DNS with multiple providers simultaneously. The tool keeps them in sync for us.
  • Test fail-over procedures before an emergency happens. We are confident we can point DNS at our secondary datacenter easily, and we can quickly switch providers if one is being DDOSed.

DNS configuration is often difficult and error-prone.  We hope DnsControl makes it easy and more reliable. It has for us.

Some resources:

Online Piracy Can Boost Comic Book Sales, Research Finds

Post Syndicated from Ernesto original https://torrentfreak.com/online-piracy-can-boost-comic-book-sales-research-finds/

yenResearch into online piracy comes in all shapes and sizes, with equally mixed results. Often the main question is whether piracy hurts legitimate revenue streams.

In recent years we have seen a plethora of studies and most are focused on the effects on movies, TV-shows and music revenues. But what about comic books?

Manga in particular has traditionally been very popular on file-sharing networks and sites. These are dozens of large sites dedicated to the comics, which are downloaded in their millions.

According to the anti-piracy group CODA, which represents Japanese comic publishers, piracy losses overseas are estimated to be double the size of overseas legal revenue.

With this in mind, Professor Tatsuo Tanaka of the Faculty of Economics at Keio University decided to look more closely at how piracy interacts with legal sales. In a natural experiment, he examined how the availability of pirated comic books affected revenue.

The research uses a massive takedown campaign conducted by CODA in 2015, which directly impacted the availability of many pirated comics on various download sites, to see how this affected sales of 3,360 comic book volumes.

Interestingly, the results show that decreased availability of pirated comics doesn’t always help sales. In fact, for comics that no longer release new volumes, the effect is reversed.

“Piracy decreases sales of ongoing comics, but it increases sales of completed comics,” Professor Tanaka writes.

“To put this another way, displacement effect is dominant for ongoing comics, and advertisement effect is dominant for completed comics,” he adds.

For these finished comic seasons, the promotional element weighs heavier. According to the Professor, this suggests that piracy can effectively be seen as a form of advertising.

“Since completed comics series have already ended, and publishers no longer do any promotion for them, consumers almost forget completed comics. We can interpret that piracy reminds consumers of past comics and stimulates sales.”

The question that remains is whether the overall effect on the industry is positive or negative. The current study provided no answer to this effect, as it’s unknown how big the sales share is for ongoing versus completed comics, but future research could look into this.

Professor Tanaka stresses that there is an important policy implication of his findings. Since piracy doesn’t affect all sales the same (it’s heterogeneous), anti-piracy strategies may have to be adapted.

“If the effect of piracy is heterogeneous, it is not the best solution to shut down the piracy sites but to delete harmful piracy files selectively if possible,” Professor Tanaka adds

“In this case, deleting piracy files of ongoing comics only is the first best strategy for publishers regardless of whether the total effect is positive or negative, because the availability of piracy files of completed comics is beneficial to both publishers and consumers,” he adds.

The research shows once again that piracy is a complex phenomenon that can have a positive or negative impact depending on the context. This isn’t limited to comics of course, as previous studies have shown similar effects in the movie and music industries.

The full paper titled The Effects of Internet Book Piracy: The Case of Japanese Comics is available here (pdf).

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Implementing Serverless Manual Approval Steps in AWS Step Functions and Amazon API Gateway

Post Syndicated from Bryan Liston original https://aws.amazon.com/blogs/compute/implementing-serverless-manual-approval-steps-in-aws-step-functions-and-amazon-api-gateway/


Ali Baghani, Software Development Engineer

A common use case for AWS Step Functions is a task that requires human intervention (for example, an approval process). Step Functions makes it easy to coordinate the components of distributed applications as a series of steps in a visual workflow called a state machine. You can quickly build and run state machines to execute the steps of your application in a reliable and scalable fashion.

In this post, I describe a serverless design pattern for implementing manual approval steps. You can use a Step Functions activity task to generate a unique token that can be returned later indicating either approval or rejection by the person making the decision.

Key steps to implementation

When the execution of a Step Functions state machine reaches an activity task state, Step Functions schedules the activity and waits for an activity worker. An activity worker is an application that polls for activity tasks by calling GetActivityTask. When the worker successfully calls the API action, the activity is vended to that worker as a JSON blob that includes a token for callback.

At this point, the activity task state and the branch of the execution that contains the state is paused. Unless a timeout is specified in the state machine definition, which can be up to one year, the activity task state waits until the activity worker calls either SendTaskSuccess or SendTaskFailure using the vended token. This pause is the first key to implementing a manual approval step.

The second key is the ability in a serverless environment to separate the code that fetches the work and acquires the token from the code that responds with the completion status and sends the token back, as long as the token can be shared, i.e., the activity worker in this example is a serverless application supervised by a single activity task state.

In this walkthrough, you use a short-lived AWS Lambda function invoked on a schedule to implement the activity worker, which acquires the token associated with the approval step, and prepares and sends an email to the approver using Amazon SES.

It is very convenient if the application that returns the token can directly call the SendTaskSuccess and SendTaskFailure API actions on Step Functions. This can be achieved more easily by exposing these two actions through Amazon API Gateway so that an email client or web browser can return the token to Step Functions. By combining a Lambda function that acquires the token with the application that returns the token through API Gateway, you can implement a serverless manual approval step, as shown below.

In this pattern, when the execution reaches a state that requires manual approval, the Lambda function prepares and sends an email to the user with two embedded hyperlinks for approval and rejection.

If the authorized user clicks on the approval hyperlink, the state succeeds. If the authorized user clicks on the rejection link, the state fails. You can also choose to set a timeout for approval and, upon timeout, take action, such as resending the email request using retry/catch conditions in the activity task state.

Employee promotion process

As an example pattern use case, you can design a simple employee promotion process which involves a single task: getting a manager’s approval through email. When an employee is nominated for promotion, a new execution starts. The name of the employee and the email address of the employee’s manager are provided to the execution.

You’ll use the design pattern to implement the manual approval step, and SES to send the email to the manager. After acquiring the task token, the Lambda function generates and sends an email to the manager with embedded hyperlinks to URIs hosted by API Gateway.

In this example, I have administrative access to my account, so that I can create IAM roles. Moreover, I have already registered my email address with SES, so that I can send emails with the address as the sender/recipient. For detailed instructions, see Send an Email with Amazon SES.

Here is a list of what you do:

  1. Create an activity
  2. Create a state machine
  3. Create and deploy an API
  4. Create an activity worker Lambda function
  5. Test that the process works

Create an activity

In the Step Functions console, choose Tasks and create an activity called ManualStep.

stepfunctionsfirst_1.png

Remember to keep the ARN of this activity at hand.

stepfunctionsfirst_2.png

Create a state machine

Next, create the state machine that models the promotion process on the Step Functions console. Use StatesExecutionRole-us-east-1, the default role created by the console. Name the state machine PromotionApproval, and use the following code. Remember to replace the value for Resource with your activity ARN.

{
  "Comment": "Employee promotion process!",
  "StartAt": "ManualApproval",
  "States": {
    "ManualApproval": {
      "Type": "Task",
      "Resource": "arn:aws:states:us-east-1:ACCOUNT_ID:activity:ManualStep",
      "TimeoutSeconds": 3600,
      "End": true
    }
  }
}

Create and deploy an API

Next, create and deploy public URIs for calling the SendTaskSuccess or SendTaskFailure API action using API Gateway.

First, navigate to the IAM console and create the role that API Gateway can use to call Step Functions. Name the role APIGatewayToStepFunctions, choose Amazon API Gateway as the role type, and create the role.

After the role has been created, attach the managed policy AWSStepFunctionsFullAccess to it.

stepfunctionsfirst_3.png

In the API Gateway console, create a new API called StepFunctionsAPI. Create two new resources under the root (/) called succeed and fail, and for each resource, create a GET method.

stepfunctionsfirst_4.png

You now need to configure each method. Start by the /fail GET method and configure it with the following values:

  • For Integration type, choose AWS Service.
  • For AWS Service, choose Step Functions.
  • For HTTP method, choose POST.
  • For Region, choose your region of interest instead of us-east-1. (For a list of regions where Step Functions is available, see AWS Region Table.)
  • For Action Type, enter SendTaskFailure.
  • For Execution, enter the APIGatewayToStepFunctions role ARN.

stepfunctionsfirst_5.png

To be able to pass the taskToken through the URI, navigate to the Method Request section, and add a URL Query String parameter called taskToken.

stepfunctionsfirst_6.png

Then, navigate to the Integration Request section and add a Body Mapping Template of type application/json to inject the query string parameter into the body of the request. Accept the change suggested by the security warning. This sets the body pass-through behavior to When there are no templates defined (Recommended). The following code does the mapping:

{
   "cause": "Reject link was clicked.",
   "error": "Rejected",
   "taskToken": "$input.params('taskToken')"
}

When you are finished, choose Save.

Next, configure the /succeed GET method. The configuration is very similar to the /fail GET method. The only difference is for Action: choose SendTaskSuccess, and set the mapping as follows:

{
   "output": "\"Approve link was clicked.\"",
   "taskToken": "$input.params('taskToken')"
}

The last step on the API Gateway console after configuring your API actions is to deploy them to a new stage called respond. You can test our API by choosing the Invoke URL links under either of the GET methods. Because no token is provided in the URI, a ValidationException message should be displayed.

stepfunctionsfirst_7.png

Create an activity worker Lambda function

In the Lambda console, create a Lambda function with a CloudWatch Events Schedule trigger using a blank function blueprint for the Node.js 4.3 runtime. The rate entered for Schedule expression is the poll rate for the activity. This should be above the rate at which the activities are scheduled by a safety margin.

The safety margin accounts for the possibility of lost tokens, retried activities, and polls that happen while no activities are scheduled. For example, if you expect 3 promotions to happen, in a certain week, you can schedule the Lambda function to run 4 times a day during that week. Alternatively, a single Lambda function can poll for multiple activities, either in parallel or in series. For this example, use a rate of one time per minute but do not enable the trigger yet.

stepfunctionsfirst_8.png

Next, create the Lambda function ManualStepActivityWorker using the following Node.js 4.3 code. The function receives the taskToken, employee name, and manager’s email from StepFunctions. It embeds the information into an email, and sends out the email to the manager.


'use strict';
console.log('Loading function');
const aws = require('aws-sdk');
const stepfunctions = new aws.StepFunctions();
const ses = new aws.SES();
exports.handler = (event, context, callback) => {
    
    var taskParams = {
        activityArn: 'arn:aws:states:us-east-1:ACCOUNT_ID:activity:ManualStep'
    };
    
    stepfunctions.getActivityTask(taskParams, function(err, data) {
        if (err) {
            console.log(err, err.stack);
            context.fail('An error occured while calling getActivityTask.');
        } else {
            if (data === null) {
                // No activities scheduled
                context.succeed('No activities received after 60 seconds.');
            } else {
                var input = JSON.parse(data.input);
                var emailParams = {
                    Destination: {
                        ToAddresses: [
                            input.managerEmailAddress
                            ]
                    },
                    Message: {
                        Subject: {
                            Data: 'Your Approval Needed for Promotion!',
                            Charset: 'UTF-8'
                        },
                        Body: {
                            Html: {
                                Data: 'Hi!<br />' +
                                    input.employeeName + ' has been nominated for promotion!<br />' +
                                    'Can you please approve:<br />' +
                                    'https://API_DEPLOYMENT_ID.execute-api.us-east-1.amazonaws.com/respond/succeed?taskToken=' + encodeURIComponent(data.taskToken) + '<br />' +
                                    'Or reject:<br />' +
                                    'https://API_DEPLOYMENT_ID.execute-api.us-east-1.amazonaws.com/respond/fail?taskToken=' + encodeURIComponent(data.taskToken),
                                Charset: 'UTF-8'
                            }
                        }
                    },
                    Source: input.managerEmailAddress,
                    ReplyToAddresses: [
                            input.managerEmailAddress
                        ]
                };
                    
                ses.sendEmail(emailParams, function (err, data) {
                    if (err) {
                        console.log(err, err.stack);
                        context.fail('Internal Error: The email could not be sent.');
                    } else {
                        console.log(data);
                        context.succeed('The email was successfully sent.');
                    }
                });
            }
        }
    });
};

In the Lambda function handler and role section, for Role, choose Create a new role, LambdaManualStepActivityWorkerRole.

stepfunctionsfirst_9.png

Add two policies to the role: one to allow the Lambda function to call the GetActivityTask API action by calling Step Functions, and one to send an email by calling SES. The result should look as follows:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "logs:CreateLogGroup",
        "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      "Resource": "arn:aws:logs:*:*:*"
    },
    {
      "Effect": "Allow",
      "Action": "states:GetActivityTask",
      "Resource": "arn:aws:states:*:*:activity:ManualStep"
    },
    {
      "Effect": "Allow",
      "Action": "ses:SendEmail",
      "Resource": "*"
    }
  ]
}

In addition, as the GetActivityTask API action performs long-polling with a timeout of 60 seconds, increase the timeout of the Lambda function to 1 minute 15 seconds. This allows the function to wait for an activity to become available, and gives it extra time to call SES to send the email. For all other settings, use the Lambda console defaults.

stepfunctionsfirst_10.png

After this, you can create your activity worker Lambda function.

Test the process

You are now ready to test the employee promotion process.

In the Lambda console, enable the ManualStepPollSchedule trigger on the ManualStepActivityWorker Lambda function.

In the Step Functions console, start a new execution of the state machine with the following input:

{ "managerEmailAddress": "[email protected]", "employeeName" : "Jim" } 

Within a minute, you should receive an email with links to approve or reject Jim’s promotion. Choosing one of those links should succeed or fail the execution.

stepfunctionsfirst_11.png

Summary

In this post, you created a state machine containing an activity task with Step Functions, an API with API Gateway, and a Lambda function to dispatch the approval/failure process. Your Step Functions activity task generated a unique token that was returned later indicating either approval or rejection by the person making the decision. Your Lambda function acquired the task token by polling the activity task, and then generated and sent an email to the manager for approval or rejection with embedded hyperlinks to URIs hosted by API Gateway.

If you have questions or suggestions, please comment below.

US and KickassTorrents Go Head to Head in Court

Post Syndicated from Ernesto original https://torrentfreak.com/us-and-kickasstorrents-go-head-to-head-in-court-170202/

kickasstorrents_500x500This week KickassTorrents’ alleged owner Artem Vaulin asked the Illinois District Court to dismiss the criminal indictment and set him free.

The fundamental flaw of the case, according to defense lawyer Ira Rothken, is that torrent files themselves are not copyrighted content.

In addition, he argued that the secondary copyright infringement claims would fail as these are non-existent under criminal law.

District Court Judge John Lee previously questioned the evidence in the case and according to Rothken, it is certainly not enough to keep his client behind bars. This is also what he told the court during the hearing this week, stressing that torrents themselves are not copyrighted.

“We believe that the indictment against Artem Vaulin in the KAT torrent files case is defective and should be dismissed. Torrent files are not content files. The reproduction and distribution of torrent files are not a crime,” Rothken tells TF.

“If a third party uses torrent files to infringe it is after they leave the KAT site behind and such conduct is too random, inconsistent, and attenuated to impose criminal liability on Mr. Vaulin. The government cannot use the civil judge-made law in Grokster as a theory in a criminal case.”

Furthermore, Rothken argued that the US indictment is flawed because it fails to allege an actual criminal copyright infringement anywhere in the world, the United States included. The defense likened KickassTorrents to general search engines such as Google instead.

On the other side of the aisle stood US Department of Justice prosecutor Devlin Su. He urged the court to wait for the extradition hearing in Poland before ruling on the request, noting that Vaulin should come to the US voluntarily if he wanted to speed things up.

According to the prosecution, KickassTorrents operated as a piracy flea market, with an advertising revenue of about $12.5 million to $22.3 million. Comparing it with Google is nonsense, Su argued.

“Google is not dedicated to uploading and distributing copyrighted works,” Law360 quotes the prosecutor.

It is now up to the Illinois District Court to decide how to move forward. The defense is hoping for an outright dismissal, while the U.S. wants to move forward.

Meanwhile, over in Poland, Vaulin remains in custody after he was denied bail. Facing severe health issues, the Ukrainian was transferred from Polish prison to a local hospital a few weeks ago, where he remains under heavy guard.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.