Tag Archives: Proxima

Prepare to run a Code Club on FutureLearn

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/code-club-futurelearn/

Prepare to run a Code Club with our newest free online course, available now on FutureLearn!

FutureLearn: Prepare to Run a Code Club

Ready to launch! Our free FutureLearn course ‘Prepare to Run a Code Club’ starts next week and you can sign up now: https://www.futurelearn.com/courses/code-club

Code Club

As of today, more than 10000 Code Clubs run in 130 countries, delivering free coding opportunities to approximately 150000 children across the globe.

A child absorbed in a task at a Code Club

As an organisation, Code Club provides free learning resources and training materials to supports the ever-growing and truly inspiring community of volunteers and educators who set up and run Code Clubs.

FutureLearn

Today we’re launching our latest free online course on FutureLearn, dedicated to training and supporting new Code Club volunteers. It will give you practical guidance on all things Code Club, as well as a taste of beginner programming!

Split over three weeks and running for 3–4 hours in total, the course provides hands-on advice and tips on everything you need to know to run a successful, fun, and educational club.

“Week 1 kicks off with advice on how to prepare to start a Code Club, for example which hardware and software are needed. Week 2 focusses on how to deliver Code Club sessions, with practical tips on helping young people learn and an easy taster coding project to try out. In the final week, the course looks at interesting ideas to enrich and extend club sessions.”
— Sarah Sherman-Chase, Code Club Participation Manager

The course is available wherever you live, and it is completely free — sign up now!

If you’re already a volunteer, the course will be a great refresher, and a chance to share your insights with newcomers. Moreover, it is also useful for parents and guardians who wish to learn more about Code Club.

Your next step

Interested in learning more? You can start the course today by visiting FutureLearn. And to find out more about Code Clubs in your country, visit Code Club UK or Code Club International.

Code Club partners from across the globe gathered together for a group photo at the International Meetup

We love hearing your Code Club stories! If you’re a volunteer, are in the process of setting up a club, or are inspired to learn more, share your story in the comments below or via social media, making sure to tag @CodeClub and @CodeClubWorld.

You might also be interested in our other free courses on the FutureLearn platform, including Teaching Physical Computing with Raspberry Pi and Python and Teaching Programming in Primary Schools.

 

The post Prepare to run a Code Club on FutureLearn appeared first on Raspberry Pi.

Danes Deploy ‘Disruption Machine’ to Curb Online Piracy

Post Syndicated from Ernesto original https://torrentfreak.com/danes-deploy-disruption-machine-to-curb-online-piracy-171119/

Over the years copyright holders have tried a multitude of measures to curb copyright infringement, with varying levels of success.

By now it’s well known that blocking or even shutting down a pirate site doesn’t help much. As long as there are alternatives, people will simply continue to download or stream elsewhere.

Increasingly, major entertainment industry companies are calling for a broader and more coordinated response. They would like to see ISPs, payment processors, advertisers, search engines, and social media companies assisting in their anti-piracy efforts. Voluntarily, or even with a legal incentive, if required.

In Denmark, local anti-piracy group RettighedsAlliancen has a similar goal and they are starting to make progress. The outfit is actively building a piracy “disruption machine” that tackles the issue from as many sides as it can.

The disruption machine is built around an Infringing Website List (IWL), which is not related to a similarly-named initiative from the UK police. This list is made up of pirate sites that have been found to facilitate copyright infringement by a Danish court.

“The IWL is a part of the disruption machine that RettighedsAlliancen has developed in collaboration with many stakeholders in the online community,” the group’s CEO Maria Fredenslund tells TorrentFreak.

The stakeholders include major ISPs, but also media companies, MasterCard, Google, and Microsoft. With help from the local government they signed a Memorandum of Understanding. Their goal is to make the internet a safe and legitimate platform for consumers and businesses while limiting copyright infringement and associated crime.

MoU signees

There are currently twelve court orders on which the list is based and two more are expected to come in before the end of the year. As a result, approximately 600 pirate sites are on the IWL, making them harder to find.

Every time a new court order is handed down, RettighedsAlliancen distributes an updated list to their the network of stakeholders.

“Currently, all major ISPs in Denmark have agreed to implement the IWL in their systems based on a joint Code of Conduct. This means that all the ISPs jointly will block their customers access to infringing services thus amplifying the impact of a blocking order by magnitudes,” Fredenslund explains.

Thus far ISPs are actively blocking 100 pirate sites, resulting in significant traffic drops. The rest of the list has yet to be implemented.

The IWL is also used in the online advertising industry, where several major advertising brokers have signed a joint agreement not to show advertising on these sites. This shuts off part of the revenue streams to pirate sites which, in theory, should make them less profitable.

A similar approach is being taken by major payment providers, who are preventing known pirate sites from processing transactions through their services. Every company has its own measures, but the overlapping goal is to frustrate pirate sites and reduce copyright infringement.

The Disruption Machine

It’s interesting to see that Google is listed as a partner since they don’t support general website blockades. However, Google said that it would demote sites on the IWL in its search results.

While these are all positive developments, according to the anti-piracy group, it’s just the start. RettighedsAlliancen also believes other tools and services could join in. Browser plugins could use the IWL to identify illegal sites, for example, and the options are endless.

“Likewise, large companies, institutions, and public authorities are also well-suited to implement the IWL in their local networks. For example, to prevent students from accessing illegal content while at school or university,” Fredenslund says.

“Looking further ahead, social media platforms such as Facebook are used to a great extent to consume content online and it is therefore obvious that they should also incorporate the IWL in their systems to prevent their users from harm and preventing copyright infringement.”

This model is not completely unique, of course. We’ve seen several elements being implemented in other countries as well, and copyright holders have been pushing voluntary agreements for quite some time now.

What’s new, however, is that it’s clearly defined as a strategy by the Danish group. And by labeling the strategy as a “disruption machine” it already sounds effective, which is part of the job.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Capturing Custom, High-Resolution Metrics from Containers Using AWS Step Functions and AWS Lambda

Post Syndicated from Nathan Taber original https://aws.amazon.com/blogs/compute/capturing-custom-high-resolution-metrics-from-containers-using-aws-step-functions-and-aws-lambda/

Contributed by Trevor Sullivan, AWS Solutions Architect

When you deploy containers with Amazon ECS, are you gathering all of the key metrics so that you can correctly monitor the overall health of your ECS cluster?

By default, ECS writes metrics to Amazon CloudWatch in 5-minute increments. For complex or large services, this may not be sufficient to make scaling decisions quickly. You may want to respond immediately to changes in workload or to identify application performance problems. Last July, CloudWatch announced support for high-resolution metrics, up to a per-second basis.

These high-resolution metrics can be used to give you a clearer picture of the load and performance for your applications, containers, clusters, and hosts. In this post, I discuss how you can use AWS Step Functions, along with AWS Lambda, to cost effectively record high-resolution metrics into CloudWatch. You implement this solution using a serverless architecture, which keeps your costs low and makes it easier to troubleshoot the solution.

To show how this works, you retrieve some useful metric data from an ECS cluster running in the same AWS account and region (Oregon, us-west-2) as the Step Functions state machine and Lambda function. However, you can use this architecture to retrieve any custom application metrics from any resource in any AWS account and region.

Why Step Functions?

Step Functions enables you to orchestrate multi-step tasks in the AWS Cloud that run for any period of time, up to a year. Effectively, you’re building a blueprint for an end-to-end process. After it’s built, you can execute the process as many times as you want.

For this architecture, you gather metrics from an ECS cluster, every five seconds, and then write the metric data to CloudWatch. After your ECS cluster metrics are stored in CloudWatch, you can create CloudWatch alarms to notify you. An alarm can also trigger an automated remediation activity such as scaling ECS services, when a metric exceeds a threshold defined by you.

When you build a Step Functions state machine, you define the different states inside it as JSON objects. The bulk of the work in Step Functions is handled by the common task state, which invokes Lambda functions or Step Functions activities. There is also a built-in library of other useful states that allow you to control the execution flow of your program.

One of the most useful state types in Step Functions is the parallel state. Each parallel state in your state machine can have one or more branches, each of which is executed in parallel. Another useful state type is the wait state, which waits for a period of time before moving to the next state.

In this walkthrough, you combine these three states (parallel, wait, and task) to create a state machine that triggers a Lambda function, which then gathers metrics from your ECS cluster.

Step Functions pricing

This state machine is executed every minute, resulting in 60 executions per hour, and 1,440 executions per day. Step Functions is billed per state transition, including the Start and End state transitions, and giving you approximately 37,440 state transitions per day. To reach this number, I’m using this estimated math:

26 state transitions per-execution x 60 minutes x 24 hours

Based on current pricing, at $0.000025 per state transition, the daily cost of this metric gathering state machine would be $0.936.

Step Functions offers an indefinite 4,000 free state transitions every month. This benefit is available to all customers, not just customers who are still under the 12-month AWS Free Tier. For more information and cost example scenarios, see Step Functions pricing.

Why Lambda?

The goal is to capture metrics from an ECS cluster, and write the metric data to CloudWatch. This is a straightforward, short-running process that makes Lambda the perfect place to run your code. Lambda is one of the key services that makes up “Serverless” application architectures. It enables you to consume compute capacity only when your code is actually executing.

The process of gathering metric data from ECS and writing it to CloudWatch takes a short period of time. In fact, my average Lambda function execution time, while developing this post, is only about 250 milliseconds on average. For every five-second interval that occurs, I’m only using 1/20th of the compute time that I’d otherwise be paying for.

Lambda pricing

For billing purposes, Lambda execution time is rounded up to the nearest 100-ms interval. In general, based on the metrics that I observed during development, a 250-ms runtime would be billed at 300 ms. Here, I calculate the cost of this Lambda function executing on a daily basis.

Assuming 31 days in each month, there would be 535,680 five-second intervals (31 days x 24 hours x 60 minutes x 12 five-second intervals = 535,680). The Lambda function is invoked every five-second interval, by the Step Functions state machine, and runs for a 300-ms period. At current Lambda pricing, for a 128-MB function, you would be paying approximately the following:

Total compute

Total executions = 535,680
Total compute = total executions x (3 x $0.000000208 per 100 ms) = $0.334 per day

Total requests

Total requests = (535,680 / 1000000) * $0.20 per million requests = $0.11 per day

Total Lambda Cost

$0.11 requests + $0.334 compute time = $0.444 per day

Similar to Step Functions, Lambda offers an indefinite free tier. For more information, see Lambda Pricing.

Walkthrough

In the following sections, I step through the process of configuring the solution just discussed. If you follow along, at a high level, you will:

  • Configure an IAM role and policy
  • Create a Step Functions state machine to control metric gathering execution
  • Create a metric-gathering Lambda function
  • Configure a CloudWatch Events rule to trigger the state machine
  • Validate the solution

Prerequisites

You should already have an AWS account with a running ECS cluster. If you don’t have one running, you can easily deploy a Docker container on an ECS cluster using the AWS Management Console. In the example produced for this post, I use an ECS cluster running Windows Server (currently in beta), but either a Linux or Windows Server cluster works.

Create an IAM role and policy

First, create an IAM role and policy that enables Step Functions, Lambda, and CloudWatch to communicate with each other.

  • The CloudWatch Events rule needs permissions to trigger the Step Functions state machine.
  • The Step Functions state machine needs permissions to trigger the Lambda function.
  • The Lambda function needs permissions to query ECS and then write to CloudWatch Logs and metrics.

When you create the state machine, Lambda function, and CloudWatch Events rule, you assign this role to each of those resources. Upon execution, each of these resources assumes the specified role and executes using the role’s permissions.

  1. Open the IAM console.
  2. Choose Roles, create New Role.
  3. For Role Name, enter WriteMetricFromStepFunction.
  4. Choose Save.

Create the IAM role trust relationship
The trust relationship (also known as the assume role policy document) for your IAM role looks like the following JSON document. As you can see from the document, your IAM role needs to trust the Lambda, CloudWatch Events, and Step Functions services. By configuring your role to trust these services, they can assume this role and inherit the role permissions.

  1. Open the IAM console.
  2. Choose Roles and select the IAM role previously created.
  3. Choose Trust RelationshipsEdit Trust Relationships.
  4. Enter the following trust policy text and choose Save.
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "lambda.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    },
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "events.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    },
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "states.us-west-2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Create an IAM policy

After you’ve finished configuring your role’s trust relationship, grant the role access to the other AWS resources that make up the solution.

The IAM policy is what gives your IAM role permissions to access various resources. You must whitelist explicitly the specific resources to which your role has access, because the default IAM behavior is to deny access to any AWS resources.

I’ve tried to keep this policy document as generic as possible, without allowing permissions to be too open. If the name of your ECS cluster is different than the one in the example policy below, make sure that you update the policy document before attaching it to your IAM role. You can attach this policy as an inline policy, instead of creating the policy separately first. However, either approach is valid.

  1. Open the IAM console.
  2. Select the IAM role, and choose Permissions.
  3. Choose Add in-line policy.
  4. Choose Custom Policy and then enter the following policy. The inline policy name does not matter.
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [ "logs:*" ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [ "cloudwatch:PutMetricData" ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [ "states:StartExecution" ],
            "Resource": [
                "arn:aws:states:*:*:stateMachine:WriteMetricFromStepFunction"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [ "lambda:InvokeFunction" ],
            "Resource": "arn:aws:lambda:*:*:function:WriteMetricFromStepFunction"
        },
        {
            "Effect": "Allow",
            "Action": [ "ecs:Describe*" ],
            "Resource": "arn:aws:ecs:*:*:cluster/ECSEsgaroth"
        }
    ]
}

Create a Step Functions state machine

In this section, you create a Step Functions state machine that invokes the metric-gathering Lambda function every five (5) seconds, for a one-minute period. If you divide a minute (60) seconds into equal parts of five-second intervals, you get 12. Based on this math, you create 12 branches, in a single parallel state, in the state machine. Each branch triggers the metric-gathering Lambda function at a different five-second marker, throughout the one-minute period. After all of the parallel branches finish executing, the Step Functions execution completes and another begins.

Follow these steps to create your Step Functions state machine:

  1. Open the Step Functions console.
  2. Choose DashboardCreate State Machine.
  3. For State Machine Name, enter WriteMetricFromStepFunction.
  4. Enter the state machine code below into the editor. Make sure that you insert your own AWS account ID for every instance of “676655494xxx”
  5. Choose Create State Machine.
  6. Select the WriteMetricFromStepFunction IAM role that you previously created.
{
    "Comment": "Writes ECS metrics to CloudWatch every five seconds, for a one-minute period.",
    "StartAt": "ParallelMetric",
    "States": {
      "ParallelMetric": {
        "Type": "Parallel",
        "Branches": [
          {
            "StartAt": "WriteMetricLambda",
            "States": {
             	"WriteMetricLambda": {
                  "Type": "Task",
				  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
    	  {
            "StartAt": "WaitFive",
            "States": {
            	"WaitFive": {
            		"Type": "Wait",
            		"Seconds": 5,
            		"Next": "WriteMetricLambdaFive"
          		},
             	"WriteMetricLambdaFive": {
                  "Type": "Task",
				  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
    	  {
            "StartAt": "WaitTen",
            "States": {
            	"WaitTen": {
            		"Type": "Wait",
            		"Seconds": 10,
            		"Next": "WriteMetricLambda10"
          		},
             	"WriteMetricLambda10": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
    	  {
            "StartAt": "WaitFifteen",
            "States": {
            	"WaitFifteen": {
            		"Type": "Wait",
            		"Seconds": 15,
            		"Next": "WriteMetricLambda15"
          		},
             	"WriteMetricLambda15": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
          {
            "StartAt": "Wait20",
            "States": {
            	"Wait20": {
            		"Type": "Wait",
            		"Seconds": 20,
            		"Next": "WriteMetricLambda20"
          		},
             	"WriteMetricLambda20": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
          {
            "StartAt": "Wait25",
            "States": {
            	"Wait25": {
            		"Type": "Wait",
            		"Seconds": 25,
            		"Next": "WriteMetricLambda25"
          		},
             	"WriteMetricLambda25": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
          {
            "StartAt": "Wait30",
            "States": {
            	"Wait30": {
            		"Type": "Wait",
            		"Seconds": 30,
            		"Next": "WriteMetricLambda30"
          		},
             	"WriteMetricLambda30": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
          {
            "StartAt": "Wait35",
            "States": {
            	"Wait35": {
            		"Type": "Wait",
            		"Seconds": 35,
            		"Next": "WriteMetricLambda35"
          		},
             	"WriteMetricLambda35": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
          {
            "StartAt": "Wait40",
            "States": {
            	"Wait40": {
            		"Type": "Wait",
            		"Seconds": 40,
            		"Next": "WriteMetricLambda40"
          		},
             	"WriteMetricLambda40": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
          {
            "StartAt": "Wait45",
            "States": {
            	"Wait45": {
            		"Type": "Wait",
            		"Seconds": 45,
            		"Next": "WriteMetricLambda45"
          		},
             	"WriteMetricLambda45": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
          {
            "StartAt": "Wait50",
            "States": {
            	"Wait50": {
            		"Type": "Wait",
            		"Seconds": 50,
            		"Next": "WriteMetricLambda50"
          		},
             	"WriteMetricLambda50": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
          {
            "StartAt": "Wait55",
            "States": {
            	"Wait55": {
            		"Type": "Wait",
            		"Seconds": 55,
            		"Next": "WriteMetricLambda55"
          		},
             	"WriteMetricLambda55": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          }
        ],
        "End": true
      }
  }
}

Now you’ve got a shiny new Step Functions state machine! However, you might ask yourself, “After the state machine has been created, how does it get executed?” Before I answer that question, create the Lambda function that writes the custom metric, and then you get the end-to-end process moving.

Create a Lambda function

The meaty part of the solution is a Lambda function, written to consume the Python 3.6 runtime, that retrieves metric values from ECS, and then writes them to CloudWatch. This Lambda function is what the Step Functions state machine is triggering every five seconds, via the Task states. Key points to remember:

The Lambda function needs permission to:

  • Write CloudWatch metrics (PutMetricData API).
  • Retrieve metrics from ECS clusters (DescribeCluster API).
  • Write StdOut to CloudWatch Logs.

Boto3, the AWS SDK for Python, is included in the Lambda execution environment for Python 2.x and 3.x.

Because Lambda includes the AWS SDK, you don’t have to worry about packaging it up and uploading it to Lambda. You can focus on writing code and automatically take a dependency on boto3.

As for permissions, you’ve already created the IAM role and attached a policy to it that enables your Lambda function to access the necessary API actions. When you create your Lambda function, make sure that you select the correct IAM role, to ensure it is invoked with the correct permissions.

The following Lambda function code is generic. So how does the Lambda function know which ECS cluster to gather metrics for? Your Step Functions state machine automatically passes in its state to the Lambda function. When you create your CloudWatch Events rule, you specify a simple JSON object that passes the desired ECS cluster name into your Step Functions state machine, which then passes it to the Lambda function.

Use the following property values as you create your Lambda function:

Function Name: WriteMetricFromStepFunction
Description: This Lambda function retrieves metric values from an ECS cluster and writes them to Amazon CloudWatch.
Runtime: Python3.6
Memory: 128 MB
IAM Role: WriteMetricFromStepFunction

import boto3

def handler(event, context):
    cw = boto3.client('cloudwatch')
    ecs = boto3.client('ecs')
    print('Got boto3 client objects')
    
    Dimension = {
        'Name': 'ClusterName',
        'Value': event['ECSClusterName']
    }

    cluster = get_ecs_cluster(ecs, Dimension['Value'])
    
    cw_args = {
       'Namespace': 'ECS',
       'MetricData': [
           {
               'MetricName': 'RunningTask',
               'Dimensions': [ Dimension ],
               'Value': cluster['runningTasksCount'],
               'Unit': 'Count',
               'StorageResolution': 1
           },
           {
               'MetricName': 'PendingTask',
               'Dimensions': [ Dimension ],
               'Value': cluster['pendingTasksCount'],
               'Unit': 'Count',
               'StorageResolution': 1
           },
           {
               'MetricName': 'ActiveServices',
               'Dimensions': [ Dimension ],
               'Value': cluster['activeServicesCount'],
               'Unit': 'Count',
               'StorageResolution': 1
           },
           {
               'MetricName': 'RegisteredContainerInstances',
               'Dimensions': [ Dimension ],
               'Value': cluster['registeredContainerInstancesCount'],
               'Unit': 'Count',
               'StorageResolution': 1
           }
        ]
    }
    cw.put_metric_data(**cw_args)
    print('Finished writing metric data')
    
def get_ecs_cluster(client, cluster_name):
    cluster = client.describe_clusters(clusters = [ cluster_name ])
    print('Retrieved cluster details from ECS')
    return cluster['clusters'][0]

Create the CloudWatch Events rule

Now you’ve created an IAM role and policy, Step Functions state machine, and Lambda function. How do these components actually start communicating with each other? The final step in this process is to set up a CloudWatch Events rule that triggers your metric-gathering Step Functions state machine every minute. You have two choices for your CloudWatch Events rule expression: rate or cron. In this example, use the cron expression.

A couple key learning points from creating the CloudWatch Events rule:

  • You can specify one or more targets, of different types (for example, Lambda function, Step Functions state machine, SNS topic, and so on).
  • You’re required to specify an IAM role with permissions to trigger your target.
    NOTE: This applies only to certain types of targets, including Step Functions state machines.
  • Each target that supports IAM roles can be triggered using a different IAM role, in the same CloudWatch Events rule.
  • Optional: You can provide custom JSON that is passed to your target Step Functions state machine as input.

Follow these steps to create the CloudWatch Events rule:

  1. Open the CloudWatch console.
  2. Choose Events, RulesCreate Rule.
  3. Select Schedule, Cron Expression, and then enter the following rule:
    0/1 * * * ? *
  4. Choose Add Target, Step Functions State MachineWriteMetricFromStepFunction.
  5. For Configure Input, select Constant (JSON Text).
  6. Enter the following JSON input, which is passed to Step Functions, while changing the cluster name accordingly:
    { "ECSClusterName": "ECSEsgaroth" }
  7. Choose Use Existing Role, WriteMetricFromStepFunction (the IAM role that you previously created).

After you’ve completed with these steps, your screen should look similar to this:

Validate the solution

Now that you have finished implementing the solution to gather high-resolution metrics from ECS, validate that it’s working properly.

  1. Open the CloudWatch console.
  2. Choose Metrics.
  3. Choose custom and select the ECS namespace.
  4. Choose the ClusterName metric dimension.

You should see your metrics listed below.

Troubleshoot configuration issues

If you aren’t receiving the expected ECS cluster metrics in CloudWatch, check for the following common configuration issues. Review the earlier procedures to make sure that the resources were properly configured.

  • The IAM role’s trust relationship is incorrectly configured.
    Make sure that the IAM role trusts Lambda, CloudWatch Events, and Step Functions in the correct region.
  • The IAM role does not have the correct policies attached to it.
    Make sure that you have copied the IAM policy correctly as an inline policy on the IAM role.
  • The CloudWatch Events rule is not triggering new Step Functions executions.
    Make sure that the target configuration on the rule has the correct Step Functions state machine and IAM role selected.
  • The Step Functions state machine is being executed, but failing part way through.
    Examine the detailed error message on the failed state within the failed Step Functions execution. It’s possible that the
  • IAM role does not have permissions to trigger the target Lambda function, that the target Lambda function may not exist, or that the Lambda function failed to complete successfully due to invalid permissions.
    Although the above list covers several different potential configuration issues, it is not comprehensive. Make sure that you understand how each service is connected to each other, how permissions are granted through IAM policies, and how IAM trust relationships work.

Conclusion

In this post, you implemented a Serverless solution to gather and record high-resolution application metrics from containers running on Amazon ECS into CloudWatch. The solution consists of a Step Functions state machine, Lambda function, CloudWatch Events rule, and an IAM role and policy. The data that you gather from this solution helps you rapidly identify issues with an ECS cluster.

To gather high-resolution metrics from any service, modify your Lambda function to gather the correct metrics from your target. If you prefer not to use Python, you can implement a Lambda function using one of the other supported runtimes, including Node.js, Java, or .NET Core. However, this post should give you the fundamental basics about capturing high-resolution metrics in CloudWatch.

If you found this post useful, or have questions, please comment below.

Hot Startups on AWS – October 2017

Post Syndicated from Tina Barr original https://aws.amazon.com/blogs/aws/hot-startups-on-aws-october-2017/

In 2015, the Centers for Medicare and Medicaid Services (CMS) reported that healthcare spending made up 17.8% of the U.S. GDP – that’s almost $3.2 trillion or $9,990 per person. By 2025, the CMS estimates this number will increase to nearly 20%. As cloud technology evolves in the healthcare and life science industries, we are seeing how companies of all sizes are using AWS to provide powerful and innovative solutions to customers across the globe. This month we are excited to feature the following startups:

  • ClearCare – helping home care agencies operate efficiently and grow their business.
  • DNAnexus – providing a cloud-based global network for sharing and managing genomic data.

ClearCare (San Francisco, CA)

ClearCare envisions a future where home care is the only choice for aging in place. Home care agencies play a critical role in the economy and their communities by significantly lowering the overall cost of care, reducing the number of hospital admissions, and bending the cost curve of aging. Patients receiving home care typically have multiple chronic conditions and functional limitations, driving over $190 billion in healthcare spending in the U.S. each year. To offset these costs, health insurance payers are developing in-home care management programs for patients. ClearCare’s goal is to help home care agencies leverage technology to improve costs, outcomes, and quality of life for the aging population. The company’s powerful software platform is specifically designed for use by non-medical, in-home care agencies to manage their businesses.

Founder and CEO Geoff Nudd created ClearCare because of his own grandmother’s need for care. Keeping family members and caregivers up to date on a loved one’s well being can be difficult, so Geoff created what is now ClearCare’s Family Room, which enables caregivers and agency staff to check schedules and receive real-time updates about what’s happening in the home. Since then, agencies have provided feedback on others areas of their businesses that could be streamlined. ClearCare has now built over 20 modules to help home care agencies optimize operations with services including a telephony service, billing and payroll, and more. ClearCare now serves over 4,000 home care agencies, representing 500,000 caregivers and 400,000 seniors.

Using AWS, ClearCare is able to spin up reliable infrastructure for proofs of concept and iterate on those systems to quickly get value to market. The company runs many AWS services including Amazon Elasticsearch Service, Amazon RDS, and Amazon CloudFront. Amazon EMR and Amazon Athena have enabled ClearCare to build a Hadoop-based ETL and data warehousing system that processes terabytes of data each day. By utilizing these managed services, ClearCare has been able to go from concept to customer delivery in less than three months.

To learn more about ClearCare, check out their website.

DNAnexus (Mountain View, CA)

DNAnexus is accelerating the application of genomic data in precision medicine by providing a cloud-based platform for sharing and managing genomic and biomedical data and analysis tools. The company was founded in 2009 by Stanford graduate student Andreas Sundquist and two Stanford professors Arend Sidow and Serafim Batzoglou, to address the need for scaling secondary analysis of next-generation sequencing (NGS) data in the cloud. The founders quickly learned that users needed a flexible solution to build complex analysis workflows and tools that enable them to share and manage large volumes of data. DNAnexus is optimized to address the challenges of security, scalability, and collaboration for organizations that are pursuing genomic-based approaches to health, both in clinics and research labs. DNAnexus has a global customer base – spanning North America, Europe, Asia-Pacific, South America, and Africa – that runs a million jobs each month and is doubling their storage year-over-year. The company currently stores more than 10 petabytes of biomedical and genomic data. That is equivalent to approximately 100,000 genomes, or in simpler terms, over 50 billion Facebook photos!

DNAnexus is working with its customers to help expand their translational informatics research, which includes expanding into clinical trial genomic services. This will help companies developing different medicines to better stratify clinical trial populations and develop companion tests that enable the right patient to get the right medicine. In collaboration with Janssen Human Microbiome Institute, DNAnexus is also launching Mosaic – a community platform for microbiome research.

AWS provides DNAnexus and its customers the flexibility to grow and scale research programs. Building the technology infrastructure required to manage these projects in-house is expensive and time-consuming. DNAnexus removes that barrier for labs of any size by using AWS scalable cloud resources. The company deploys its customers’ genomic pipelines on Amazon EC2, using Amazon S3 for high-performance, high-durability storage, and Amazon Glacier for low-cost data archiving. DNAnexus is also an AWS Life Sciences Competency Partner.

Learn more about DNAnexus here.

-Tina

Bringing Datacenter-Scale Hardware-Software Co-design to the Cloud with FireSim and Amazon EC2 F1 Instances

Post Syndicated from Mia Champion original https://aws.amazon.com/blogs/compute/bringing-datacenter-scale-hardware-software-co-design-to-the-cloud-with-firesim-and-amazon-ec2-f1-instances/

The recent addition of Xilinx FPGAs to AWS Cloud compute offerings is one way that AWS is enabling global growth in the areas of advanced analytics, deep learning and AI. The customized F1 servers use pooled accelerators, enabling interconnectivity of up to 8 FPGAs, each one including 64 GiB DDR4 ECC protected memory, with a dedicated PCIe x16 connection. That makes this a powerful engine with the capacity to process advanced analytical applications at scale, at a significantly faster rate. For example, AWS commercial partner Edico Genome is able to achieve an approximately 30X speedup in analyzing whole genome sequencing datasets using their DRAGEN platform powered with F1 instances.

While the availability of FPGA F1 compute on-demand provides clear accessibility and cost advantages, many mainstream users are still finding that the “threshold to entry” in developing or running FPGA-accelerated simulations is too high. Researchers at the UC Berkeley RISE Lab have developed “FireSim”, powered by Amazon FPGA F1 instances as an open-source resource, FireSim lowers that entry bar and makes it easier for everyone to leverage the power of an FPGA-accelerated compute environment. Whether you are part of a small start-up development team or working at a large datacenter scale, hardware-software co-design enables faster time-to-deployment, lower costs, and more predictable performance. We are excited to feature FireSim in this post from Sagar Karandikar and his colleagues at UC-Berkeley.

―Mia Champion, Sr. Data Scientist, AWS

Mapping an 8-node FireSim cluster simulation to Amazon EC2 F1

As traditional hardware scaling nears its end, the data centers of tomorrow are trending towards heterogeneity, employing custom hardware accelerators and increasingly high-performance interconnects. Prototyping new hardware at scale has traditionally been either extremely expensive, or very slow. In this post, I introduce FireSim, a new hardware simulation platform under development in the computer architecture research group at UC Berkeley that enables fast, scalable hardware simulation using Amazon EC2 F1 instances.

FireSim benefits both hardware and software developers working on new rack-scale systems: software developers can use the simulated nodes with new hardware features as they would use a real machine, while hardware developers have full control over the hardware being simulated and can run real software stacks while hardware is still under development. In conjunction with this post, we’re releasing the first public demo of FireSim, which lets you deploy your own 8-node simulated cluster on an F1 Instance and run benchmarks against it. This demo simulates a pre-built “vanilla” cluster, but demonstrates FireSim’s high performance and usability.

Why FireSim + F1?

FPGA-accelerated hardware simulation is by no means a new concept. However, previous attempts to use FPGAs for simulation have been fraught with usability, scalability, and cost issues. FireSim takes advantage of EC2 F1 and open-source hardware to address the traditional problems with FPGA-accelerated simulation:
Problem #1: FPGA-based simulations have traditionally been expensive, difficult to deploy, and difficult to reproduce.
FireSim uses public-cloud infrastructure like F1, which means no upfront cost to purchase and deploy FPGAs. Developers and researchers can distribute pre-built AMIs and AFIs, as in this public demo (more details later in this post), to make experiments easy to reproduce. FireSim also automates most of the work involved in deploying an FPGA simulation, essentially enabling one-click conversion from new RTL to deploying on an FPGA cluster.

Problem #2: FPGA-based simulations have traditionally been difficult (and expensive) to scale.
Because FireSim uses F1, users can scale out experiments by spinning up additional EC2 instances, rather than spending hundreds of thousands of dollars on large FPGA clusters.

Problem #3: Finding open hardware to simulate has traditionally been difficult. Finding open hardware that can run real software stacks is even harder.
FireSim simulates RocketChip, an open, silicon-proven, RISC-V-based processor platform, and adds peripherals like a NIC and disk device to build up a realistic system. Processors that implement RISC-V automatically support real operating systems (such as Linux) and even support applications like Apache and Memcached. We provide a custom Buildroot-based FireSim Linux distribution that runs on our simulated nodes and includes many popular developer tools.

Problem #4: Writing hardware in traditional HDLs is time-consuming.
Both FireSim and RocketChip use the Chisel HDL, which brings modern programming paradigms to hardware description languages. Chisel greatly simplifies the process of building large, highly parameterized hardware components.

How to use FireSim for hardware/software co-design

FireSim drastically improves the process of co-designing hardware and software by acting as a push-button interface for collaboration between hardware developers and systems software developers. The following diagram describes the workflows that hardware and software developers use when working with FireSim.

Figure 2. The FireSim custom hardware development workflow.

The hardware developer’s view:

  1. Write custom RTL for your accelerator, peripheral, or processor modification in a productive language like Chisel.
  2. Run a software simulation of your hardware design in standard gate-level simulation tools for early-stage debugging.
  3. Run FireSim build scripts, which automatically build your simulation, run it through the Vivado toolchain/AWS shell scripts, and publish an AFI.
  4. Deploy your simulation on EC2 F1 using the generated simulation driver and AFI
  5. Run real software builds released by software developers to benchmark your hardware

The software developer’s view:

  1. Deploy the AMI/AFI generated by the hardware developer on an F1 instance to simulate a cluster of nodes (or scale out to many F1 nodes for larger simulated core-counts).
  2. Connect using SSH into the simulated nodes in the cluster and boot the Linux distribution included with FireSim. This distribution is easy to customize, and already supports many standard software packages.
  3. Directly prototype your software using the same exact interfaces that the software will see when deployed on the real future system you’re prototyping, with the same performance characteristics as observed from software, even at scale.

FireSim demo v1.0

Figure 3. Cluster topology simulated by FireSim demo v1.0.

This first public demo of FireSim focuses on the aforementioned “software-developer’s view” of the custom hardware development cycle. The demo simulates a cluster of 1 to 8 RocketChip-based nodes, interconnected by a functional network simulation. The simulated nodes work just like “real” machines:  they boot Linux, you can connect to them using SSH, and you can run real applications on top. The nodes can see each other (and the EC2 F1 instance on which they’re deployed) on the network and communicate with one another. While the demo currently simulates a pre-built “vanilla” cluster, the entire hardware configuration of these simulated nodes can be modified after FireSim is open-sourced.

In this post, I walk through bringing up a single-node FireSim simulation for experienced EC2 F1 users. For more detailed instructions for new users and instructions for running a larger 8-node simulation, see FireSim Demo v1.0 on Amazon EC2 F1. Both demos walk you through setting up an instance from a demo AMI/AFI and booting Linux on the simulated nodes. The full demo instructions also walk you through an example workload, running Memcached on the simulated nodes, with YCSB as a load generator to demonstrate network functionality.

Deploying the demo on F1

In this release, we provide pre-built binaries for driving simulation from the host and a pre-built AFI that contains the FPGA infrastructure necessary to simulate a RocketChip-based node.

Starting your F1 instances

First, launch an instance using the free FireSim Demo v1.0 product available on the AWS Marketplace on an f1.2xlarge instance. After your instance has booted, log in using the user name centos. On the first login, you should see the message “FireSim network config completed.” This sets up the necessary tap interfaces and bridge on the EC2 instance to enable communicating with the simulated nodes.

AMI contents

The AMI contains a variety of tools to help you run simulations and build software for RISC-V systems, including the riscv64 toolchain, a Buildroot-based Linux distribution that runs on the simulated nodes, and the simulation driver program. For more details, see the AMI Contents section on the FireSim website.

Single-node demo

First, you need to flash the FPGA with the FireSim AFI. To do so, run:

[[email protected]_ADDR ~]$ sudo fpga-load-local-image -S 0 -I agfi-00a74c2d615134b21

To start a simulation, run the following at the command line:

[[email protected]_ADDR ~]$ boot-firesim-singlenode

This automatically calls the simulation driver, telling it to load the Linux kernel image and root filesystem for the Linux distro. This produces output similar to the following:

Simulations Started. You can use the UART console of each simulated node by attaching to the following screens:

There is a screen on:

2492.fsim0      (Detached)

1 Socket in /var/run/screen/S-centos.

You could connect to the simulated UART console by connecting to this screen, but instead opt to use SSH to access the node instead.

First, ping the node to make sure it has come online. This is currently required because nodes may get stuck at Linux boot if the NIC does not receive any network traffic. For more information, see Troubleshooting/Errata. The node is always assigned the IP address 192.168.1.10:

[[email protected]_ADDR ~]$ ping 192.168.1.10

This should eventually produce the following output:

PING 192.168.1.10 (192.168.1.10) 56(84) bytes of data.

From 192.168.1.1 icmp_seq=1 Destination Host Unreachable

64 bytes from 192.168.1.10: icmp_seq=1 ttl=64 time=2017 ms

64 bytes from 192.168.1.10: icmp_seq=2 ttl=64 time=1018 ms

64 bytes from 192.168.1.10: icmp_seq=3 ttl=64 time=19.0 ms

At this point, you know that the simulated node is online. You can connect to it using SSH with the user name root and password firesim. It is also convenient to make sure that your TERM variable is set correctly. In this case, the simulation expects TERM=linux, so provide that:

[[email protected]_ADDR ~]$ TERM=linux ssh [email protected]

The authenticity of host ‘192.168.1.10 (192.168.1.10)’ can’t be established.

ECDSA key fingerprint is 63:e9:66:d0:5c:06:2c:1d:5c:95:33:c8:36:92:30:49.

Are you sure you want to continue connecting (yes/no)? yes

Warning: Permanently added ‘192.168.1.10’ (ECDSA) to the list of known hosts.

[email protected]’s password:

#

At this point, you’re connected to the simulated node. Run uname -a as an example. You should see the following output, indicating that you’re connected to a RISC-V system:

# uname -a

Linux buildroot 4.12.0-rc2 #1 Fri Aug 4 03:44:55 UTC 2017 riscv64 GNU/Linux

Now you can run programs on the simulated node, as you would with a real machine. For an example workload (running YCSB against Memcached on the simulated node) or to run a larger 8-node simulation, see the full FireSim Demo v1.0 on Amazon EC2 F1 demo instructions.

Finally, when you are finished, you can shut down the simulated node by running the following command from within the simulated node:

# poweroff

You can confirm that the simulation has ended by running screen -ls, which should now report that there are no detached screens.

Future plans

At Berkeley, we’re planning to keep improving the FireSim platform to enable our own research in future data center architectures, like FireBox. The FireSim platform will eventually support more sophisticated processors, custom accelerators (such as Hwacha), network models, and peripherals, in addition to scaling to larger numbers of FPGAs. In the future, we’ll open source the entire platform, including Midas, the tool used to transform RTL into FPGA simulators, allowing users to modify any part of the hardware/software stack. Follow @firesimproject on Twitter to stay tuned to future FireSim updates.

Acknowledgements

FireSim is the joint work of many students and faculty at Berkeley: Sagar Karandikar, Donggyu Kim, Howard Mao, David Biancolin, Jack Koenig, Jonathan Bachrach, and Krste Asanović. This work is partially funded by AWS through the RISE Lab, by the Intel Science and Technology Center for Agile HW Design, and by ASPIRE Lab sponsors and affiliates Intel, Google, HPE, Huawei, NVIDIA, and SK hynix.

Introducing AWS Directory Service for Microsoft Active Directory (Standard Edition)

Post Syndicated from Peter Pereira original https://aws.amazon.com/blogs/security/introducing-aws-directory-service-for-microsoft-active-directory-standard-edition/

Today, AWS introduced AWS Directory Service for Microsoft Active Directory (Standard Edition), also known as AWS Microsoft AD (Standard Edition), which is managed Microsoft Active Directory (AD) that is performance optimized for small and midsize businesses. AWS Microsoft AD (Standard Edition) offers you a highly available and cost-effective primary directory in the AWS Cloud that you can use to manage users, groups, and computers. It enables you to join Amazon EC2 instances to your domain easily and supports many AWS and third-party applications and services. It also can support most of the common use cases of small and midsize businesses. When you use AWS Microsoft AD (Standard Edition) as your primary directory, you can manage access and provide single sign-on (SSO) to cloud applications such as Microsoft Office 365. If you have an existing Microsoft AD directory, you can also use AWS Microsoft AD (Standard Edition) as a resource forest that contains primarily computers and groups, allowing you to migrate your AD-aware applications to the AWS Cloud while using existing on-premises AD credentials.

In this blog post, I help you get started by answering three main questions about AWS Microsoft AD (Standard Edition):

  1. What do I get?
  2. How can I use it?
  3. What are the key features?

After answering these questions, I show how you can get started with creating and using your own AWS Microsoft AD (Standard Edition) directory.

1. What do I get?

When you create an AWS Microsoft AD (Standard Edition) directory, AWS deploys two Microsoft AD domain controllers powered by Microsoft Windows Server 2012 R2 in your Amazon Virtual Private Cloud (VPC). To help deliver high availability, the domain controllers run in different Availability Zones in the AWS Region of your choice.

As a managed service, AWS Microsoft AD (Standard Edition) configures directory replication, automates daily snapshots, and handles all patching and software updates. In addition, AWS Microsoft AD (Standard Edition) monitors and automatically recovers domain controllers in the event of a failure.

AWS Microsoft AD (Standard Edition) has been optimized as a primary directory for small and midsize businesses with the capacity to support approximately 5,000 employees. With 1 GB of directory object storage, AWS Microsoft AD (Standard Edition) has the capacity to store 30,000 or more total directory objects (users, groups, and computers). AWS Microsoft AD (Standard Edition) also gives you the option to add domain controllers to meet the specific performance demands of your applications. You also can use AWS Microsoft AD (Standard Edition) as a resource forest with a trust relationship to your on-premises directory.

2. How can I use it?

With AWS Microsoft AD (Standard Edition), you can share a single directory for multiple use cases. For example, you can share a directory to authenticate and authorize access for .NET applications, Amazon RDS for SQL Server with Windows Authentication enabled, and Amazon Chime for messaging and video conferencing.

The following diagram shows some of the use cases for your AWS Microsoft AD (Standard Edition) directory, including the ability to grant your users access to external cloud applications and allow your on-premises AD users to manage and have access to resources in the AWS Cloud. Click the diagram to see a larger version.

Diagram showing some ways you can use AWS Microsoft AD (Standard Edition)--click the diagram to see a larger version

Use case 1: Sign in to AWS applications and services with AD credentials

You can enable multiple AWS applications and services such as the AWS Management Console, Amazon WorkSpaces, and Amazon RDS for SQL Server to use your AWS Microsoft AD (Standard Edition) directory. When you enable an AWS application or service in your directory, your users can access the application or service with their AD credentials.

For example, you can enable your users to sign in to the AWS Management Console with their AD credentials. To do this, you enable the AWS Management Console as an application in your directory, and then assign your AD users and groups to IAM roles. When your users sign in to the AWS Management Console, they assume an IAM role to manage AWS resources. This makes it easy for you to grant your users access to the AWS Management Console without needing to configure and manage a separate SAML infrastructure.

Use case 2: Manage Amazon EC2 instances

Using familiar AD administration tools, you can apply AD Group Policy objects (GPOs) to centrally manage your Amazon EC2 for Windows or Linux instances by joining your instances to your AWS Microsoft AD (Standard Edition) domain.

In addition, your users can sign in to your instances with their AD credentials. This eliminates the need to use individual instance credentials or distribute private key (PEM) files. This makes it easier for you to instantly grant or revoke access to users by using AD user administration tools you already use.

Use case 3: Provide directory services to your AD-aware workloads

AWS Microsoft AD (Standard Edition) is an actual Microsoft AD that enables you to run traditional AD-aware workloads such as Remote Desktop Licensing Manager, Microsoft SharePoint, and Microsoft SQL Server Always On in the AWS Cloud. AWS Microsoft AD (Standard Edition) also helps you to simplify and improve the security of AD-integrated .NET applications by using group Managed Service Accounts (gMSAs) and Kerberos constrained delegation (KCD).

Use case 4: SSO to Office 365 and other cloud applications

You can use AWS Microsoft AD (Standard Edition) to provide SSO for cloud applications. You can use Azure AD Connect to synchronize your users into Azure AD, and then use Active Directory Federation Services (AD FS) so that your users can access Microsoft Office 365 and other SAML 2.0 cloud applications by using their AD credentials.

Use case 5: Extend your on-premises AD to the AWS Cloud

If you already have an AD infrastructure and want to use it when migrating AD-aware workloads to the AWS Cloud, AWS Microsoft AD (Standard Edition) can help. You can use AD trusts to connect AWS Microsoft AD (Standard Edition) to your existing AD. This means your users can access AD-aware and AWS applications with their on-premises AD credentials, without needing you to synchronize users, groups, or passwords.

For example, your users can sign in to the AWS Management Console and Amazon WorkSpaces by using their existing AD user names and passwords. Also, when you use AD-aware applications such as SharePoint with AWS Microsoft AD (Standard Edition), your logged-in Windows users can access these applications without needing to enter credentials again.

3. What are the key features?

AWS Microsoft AD (Standard Edition) includes the features detailed in this section.

Extend your AD schema

With AWS Microsoft AD, you can run customized AD-integrated applications that require changes to your directory schema, which defines the structures of your directory. The schema is composed of object classes such as user objects, which contain attributes such as user names. AWS Microsoft AD lets you extend the schema by adding new AD attributes or object classes that are not present in the core AD attributes and classes.

For example, if you have a human resources application that uses employee badge color to assign specific benefits, you can extend the schema to include a badge color attribute in the user object class of your directory. To learn more, see How to Move More Custom Applications to the AWS Cloud with AWS Directory Service.

Create user-specific password policies

With user-specific password policies, you can apply specific restrictions and account lockout policies to different types of users in your AWS Microsoft AD (Standard Edition) domain. For example, you can enforce strong passwords and frequent password change policies for administrators, and use less-restrictive policies with moderate account lockout policies for general users.

Add domain controllers

You can increase the performance and redundancy of your directory by adding domain controllers. This can help improve application performance by enabling directory clients to load-balance their requests across a larger number of domain controllers.

Encrypt directory traffic

You can use AWS Microsoft AD (Standard Edition) to encrypt Lightweight Directory Access Protocol (LDAP) communication between your applications and your directory. By enabling LDAP over Secure Sockets Layer (SSL)/Transport Layer Security (TLS), also called LDAPS, you encrypt your LDAP communications end to end. This helps you to protect sensitive information you keep in your directory when it is accessed over untrusted networks.

Improve the security of signing in to AWS services by using multi-factor authentication (MFA)

You can improve the security of signing in to AWS services, such as Amazon WorkSpaces and Amazon QuickSight, by enabling MFA in your AWS Microsoft AD (Standard Edition) directory. With MFA, your users must enter a one-time passcode (OTP) in addition to their AD user names and passwords to access AWS applications and services you enable in AWS Microsoft AD (Standard Edition).

Get started

To get started, use the Directory Service console to create your first directory with just a few clicks. If you have not used Directory Service before, you may be eligible for a 30-day limited free trial.

Summary

In this blog post, I explained what AWS Microsoft AD (Standard Edition) is and how you can use it. With a single directory, you can address many use cases for your business, making it easier to migrate and run your AD-aware workloads in the AWS Cloud, provide access to AWS applications and services, and connect to other cloud applications. To learn more about AWS Microsoft AD, see the Directory Service home page.

If you have comments about this post, submit them in the “Comments” section below. If you have questions about this blog post, start a new thread on the Directory Service forum.

– Peter

How to Compete with Giants

Post Syndicated from Gleb Budman original https://www.backblaze.com/blog/how-to-compete-with-giants/

How to Compete with Giants

This post by Backblaze’s CEO and co-founder Gleb Budman is the sixth in a series about entrepreneurship. You can choose posts in the series from the list below:

  1. How Backblaze got Started: The Problem, The Solution, and the Stuff In-Between
  2. Building a Competitive Moat: Turning Challenges Into Advantages
  3. From Idea to Launch: Getting Your First Customers
  4. How to Get Your First 1,000 Customers
  5. Surviving Your First Year
  6. How to Compete with Giants

Use the Join button above to receive notification of new posts in this series.

Perhaps your business is competing in a brand new space free from established competitors. Most of us, though, start companies that compete with existing offerings from large, established companies. You need to come up with a better mousetrap — not the first mousetrap.

That’s the challenge Backblaze faced. In this post, I’d like to share some of the lessons I learned from that experience.

Backblaze vs. Giants

Competing with established companies that are orders of magnitude larger can be daunting. How can you succeed?

I’ll set the stage by offering a few sets of giants we compete with:

  • When we started Backblaze, we offered online backup in a market where companies had been offering “online backup” for at least a decade, and even the newer entrants had raised tens of millions of dollars.
  • When we built our storage servers, the alternatives were EMC, NetApp, and Dell — each of which had a market cap of over $10 billion.
  • When we introduced our cloud storage offering, B2, our direct competitors were Amazon, Google, and Microsoft. You might have heard of them.

What did we learn by competing with these giants on a bootstrapped budget? Let’s take a look.

Determine What Success Means

For a long time Apple considered Apple TV to be a hobby, not a real product worth focusing on, because it did not generate a billion in revenue. For a $10 billion per year revenue company, a new business that generates $50 million won’t move the needle and often isn’t worth putting focus on. However, for a startup, getting to $50 million in revenue can be the start of a wildly successful business.

Lesson Learned: Don’t let the giants set your success metrics.

The Advantages Startups Have

The giants have a lot of advantages: more money, people, scale, resources, access, etc. Following their playbook and attacking head-on means you’re simply outgunned. Common paths to failure are trying to build more features, enter more markets, outspend on marketing, and other similar approaches where scale and resources are the primary determinants of success.

But being a startup affords many advantages most giants would salivate over. As a nimble startup you can leverage those to succeed. Let’s breakdown nine competitive advantages we’ve used that you can too.

1. Drive Focus

It’s hard to build a $10 billion revenue business doing just one thing, and most giants have a broad portfolio of businesses, numerous products for each, and targeting a variety of customer segments in multiple markets. That adds complexity and distributes management attention.

Startups get the benefit of having everyone in the company be extremely focused, often on a singular mission, product, customer segment, and market. While our competitors sell everything from advertising to Zantac, and are investing in groceries and shipping, Backblaze has focused exclusively on cloud storage. This means all of our best people (i.e. everyone) is focused on our cloud storage business. Where is all of your focus going?

Lesson Learned: Align everyone in your company to a singular focus to dramatically out-perform larger teams.

2. Use Lack-of-Scale as an Advantage

You may have heard Paul Graham say “Do things that don’t scale.” There are a host of things you can do specifically because you don’t have the same scale as the giants. Use that as an advantage.

When we look for data center space, we have more options than our largest competitors because there are simply more spaces available with room for 100 cabinets than for 1,000 cabinets. With some searching, we can find data center space that is better/cheaper.

When a flood in Thailand destroyed factories, causing the world’s supply of hard drives to plummet and prices to triple, we started drive farming. The giants certainly couldn’t. It was a bit crazy, but it let us keep prices unchanged for our customers.

Our Chief Cloud Officer, Tim, used to work at Adobe. Because of their size, any new product needed to always launch in a multitude of languages and in global markets. Once launched, they had scale. But getting any new product launched was incredibly challenging.

Lesson Learned: Use lack-of-scale to exploit opportunities that are closed to giants.

3. Build a Better Product

This one is probably obvious. If you’re going to provide the same product, at the same price, to the same customers — why do it? Remember that better does not always mean more features. Here’s one way we built a better product that didn’t require being a bigger company.

All online backup services required customers to choose what to include in their backup. We found that this was complicated for users since they often didn’t know what needed to be backed up. We flipped the model to back up everything and allow users to exclude if they wanted to, but it was not required. This reduced the number of features/options, while making it easier and better for the user.

This didn’t require the resources of a huge company; it just required understanding customers a bit deeper and thinking about the solution differently. Building a better product is the most classic startup competitive advantage.

Lesson Learned: Dig deep with your customers to understand and deliver a better mousetrap.

4. Provide Better Service

How can you provide better service? Use your advantages. Escalations from your customer care folks to engineering can go through fewer hoops. Fixing an issue and shipping can be quicker. Access to real answers on Twitter or Facebook can be more effective.

A strategic decision we made was to have all customer support people as full-time employees in our headquarters. This ensures they are in close contact to the whole company for feedback to quickly go both ways.

Having a smaller team and fewer layers enables faster internal communication, which increases customer happiness. And the option to do things that don’t scale — such as help a customer in a unique situation — can go a long way in building customer loyalty.

Lesson Learned: Service your customers better by establishing clear internal communications.

5. Remove The Unnecessary

After determining that the industry standard EMC/NetApp/Dell storage servers would be too expensive to build our own cloud storage upon, we decided to build our own infrastructure. Many said we were crazy to compete with these multi-billion dollar companies and that it would be impossible to build a lower cost storage server. However, not only did it prove to not be impossible — it wasn’t even that hard.

One key trick? Remove the unnecessary. While EMC and others built servers to sell to other companies for a wide variety of use cases, Backblaze needed servers that only Backblaze would run, and for a single use case. As a result we could tailor the servers for our needs by removing redundancy from each server (since we would run redundant servers), and using lower-performance components (since we would get high-performance by running parallel servers).

What do your customers and use cases not need? This can trim costs and complexity while often improving the product for your use case.

Lesson Learned: Don’t think “what can we add” to what the giants offer — think “what can we remove.”

6. Be Easy

How many times have you visited a large company website, particularly one that’s not consumer-focused, only to leave saying, “Huh? I don’t understand what you do.” Keeping your website clear, and your product and pricing simple, will dramatically increase conversion and customer satisfaction. If you’re able to make it 2x easier and thus increasing your conversion by 2x, you’ve just allowed yourself to spend ½ as much acquiring a customer.

Providing unlimited data backup wasn’t specifically about providing more storage — it was about making it easier. Since users didn’t know how much data they needed to back up, charging per gigabyte meant they wouldn’t know the cost. Providing unlimited data backup meant they could just relax.

Customers love easy — and being smaller makes easy easier to deliver. Use that as an advantage in your website, marketing materials, pricing, product, and in every other customer interaction.

Lesson Learned: Ease-of-use isn’t a slogan: it’s a competitive advantage. Treat it as seriously as any other feature of your product

7. Don’t Be Afraid of Risk

Obviously unnecessary risks are unnecessary, and some risks aren’t worth taking. However, large companies that have given guidance to Wall Street with a $0.01 range on their earning-per-share are inherently going to be very risk-averse. Use risk-tolerance to open up opportunities, and adjust your tolerance level as you scale. In your first year, there are likely an infinite number of ways your business may vaporize; don’t be too worried about taking a risk that might have a 20% downside when the upside is hockey stick growth.

Using consumer-grade hard drives in our servers may have caused pain and suffering for us years down-the-line, but they were priced at approximately 50% of enterprise drives. Giants wouldn’t have considered the option. Turns out, the consumer drives performed great for us.

Lesson Learned: Use calculated risks as an advantage.

8. Be Open

The larger a company grows, the more it wants to hide information. Some of this is driven by regulatory requirements as a public company. But most of this is cultural. Sharing something might cause a problem, so let’s not. All external communication is treated as a critical press release, with rounds and rounds of editing by multiple teams and approvals. However, customers are often desperate for information. Moreover, sharing information builds trust, understanding, and advocates.

I started blogging at Backblaze before we launched. When we blogged about our Storage Pod and open-sourced the design, many thought we were crazy to share this information. But it was transformative for us, establishing Backblaze as a tech thought leader in storage and giving people a sense of how we were able to provide our service at such a low cost.

Over the years we’ve developed a culture of being open internally and externally, on our blog and with the press, and in communities such as Hacker News and Reddit. Often we’ve been asked, “why would you share that!?” — but it’s the continual openness that builds trust. And that culture of openness is incredibly challenging for the giants.

Lesson Learned: Overshare to build trust and brand where giants won’t.

9. Be Human

As companies scale, typically a smaller percent of founders and executives interact with customers. The people who build the company become more hidden, the language feels “corporate,” and customers start to feel they’re interacting with the cliche “faceless, nameless corporation.” Use your humanity to your advantage. From day one the Backblaze About page listed all the founders, and my email address. While contacting us shouldn’t be the first path for a customer support question, I wanted it to be clear that we stand behind the service we offer; if we’re doing something wrong — I want to know it.

To scale it’s important to have processes and procedures, but sometimes a situation falls outside of a well-established process. While we want our employees to follow processes, they’re still encouraged to be human and “try to do the right thing.” How to you strike this balance? Simon Sinek gives a good talk about it: make your employees feel safe. If employees feel safe they’ll be human.

If your customer is a consumer, they’ll appreciate being treated as a human. Even if your customer is a corporation, the purchasing decision-makers are still people.

Lesson Learned: Being human is the ultimate antithesis to the faceless corporation.

Build Culture to Sustain Your Advantages at Scale

Presumably the goal is not to always be competing with giants, but to one day become a giant. Does this mean you’ll lose all of these advantages? Some, yes — but not all. Some of these advantages are cultural, and if you build these into the culture from the beginning, and fight to keep them as you scale, you can keep them as you become a giant.

Tesla still comes across as human, with Elon Musk frequently interacting with people on Twitter. Apple continues to provide great service through their Genius Bar. And, worst case, if you lose these at scale, you’ll still have the other advantages of being a giant such as money, people, scale, resources, and access.

Of course, some new startup will be gunning for you with grand ambitions, so just be sure not to get complacent. 😉

The post How to Compete with Giants appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

[$] Point releases for the GNU C Library

Post Syndicated from corbet original https://lwn.net/Articles/736429/rss

The GNU C Library (glibc) project produces regular releases on an
approximately six-month cadence. The current release is 2.26
from early August; the 2.27 release is expected at the beginning of
February 2018. Unlike many other projects, though, glibc does not normally
create point releases for important fixes between the major releases.
The last point release from glibc was 2.14.1, which came out in 2011.
A discussion on the need for a 2.26 point release led to questions about
whether such releases have a useful place in the current
software-development environment.

Football Coach Retweets, Gets Sued for Copyright Infringement

Post Syndicated from Andy original https://torrentfreak.com/football-coach-retweets-gets-sued-for-copyright-infringement-170928/

When copyright infringement lawsuits hit the US courts, there’s often a serious case at hand. Whether that’s the sharing of a leaked movie online or indeed the mass infringement that allegedly took place on Megaupload, there’s usually something quite meaty to discuss.

A lawsuit filed this week in a Pennsylvania federal court certainly provides the later, but without managing to be much more than a fairly trivial matter in the first instance.

The case was filed by sports psychologist and author Dr. Keith Bell. It begins by describing Bell as an “internationally recognized performance consultant” who has worked with 500 teams, including the Olympic and national teams for the United States, Canada, Australia, New Zealand, Hong Kong, Fiji, and the Cayman Islands.

Bell is further described as a successful speaker, athlete and coach; “A four-time
collegiate All-American swimmer, a holder of numerous world and national masters swim records, and has coached several collegiate, high school, and private swim teams to competitive success.”

At the heart of the lawsuit is a book that Bell published in 1982, entitled Winning Isn’t Normal.

“The book has enjoyed substantial acclaim, distribution and publicity. Dr. Bell is the sole author of this work, and continues to own all rights in the work,” the lawsuit (pdf) reads.

Bell claims that on or about November 6, 2015, King’s College head football coach Jeffery Knarr retweeted a tweet that was initially posted from @NSUBaseball32, a Twitter account operated by Northeastern State University’s RiverHawks baseball team. The retweet, as shown in the lawsuit, can be seen below.

The retweet that sparked the lawsuit

“The post was made without authorization from Dr. Bell and without attribution
to Dr. Bell,” the lawsuit reads.

“Neither Defendant King’s College nor Defendant Jeffery Knarr contacted Dr.
Bell to request permission to use Dr. Bell’s copyrighted work. As of November 14, 2015, the post had received 206 ‘Retweets’ and 189 ‘Likes.’ Due to the globally accessible nature of Twitter, the post was accessible by Internet users across the world.”

Bell says he sent a cease and desist letter to NSU in September 2016 and shortly thereafter NSU removed the post, which removed the retweets. However, this meant that Knarr’s retweet had been online for “at least” 10 months and 21 days.

To put the icing on the cake, Bell also holds the trademark to the phrase “Winning Isn’t Normal”, so he’s suing Knarr and his King’s College employer for trademark infringement too.

“The Defendants included Plaintiff’s trademark twice in the Twitter post. The first instance was as the title of the post, with the mark shown in letters which
were emphasized by being capitalized, bold, and underlined,” the lawsuit notes.

“The second instance was at the end of the post, with the mark shown in letters which were emphasized by being capitalized, bold, underlined, and followed by three
exclamation points.”

Describing what appears to be a casual retweet as “willful, intentional and purposeful” infringement carried out “in disregard of and with indifference to Plaintiff’s rights,” Bell demands damages and attorneys fees from Knarr and his employer.

“As a direct and proximate result of said infringement by Defendants, Plaintiff is
entitled to damages in an amount to be proven at trial,” the lawsuit concludes.

Since the page from the book retweeted by Knarr is a small portion of the overall work, there may be a fair use defense. Nevertheless, defending this kind of suit is never cheap, so it’s probably fair to say there will already be a considerable amount of regret among the defendants at ever having set eyes on Bell’s 35-year-old book.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

People can’t read (Equifax edition)

Post Syndicated from Robert Graham original http://blog.erratasec.com/2017/09/people-cant-read-equifax-edition.html

One of these days I’m going to write a guide for journalists reporting on the cyber. One of the items I’d stress is that they often fail to read the text of what is being said, but instead read some sort of subtext that wasn’t explicitly said. This is valid sometimes — as the subtext is what the writer intended all along, even if they didn’t explicitly write it. Other times, though the imagined subtext is not what the writer intended at all.

A good example is the recent Equifax breach. The original statement says:

Equifax Inc. (NYSE: EFX) today announced a cybersecurity incident potentially impacting approximately 143 million U.S. consumers.

The word consumers was widely translated to customers, as in this Bloomberg story:

Equifax Inc. said its systems were struck by a cyberattack that may have affected about 143 million U.S. customers of the credit reporting agency

But these aren’t the same thing. Equifax is a credit rating agency, keeping data on people who are not its own customers. It’s an important difference.

Another good example is yesterday’s quote “confirming” that the “Apache Struts” vulnerability was to blame:

Equifax has been intensely investigating the scope of the intrusion with the assistance of a leading, independent cybersecurity firm to determine what information was accessed and who has been impacted. We know that criminals exploited a U.S. website application vulnerability. The vulnerability was Apache Struts CVE-2017-5638.

But it doesn’t confirm Struts was responsible. Blaming Struts is certainly the subtext of this paragraph, but it’s not the text. It mentions that criminals had exploited the Struts vulnerability, but don’t actually connect the dots to the breach we are all talking about.

There’s probably reasons for this. While it’s easy for forensics to find evidence of Struts exploitation in logfiles, it’s much harder to connect this to the breach. While they suspect Struts, they may not actually be able to confirm it. Or, maybe they are trying to cover things up, where they feel failing to patch is a lesser crime than what they really did.

It’s at this point journalists should earn their pay. Instead rewriting what they read on the Internet, they could do legwork and call up Equifax PR and ask.

The purpose of this post isn’t to discuss Equifax, but the tendency of people to “read between the lines”, to read some subtext that wasn’t actually expressed in the text. Sometimes the subtext is legitimately there, such as how Equifax clearly intends people to blame Struts thought they don’t say it outright. Sometimes the subtext isn’t there, such as how Equifax doesn’t mean it’s own customers, only “U.S. consumers”. Journalists need to be careful about making assumptions about the subtext.


Update: The Equifax CSO has a degree in music. Some people have criticized this. Most people have defended this, pointing out that almost nobody has an “infosec” degree in our industry, and many of the top people have no degree at all. Among others, @thegrugq has pointed out that infosec degrees are only a few years old — they weren’t around 20 years ago when today’s corporate officers were getting their degrees.

Again, we have the text/subtext problem, where people interpret infosec degrees as being the same as computer-science degrees, the later of which have existed for decades. Some, as in this case, consider them to be wildly different. Others consider them to be nearly the same.

Equifax Data Breach – Hack Due To Missed Apache Patch

Post Syndicated from Darknet original https://www.darknet.org.uk/2017/09/equifax-data-breach-hack-due-to-missed-apache-patch/?utm_source=rss&utm_medium=social&utm_campaign=darknetfeed

Equifax Data Breach – Hack Due To Missed Apache Patch

The Equifax data breach is pretty huge with 143 million records leaked from the hack in the US alone with unknown more in Canada and the UK.

The original statement about the breach is as follows for those that weren’t up to date with it, which came out Sept 7th (4 months AFTER the breach happened).

Equifax Inc. (NYSE: EFX) today announced a cybersecurity incident potentially impacting approximately 143 million U.S.

Read the rest of Equifax Data Breach – Hack Due To Missed Apache Patch now! Only available at Darknet.

Time Warner Hacked – AWS Config Exposes 4M Subscribers

Post Syndicated from Darknet original https://www.darknet.org.uk/2017/09/time-warner-hacked-aws-config-exposes-4m-subscribers/?utm_source=rss&utm_medium=social&utm_campaign=darknetfeed

Time Warner Hacked – AWS Config Exposes 4M Subscribers

What’s the latest on the web, Time Warner Hacked is what it’s about now as a bad AWS S3 config (once again) exposes the details of approximately 4 Million subscribers.

This follows not long after the Instagram API leaking user contact information and a few other recent leaks involving poorly secured Amazon AWS S3 buckets and I’d hazard a guess that it won’t be the last.

Records of roughly four million Time Warner Cable customers in the US were exposed to the public internet after a contractor failed to properly secure an Amazon cloud database.

Read the rest of Time Warner Hacked – AWS Config Exposes 4M Subscribers now! Only available at Darknet.

Perfect 10 Takes Giganews to Supreme Court, Says It’s Worse Than Megaupload

Post Syndicated from Andy original https://torrentfreak.com/perfect-10-takes-giganews-supreme-court-says-worse-megaupload-170906/

Adult publisher Perfect 10 has developed a reputation for being a serial copyright litigant.

Over the years the company targeted a number of high-profile defendants, including Google, Amazon, Mastercard, and Visa. Around two dozen of Perfect 10’s lawsuits ended in cash settlements and defaults, in the publisher’s favor.

Perhaps buoyed by this success, the company went after Usenet provider Giganews but instead of a company willing to roll over, Perfect 10 found a highly defensive and indeed aggressive opponent. The initial copyright case filed by Perfect 10 alleged that Giganews effectively sold access to Perfect 10 content but things went badly for the publisher.

In November 2014, the U.S. District Court for the Central District of California found that Giganews was not liable for the infringing activities of its users. Perfect 10 was ordered to pay Giganews $5.6m in attorney’s fees and costs. Perfect 10 lost again at the Court of Appeals for the Ninth Circuit.

As a result of these failed actions, Giganews is owned millions by Perfect 10 but the publisher has thus far refused to pay up. That resulted in Giganews filing a $20m lawsuit, accusing Perfect 10 and President Dr. Norman Zada of fraud.

With all this litigation boiling around in the background and Perfect 10 already bankrupt as a result, one might think the story would be near to a conclusion. That doesn’t seem to be the case. In a fresh announcement, Perfect 10 says it has now appealed its case to the US Supreme Court.

“This is an extraordinarily important case, because for the first time, an appellate court has allowed defendants to copy and sell movies, songs, images, and other copyrighted works, without permission or payment to copyright holders,” says Zada.

“In this particular case, evidence was presented that defendants were copying and selling access to approximately 25,000 terabytes of unlicensed movies, songs, images, software, and magazines.”

Referencing an Amicus brief previously filed by the RIAA which described Giganews as “blatant copyright pirates,” Perfect 10 accuses the Ninth Circuit of allowing Giganews to copy and sell trillions of dollars of other people’s intellectual property “because their copying and selling was done in an automated fashion using a computer.”

Noting that “everything is done via computer” these days and with an undertone that the ruling encouraged others to infringe, Perfect 10 says there are now 88 companies similar to Giganews which rely on the automation defense to commit infringement – even involving content owned by people in the US Government.

“These exploiters of other people’s property are fearless. They are copying and selling access to pirated versions of pretty much every movie ever made, including films co-produced by treasury secretary Steven Mnuchin,” Nada says.

“You would think the justice department would do something to protect the viability of this nation’s movie and recording studios, as unfettered piracy harms jobs and tax revenues, but they have done nothing.”

But Zada doesn’t stop at blaming Usenet services, the California District Court, the Ninth Circuit, and the United States Department of Justice for his problems – Congress is to blame too.

“Copyright holders have nowhere to turn other than the Federal courts, whose judges are ridiculously overworked. For years, Congress has failed to provide the Federal courts with adequate funding. As a result, judges can make mistakes,” he adds.

For Zada, those mistakes are particularly notable, particularly since at least one other super high-profile company was shut down in the most aggressive manner possible for allegedly being involved in less piracy than Giganews.

Pointing to the now-infamous Megaupload case, Perfect 10 notes that the Department of Justice completely shut that operation down, filing charges of criminal copyright infringement against Kim Dotcom and seizing $175 million “for selling access to movies and songs which they did not own.”

“Perfect 10 provided evidence that [Giganews] offered more than 200 times as many full length movies as did megaupload.com. But our evidence fell on deaf ears,” Zada complains.

In contrast, Perfect 10 adds, a California District Court found that Giganews had done nothing wrong, allowed it to continue copying and selling access to Perfect 10’s content, and awarded the Usenet provider $5.63m in attorneys fees.

“Prior to this case, no court had ever awarded fees to an alleged infringer, unless they were found to either own the copyrights at issue, or established a fair use defense. Neither was the case here,” Zada adds.

While Perfect 10 has filed a petition with the Supreme Court, the odds of being granted a review are particularly small. Only time will tell how this case will end, but it seems unlikely that the adult publisher will enjoy a happy ending, one in which it doesn’t have to pay Giganews millions of dollars in attorney’s fees.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Security Flaw in Estonian National ID Card

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/09/security_flaw_i.html

We have no idea how bad this really is:

On 30 August, an international team of researchers informed the Estonian Information System Authority (RIA) of a vulnerability potentially affecting the digital use of Estonian ID cards. The possible vulnerability affects a total of almost 750,000 ID-cards issued starting from October 2014, including cards issued to e-residents. The ID-cards issued before 16 October 2014 use a different chip and are not affected. Mobile-IDs are also not impacted.

My guess is that it’s worse than the politicians are saying:

According to Peterkop, the current data shows this risk to be theoretical and there is no evidence of anyone’s digital identity being misused. “All ID-card operations are still valid and we will take appropriate actions to secure the functioning of our national digital-ID infrastructure. For example, we have restricted the access to Estonian ID-card public key database to prevent illegal use.”

And because this system is so important in local politics, the effects are significant:

In the light of current events, some Estonian politicians called to postpone the upcoming local elections, due to take place on 16 October. In Estonia, approximately 35% of the voters use digital identity to vote online.

But the Estonian prime minister, Jüri Ratas, said at a press conference on 5 September that “this incident will not affect the course of the Estonian e-state.” Ratas also recommended to use Mobile-IDs where possible. The prime minister said that the State Electoral Office will decide whether it will allow the usage of ID cards at the upcoming local elections.

The Estonian Police and Border Guard estimates it will take approximately two months to fix the issue with faulty cards. The authority will involve as many Estonian experts as possible in the process.

This is exactly the sort of thing I worry about as ID systems become more prevalent and more centralized. Anyone want to place bets on whether a foreign country is going to try to hack the next Estonian election?

Another article.

HDClub, Russia’s Leading HD-Only Torrent Site, Permanently Shuts Down

Post Syndicated from Andy original https://torrentfreak.com/hdclub-russias-leading-hd-torrent-site-permanently-shuts-down-170830/

While millions of users frequent popular public torrent sites such as The Pirate Bay and RARBG every day, there’s a thriving scene that’s hidden from the wider public eye.

Every week, private torrent trackers cater to dozens of millions of BitTorrent users who have taken the time and effort to gain access to these more secretive communities. Often labeled as elitist and running counter to the broad sharing ethos that made file-sharing the beast it is today, private sites pride themselves on quality, order and speed, something public sites typically struggle to match.

In addition to these notable qualities, many private sites choose to focus on a particular niche. There are sites dedicated to obscure electronic music, comedy, and even magic, but HDClub’s focus was given away by its name.

Dubbing itself “The HighDefinition BitTorrent Community”, HDClub specialized in HD productions including Blu-ray and 3D content, covering movies, TV shows, music videos, and animation.

Born in 2007, HDClub celebrated its ninth birthday on March 9 last year, with 2017 heralding a full decade online for the site. Catering mainly to the Russian and Ukrainian markets, the site’s releases often preserved an English audio option, ideal for those looking for high-quality releases from an unorthodox source at decent speeds.

Of course, HDClub releases often leaked out of the site, meaning that thousands are still available on regular public trackers, as a search on any Western torrent engine reveals.

A sample of HDClub releases listed on Torrentz2

Importantly, the site offered thousands of releases completely unavailable in Russia from licensed sources, meaning it filled a niche in which official outlets either wouldn’t or couldn’t compete. This earned itself a place in Russia’s Top 1000 sites list, despite being a closed membership platform.

The site’s attention to detail and focus earned it a considerable following. For the past few years the site capped membership at 190,000 people but in practice, attendance floated around the 170,000 mark. Seeders peaked at approximately 400,000 with leechers considerably less, making seeding as difficult as one might expect on a ratio-based tracker.

Now, however, the decade-long run of HDClub has come to an abrupt end. Early this week the tracker went dark, reportedly without advance notice. A Russian language announcement now present on its main page explains the reasons for the site’s demise.

“Recently, we received several dozens of complaints from rightsholders weekly, and our community is subjected to attacks and espionage,” the announcement reads.

While public torrent sites are always bombarded with DMCA-style notices, private sites tend to avoid large numbers of complaints. In this case, however, HDClub were clearly feeling the pressure. The site’s main page was open to the public while featuring popular releases, so this probably didn’t help with the load.

It’s not clear what is meant by “attacks and espionage” but it’s possibly a reference to DDoS assaults and third-parties attempting to monitor the site. Nevertheless, as HDClub points out, the climate for torrent, streaming, and similar sites has become increasingly hostile in the region recently.

“In parallel, there is a tightening of Internet legislation in Russia, Ukraine and EU countries,” the site says.

Interestingly, the site’s operators also suggest that interest from some quarters had waned, noting that “the time of enthusiasts irretrievably goes away.” It’s unclear whether that’s a reference to site users, the site’s operators, or indeed both. But in any event, any significant decline in any area can prove fatal, particularly when other pressures are at play.

“In the circumstances, we can no longer support the work of the club in the originally conceived format. The project is closed, but we ask you to refrain from long farewells. Thank you all and goodbye!” the message concludes.

Interestingly, the site ends with a little teaser, which may indicate some hope for the future.

“There are talks on preserving the heritage of the club,” it reads, without adding further details.

Possibly stay tuned…..

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

How to Configure an LDAPS Endpoint for Simple AD

Post Syndicated from Cameron Worrell original https://aws.amazon.com/blogs/security/how-to-configure-an-ldaps-endpoint-for-simple-ad/

Simple AD, which is powered by Samba  4, supports basic Active Directory (AD) authentication features such as users, groups, and the ability to join domains. Simple AD also includes an integrated Lightweight Directory Access Protocol (LDAP) server. LDAP is a standard application protocol for the access and management of directory information. You can use the BIND operation from Simple AD to authenticate LDAP client sessions. This makes LDAP a common choice for centralized authentication and authorization for services such as Secure Shell (SSH), client-based virtual private networks (VPNs), and many other applications. Authentication, the process of confirming the identity of a principal, typically involves the transmission of highly sensitive information such as user names and passwords. To protect this information in transit over untrusted networks, companies often require encryption as part of their information security strategy.

In this blog post, we show you how to configure an LDAPS (LDAP over SSL/TLS) encrypted endpoint for Simple AD so that you can extend Simple AD over untrusted networks. Our solution uses Elastic Load Balancing (ELB) to send decrypted LDAP traffic to HAProxy running on Amazon EC2, which then sends the traffic to Simple AD. ELB offers integrated certificate management, SSL/TLS termination, and the ability to use a scalable EC2 backend to process decrypted traffic. ELB also tightly integrates with Amazon Route 53, enabling you to use a custom domain for the LDAPS endpoint. The solution needs the intermediate HAProxy layer because ELB can direct traffic only to EC2 instances. To simplify testing and deployment, we have provided an AWS CloudFormation template to provision the ELB and HAProxy layers.

This post assumes that you have an understanding of concepts such as Amazon Virtual Private Cloud (VPC) and its components, including subnets, routing, Internet and network address translation (NAT) gateways, DNS, and security groups. You should also be familiar with launching EC2 instances and logging in to them with SSH. If needed, you should familiarize yourself with these concepts and review the solution overview and prerequisites in the next section before proceeding with the deployment.

Note: This solution is intended for use by clients requiring an LDAPS endpoint only. If your requirements extend beyond this, you should consider accessing the Simple AD servers directly or by using AWS Directory Service for Microsoft AD.

Solution overview

The following diagram and description illustrates and explains the Simple AD LDAPS environment. The CloudFormation template creates the items designated by the bracket (internal ELB load balancer and two HAProxy nodes configured in an Auto Scaling group).

Diagram of the the Simple AD LDAPS environment

Here is how the solution works, as shown in the preceding numbered diagram:

  1. The LDAP client sends an LDAPS request to ELB on TCP port 636.
  2. ELB terminates the SSL/TLS session and decrypts the traffic using a certificate. ELB sends the decrypted LDAP traffic to the EC2 instances running HAProxy on TCP port 389.
  3. The HAProxy servers forward the LDAP request to the Simple AD servers listening on TCP port 389 in a fixed Auto Scaling group configuration.
  4. The Simple AD servers send an LDAP response through the HAProxy layer to ELB. ELB encrypts the response and sends it to the client.

Note: Amazon VPC prevents a third party from intercepting traffic within the VPC. Because of this, the VPC protects the decrypted traffic between ELB and HAProxy and between HAProxy and Simple AD. The ELB encryption provides an additional layer of security for client connections and protects traffic coming from hosts outside the VPC.

Prerequisites

  1. Our approach requires an Amazon VPC with two public and two private subnets. The previous diagram illustrates the environment’s VPC requirements. If you do not yet have these components in place, follow these guidelines for setting up a sample environment:
    1. Identify a region that supports Simple AD, ELB, and NAT gateways. The NAT gateways are used with an Internet gateway to allow the HAProxy instances to access the internet to perform their required configuration. You also need to identify the two Availability Zones in that region for use by Simple AD. You will supply these Availability Zones as parameters to the CloudFormation template later in this process.
    2. Create or choose an Amazon VPC in the region you chose. In order to use Route 53 to resolve the LDAPS endpoint, make sure you enable DNS support within your VPC. Create an Internet gateway and attach it to the VPC, which will be used by the NAT gateways to access the internet.
    3. Create a route table with a default route to the Internet gateway. Create two NAT gateways, one per Availability Zone in your public subnets to provide additional resiliency across the Availability Zones. Together, the routing table, the NAT gateways, and the Internet gateway enable the HAProxy instances to access the internet.
    4. Create two private routing tables, one per Availability Zone. Create two private subnets, one per Availability Zone. The dual routing tables and subnets allow for a higher level of redundancy. Add each subnet to the routing table in the same Availability Zone. Add a default route in each routing table to the NAT gateway in the same Availability Zone. The Simple AD servers use subnets that you create.
    5. The LDAP service requires a DNS domain that resolves within your VPC and from your LDAP clients. If you do not have an existing DNS domain, follow the steps to create a private hosted zone and associate it with your VPC. To avoid encryption protocol errors, you must ensure that the DNS domain name is consistent across your Route 53 zone and in the SSL/TLS certificate (see Step 2 in the “Solution deployment” section).
  2. Make sure you have completed the Simple AD Prerequisites.
  3. We will use a self-signed certificate for ELB to perform SSL/TLS decryption. You can use a certificate issued by your preferred certificate authority or a certificate issued by AWS Certificate Manager (ACM).
    Note: To prevent unauthorized connections directly to your Simple AD servers, you can modify the Simple AD security group on port 389 to block traffic from locations outside of the Simple AD VPC. You can find the security group in the EC2 console by creating a search filter for your Simple AD directory ID. It is also important to allow the Simple AD servers to communicate with each other as shown on Simple AD Prerequisites.

Solution deployment

This solution includes five main parts:

  1. Create a Simple AD directory.
  2. Create a certificate.
  3. Create the ELB and HAProxy layers by using the supplied CloudFormation template.
  4. Create a Route 53 record.
  5. Test LDAPS access using an Amazon Linux client.

1. Create a Simple AD directory

With the prerequisites completed, you will create a Simple AD directory in your private VPC subnets:

  1. In the Directory Service console navigation pane, choose Directories and then choose Set up directory.
  2. Choose Simple AD.
    Screenshot of choosing "Simple AD"
  3. Provide the following information:
    • Directory DNS – The fully qualified domain name (FQDN) of the directory, such as corp.example.com. You will use the FQDN as part of the testing procedure.
    • NetBIOS name – The short name for the directory, such as CORP.
    • Administrator password – The password for the directory administrator. The directory creation process creates an administrator account with the user name Administrator and this password. Do not lose this password because it is nonrecoverable. You also need this password for testing LDAPS access in a later step.
    • Description – An optional description for the directory.
    • Directory Size – The size of the directory.
      Screenshot of the directory details to provide
  4. Provide the following information in the VPC Details section, and then choose Next Step:
    • VPC – Specify the VPC in which to install the directory.
    • Subnets – Choose two private subnets for the directory servers. The two subnets must be in different Availability Zones. Make a note of the VPC and subnet IDs for use as CloudFormation input parameters. In the following example, the Availability Zones are us-east-1a and us-east-1c.
      Screenshot of the VPC details to provide
  5. Review the directory information and make any necessary changes. When the information is correct, choose Create Simple AD.

It takes several minutes to create the directory. From the AWS Directory Service console , refresh the screen periodically and wait until the directory Status value changes to Active before continuing. Choose your Simple AD directory and note the two IP addresses in the DNS address section. You will enter them when you run the CloudFormation template later.

Note: Full administration of your Simple AD implementation is out of scope for this blog post. See the documentation to add users, groups, or instances to your directory. Also see the previous blog post, How to Manage Identities in Simple AD Directories.

2. Create a certificate

In the previous step, you created the Simple AD directory. Next, you will generate a self-signed SSL/TLS certificate using OpenSSL. You will use the certificate with ELB to secure the LDAPS endpoint. OpenSSL is a standard, open source library that supports a wide range of cryptographic functions, including the creation and signing of x509 certificates. You then import the certificate into ACM that is integrated with ELB.

  1. You must have a system with OpenSSL installed to complete this step. If you do not have OpenSSL, you can install it on Amazon Linux by running the command, sudo yum install openssl. If you do not have access to an Amazon Linux instance you can create one with SSH access enabled to proceed with this step. Run the command, openssl version, at the command line to see if you already have OpenSSL installed.
    [[email protected] ~]$ openssl version
    OpenSSL 1.0.1k-fips 8 Jan 2015

  2. Create a private key using the command, openssl genrsa command.
    [[email protected] tmp]$ openssl genrsa 2048 > privatekey.pem
    Generating RSA private key, 2048 bit long modulus
    ......................................................................................................................................................................+++
    ..........................+++
    e is 65537 (0x10001)

  3. Generate a certificate signing request (CSR) using the openssl req command. Provide the requested information for each field. The Common Name is the FQDN for your LDAPS endpoint (for example, ldap.corp.example.com). The Common Name must use the domain name you will later register in Route 53. You will encounter certificate errors if the names do not match.
    [[email protected] tmp]$ openssl req -new -key privatekey.pem -out server.csr
    You are about to be asked to enter information that will be incorporated into your certificate request.

  4. Use the openssl x509 command to sign the certificate. The following example uses the private key from the previous step (privatekey.pem) and the signing request (server.csr) to create a public certificate named server.crt that is valid for 365 days. This certificate must be updated within 365 days to avoid disruption of LDAPS functionality.
    [[email protected] tmp]$ openssl x509 -req -sha256 -days 365 -in server.csr -signkey privatekey.pem -out server.crt
    Signature ok
    subject=/C=XX/L=Default City/O=Default Company Ltd/CN=ldap.corp.example.com
    Getting Private key

  5. You should see three files: privatekey.pem, server.crt, and server.csr.
    [[email protected] tmp]$ ls
    privatekey.pem server.crt server.csr

    Restrict access to the private key.

    [[email protected] tmp]$ chmod 600 privatekey.pem

    Keep the private key and public certificate for later use. You can discard the signing request because you are using a self-signed certificate and not using a Certificate Authority. Always store the private key in a secure location and avoid adding it to your source code.

  6. In the ACM console, choose Import a certificate.
  7. Using your favorite Linux text editor, paste the contents of your server.crt file in the Certificate body box.
  8. Using your favorite Linux text editor, paste the contents of your privatekey.pem file in the Certificate private key box. For a self-signed certificate, you can leave the Certificate chain box blank.
  9. Choose Review and import. Confirm the information and choose Import.

3. Create the ELB and HAProxy layers by using the supplied CloudFormation template

Now that you have created your Simple AD directory and SSL/TLS certificate, you are ready to use the CloudFormation template to create the ELB and HAProxy layers.

  1. Load the supplied CloudFormation template to deploy an internal ELB and two HAProxy EC2 instances into a fixed Auto Scaling group. After you load the template, provide the following input parameters. Note: You can find the parameters relating to your Simple AD from the directory details page by choosing your Simple AD in the Directory Service console.
Input parameter Input parameter description
HAProxyInstanceSize The EC2 instance size for HAProxy servers. The default size is t2.micro and can scale up for large Simple AD environments.
MyKeyPair The SSH key pair for EC2 instances. If you do not have an existing key pair, you must create one.
VPCId The target VPC for this solution. Must be in the VPC where you deployed Simple AD and is available in your Simple AD directory details page.
SubnetId1 The Simple AD primary subnet. This information is available in your Simple AD directory details page.
SubnetId2 The Simple AD secondary subnet. This information is available in your Simple AD directory details page.
MyTrustedNetwork Trusted network Classless Inter-Domain Routing (CIDR) to allow connections to the LDAPS endpoint. For example, use the VPC CIDR to allow clients in the VPC to connect.
SimpleADPriIP The primary Simple AD Server IP. This information is available in your Simple AD directory details page.
SimpleADSecIP The secondary Simple AD Server IP. This information is available in your Simple AD directory details page.
LDAPSCertificateARN The Amazon Resource Name (ARN) for the SSL certificate. This information is available in the ACM console.
  1. Enter the input parameters and choose Next.
  2. On the Options page, accept the defaults and choose Next.
  3. On the Review page, confirm the details and choose Create. The stack will be created in approximately 5 minutes.

4. Create a Route 53 record

The next step is to create a Route 53 record in your private hosted zone so that clients can resolve your LDAPS endpoint.

  1. If you do not have an existing DNS domain for use with LDAP, create a private hosted zone and associate it with your VPC. The hosted zone name should be consistent with your Simple AD (for example, corp.example.com).
  2. When the CloudFormation stack is in CREATE_COMPLETE status, locate the value of the LDAPSURL on the Outputs tab of the stack. Copy this value for use in the next step.
  3. On the Route 53 console, choose Hosted Zones and then choose the zone you used for the Common Name box for your self-signed certificate. Choose Create Record Set and enter the following information:
    1. Name – The label of the record (such as ldap).
    2. Type – Leave as A – IPv4 address.
    3. Alias – Choose Yes.
    4. Alias Target – Paste the value of the LDAPSURL on the Outputs tab of the stack.
  4. Leave the defaults for Routing Policy and Evaluate Target Health, and choose Create.
    Screenshot of finishing the creation of the Route 53 record

5. Test LDAPS access using an Amazon Linux client

At this point, you have configured your LDAPS endpoint and now you can test it from an Amazon Linux client.

  1. Create an Amazon Linux instance with SSH access enabled to test the solution. Launch the instance into one of the public subnets in your VPC. Make sure the IP assigned to the instance is in the trusted IP range you specified in the CloudFormation parameter MyTrustedNetwork in Step 3.b.
  2. SSH into the instance and complete the following steps to verify access.
    1. Install the openldap-clients package and any required dependencies:
      sudo yum install -y openldap-clients.
    2. Add the server.crt file to the /etc/openldap/certs/ directory so that the LDAPS client will trust your SSL/TLS certificate. You can copy the file using Secure Copy (SCP) or create it using a text editor.
    3. Edit the /etc/openldap/ldap.conf file and define the environment variables BASE, URI, and TLS_CACERT.
      • The value for BASE should match the configuration of the Simple AD directory name.
      • The value for URI should match your DNS alias.
      • The value for TLS_CACERT is the path to your public certificate.

Here is an example of the contents of the file.

BASE dc=corp,dc=example,dc=com
URI ldaps://ldap.corp.example.com
TLS_CACERT /etc/openldap/certs/server.crt

To test the solution, query the directory through the LDAPS endpoint, as shown in the following command. Replace corp.example.com with your domain name and use the Administrator password that you configured with the Simple AD directory

$ ldapsearch -D "[email protected]corp.example.com" -W sAMAccountName=Administrator

You should see a response similar to the following response, which provides the directory information in LDAP Data Interchange Format (LDIF) for the administrator distinguished name (DN) from your Simple AD LDAP server.

# extended LDIF
#
# LDAPv3
# base <dc=corp,dc=example,dc=com> (default) with scope subtree
# filter: sAMAccountName=Administrator
# requesting: ALL
#

# Administrator, Users, corp.example.com
dn: CN=Administrator,CN=Users,DC=corp,DC=example,DC=com
objectClass: top
objectClass: person
objectClass: organizationalPerson
objectClass: user
description: Built-in account for administering the computer/domain
instanceType: 4
whenCreated: 20170721123204.0Z
uSNCreated: 3223
name: Administrator
objectGUID:: l3h0HIiKO0a/ShL4yVK/vw==
userAccountControl: 512
…

You can now use the LDAPS endpoint for directory operations and authentication within your environment. If you would like to learn more about how to interact with your LDAPS endpoint within a Linux environment, here are a few resources to get started:

Troubleshooting

If you receive an error such as the following error when issuing the ldapsearch command, there are a few things you can do to help identify issues.

ldap_sasl_bind(SIMPLE): Can't contact LDAP server (-1)
  • You might be able to obtain additional error details by adding the -d1 debug flag to the ldapsearch command in the previous section.
    $ ldapsearch -D "[email protected]" -W sAMAccountName=Administrator –d1

  • Verify that the parameters in ldap.conf match your configured LDAPS URI endpoint and that all parameters can be resolved by DNS. You can use the following dig command, substituting your configured endpoint DNS name.
    $ dig ldap.corp.example.com

  • Confirm that the client instance from which you are connecting is in the CIDR range of the CloudFormation parameter, MyTrustedNetwork.
  • Confirm that the path to your public SSL/TLS certificate configured in ldap.conf as TLS_CAERT is correct. You configured this in Step 5.b.3. You can check your SSL/TLS connection with the command, substituting your configured endpoint DNS name for the string after –connect.
    $ echo -n | openssl s_client -connect ldap.corp.example.com:636

  • Verify that your HAProxy instances have the status InService in the EC2 console: Choose Load Balancers under Load Balancing in the navigation pane, highlight your LDAPS load balancer, and then choose the Instances

Conclusion

You can use ELB and HAProxy to provide an LDAPS endpoint for Simple AD and transport sensitive authentication information over untrusted networks. You can explore using LDAPS to authenticate SSH users or integrate with other software solutions that support LDAP authentication. This solution’s CloudFormation template is available on GitHub.

If you have comments about this post, submit them in the “Comments” section below. If you have questions about or issues implementing this solution, start a new thread on the Directory Service forum.

– Cameron and Jeff

From Data Lake to Data Warehouse: Enhancing Customer 360 with Amazon Redshift Spectrum

Post Syndicated from Dylan Tong original https://aws.amazon.com/blogs/big-data/from-data-lake-to-data-warehouse-enhancing-customer-360-with-amazon-redshift-spectrum/

Achieving a 360o-view of your customer has become increasingly challenging as companies embrace omni-channel strategies, engaging customers across websites, mobile, call centers, social media, physical sites, and beyond. The promise of a web where online and physical worlds blend makes understanding your customers more challenging, but also more important. Businesses that are successful in this medium have a significant competitive advantage.

The big data challenge requires the management of data at high velocity and volume. Many customers have identified Amazon S3 as a great data lake solution that removes the complexities of managing a highly durable, fault tolerant data lake infrastructure at scale and economically.

AWS data services substantially lessen the heavy lifting of adopting technologies, allowing you to spend more time on what matters most—gaining a better understanding of customers to elevate your business. In this post, I show how a recent Amazon Redshift innovation, Redshift Spectrum, can enhance a customer 360 initiative.

Customer 360 solution

A successful customer 360 view benefits from using a variety of technologies to deliver different forms of insights. These could range from real-time analysis of streaming data from wearable devices and mobile interactions to historical analysis that requires interactive, on demand queries on billions of transactions. In some cases, insights can only be inferred through AI via deep learning. Finally, the value of your customer data and insights can’t be fully realized until it is operationalized at scale—readily accessible by fleets of applications. Companies are leveraging AWS for the breadth of services that cover these domains, to drive their data strategy.

A number of AWS customers stream data from various sources into a S3 data lake through Amazon Kinesis. They use Kinesis and technologies in the Hadoop ecosystem like Spark running on Amazon EMR to enrich this data. High-value data is loaded into an Amazon Redshift data warehouse, which allows users to analyze and interact with data through a choice of client tools. Redshift Spectrum expands on this analytics platform by enabling Amazon Redshift to blend and analyze data beyond the data warehouse and across a data lake.

The following diagram illustrates the workflow for such a solution.

This solution delivers value by:

  • Reducing complexity and time to value to deeper insights. For instance, an existing data model in Amazon Redshift may provide insights across dimensions such as customer, geography, time, and product on metrics from sales and financial systems. Down the road, you may gain access to streaming data sources like customer-care call logs and website activity that you want to blend in with the sales data on the same dimensions to understand how web and call center experiences maybe correlated with sales performance. Redshift Spectrum can join these dimensions in Amazon Redshift with data in S3 to allow you to quickly gain new insights, and avoid the slow and more expensive alternative of fully integrating these sources with your data warehouse.
  • Providing an additional avenue for optimizing costs and performance. In cases like call logs and clickstream data where volumes could be many TBs to PBs, storing the data exclusively in S3 yields significant cost savings. Interactive analysis on massive datasets may now be economically viable in cases where data was previously analyzed periodically through static reports generated by inexpensive batch processes. In some cases, you can improve the user experience while simultaneously lowering costs. Spectrum is powered by a large-scale infrastructure external to your Amazon Redshift cluster, and excels at scanning and aggregating large volumes of data. For instance, your analysts maybe performing data discovery on customer interactions across millions of consumers over years of data across various channels. On this large dataset, certain queries could be slow if you didn’t have a large Amazon Redshift cluster. Alternatively, you could use Redshift Spectrum to achieve a better user experience with a smaller cluster.

Proof of concept walkthrough

To make evaluation easier for you, I’ve conducted a Redshift Spectrum proof-of-concept (PoC) for the customer 360 use case. For those who want to replicate the PoC, the instructions, AWS CloudFormation templates, and public data sets are available in the GitHub repository.

The remainder of this post is a journey through the project, observing best practices in action, and learning how you can achieve business value. The walkthrough involves:

  • An analysis of performance data from the PoC environment involving queries that demonstrate blending and analysis of data across Amazon Redshift and S3. Observe that great results are achievable at scale.
  • Guidance by example on query tuning, design, and data preparation to illustrate the optimization process. This includes tuning a query that combines clickstream data in S3 with customer and time dimensions in Amazon Redshift, and aggregates ~1.9 B out of 3.7 B+ records in under 10 seconds with a small cluster!
  • Guidance and measurements to help assess deciding between two options: accessing and analyzing data exclusively in Amazon Redshift, or using Redshift Spectrum to access data left in S3.

Stream ingestion and enrichment

The focus of this post isn’t stream ingestion and enrichment on Kinesis and EMR, but be mindful of performance best practices on S3 to ensure good streaming and query performance:

  • Use random object keys: The data files provided for this project are prefixed with SHA-256 hashes to prevent hot partitions. This is important to ensure that optimal request rates to support PUT requests from the incoming stream in addition to certain queries from large Amazon Redshift clusters that could send a large number of parallel GET requests.
  • Micro-batch your data stream: S3 isn’t optimized for small random write workloads. Your datasets should be micro-batched into large files. For instance, the “parquet-1” dataset provided batches >7 million records per file. The optimal file size for Redshift Spectrum is usually in the 100 MB to 1 GB range.

If you have an edge case that may pose scalability challenges, AWS would love to hear about it. For further guidance, talk to your solutions architect.

Environment

The project consists of the following environment:

  • Amazon Redshift cluster: 4 X dc1.large
  • Data:
    • Time and customer dimension tables are stored on all Amazon Redshift nodes (ALL distribution style):
      • The data originates from the DWDATE and CUSTOMER tables in the Star Schema Benchmark
      • The customer table contains attributes for 3 million customers.
      • The time data is at the day-level granularity, and spans 7 years, from the start of 1992 to the end of 1998.
    • The clickstream data is stored in an S3 bucket, and serves as a fact table.
      • Various copies of this dataset in CSV and Parquet format have been provided, for reasons to be discussed later.
      • The data is a modified version of the uservisits dataset from AMPLab’s Big Data Benchmark, which was generated by Intel’s Hadoop benchmark tools.
      • Changes were minimal, so that existing test harnesses for this test can be adapted:
        • Increased the 751,754,869-row dataset 5X to 3,758,774,345 rows.
        • Added surrogate keys to support joins with customer and time dimensions. These keys were distributed evenly across the entire dataset to represents user visits from six customers over seven years.
        • Values for the visitDate column were replaced to align with the 7-year timeframe, and the added time surrogate key.

Queries across the data lake and data warehouse 

Imagine a scenario where a business analyst plans to analyze clickstream metrics like ad revenue over time and by customer, market segment and more. The example below is a query that achieves this effect: 

The query part highlighted in red retrieves clickstream data in S3, and joins the data with the time and customer dimension tables in Amazon Redshift through the part highlighted in blue. The query returns the total ad revenue for three customers over the last three months, along with info on their respective market segment.

Unfortunately, this query takes around three minutes to run, and doesn’t enable the interactive experience that you want. However, there’s a number of performance optimizations that you can implement to achieve the desired performance.

Performance analysis

Two key utilities provide visibility into Redshift Spectrum:

  • EXPLAIN
    Provides the query execution plan, which includes info around what processing is pushed down to Redshift Spectrum. Steps in the plan that include the prefix S3 are executed on Redshift Spectrum. For instance, the plan for the previous query has the step “S3 Seq Scan clickstream.uservisits_csv10”, indicating that Redshift Spectrum performs a scan on S3 as part of the query execution.
  • SVL_S3QUERY_SUMMARY
    Statistics for Redshift Spectrum queries are stored in this table. While the execution plan presents cost estimates, this table stores actual statistics for past query runs.

You can get the statistics of your last query by inspecting the SVL_S3QUERY_SUMMARY table with the condition (query = pg_last_query_id()). Inspecting the previous query reveals that the entire dataset of nearly 3.8 billion rows was scanned to retrieve less than 66.3 million rows. Improving scan selectivity in your query could yield substantial performance improvements.

Partitioning

Partitioning is a key means to improving scan efficiency. In your environment, the data and tables have already been organized, and configured to support partitions. For more information, see the PoC project setup instructions. The clickstream table was defined as:

CREATE EXTERNAL TABLE clickstream.uservisits_csv10
…
PARTITIONED BY(customer int4, visitYearMonth int4)

The entire 3.8 billion-row dataset is organized as a collection of large files where each file contains data exclusive to a particular customer and month in a year. This allows you to partition your data into logical subsets by customer and year/month. With partitions, the query engine can target a subset of files:

  • Only for specific customers
  • Only data for specific months
  • A combination of specific customers and year/months

You can use partitions in your queries. Instead of joining your customer data on the surrogate customer key (that is, c.c_custkey = uv.custKey), the partition key “customer” should be used instead:

SELECT c.c_name, c.c_mktsegment, t.prettyMonthYear, SUM(uv.adRevenue)
…
ON c.c_custkey = uv.customer
…
ORDER BY c.c_name, c.c_mktsegment, uv.yearMonthKey  ASC

This query should run approximately twice as fast as the previous query. If you look at the statistics for this query in SVL_S3QUERY_SUMMARY, you see that only half the dataset was scanned. This is expected because your query is on three out of six customers on an evenly distributed dataset. However, the scan is still inefficient, and you can benefit from using your year/month partition key as well:

SELECT c.c_name, c.c_mktsegment, t.prettyMonthYear, SUM(uv.adRevenue)
…
ON c.c_custkey = uv.customer
…
ON uv.visitYearMonth = t.d_yearmonthnum
…
ORDER BY c.c_name, c.c_mktsegment, uv.visitYearMonth ASC

All joins between the tables are now using partitions. Upon reviewing the statistics for this query, you should observe that Redshift Spectrum scans and returns the exact number of rows, 66,270,117. If you run this query a few times, you should see execution time in the range of 8 seconds, which is a 22.5X improvement on your original query!

Predicate pushdown and storage optimizations 

Previously, I mentioned that Redshift Spectrum performs processing through large-scale infrastructure external to your Amazon Redshift cluster. It is optimized for performing large scans and aggregations on S3. In fact, Redshift Spectrum may even out-perform a medium size Amazon Redshift cluster on these types of workloads with the proper optimizations. There are two important variables to consider for optimizing large scans and aggregations:

  • File size and count. As a general rule, use files 100 MB-1 GB in size, as Redshift Spectrum and S3 are optimized for reading this object size. However, the number of files operating on a query is directly correlated with the parallelism achievable by a query. There is an inverse relationship between file size and count: the bigger the files, the fewer files there are for the same dataset. Consequently, there is a trade-off between optimizing for object read performance, and the amount of parallelism achievable on a particular query. Large files are best for large scans as the query likely operates on sufficiently large number of files. For queries that are more selective and for which fewer files are operating, you may find that smaller files allow for more parallelism.
  • Data format. Redshift Spectrum supports various data formats. Columnar formats like Parquet can sometimes lead to substantial performance benefits by providing compression and more efficient I/O for certain workloads. Generally, format types like Parquet should be used for query workloads involving large scans, and high attribute selectivity. Again, there are trade-offs as formats like Parquet require more compute power to process than plaintext. For queries on smaller subsets of data, the I/O efficiency benefit of Parquet is diminished. At some point, Parquet may perform the same or slower than plaintext. Latency, compression rates, and the trade-off between user experience and cost should drive your decision.

To help illustrate how Redshift Spectrum performs on these large aggregation workloads, run a basic query that aggregates the entire ~3.7 billion record dataset on Redshift Spectrum, and compared that with running the query exclusively on Amazon Redshift:

SELECT uv.custKey, COUNT(uv.custKey)
FROM <your clickstream table> as uv
GROUP BY uv.custKey
ORDER BY uv.custKey ASC

For the Amazon Redshift test case, the clickstream data is loaded, and distributed evenly across all nodes (even distribution style) with optimal column compression encodings prescribed by the Amazon Redshift’s ANALYZE command.

The Redshift Spectrum test case uses a Parquet data format with each file containing all the data for a particular customer in a month. This results in files mostly in the range of 220-280 MB, and in effect, is the largest file size for this partitioning scheme. If you run tests with the other datasets provided, you see that this data format and size is optimal and out-performs others by ~60X. 

Performance differences will vary depending on the scenario. The important takeaway is to understand the testing strategy and the workload characteristics where Redshift Spectrum is likely to yield performance benefits. 

The following chart compares the query execution time for the two scenarios. The results indicate that you would have to pay for 12 X DC1.Large nodes to get performance comparable to using a small Amazon Redshift cluster that leverages Redshift Spectrum. 

Chart showing simple aggregation on ~3.7 billion records

So you’ve validated that Spectrum excels at performing large aggregations. Could you benefit by pushing more work down to Redshift Spectrum in your original query? It turns out that you can, by making the following modification:

The clickstream data is stored at a day-level granularity for each customer while your query rolls up the data to the month level per customer. In the earlier query that uses the day/month partition key, you optimized the query so that it only scans and retrieves the data required, but the day level data is still sent back to your Amazon Redshift cluster for joining and aggregation. The query shown here pushes aggregation work down to Redshift Spectrum as indicated by the query plan:

In this query, Redshift Spectrum aggregates the clickstream data to the month level before it is returned to the Amazon Redshift cluster and joined with the dimension tables. This query should complete in about 4 seconds, which is roughly twice as fast as only using the partition key. The speed increase is evident upon reviewing the SVL_S3QUERY_SUMMARY table:

  • Bytes scanned is 21.6X less because of the Parquet data format.
  • Only 90 records are returned back to the Amazon Redshift cluster as a result of the push-down, instead of ~66.2 million, leading to substantially less join overhead, and about 530 MB less data sent back to your cluster.
  • No adverse change in average parallelism.

Assessing the value of Amazon Redshift vs. Redshift Spectrum

At this point, you might be asking yourself, why would I ever not use Redshift Spectrum? Well, you still get additional value for your money by loading data into Amazon Redshift, and querying in Amazon Redshift vs. querying S3.

In fact, it turns out that the last version of our query runs even faster when executed exclusively in native Amazon Redshift, as shown in the following chart:

Chart comparing Amazon Redshift vs. Redshift Spectrum with pushdown aggregation over 3 months of data

As a general rule, queries that aren’t dominated by I/O and which involve multiple joins are better optimized in native Amazon Redshift. For instance, the performance difference between running the partition key query entirely in Amazon Redshift versus with Redshift Spectrum is twice as large as that that of the pushdown aggregation query, partly because the former case benefits more from better join performance.

Furthermore, the variability in latency in native Amazon Redshift is lower. For use cases where you have tight performance SLAs on queries, you may want to consider using Amazon Redshift exclusively to support those queries.

On the other hand, when you perform large scans, you could benefit from the best of both worlds: higher performance at lower cost. For instance, imagine that you wanted to enable your business analysts to interactively discover insights across a vast amount of historical data. In the example below, the pushdown aggregation query is modified to analyze seven years of data instead of three months:

SELECT c.c_name, c.c_mktsegment, t.prettyMonthYear, uv.totalRevenue
…
WHERE customer <= 3 and visitYearMonth >= 199201
… 
FROM dwdate WHERE d_yearmonthnum >= 199201) as t
…
ORDER BY c.c_name, c.c_mktsegment, uv.visitYearMonth ASC

This query requires scanning and aggregating nearly 1.9 billion records. As shown in the chart below, Redshift Spectrum substantially speeds up this query. A large Amazon Redshift cluster would have to be provisioned to support this use case. With the aid of Redshift Spectrum, you could use an existing small cluster, keep a single copy of your data in S3, and benefit from economical, durable storage while only paying for what you use via the pay per query pricing model.

Chart comparing Amazon Redshift vs. Redshift Spectrum with pushdown aggregation over 7 years of data

Summary

Redshift Spectrum lowers the time to value for deeper insights on customer data queries spanning the data lake and data warehouse. It can enable interactive analysis on datasets in cases that weren’t economically practical or technically feasible before.

There are cases where you can get the best of both worlds from Redshift Spectrum: higher performance at lower cost. However, there are still latency-sensitive use cases where you may want native Amazon Redshift performance. For more best practice tips, see the 10 Best Practices for Amazon Redshift post.

Please visit the Amazon Redshift Spectrum PoC Environment Github page. If you have questions or suggestions, please comment below.

 


Additional Reading

Learn more about how Amazon Redshift Spectrum extends data warehousing out to exabytes – no loading required.


About the Author

Dylan Tong is an Enterprise Solutions Architect at AWS. He works with customers to help drive their success on the AWS platform through thought leadership and guidance on designing well architected solutions. He has spent most of his career building on his expertise in data management and analytics by working for leaders and innovators in the space.