Tag Archives: alarms

Event-Driven Computing with Amazon SNS and AWS Compute, Storage, Database, and Networking Services

Post Syndicated from Christie Gifrin original https://aws.amazon.com/blogs/compute/event-driven-computing-with-amazon-sns-compute-storage-database-and-networking-services/

Contributed by Otavio Ferreira, Manager, Software Development, AWS Messaging

Like other developers around the world, you may be tackling increasingly complex business problems. A key success factor, in that case, is the ability to break down a large project scope into smaller, more manageable components. A service-oriented architecture guides you toward designing systems as a collection of loosely coupled, independently scaled, and highly reusable services. Microservices take this even further. To improve performance and scalability, they promote fine-grained interfaces and lightweight protocols.

However, the communication among isolated microservices can be challenging. Services are often deployed onto independent servers and don’t share any compute or storage resources. Also, you should avoid hard dependencies among microservices, to preserve maintainability and reusability.

If you apply the pub/sub design pattern, you can effortlessly decouple and independently scale out your microservices and serverless architectures. A pub/sub messaging service, such as Amazon SNS, promotes event-driven computing that statically decouples event publishers from subscribers, while dynamically allowing for the exchange of messages between them. An event-driven architecture also introduces the responsiveness needed to deal with complex problems, which are often unpredictable and asynchronous.

What is event-driven computing?

Given the context of microservices, event-driven computing is a model in which subscriber services automatically perform work in response to events triggered by publisher services. This paradigm can be applied to automate workflows while decoupling the services that collectively and independently work to fulfil these workflows. Amazon SNS is an event-driven computing hub, in the AWS Cloud, that has native integration with several AWS publisher and subscriber services.

Which AWS services publish events to SNS natively?

Several AWS services have been integrated as SNS publishers and, therefore, can natively trigger event-driven computing for a variety of use cases. In this post, I specifically cover AWS compute, storage, database, and networking services, as depicted below.

Compute services

  • Auto Scaling: Helps you ensure that you have the correct number of Amazon EC2 instances available to handle the load for your application. You can configure Auto Scaling lifecycle hooks to trigger events, as Auto Scaling resizes your EC2 cluster.As an example, you may want to warm up the local cache store on newly launched EC2 instances, and also download log files from other EC2 instances that are about to be terminated. To make this happen, set an SNS topic as your Auto Scaling group’s notification target, then subscribe two Lambda functions to this SNS topic. The first function is responsible for handling scale-out events (to warm up cache upon provisioning), whereas the second is in charge of handling scale-in events (to download logs upon termination).

  • AWS Elastic Beanstalk: An easy-to-use service for deploying and scaling web applications and web services developed in a number of programming languages. You can configure event notifications for your Elastic Beanstalk environment so that notable events can be automatically published to an SNS topic, then pushed to topic subscribers.As an example, you may use this event-driven architecture to coordinate your continuous integration pipeline (such as Jenkins CI). That way, whenever an environment is created, Elastic Beanstalk publishes this event to an SNS topic, which triggers a subscribing Lambda function, which then kicks off a CI job against your newly created Elastic Beanstalk environment.

  • Elastic Load Balancing: Automatically distributes incoming application traffic across Amazon EC2 instances, containers, or other resources identified by IP addresses.You can configure CloudWatch alarms on Elastic Load Balancing metrics, to automate the handling of events derived from Classic Load Balancers. As an example, you may leverage this event-driven design to automate latency profiling in an Amazon ECS cluster behind a Classic Load Balancer. In this example, whenever your ECS cluster breaches your load balancer latency threshold, an event is posted by CloudWatch to an SNS topic, which then triggers a subscribing Lambda function. This function runs a task on your ECS cluster to trigger a latency profiling tool, hosted on the cluster itself. This can enhance your latency troubleshooting exercise by making it timely.

Storage services

  • Amazon S3: Object storage built to store and retrieve any amount of data.You can enable S3 event notifications, and automatically get them posted to SNS topics, to automate a variety of workflows. For instance, imagine that you have an S3 bucket to store incoming resumes from candidates, and a fleet of EC2 instances to encode these resumes from their original format (such as Word or text) into a portable format (such as PDF).In this example, whenever new files are uploaded to your input bucket, S3 publishes these events to an SNS topic, which in turn pushes these messages into subscribing SQS queues. Then, encoding workers running on EC2 instances poll these messages from the SQS queues; retrieve the original files from the input S3 bucket; encode them into PDF; and finally store them in an output S3 bucket.

  • Amazon EFS: Provides simple and scalable file storage, for use with Amazon EC2 instances, in the AWS Cloud.You can configure CloudWatch alarms on EFS metrics, to automate the management of your EFS systems. For example, consider a highly parallelized genomics analysis application that runs against an EFS system. By default, this file system is instantiated on the “General Purpose” performance mode. Although this performance mode allows for lower latency, it might eventually impose a scaling bottleneck. Therefore, you may leverage an event-driven design to handle it automatically.Basically, as soon as the EFS metric “Percent I/O Limit” breaches 95%, CloudWatch could post this event to an SNS topic, which in turn would push this message into a subscribing Lambda function. This function automatically creates a new file system, this time on the “Max I/O” performance mode, then switches the genomics analysis application to this new file system. As a result, your application starts experiencing higher I/O throughput rates.

  • Amazon Glacier: A secure, durable, and low-cost cloud storage service for data archiving and long-term backup.You can set a notification configuration on an Amazon Glacier vault so that when a job completes, a message is published to an SNS topic. Retrieving an archive from Amazon Glacier is a two-step asynchronous operation, in which you first initiate a job, and then download the output after the job completes. Therefore, SNS helps you eliminate polling your Amazon Glacier vault to check whether your job has been completed, or not. As usual, you may subscribe SQS queues, Lambda functions, and HTTP endpoints to your SNS topic, to be notified when your Amazon Glacier job is done.

  • AWS Snowball: A petabyte-scale data transport solution that uses secure appliances to transfer large amounts of data.You can leverage Snowball notifications to automate workflows related to importing data into and exporting data from AWS. More specifically, whenever your Snowball job status changes, Snowball can publish this event to an SNS topic, which in turn can broadcast the event to all its subscribers.As an example, imagine a Geographic Information System (GIS) that distributes high-resolution satellite images to users via Web browser. In this example, the GIS vendor could capture up to 80 TB of satellite images; create a Snowball job to import these files from an on-premises system to an S3 bucket; and provide an SNS topic ARN to be notified upon job status changes in Snowball. After Snowball changes the job status from “Importing” to “Completed”, Snowball publishes this event to the specified SNS topic, which delivers this message to a subscribing Lambda function, which finally creates a CloudFront web distribution for the target S3 bucket, to serve the images to end users.

Database services

  • Amazon RDS: Makes it easy to set up, operate, and scale a relational database in the cloud.RDS leverages SNS to broadcast notifications when RDS events occur. As usual, these notifications can be delivered via any protocol supported by SNS, including SQS queues, Lambda functions, and HTTP endpoints.As an example, imagine that you own a social network website that has experienced organic growth, and needs to scale its compute and database resources on demand. In this case, you could provide an SNS topic to listen to RDS DB instance events. When the “Low Storage” event is published to the topic, SNS pushes this event to a subscribing Lambda function, which in turn leverages the RDS API to increase the storage capacity allocated to your DB instance. The provisioning itself takes place within the specified DB maintenance window.

  • Amazon ElastiCache: A web service that makes it easy to deploy, operate, and scale an in-memory data store or cache in the cloud.ElastiCache can publish messages using Amazon SNS when significant events happen on your cache cluster. This feature can be used to refresh the list of servers on client machines connected to individual cache node endpoints of a cache cluster. For instance, an ecommerce website fetches product details from a cache cluster, with the goal of offloading a relational database and speeding up page load times. Ideally, you want to make sure that each web server always has an updated list of cache servers to which to connect.To automate this node discovery process, you can get your ElastiCache cluster to publish events to an SNS topic. Thus, when ElastiCache event “AddCacheNodeComplete” is published, your topic then pushes this event to all subscribing HTTP endpoints that serve your ecommerce website, so that these HTTP servers can update their list of cache nodes.

  • Amazon Redshift: A fully managed data warehouse that makes it simple to analyze data using standard SQL and BI (Business Intelligence) tools.Amazon Redshift uses SNS to broadcast relevant events so that data warehouse workflows can be automated. As an example, imagine a news website that sends clickstream data to a Kinesis Firehose stream, which then loads the data into Amazon Redshift, so that popular news and reading preferences might be surfaced on a BI tool. At some point though, this Amazon Redshift cluster might need to be resized, and the cluster enters a ready-only mode. Hence, this Amazon Redshift event is published to an SNS topic, which delivers this event to a subscribing Lambda function, which finally deletes the corresponding Kinesis Firehose delivery stream, so that clickstream data uploads can be put on hold.At a later point, after Amazon Redshift publishes the event that the maintenance window has been closed, SNS notifies a subscribing Lambda function accordingly, so that this function can re-create the Kinesis Firehose delivery stream, and resume clickstream data uploads to Amazon Redshift.

  • AWS DMS: Helps you migrate databases to AWS quickly and securely. The source database remains fully operational during the migration, minimizing downtime to applications that rely on the database.DMS also uses SNS to provide notifications when DMS events occur, which can automate database migration workflows. As an example, you might create data replication tasks to migrate an on-premises MS SQL database, composed of multiple tables, to MySQL. Thus, if replication tasks fail due to incompatible data encoding in the source tables, these events can be published to an SNS topic, which can push these messages into a subscribing SQS queue. Then, encoders running on EC2 can poll these messages from the SQS queue, encode the source tables into a compatible character set, and restart the corresponding replication tasks in DMS. This is an event-driven approach to a self-healing database migration process.

Networking services

  • Amazon Route 53: A highly available and scalable cloud-based DNS (Domain Name System). Route 53 health checks monitor the health and performance of your web applications, web servers, and other resources.You can set CloudWatch alarms and get automated Amazon SNS notifications when the status of your Route 53 health check changes. As an example, imagine an online payment gateway that reports the health of its platform to merchants worldwide, via a status page. This page is hosted on EC2 and fetches platform health data from DynamoDB. In this case, you could configure a CloudWatch alarm for your Route 53 health check, so that when the alarm threshold is breached, and the payment gateway is no longer considered healthy, then CloudWatch publishes this event to an SNS topic, which pushes this message to a subscribing Lambda function, which finally updates the DynamoDB table that populates the status page. This event-driven approach avoids any kind of manual update to the status page visited by merchants.

  • AWS Direct Connect (AWS DX): Makes it easy to establish a dedicated network connection from your premises to AWS, which can reduce your network costs, increase bandwidth throughput, and provide a more consistent network experience than Internet-based connections.You can monitor physical DX connections using CloudWatch alarms, and send SNS messages when alarms change their status. As an example, when a DX connection state shifts to 0 (zero), indicating that the connection is down, this event can be published to an SNS topic, which can fan out this message to impacted servers through HTTP endpoints, so that they might reroute their traffic through a different connection instead. This is an event-driven approach to connectivity resilience.

More event-driven computing on AWS

In addition to SNS, event-driven computing is also addressed by Amazon CloudWatch Events, which delivers a near real-time stream of system events that describe changes in AWS resources. With CloudWatch Events, you can route each event type to one or more targets, including:

Many AWS services publish events to CloudWatch. As an example, you can get CloudWatch Events to capture events on your ETL (Extract, Transform, Load) jobs running on AWS Glue and push failed ones to an SQS queue, so that you can retry them later.

Conclusion

Amazon SNS is a pub/sub messaging service that can be used as an event-driven computing hub to AWS customers worldwide. By capturing events natively triggered by AWS services, such as EC2, S3 and RDS, you can automate and optimize all kinds of workflows, namely scaling, testing, encoding, profiling, broadcasting, discovery, failover, and much more. Business use cases presented in this post ranged from recruiting websites, to scientific research, geographic systems, social networks, retail websites, and news portals.

Start now by visiting Amazon SNS in the AWS Management Console, or by trying the AWS 10-Minute Tutorial, Send Fan-out Event Notifications with Amazon SNS and Amazon SQS.

 

Capturing Custom, High-Resolution Metrics from Containers Using AWS Step Functions and AWS Lambda

Post Syndicated from Nathan Taber original https://aws.amazon.com/blogs/compute/capturing-custom-high-resolution-metrics-from-containers-using-aws-step-functions-and-aws-lambda/

Contributed by Trevor Sullivan, AWS Solutions Architect

When you deploy containers with Amazon ECS, are you gathering all of the key metrics so that you can correctly monitor the overall health of your ECS cluster?

By default, ECS writes metrics to Amazon CloudWatch in 5-minute increments. For complex or large services, this may not be sufficient to make scaling decisions quickly. You may want to respond immediately to changes in workload or to identify application performance problems. Last July, CloudWatch announced support for high-resolution metrics, up to a per-second basis.

These high-resolution metrics can be used to give you a clearer picture of the load and performance for your applications, containers, clusters, and hosts. In this post, I discuss how you can use AWS Step Functions, along with AWS Lambda, to cost effectively record high-resolution metrics into CloudWatch. You implement this solution using a serverless architecture, which keeps your costs low and makes it easier to troubleshoot the solution.

To show how this works, you retrieve some useful metric data from an ECS cluster running in the same AWS account and region (Oregon, us-west-2) as the Step Functions state machine and Lambda function. However, you can use this architecture to retrieve any custom application metrics from any resource in any AWS account and region.

Why Step Functions?

Step Functions enables you to orchestrate multi-step tasks in the AWS Cloud that run for any period of time, up to a year. Effectively, you’re building a blueprint for an end-to-end process. After it’s built, you can execute the process as many times as you want.

For this architecture, you gather metrics from an ECS cluster, every five seconds, and then write the metric data to CloudWatch. After your ECS cluster metrics are stored in CloudWatch, you can create CloudWatch alarms to notify you. An alarm can also trigger an automated remediation activity such as scaling ECS services, when a metric exceeds a threshold defined by you.

When you build a Step Functions state machine, you define the different states inside it as JSON objects. The bulk of the work in Step Functions is handled by the common task state, which invokes Lambda functions or Step Functions activities. There is also a built-in library of other useful states that allow you to control the execution flow of your program.

One of the most useful state types in Step Functions is the parallel state. Each parallel state in your state machine can have one or more branches, each of which is executed in parallel. Another useful state type is the wait state, which waits for a period of time before moving to the next state.

In this walkthrough, you combine these three states (parallel, wait, and task) to create a state machine that triggers a Lambda function, which then gathers metrics from your ECS cluster.

Step Functions pricing

This state machine is executed every minute, resulting in 60 executions per hour, and 1,440 executions per day. Step Functions is billed per state transition, including the Start and End state transitions, and giving you approximately 37,440 state transitions per day. To reach this number, I’m using this estimated math:

26 state transitions per-execution x 60 minutes x 24 hours

Based on current pricing, at $0.000025 per state transition, the daily cost of this metric gathering state machine would be $0.936.

Step Functions offers an indefinite 4,000 free state transitions every month. This benefit is available to all customers, not just customers who are still under the 12-month AWS Free Tier. For more information and cost example scenarios, see Step Functions pricing.

Why Lambda?

The goal is to capture metrics from an ECS cluster, and write the metric data to CloudWatch. This is a straightforward, short-running process that makes Lambda the perfect place to run your code. Lambda is one of the key services that makes up “Serverless” application architectures. It enables you to consume compute capacity only when your code is actually executing.

The process of gathering metric data from ECS and writing it to CloudWatch takes a short period of time. In fact, my average Lambda function execution time, while developing this post, is only about 250 milliseconds on average. For every five-second interval that occurs, I’m only using 1/20th of the compute time that I’d otherwise be paying for.

Lambda pricing

For billing purposes, Lambda execution time is rounded up to the nearest 100-ms interval. In general, based on the metrics that I observed during development, a 250-ms runtime would be billed at 300 ms. Here, I calculate the cost of this Lambda function executing on a daily basis.

Assuming 31 days in each month, there would be 535,680 five-second intervals (31 days x 24 hours x 60 minutes x 12 five-second intervals = 535,680). The Lambda function is invoked every five-second interval, by the Step Functions state machine, and runs for a 300-ms period. At current Lambda pricing, for a 128-MB function, you would be paying approximately the following:

Total compute

Total executions = 535,680
Total compute = total executions x (3 x $0.000000208 per 100 ms) = $0.334 per day

Total requests

Total requests = (535,680 / 1000000) * $0.20 per million requests = $0.11 per day

Total Lambda Cost

$0.11 requests + $0.334 compute time = $0.444 per day

Similar to Step Functions, Lambda offers an indefinite free tier. For more information, see Lambda Pricing.

Walkthrough

In the following sections, I step through the process of configuring the solution just discussed. If you follow along, at a high level, you will:

  • Configure an IAM role and policy
  • Create a Step Functions state machine to control metric gathering execution
  • Create a metric-gathering Lambda function
  • Configure a CloudWatch Events rule to trigger the state machine
  • Validate the solution

Prerequisites

You should already have an AWS account with a running ECS cluster. If you don’t have one running, you can easily deploy a Docker container on an ECS cluster using the AWS Management Console. In the example produced for this post, I use an ECS cluster running Windows Server (currently in beta), but either a Linux or Windows Server cluster works.

Create an IAM role and policy

First, create an IAM role and policy that enables Step Functions, Lambda, and CloudWatch to communicate with each other.

  • The CloudWatch Events rule needs permissions to trigger the Step Functions state machine.
  • The Step Functions state machine needs permissions to trigger the Lambda function.
  • The Lambda function needs permissions to query ECS and then write to CloudWatch Logs and metrics.

When you create the state machine, Lambda function, and CloudWatch Events rule, you assign this role to each of those resources. Upon execution, each of these resources assumes the specified role and executes using the role’s permissions.

  1. Open the IAM console.
  2. Choose Roles, create New Role.
  3. For Role Name, enter WriteMetricFromStepFunction.
  4. Choose Save.

Create the IAM role trust relationship
The trust relationship (also known as the assume role policy document) for your IAM role looks like the following JSON document. As you can see from the document, your IAM role needs to trust the Lambda, CloudWatch Events, and Step Functions services. By configuring your role to trust these services, they can assume this role and inherit the role permissions.

  1. Open the IAM console.
  2. Choose Roles and select the IAM role previously created.
  3. Choose Trust RelationshipsEdit Trust Relationships.
  4. Enter the following trust policy text and choose Save.
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "lambda.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    },
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "events.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    },
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "states.us-west-2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Create an IAM policy

After you’ve finished configuring your role’s trust relationship, grant the role access to the other AWS resources that make up the solution.

The IAM policy is what gives your IAM role permissions to access various resources. You must whitelist explicitly the specific resources to which your role has access, because the default IAM behavior is to deny access to any AWS resources.

I’ve tried to keep this policy document as generic as possible, without allowing permissions to be too open. If the name of your ECS cluster is different than the one in the example policy below, make sure that you update the policy document before attaching it to your IAM role. You can attach this policy as an inline policy, instead of creating the policy separately first. However, either approach is valid.

  1. Open the IAM console.
  2. Select the IAM role, and choose Permissions.
  3. Choose Add in-line policy.
  4. Choose Custom Policy and then enter the following policy. The inline policy name does not matter.
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [ "logs:*" ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [ "cloudwatch:PutMetricData" ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [ "states:StartExecution" ],
            "Resource": [
                "arn:aws:states:*:*:stateMachine:WriteMetricFromStepFunction"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [ "lambda:InvokeFunction" ],
            "Resource": "arn:aws:lambda:*:*:function:WriteMetricFromStepFunction"
        },
        {
            "Effect": "Allow",
            "Action": [ "ecs:Describe*" ],
            "Resource": "arn:aws:ecs:*:*:cluster/ECSEsgaroth"
        }
    ]
}

Create a Step Functions state machine

In this section, you create a Step Functions state machine that invokes the metric-gathering Lambda function every five (5) seconds, for a one-minute period. If you divide a minute (60) seconds into equal parts of five-second intervals, you get 12. Based on this math, you create 12 branches, in a single parallel state, in the state machine. Each branch triggers the metric-gathering Lambda function at a different five-second marker, throughout the one-minute period. After all of the parallel branches finish executing, the Step Functions execution completes and another begins.

Follow these steps to create your Step Functions state machine:

  1. Open the Step Functions console.
  2. Choose DashboardCreate State Machine.
  3. For State Machine Name, enter WriteMetricFromStepFunction.
  4. Enter the state machine code below into the editor. Make sure that you insert your own AWS account ID for every instance of “676655494xxx”
  5. Choose Create State Machine.
  6. Select the WriteMetricFromStepFunction IAM role that you previously created.
{
    "Comment": "Writes ECS metrics to CloudWatch every five seconds, for a one-minute period.",
    "StartAt": "ParallelMetric",
    "States": {
      "ParallelMetric": {
        "Type": "Parallel",
        "Branches": [
          {
            "StartAt": "WriteMetricLambda",
            "States": {
             	"WriteMetricLambda": {
                  "Type": "Task",
				  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
    	  {
            "StartAt": "WaitFive",
            "States": {
            	"WaitFive": {
            		"Type": "Wait",
            		"Seconds": 5,
            		"Next": "WriteMetricLambdaFive"
          		},
             	"WriteMetricLambdaFive": {
                  "Type": "Task",
				  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
    	  {
            "StartAt": "WaitTen",
            "States": {
            	"WaitTen": {
            		"Type": "Wait",
            		"Seconds": 10,
            		"Next": "WriteMetricLambda10"
          		},
             	"WriteMetricLambda10": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
    	  {
            "StartAt": "WaitFifteen",
            "States": {
            	"WaitFifteen": {
            		"Type": "Wait",
            		"Seconds": 15,
            		"Next": "WriteMetricLambda15"
          		},
             	"WriteMetricLambda15": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
          {
            "StartAt": "Wait20",
            "States": {
            	"Wait20": {
            		"Type": "Wait",
            		"Seconds": 20,
            		"Next": "WriteMetricLambda20"
          		},
             	"WriteMetricLambda20": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
          {
            "StartAt": "Wait25",
            "States": {
            	"Wait25": {
            		"Type": "Wait",
            		"Seconds": 25,
            		"Next": "WriteMetricLambda25"
          		},
             	"WriteMetricLambda25": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
          {
            "StartAt": "Wait30",
            "States": {
            	"Wait30": {
            		"Type": "Wait",
            		"Seconds": 30,
            		"Next": "WriteMetricLambda30"
          		},
             	"WriteMetricLambda30": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
          {
            "StartAt": "Wait35",
            "States": {
            	"Wait35": {
            		"Type": "Wait",
            		"Seconds": 35,
            		"Next": "WriteMetricLambda35"
          		},
             	"WriteMetricLambda35": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
          {
            "StartAt": "Wait40",
            "States": {
            	"Wait40": {
            		"Type": "Wait",
            		"Seconds": 40,
            		"Next": "WriteMetricLambda40"
          		},
             	"WriteMetricLambda40": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
          {
            "StartAt": "Wait45",
            "States": {
            	"Wait45": {
            		"Type": "Wait",
            		"Seconds": 45,
            		"Next": "WriteMetricLambda45"
          		},
             	"WriteMetricLambda45": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
          {
            "StartAt": "Wait50",
            "States": {
            	"Wait50": {
            		"Type": "Wait",
            		"Seconds": 50,
            		"Next": "WriteMetricLambda50"
          		},
             	"WriteMetricLambda50": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
          {
            "StartAt": "Wait55",
            "States": {
            	"Wait55": {
            		"Type": "Wait",
            		"Seconds": 55,
            		"Next": "WriteMetricLambda55"
          		},
             	"WriteMetricLambda55": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          }
        ],
        "End": true
      }
  }
}

Now you’ve got a shiny new Step Functions state machine! However, you might ask yourself, “After the state machine has been created, how does it get executed?” Before I answer that question, create the Lambda function that writes the custom metric, and then you get the end-to-end process moving.

Create a Lambda function

The meaty part of the solution is a Lambda function, written to consume the Python 3.6 runtime, that retrieves metric values from ECS, and then writes them to CloudWatch. This Lambda function is what the Step Functions state machine is triggering every five seconds, via the Task states. Key points to remember:

The Lambda function needs permission to:

  • Write CloudWatch metrics (PutMetricData API).
  • Retrieve metrics from ECS clusters (DescribeCluster API).
  • Write StdOut to CloudWatch Logs.

Boto3, the AWS SDK for Python, is included in the Lambda execution environment for Python 2.x and 3.x.

Because Lambda includes the AWS SDK, you don’t have to worry about packaging it up and uploading it to Lambda. You can focus on writing code and automatically take a dependency on boto3.

As for permissions, you’ve already created the IAM role and attached a policy to it that enables your Lambda function to access the necessary API actions. When you create your Lambda function, make sure that you select the correct IAM role, to ensure it is invoked with the correct permissions.

The following Lambda function code is generic. So how does the Lambda function know which ECS cluster to gather metrics for? Your Step Functions state machine automatically passes in its state to the Lambda function. When you create your CloudWatch Events rule, you specify a simple JSON object that passes the desired ECS cluster name into your Step Functions state machine, which then passes it to the Lambda function.

Use the following property values as you create your Lambda function:

Function Name: WriteMetricFromStepFunction
Description: This Lambda function retrieves metric values from an ECS cluster and writes them to Amazon CloudWatch.
Runtime: Python3.6
Memory: 128 MB
IAM Role: WriteMetricFromStepFunction

import boto3

def handler(event, context):
    cw = boto3.client('cloudwatch')
    ecs = boto3.client('ecs')
    print('Got boto3 client objects')
    
    Dimension = {
        'Name': 'ClusterName',
        'Value': event['ECSClusterName']
    }

    cluster = get_ecs_cluster(ecs, Dimension['Value'])
    
    cw_args = {
       'Namespace': 'ECS',
       'MetricData': [
           {
               'MetricName': 'RunningTask',
               'Dimensions': [ Dimension ],
               'Value': cluster['runningTasksCount'],
               'Unit': 'Count',
               'StorageResolution': 1
           },
           {
               'MetricName': 'PendingTask',
               'Dimensions': [ Dimension ],
               'Value': cluster['pendingTasksCount'],
               'Unit': 'Count',
               'StorageResolution': 1
           },
           {
               'MetricName': 'ActiveServices',
               'Dimensions': [ Dimension ],
               'Value': cluster['activeServicesCount'],
               'Unit': 'Count',
               'StorageResolution': 1
           },
           {
               'MetricName': 'RegisteredContainerInstances',
               'Dimensions': [ Dimension ],
               'Value': cluster['registeredContainerInstancesCount'],
               'Unit': 'Count',
               'StorageResolution': 1
           }
        ]
    }
    cw.put_metric_data(**cw_args)
    print('Finished writing metric data')
    
def get_ecs_cluster(client, cluster_name):
    cluster = client.describe_clusters(clusters = [ cluster_name ])
    print('Retrieved cluster details from ECS')
    return cluster['clusters'][0]

Create the CloudWatch Events rule

Now you’ve created an IAM role and policy, Step Functions state machine, and Lambda function. How do these components actually start communicating with each other? The final step in this process is to set up a CloudWatch Events rule that triggers your metric-gathering Step Functions state machine every minute. You have two choices for your CloudWatch Events rule expression: rate or cron. In this example, use the cron expression.

A couple key learning points from creating the CloudWatch Events rule:

  • You can specify one or more targets, of different types (for example, Lambda function, Step Functions state machine, SNS topic, and so on).
  • You’re required to specify an IAM role with permissions to trigger your target.
    NOTE: This applies only to certain types of targets, including Step Functions state machines.
  • Each target that supports IAM roles can be triggered using a different IAM role, in the same CloudWatch Events rule.
  • Optional: You can provide custom JSON that is passed to your target Step Functions state machine as input.

Follow these steps to create the CloudWatch Events rule:

  1. Open the CloudWatch console.
  2. Choose Events, RulesCreate Rule.
  3. Select Schedule, Cron Expression, and then enter the following rule:
    0/1 * * * ? *
  4. Choose Add Target, Step Functions State MachineWriteMetricFromStepFunction.
  5. For Configure Input, select Constant (JSON Text).
  6. Enter the following JSON input, which is passed to Step Functions, while changing the cluster name accordingly:
    { "ECSClusterName": "ECSEsgaroth" }
  7. Choose Use Existing Role, WriteMetricFromStepFunction (the IAM role that you previously created).

After you’ve completed with these steps, your screen should look similar to this:

Validate the solution

Now that you have finished implementing the solution to gather high-resolution metrics from ECS, validate that it’s working properly.

  1. Open the CloudWatch console.
  2. Choose Metrics.
  3. Choose custom and select the ECS namespace.
  4. Choose the ClusterName metric dimension.

You should see your metrics listed below.

Troubleshoot configuration issues

If you aren’t receiving the expected ECS cluster metrics in CloudWatch, check for the following common configuration issues. Review the earlier procedures to make sure that the resources were properly configured.

  • The IAM role’s trust relationship is incorrectly configured.
    Make sure that the IAM role trusts Lambda, CloudWatch Events, and Step Functions in the correct region.
  • The IAM role does not have the correct policies attached to it.
    Make sure that you have copied the IAM policy correctly as an inline policy on the IAM role.
  • The CloudWatch Events rule is not triggering new Step Functions executions.
    Make sure that the target configuration on the rule has the correct Step Functions state machine and IAM role selected.
  • The Step Functions state machine is being executed, but failing part way through.
    Examine the detailed error message on the failed state within the failed Step Functions execution. It’s possible that the
  • IAM role does not have permissions to trigger the target Lambda function, that the target Lambda function may not exist, or that the Lambda function failed to complete successfully due to invalid permissions.
    Although the above list covers several different potential configuration issues, it is not comprehensive. Make sure that you understand how each service is connected to each other, how permissions are granted through IAM policies, and how IAM trust relationships work.

Conclusion

In this post, you implemented a Serverless solution to gather and record high-resolution application metrics from containers running on Amazon ECS into CloudWatch. The solution consists of a Step Functions state machine, Lambda function, CloudWatch Events rule, and an IAM role and policy. The data that you gather from this solution helps you rapidly identify issues with an ECS cluster.

To gather high-resolution metrics from any service, modify your Lambda function to gather the correct metrics from your target. If you prefer not to use Python, you can implement a Lambda function using one of the other supported runtimes, including Node.js, Java, or .NET Core. However, this post should give you the fundamental basics about capturing high-resolution metrics in CloudWatch.

If you found this post useful, or have questions, please comment below.

Protect your Reputation with Email Pausing and Configuration Set Metrics

Post Syndicated from Brent Meyer original https://aws.amazon.com/blogs/ses/protect-your-reputation-with-email-pausing-and-configuration-set-metrics/

In August, we launched the reputation dashboard, which helps you track important metrics that could impact your ability to send emails. By monitoring the metrics in this dashboard, you can protect your sender reputation, which can increase the likelihood that the emails you send will reach your customers’ inboxes.

Today, we’re launching two features that build upon the capabilities of the reputation dashboard. The first is the ability to temporarily pause email sending, either at the configuration set level, or across your entire Amazon SES account. The second is the ability to export reputation metrics for individual configuration sets.

Email Pausing

Today’s update includes new API operations that can temporarily pause your ability to send email using Amazon SES. To disable email sending across your entire Amazon SES account, you can use the UpdateAccountSendingEnabled operation. To pause sending only for emails sent using a specific configuration set, you can use the UpdateConfigurationSetSendingEnabled operation.

Email pausing is helpful because Amazon SES uses automatic enforcement policies. If the bounce or complaint rates for your account are too high, your account is automatically placed on probation. If the bounce or complaint issues continue after the probation period has ended, your account may be suspended.

With email pausing, you can temporarily halt your ability to send email before your account is placed on probation. While your ability to send email is paused, you can identify the issues that were causing your account to register high bounce or complaint rates. You can then resume sending after the issues are resolved.

Email pausing helps ensure that your ability to send email using Amazon SES is not interrupted because of enforcement issues. It helps ensure that your sender reputation won’t be damaged by mistakes or unforeseen issues.

You can learn more about the UpdateAccountSendingEnabled and UpdateConfigurationSetSendingEnabled operations in the Amazon Simple Email Service API Reference.

Configuration Set Reputation Metrics

Amazon SES automatically publishes the bounce and complaint rates for your account to Amazon CloudWatch. In CloudWatch, you can monitor these metrics over time, and create alarms that notify you when your reputation metrics cross certain thresholds.

With today’s update, you can also publish reputation metrics for individual configuration sets to CloudWatch. This feature gives you additional information about the messages you send using Amazon SES. For example, if you send all of your marketing emails using one configuration set, and your transactional emails using a different configuration set, you can view distinct reputation metrics for each type of email.

Because we anticipate that this feature will lead to the creation of many new configuration sets, we’re increasing the maximum number of configuration sets you can create from 50 to 10,000.

For more information about exporting reputation metrics for configuration sets, see Exporting Reputation Metrics for a Configuration Set to CloudWatch in the Amazon Simple Email Service Developer Guide.

Automating These Features

You can use AWS services—including Amazon SNS, AWS Lambda, and Amazon CloudWatch—to create a solution that automatically pauses email sending for your account when your overall reputation metrics cross a certain threshold. Or, to minimize disruption to your email sending program, you can pause email sending for a specific configuration set when the metrics for that configuration set cross a threshold. The following image illustrates the processes that occur when you implement these solutions.

A flow diagram that illustrates a solution for automatically pausing Amazon SES email sending. Amazon SES provides reputation metrics to CloudWatch. If those metrics exceed a threshold, a CloudWatch alarm is triggered, which triggers an SNS topic. The SNS topic sends notifications (email, SMS), and executes a Lambda function, which pauses email sending in SES.

For more information on both of these solutions, see Automatically Pausing Email Sending in the Amazon Simple Email Service Developer Guide.

We’re always looking for ways to help safeguard the reputation you’ve worked hard to build. If you have suggestions, questions, or comments, we’d love to hear from you in the comments below, or in the Amazon SES Forum.

These features are now available in the following AWS Regions: US West (Oregon), US East (N. Virginia), and EU (Ireland).

Amazon ElastiCache Update – Online Resizing for Redis Clusters

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/amazon-elasticache-update-online-resizing-for-redis-clusters/

Amazon ElastiCache makes it easy to for you to set up a fast, in-memory data store and cache. With support for the two most popular open source offerings (Redis and Memcached), ElastiCache supports the demanding needs of game leaderboards, in-memory analytics, and large-scale messaging.

Today I would like to tell you about an important addition to Amazon ElastiCache for Redis. You can already create clusters with up to 15 shards, each responsible for storing keys and values for a specific set of slots (each cluster has exactly 16,384 slots). A single cluster can expand to store 3.55 terabytes of in-memory data while supporting up to 20 million reads and 4.5 million writes per second.

Now with Online Resizing
You can now adjust the number of shards in a running ElastiCache for Redis cluster while the cluster remains online and responding to requests. This gives you the power to respond to changes in traffic and data volume without having to take the cluster offline or to start with an empty cache. You can also rebalance a running cluster to uniformly redistribute slot space without changing the number of shards.

When you initiate a resharding or rebalancing operation, ElastiCache for Redis starts by preparing a plan that will result in an even distribution of slots across the shards in the cluster. Then it transfers slots across shards, moving many in parallel for efficiency. This all happens while the cluster continues to respond to requests, with a modest impact on write throughput for writes to a slot that is in motion. The migration rate is dependent on the instance type, network speed, read/write traffic to the slots, and is generally about 1 gigabyte per minute.

The resharding and rebalancing operations apply to Redis clusters that were created with Cluster Mode enabled:

Resharding a Cluster
In general, you will know that it is time to expand a cluster via resharding when it starts to face significant memory pressure or when individual nodes are becoming bottlenecks. You can watch the cluster’s CloudWatch metrics to identify each situation:

Memory Pressure – FreeableMemory, SwapUsage, BytesUsedForCache.

CPU Bottleneck – CPUUtilization, CurrConnections, NewConnections.

Network Bottleneck – NetworkBytesIn, NetworkBytesOut.

You can use CloudWatch Dashboards to monitor these metrics, and CloudWatch Alarms to automate the resharding process.

To reshard a Redis cluster from the ElastiCache Dashboard, click on the cluster to visit the detail page, and then click on the Add shards button:

Enter the number of shards to add and (optionally) the desired Availability Zones, then click on Add:

The status of the cluster will change to modifying and the resharding process will begin. It can take anywhere from a few minutes to several hours, as indicated above. You can track the progress on the detail page for the cluster:

You can see the slots moving from shard to shard:

You can also watch the Events for the cluster:

During the resharding you should avoid the use of the KEYS and SMEMBERS commands, as well as compute-intensive Lua scripts in order to moderate the load on the cluster shards. You should avoid the FLUSHDB and FLUSHALL commands entirely; using them will interrupt and then abort the resharding process.

The status of each shard will return to available when the process is complete:

The same process takes place when you delete shards.

Rebalancing Slots
You can perform this operation by heading to the cluster’s detail page and clicking on Rebalance Slot Distribution:

Things to Know
Here are a couple of things to keep in mind about this new feature:

Engine Version – Your cluster must be running version 3.2.10 of the Redis engine.

Migration Size – Slots that contain items that are larger than 256 megabytes after serialization are not migrated.

Cluster Endpoint – The cluster endpoint does not change as a result of a resharding or rebalancing.

Available Now
This feature is available now and you can start using it today.

Jeff;

 

New – Application Load Balancing via IP Address to AWS & On-Premises Resources

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-application-load-balancing-via-ip-address-to-aws-on-premises-resources/

I told you about the new AWS Application Load Balancer last year and showed you how to use it to do implement Layer 7 (application) routing to EC2 instances and to microservices running in containers.

Some of our customers are building hybrid applications as part of a longer-term move to AWS. These customers have told us that they would like to use a single Application Load Balancer to spread traffic across a combination of existing on-premises resources and new resources running in the AWS Cloud. Other customers would like to spread traffic to web or database servers that are scattered across two or more Virtual Private Clouds (VPCs), host multiple services on the same instance with distinct IP addresses but a common port number, and to offer support for IP-based virtual hosting for clients that do not support Server Name Indication (SNI). Another group of customers would like to host multiple instances of a service on the same instance (perhaps within containers), while using multiple interfaces and security groups to implement fine-grained access control.

These situations arise within a broad set of hybrid, migration, disaster recovery, and on-premises use cases and scenarios.

Route to IP Addresses
In order to address these use cases, Application Load Balancers can now route traffic directly to IP addresses. These addresses can be in the same VPC as the ALB, a peer VPC in the same region, on an EC2 instance connected to a VPC by way of ClassicLink, or on on-premises resources at the other end of a VPN connection or AWS Direct Connect connection.

Application Load Balancers already group targets in to target groups. As part of today’s launch, each target group now has a target type attribute:

instance – Targets are registered by way of EC2 instance IDs, as before.

ip – Targets are registered as IP addresses. You can use any IPv4 address from the load balancer’s VPC CIDR for targets within load balancer’s VPC and any IPv4 address from the RFC 1918 ranges (10.0.0.0/8, 172.16.0.0/12, and 192.168.0.0/16) or the RFC 6598 range (100.64.0.0/10) for targets located outside the load balancer’s VPC (this includes Peered VPC, EC2-Classic, and on-premises targets reachable over Direct Connect or VPN).

Each target group has a load balancer and health check configuration, and publishes metrics to CloudWatch, as has always been the case.

Let’s say that you are in the transition phase of an application migration to AWS or want to use AWS to augment on-premises resources with EC2 instances and you need to distribute application traffic across both your AWS and on-premises resources. You can achieve this by registering all the resources (AWS and on-premises) to the same target group and associate the target group with a load balancer. Alternatively, you can use DNS based weighted load balancing across AWS and on-premises resources using two load balancers i.e. one load balancer for AWS and other for on-premises resources. In the scenario where application-A back-ends are in VPC and application-B back-ends are in on-premises locations then you can put back-ends for each application in different target groups and use content based routing to route traffic to each target group.

Creating a Target Group
Here’s how I create a target group that sends traffic to some IP addresses as part of the process of creating an Application Load Balancer. I enter a name (ip-target-1) and select ip as the Target type:

Then I enter IP address targets. These can be from the VPC that hosts the load balancer:

Or they can be other private IP addresses within one of the private ranges listed above, for targets outside of the VPC that hosts the load balancer:

After I review the settings and create the load balancer, traffic will be sent to the designated IP addresses as soon as they pass the health checks. Each load balancer can accommodate up to 1000 targets.

I can examine my target group and edit the set of targets at any time:

As you can see, one of my targets was not healthy when I took this screen shot (this was by design). Metrics are published to CloudWatch for each target group; I can see them in the Console and I can create CloudWatch Alarms:

Available Now
This feature is available now and you can start using it today in all AWS Regions.

Jeff;

 

Announcing the Reputation Dashboard

Post Syndicated from Brent Meyer original https://aws.amazon.com/blogs/ses/announcing-the-reputation-dashboard/

The Amazon SES team is pleased to announce the addition of a reputation dashboard to the Amazon SES console. This new feature helps you track issues that could impact the sender reputation of your Amazon SES account.

What information does the reputation dashboard provide?

Amazon SES users must maintain bounce and complaint rates below a certain threshold. We put these rules in place to protect the sender reputations of all Amazon SES users, and to prevent Amazon SES from being used to deliver spam or malicious content. Users with very high rates of bounces or complaints may be put on probation. If the bounce or complaint rates are not within acceptable limits by the end of the probation period, these accounts may be shut down completely.

Previous versions of Amazon SES provided basic sending metrics, including information about bounces and complaints. However, the bounce and complaint metrics in this dashboard only included information for the past few days of email sent from your account, as opposed to an overall rate.

The new reputation dashboard provides overall bounce and complaint rates for your entire account. This enables you to more closely monitor the health of your account and adjust your email sending practices as needed.

Can’t I just calculate these values myself?

Because each Amazon SES account sends different volumes of email at different rates, we do not calculate bounce and complaint rates based on a fixed time period. Instead, we use a representative volume of email. This representative volume is the basis for the bounce and complaint rate calculations.

Why do we use representative volume in our calculations? Let’s imagine that you sent 1,000 emails one week, and 5 of them bounced. If we only considered a week of email sending, your metrics look good. Now imagine that the next week you only sent 5 emails, and one of them bounced. Suddenly, your bounce rate jumps from half a percent to 20%, and your account is automatically placed on probation. This example may be an extreme case, but it illustrates the reason that we don’t use fixed time periods when calculating bounce and complaint rates.

When you open the new reputation dashboard, you will see bounce and complaint rates calculated using the representative volume for your account. We automatically recalculate these rates every time you send email through Amazon SES.

What else can I do with these metrics?

The Bounce and Complaint Rate metrics in the reputation dashboard are automatically sent to Amazon CloudWatch. You can use CloudWatch to create dashboards that track your bounce and complaint rates over time, and to create alarms that send you notifications when these metrics cross certain thresholds. To learn more, see Creating Reputation Monitoring Alarms Using CloudWatch in the Amazon SES Developer Guide.

How can I see the reputation dashboard?

The reputation dashboard is now available to all Amazon SES users. To view the reputation dashboard, sign in to the Amazon SES console. On the left navigation menu, choose Reputation Dashboard. For more information, see Monitoring Your Sender Reputation in the Amazon SES Developer Guide.

We hope you find the information in the reputation dashboard to be useful in managing your email sending programs and campaigns. If you have any questions or comments, please leave a comment on this post, or let us know in the Amazon SES forum.

New – High-Resolution Custom Metrics and Alarms for Amazon CloudWatch

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-high-resolution-custom-metrics-and-alarms-for-amazon-cloudwatch/

Amazon CloudWatch has been an important part of AWS since early 2009! Launched as part of a three-pack that also included Auto Scaling and Elastic Load Balancing, CloudWatch has evolved into a very powerful monitoring service for AWS resources and the applications that you run on the AWS Cloud. CloudWatch custom metrics (launched way back in 2011) allow you to store business and application metrics in CloudWatch, view them in graphs, and initiate actions based on CloudWatch Alarms. Needless to say, we have made many enhancements to CloudWatch over the years! Some of the most recent include Extended Metrics Retention (and a User Interface Update), Dashboards, API/CloudFormation Support for Dashboards, and Alarms on Dashboards.

Originally, metrics were stored at five minute intervals; this was reduced to one minute (also known as Detailed Monitoring) in response to customer requests way back in 2010. This was a welcome change, but now it is time to do better. Our customers are streaming video, running flash sales, deploying code tens or hundreds of times per day, and running applications that scale in and out very quickly as conditions change. In all of these situations, a minute is simply too coarse of an interval. Important, transient spikes can be missed; disparate (yet related) events are difficult to correlate across time, and the MTTR (mean time to repair) when something breaks is too high.

New High-Resolution Metrics
Today we are adding support for high-resolution custom metrics, with plans to add support for AWS services over time. Your applications can now publish metrics to CloudWatch with 1-second resolution. You can watch the metrics scroll across your screen seconds after they are published and you can set up high-resolution CloudWatch Alarms that evaluate as frequently as every 10 seconds.

Imagine alarming when available memory gets low. This is often a transient condition that can be hard to catch with infrequent samples. With high-resolution metrics, you can see, detect (via an alarm), and act on it within seconds:

In this case the alarm on the right would not fire, and you would not know about the issue.

Publishing High-Resolution Metrics
You can publish high-resolution metrics in two different ways:

  • API – The PutMetricData function now accepts an optional StorageResolution parameter. Set this parameter to 1 to publish high-resolution metrics; omit it (or set it to 60) to publish at standard 1-minute resolution.
  • collectd plugin – The CloudWatch plugin for collectd has been updated to support collection and publication of high-resolution metrics. You will need to set the enable_high_definition_metrics parameter in the config file for the plugin.

CloudWatch metrics are rolled up over time; resolution effectively decreases as the metrics age. Here’s the schedule:

  • 1 second metrics are available for 3 hours.
  • 60 second metrics are available for 15 days.
  • 5 minute metrics are available for 63 days.
  • 1 hour metrics are available for 455 days (15 months).

When you call GetMetricStatistics you can specify a period of 1, 5, 10, 30 or any multiple of 60 seconds for high-resolution metrics. You can specify any multiple of 60 seconds for standard metrics.

A Quick Demo
I grabbed my nearest EC2 instance, installed the latest version of collectd and the Python plugin:

$ sudo yum install collectd collectd-python

Then I downloaded the setup script for the plugin, made it executable, and ran it:

$ wget https://raw.githubusercontent.com/awslabs/collectd-cloudwatch/master/src/setup.py
$ chmod a+x setup.py
$ sudo ./setup.py

I had already created a suitable IAM Role and added it to my instance; it was automatically detected during setup. I was asked to enable the high resolution metrics:

collectd started running and publishing metrics within seconds. I opened up the CloudWatch Console to take a look:

Then I zoomed in to see the metrics in detail:

I also created an alarm that will check the memory.percent.used metric at 10 second intervals. This will make it easier for me to detect situations where a lot of memory is being used for a short period of time:

Now Available
High-resolution custom metrics and alarms are available now in all Public AWS Regions, with support for AWS GovCloud (US) coming soon.

As was already the case, you can store 10 metrics at no charge every month; see the CloudWatch Pricing page for more information. Pricing for high-resolution metrics is identical to that for standard resolution metrics, with volume tiers that allow you to realize savings (on a per-metric) basis when you use more metrics. High-resolution alarms are priced at $0.30 per alarm per month.

New – Target Tracking Policies for EC2 Auto Scaling

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-target-tracking-policies-for-ec2-auto-scaling/

I recently told you about DynamoDB Auto Scaling and showed you how it uses multiple CloudWatch Alarms to automate capacity management for DynamoDB tables. Behind the scenes, this feature makes use of a more general Application Auto Scaling model that we plan to put to use across several different AWS services over time.

The new Auto Scaling model includes an important new feature that we call target tracking. When you create an Auto Scaling policy that makes use of target tracking, you choose a target value for a particular CloudWatch metric. Auto Scaling then turns the appropriate knob (so to speak) to drive the metric toward the target, while also adjusting the relevant CloudWatch Alarms. Specifying your desired target, in whatever metrics-driven units make sense for your application, is generally easier and more direct than setting up ranges and thresholds manually using the original step scaling policy type. However, you can use target tracking in conjunction with step scaling in order to implement an advanced scaling strategy. For example, you could use target tracking for scale-out operations and step scaling for scale-in.

Now for EC2
Today we are adding target tracking support to EC2 Auto Scaling. You can now create scaling policies that are driven by application load balancer request counts, CPU load, network traffic, or a custom metric (the Request Count per Target metric is new, and is also part of today’s launch):

These metrics share an important property: adding additional EC2 instances will (with no changes in overall load) drive the metric down, and vice versa.

To create an Auto Scaling Group that makes use of target tracking, you simply enter a name for the policy, choose a metric, and set the desired target value:

You have the option to disable the scale-in side of the policy. If you do this, you can scale-in manually or use a separate policy.

You can create target tracking policies using the AWS Management Console, AWS Command Line Interface (CLI), or the AWS SDKs.

Here are a couple of things to keep in mind as you look forward to using target tracking:

  • You can track more than one target in a single Auto Scaling Group as long as each one references a distinct metric. Scaling will always choose the policy that drives the highest capacity.
  • Scaling will not take place if the metric has insufficient data.
  • Auto Scaling compensates for rapid, transient fluctuations in the metrics, and strives to minimize corresponding fluctuations in capacity.
  • You can set up target tracking for a custom metric through the Auto Scaling API or the AWS Command Line Interface (CLI).
  • In most cases you should elect to scale on metrics that are published with 1-minute frequency (also known as detailed monitoring). Using 5-minute metrics as the basis for scaling will result in a slower response time.

Now Available
This new feature is available now and you can start using it today at no extra charge. To learn more, read about Target Tracking Scaling in the Auto Scaling User Guide.

Jeff;

New – API & CloudFormation Support for Amazon CloudWatch Dashboards

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-api-cloudformation-support-for-amazon-cloudwatch-dashboards/

We launched CloudWatch Dashboards a couple of years ago. In the post that I wrote for the launch, I showed you how to interactively create a dashboard that displayed chosen CloudWatch metrics in graphical form. After the launch, we added additional features including a full screen mode, a dark theme, control over the range of the Y axis, simplified renaming, persistent storage, and new visualization options.

New API & CLI
While console support is wonderful for interactive use, many customers have asked us to support programmatic creation and manipulation of dashboards and the widgets within. They would like to dynamically build and maintain dashboards, adding and removing widgets as the corresponding AWS resources are created and destroyed. Other customers are interested in setting up and maintaining a consistent set of dashboards across two or more AWS accounts.

I am happy to announce that API, CLI, and AWS CloudFormation support for CloudWatch Dashboards is available now and that you can start using it today!

There are four new API functions (and equivalent CLI commands):

ListDashboards / aws cloudwatch list-dashboards – Fetch a list of all dashboards within an account, or a subset that share a common prefix.

GetDashboard / aws cloudwatch get-dashboard – Fetch details for a single dashboard.

PutDashboard / aws cloudwatch put-dashboard – Create a new dashboard or update an existing one.

DeleteDashboards / aws cloudwatch delete-dashboards – Delete one or more dashboards.

Dashboard Concepts
I want to show you how to use these functions and commands. Before I dive in, I should review a couple of important dashboard concepts and attributes.

Global – Dashboards are part of an AWS account, and are not associated with a specific AWS Region. Each account can have up to 500 dashboards.

Named – Each dashboard has a name that is unique within the AWS account. Names can be up to 255 characters long.

Grid Model – Each dashboard is composed of a grid of cells. The grid is 24 cells across and as tall as necessary. Each widget on the dashboard is positioned at a particular set of grid coordinates, and has a size that spans an integral number of grid cells.

Widgets (Visualizations) – Each widget can display text or a set of CloudWatch metrics. Text is specified using Markdown; metrics can be displayed as single values, line charts, or stacked area charts. Each dashboard can have up to 100 widgets. Widgets that display metrics can also be associated with a CloudWatch Alarm.

Dashboards have a JSON representation that you can now see and edit from within the console. Simply click on the Action menu and choose View/edit source:

Here’s the source for my dashboard:

You can use this JSON as a starting point for your own applications. As you can see, there’s an entry in the widgets array for each widget on the dashboard; each entry describes one widget, starting with its type, position, and size.

Creating a Dashboard Using the API
Let’s say I want to create a dashboard that has a widget for each of my EC2 instances in a particular region. I’ll use Python and the AWS SDK for Python, and start as follows (excuse the amateur nature of my code):

import boto3
import json

cw  = boto3.client("cloudwatch")
ec2 = boto3.client("ec2")

x, y          = [0, 0]
width, height = [3, 3]
max_width     = 12
widgets       = []

Then I simply iterate over the instances, creating a widget dictionary for each one, and appending it to the widgets array:

instances = ec2.describe_instances()
for r in instances['Reservations']:
    for i in r['Instances']:

        widget = {'type'      : 'metric',
                  'x'         : x,
                  'y'         : y,
                  'height'    : height,
                  'width'     : width,
                  'properties': {'view'    : 'timeSeries',
                                 'stacked' : False,
                                 'metrics' : [['AWS/EC2', 'NetworkIn', 'InstanceId', i['InstanceId']],
                                              ['.',       'NetworkOut', '.',         '.']
                                             ],
                                 'period'  : 300,
                                 'stat'    : 'Average',
                                 'region'  : 'us-east-1',
                                 'title'   : i['InstanceId']
                                }
                 }

        widgets.append(widget)

I update the position (x and y) within the loop, and form a grid (if I don’t specify positions, the widgets will be laid out left to right, top to bottom):

        x += width
        if (x + width > max_width):
            x = 0
            y += height

After I have processed all of the instances, I create a JSON version of the widget array:

body   = {'widgets' : widgets}
body_j = json.dumps(body)

And I create or update my dashboard:

cw.put_dashboard(DashboardName = "EC2_Networking",
                 DashboardBody = body_j)

I run the code, and get the following dashboard:

The CloudWatch team recommends that dashboards created programmatically include a text widget indicating that the dashboard was generated automatically, along with a link to the source code or CloudFormation template that did the work. This will discourage users from making manual, out-of-band changers to the dashboards.

As I mentioned earlier, each metric widget can also be associated with a CloudWatch Alarm. You can create the alarms programmatically or by using a CloudFormation template such as the Sample CPU Utilization Alarm. If you decide to do this, the alarm threshold will be displayed in the widget. To learn more about this, read Tara Walker’s recent post, Amazon CloudWatch Launches Alarms on Dashboards.

Going one step further, I could use CloudWatch Events and a Lamba Function to track the creation and deletion of certain resources and update a dashboard in concert with the changes. To learn how to do this, read Keeping CloudWatch Dashboards up to Date Using AWS Lambda.

Accessing a Dashboard Using the CLI
I can also access and manipulate my dashboards from the command line. For example, I can generate a simple list:

$ aws cloudwatch list-dashboards --output table
----------------------------------------------
|               ListDashboards               |
+--------------------------------------------+
||             DashboardEntries             ||
|+-----------------+----------------+-------+|
||  DashboardName  | LastModified   | Size  ||
|+-----------------+----------------+-------+|
||  Disk-Metrics   |  1496405221.0  |  316  ||
||  EC2_Networking |  1498090434.0  |  2830 ||
||  Main-Metrics   |  1498085173.0  |  234  ||
|+-----------------+----------------+-------+|

And I can get rid of the Disk-Metrics dashboard:

$ aws cloudwatch delete-dashboards --dashboard-names Disk-Metrics

I can also retrieve the JSON that defines a dashboard:

Creating a Dashboard Using CloudFormation
Dashboards can also be specified in CloudFormation templates. Here’s a simple template in YAML (the DashboardBody is still specified in JSON):

Resources:
  MyDashboard:
    Type: "AWS::CloudWatch::Dashboard"
    Properties:
      DashboardName: SampleDashboard
      DashboardBody: '{"widgets":[{"type":"text","x":0,"y":0,"width":6,"height":6,"properties":{"markdown":"Hi there from CloudFormation"}}]}'

I place the template in a file and then create a stack using the console or the CLI:

$ aws cloudformation create-stack --stack-name MyDashboard --template-body file://dash.yaml
{
    "StackId": "arn:aws:cloudformation:us-east-1:xxxxxxxxxxxx:stack/MyDashboard/a2a3fb20-5708-11e7-8ffd-500c21311262"
}

Here’s the dashboard:

Available Now
This feature is available now and you can start using it today. You can create 3 dashboards with up to 50 metrics per dashboard at no charge; additional dashboards are priced at $3 per month, as listed on the CloudWatch Pricing page. You can make up to 1 million calls to the new API functions each month at no charge; beyond that you pay $.01 for every 1,000 calls.

Jeff;

AWS Bill Simplification – Consolidated CloudWatch Charges

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-bill-simplification-consolidated-cloudwatch-charges/

The bill that you receive for your use of AWS in July will include a change in the way that Amazon CloudWatch charges are presented. The CloudWatch team made this change in order to make your bill simpler and easier to understand.

Consolidating Charges
In the past, charges for your usage of CloudWatch were split between two sections of your bill. For historical reasons, the charges for CloudWatch Alarms, CloudWatch Metrics, and calls to the CloudWatch API were reported in the Elastic Compute Cloud (EC2) detail section, while charges for CloudWatch Logs and CloudWatch Dashboards were reported in the CloudWatch detail section, like this:

We have received feedback that splitting the charges across two sections of the bill made it difficult to locate and understand the entire set of monitoring charges. In order to address this issue, we are moving the charges that were formerly listed in the Elastic Compute Cloud (EC2) detail section to the CloudWatch detail section. We are making the same change to the detailed billing report, moving the affected charges from the AmazonEC2 product code to the AmazonCloudWatch product code and changing to the AmazonCloudWatch product name. This change does not affect your overall bill; it simply consolidates all of the charges for the use of CloudWatch in one section.

Billing Metric
The CloudWatch billing metric named Estimated Charges can be viewed as a Total Estimated Charge, or broken down By Service:

The total will not change. However, as noted above, the charges that formerly had AmazonEC2 as the ServiceName dimension will now have it set to AmazonCloudWatch:

You may need to adjust thresholds on your billing alarms as a result:

Once again, your total AWS bill will not change. You will begin to see the consolidated charges for CloudWatch in your AWS bill for July 2017.

Jeff;

 

Protect Web Sites & Services Using Rate-Based Rules for AWS WAF

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/protect-web-sites-services-using-rate-based-rules-for-aws-waf/

AWS WAF (Web Application Firewall) helps to protect your application from many different types of application-layer attacks that involve requests that are malicious or malformed. As I showed you when I first wrote about this service (New – AWS WAF), you can define rules that match cross-site scripting, IP address, SQL injection, size, or content constraints:

When incoming requests match rules, actions are invoked. Actions can either allow, block, or simply count matches.

The existing rule model is powerful and gives you the ability to detect and respond to many different types of attacks. It does not, however, allow you to respond to attacks that simply consist of a large number of otherwise valid requests from a particular IP address. These requests might be a web-layer DDoS attack, a brute-force login attempt, or even a partner integration gone awry.

New Rate-Based Rules
Today we are adding Rate-based Rules to WAF, giving you control of when IP addresses are added to and removed from a blacklist, along with the flexibility to handle exceptions and special cases:

Blacklisting IP Addresses – You can blacklist IP addresses that make requests at a rate that exceeds a configured threshold rate.

IP Address Tracking– You can see which IP addresses are currently blacklisted.

IP Address Removal – IP addresses that have been blacklisted are automatically removed when they no longer make requests at a rate above the configured threshold.

IP Address Exemption – You can exempt certain IP addresses from blacklisting by using an IP address whitelist inside of the a rate-based rule. For example, you might want to allow trusted partners to access your site at a higher rate.

Monitoring & Alarming – You can watch and alarm on CloudWatch metrics that are published for each rule.

You can combine new Rate-based Rules with WAF Conditions to implement sophisticated rate-limiting strategies. For example, you could use a Rate-based Rule and a WAF Condition that matches your login pages. This would allow you to impose a modest threshold on your login pages (to avoid brute-force password attacks) and allow a more generous one on your marketing or system status pages.

Thresholds are defined in terms of the number of incoming requests from a single IP address within a 5 minute period. Once this threshold is breached, additional requests from the IP address are blocked until the request rate falls below the threshold.

Using Rate-Based Rules
Here’s how you would define a Rate-based Rule that protects the /login portion of your site. Start by defining a WAF condition that matches the desired string in the URI of the page:

Then use this condition to define a Rate-based Rule (the rate limit is expressed in terms of requests within a 5 minute interval, but the blacklisting goes in to effect as soon as the limit is breached):

With the condition and the rule in place, create a Web ACL (ProtectLoginACL) to bring it all together and to attach it to the AWS resource (a CloudFront distribution in this case):

Then attach the rule (ProtectLogin) to the Web ACL:

The resource is now protected in accord with the rule and the web ACL. You can monitor the associated CloudWatch metrics (ProtectLogin and ProtectLoginACL in this case). You could even create CloudWatch Alarms and use them to fire Lambda functions when a protection threshold is breached. The code could examine the offending IP address and make a complex, business-driven decision, perhaps adding a whitelisting rule that gives an extra-generous allowance to a trusted partner or to a user with a special payment plan.

Available Now
The new, Rate-based Rules are available now and you can start using them today! Rate-based rules are priced the same as Regular rules; see the WAF Pricing page for more info.

Jeff;

Building Loosely Coupled, Scalable, C# Applications with Amazon SQS and Amazon SNS

Post Syndicated from Tara Van Unen original https://aws.amazon.com/blogs/compute/building-loosely-coupled-scalable-c-applications-with-amazon-sqs-and-amazon-sns/

 
Stephen Liedig, Solutions Architect

 

One of the many challenges professional software architects and developers face is how to make cloud-native applications scalable, fault-tolerant, and highly available.

Fundamental to your project success is understanding the importance of making systems highly cohesive and loosely coupled. That means considering the multi-dimensional facets of system coupling to support the distributed nature of the applications that you are building for the cloud.

By that, I mean addressing not only the application-level coupling (managing incoming and outgoing dependencies), but also considering the impacts of of platform, spatial, and temporal coupling of your systems. Platform coupling relates to the interoperability, or lack thereof, of heterogeneous systems components. Spatial coupling deals with managing components at a network topology level or protocol level. Temporal, or runtime coupling, refers to the ability of a component within your system to do any kind of meaningful work while it is performing a synchronous, blocking operation.

The AWS messaging services, Amazon SQS and Amazon SNS, help you deal with these forms of coupling by providing mechanisms for:

  • Reliable, durable, and fault-tolerant delivery of messages between application components
  • Logical decomposition of systems and increased autonomy of components
  • Creating unidirectional, non-blocking operations, temporarily decoupling system components at runtime
  • Decreasing the dependencies that components have on each other through standard communication and network channels

Following on the recent topic, Building Scalable Applications and Microservices: Adding Messaging to Your Toolbox, in this post, I look at some of the ways you can introduce SQS and SNS into your architectures to decouple your components, and show how you can implement them using C#.

Walkthrough

To illustrate some of these concepts, consider a web application that processes customer orders. As good architects and developers, you have followed best practices and made your application scalable and highly available. Your solution included implementing load balancing, dynamic scaling across multiple Availability Zones, and persisting orders in a Multi-AZ Amazon RDS database instance, as in the following diagram.


In this example, the application is responsible for handling and persisting the order data, as well as dealing with increases in traffic for popular items.

One potential point of vulnerability in the order processing workflow is in saving the order in the database. The business expects that every order has been persisted into the database. However, any potential deadlock, race condition, or network issue could cause the persistence of the order to fail. Then, the order is lost with no recourse to restore the order.

With good logging capability, you may be able to identify when an error occurred and which customer’s order failed. This wouldn’t allow you to “restore” the transaction, and by that stage, your customer is no longer your customer.

As illustrated in the following diagram, introducing an SQS queue helps improve your ordering application. Using the queue isolates the processing logic into its own component and runs it in a separate process from the web application. This, in turn, allows the system to be more resilient to spikes in traffic, while allowing work to be performed only as fast as necessary in order to manage costs.


In addition, you now have a mechanism for persisting orders as messages (with the queue acting as a temporary database), and have moved the scope of your transaction with your database further down the stack. In the event of an application exception or transaction failure, this ensures that the order processing can be retired or redirected to the Amazon SQS Dead Letter Queue (DLQ), for re-processing at a later stage. (See the recent post, Using Amazon SQS Dead-Letter Queues to Control Message Failure, for more information on dead-letter queues.)

Scaling the order processing nodes

This change allows you now to scale the web application frontend independently from the processing nodes. The frontend application can continue to scale based on metrics such as CPU usage, or the number of requests hitting the load balancer. Processing nodes can scale based on the number of orders in the queue. Here is an example of scale-in and scale-out alarms that you would associate with the scaling policy.

Scale-out Alarm

aws cloudwatch put-metric-alarm --alarm-name AddCapacityToCustomerOrderQueue --metric-name ApproximateNumberOfMessagesVisible --namespace "AWS/SQS" 
--statistic Average --period 300 --threshold 3 --comparison-operator GreaterThanOrEqualToThreshold --dimensions Name=QueueName,Value=customer-orders
--evaluation-periods 2 --alarm-actions <arn of the scale-out autoscaling policy>

Scale-in Alarm

aws cloudwatch put-metric-alarm --alarm-name RemoveCapacityFromCustomerOrderQueue --metric-name ApproximateNumberOfMessagesVisible --namespace "AWS/SQS" 
 --statistic Average --period 300 --threshold 1 --comparison-operator LessThanOrEqualToThreshold --dimensions Name=QueueName,Value=customer-orders
 --evaluation-periods 2 --alarm-actions <arn of the scale-in autoscaling policy>

In the above example, use the ApproximateNumberOfMessagesVisible metric to discover the queue length and drive the scaling policy of the Auto Scaling group. Another useful metric is ApproximateAgeOfOldestMessage, when applications have time-sensitive messages and developers need to ensure that messages are processed within a specific time period.

Scaling the order processing implementation

On top of scaling at an infrastructure level using Auto Scaling, make sure to take advantage of the processing power of your Amazon EC2 instances by using as many of the available threads as possible. There are several ways to implement this. In this post, we build a Windows service that uses the BackgroundWorker class to process the messages from the queue.

Here’s a closer look at the implementation. In the first section of the consuming application, use a loop to continually poll the queue for new messages, and construct a ReceiveMessageRequest variable.

public static void PollQueue()
{
    while (_running)
    {
        Task<ReceiveMessageResponse> receiveMessageResponse;

        // Pull messages off the queue
        using (var sqs = new AmazonSQSClient())
        {
            const int maxMessages = 10;  // 1-10

            //Receiving a message
            var receiveMessageRequest = new ReceiveMessageRequest
            {
                // Get URL from Configuration
                QueueUrl = _queueUrl, 
                // The maximum number of messages to return. 
                // Fewer messages might be returned. 
                MaxNumberOfMessages = maxMessages, 
                // A list of attributes that need to be returned with message.
                AttributeNames = new List<string> { "All" },
                // Enable long polling. 
                // Time to wait for message to arrive on queue.
                WaitTimeSeconds = 5 
            };

            receiveMessageResponse = sqs.ReceiveMessageAsync(receiveMessageRequest);
        }

The WaitTimeSeconds property of the ReceiveMessageRequest specifies the duration (in seconds) that the call waits for a message to arrive in the queue before returning a response to the calling application. There are a few benefits to using long polling:

  • It reduces the number of empty responses by allowing SQS to wait until a message is available in the queue before sending a response.
  • It eliminates false empty responses by querying all (rather than a limited number) of the servers.
  • It returns messages as soon any message becomes available.

For more information, see Amazon SQS Long Polling.

After you have returned messages from the queue, you can start to process them by looping through each message in the response and invoking a new BackgroundWorker thread.

// Process messages
if (receiveMessageResponse.Result.Messages != null)
{
    foreach (var message in receiveMessageResponse.Result.Messages)
    {
        Console.WriteLine("Received SQS message, starting worker thread");

        // Create background worker to process message
        BackgroundWorker worker = new BackgroundWorker();
        worker.DoWork += (obj, e) => ProcessMessage(message);
        worker.RunWorkerAsync();
    }
}
else
{
    Console.WriteLine("No messages on queue");
}

The event handler, ProcessMessage, is where you implement business logic for processing orders. It is important to have a good understanding of how long a typical transaction takes so you can set a message VisibilityTimeout that is long enough to complete your operation. If order processing takes longer than the specified timeout period, the message becomes visible on the queue. Other nodes may pick it and process the same order twice, leading to unintended consequences.

Handling Duplicate Messages

In order to manage duplicate messages, seek to make your processing application idempotent. In mathematics, idempotent describes a function that produces the same result if it is applied to itself:

f(x) = f(f(x))

No matter how many times you process the same message, the end result is the same (definition from Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions, Hohpe and Wolf, 2004).

There are several strategies you could apply to achieve this:

  • Create messages that have inherent idempotent characteristics. That is, they are non-transactional in nature and are unique at a specified point in time. Rather than saying “place new order for Customer A,” which adds a duplicate order to the customer, use “place order <orderid> on <timestamp> for Customer A,” which creates a single order no matter how often it is persisted.
  • Deliver your messages via an Amazon SQS FIFO queue, which provides the benefits of message sequencing, but also mechanisms for content-based deduplication. You can deduplicate using the MessageDeduplicationId property on the SendMessage request or by enabling content-based deduplication on the queue, which generates a hash for MessageDeduplicationId, based on the content of the message, not the attributes.
var sendMessageRequest = new SendMessageRequest
{
    QueueUrl = _queueUrl,
    MessageBody = JsonConvert.SerializeObject(order),
    MessageGroupId = Guid.NewGuid().ToString("N"),
    MessageDeduplicationId = Guid.NewGuid().ToString("N")
};
  • If using SQS FIFO queues is not an option, keep a message log of all messages attributes processed for a specified period of time, as an alternative to message deduplication on the receiving end. Verifying the existence of the message in the log before processing the message adds additional computational overhead to your processing. This can be minimized through low latency persistence solutions such as Amazon DynamoDB. Bear in mind that this solution is dependent on the successful, distributed transaction of the message and the message log.

Handling exceptions

Because of the distributed nature of SQS queues, it does not automatically delete the message. Therefore, you must explicitly delete the message from the queue after processing it, using the message ReceiptHandle property (see the following code example).

However, if at any stage you have an exception, avoid handling it as you normally would. The intention is to make sure that the message ends back on the queue, so that you can gracefully deal with intermittent failures. Instead, log the exception to capture diagnostic information, and swallow it.

By not explicitly deleting the message from the queue, you can take advantage of the VisibilityTimeout behavior described earlier. Gracefully handle the message processing failure and make the unprocessed message available to other nodes to process.

In the event that subsequent retries fail, SQS automatically moves the message to the configured DLQ after the configured number of receives has been reached. You can further investigate why the order process failed. Most importantly, the order has not been lost, and your customer is still your customer.

private static void ProcessMessage(Message message)
{
    using (var sqs = new AmazonSQSClient())
    {
        try
        {
            Console.WriteLine("Processing message id: {0}", message.MessageId);

            // Implement messaging processing here
            // Ensure no downstream resource contention (parallel processing)
            // <your order processing logic in here…>
            Console.WriteLine("{0} Thread {1}: {2}", DateTime.Now.ToString("s"), Thread.CurrentThread.ManagedThreadId, message.MessageId);
            
            // Delete the message off the queue. 
            // Receipt handle is the identifier you must provide 
            // when deleting the message.
            var deleteRequest = new DeleteMessageRequest(_queueName, message.ReceiptHandle);
            sqs.DeleteMessageAsync(deleteRequest);
            Console.WriteLine("Processed message id: {0}", message.MessageId);

        }
        catch (Exception ex)
        {
            // Do nothing.
            // Swallow exception, message will return to the queue when 
            // visibility timeout has been exceeded.
            Console.WriteLine("Could not process message due to error. Exception: {0}", ex.Message);
        }
    }
}

Using SQS to adapt to changing business requirements

One of the benefits of introducing a message queue is that you can accommodate new business requirements without dramatically affecting your application.

If, for example, the business decided that all orders placed over $5000 are to be handled as a priority, you could introduce a new “priority order” queue. The way the orders are processed does not change. The only significant change to the processing application is to ensure that messages from the “priority order” queue are processed before the “standard order” queue.

The following diagram shows how this logic could be isolated in an “order dispatcher,” whose only purpose is to route order messages to the appropriate queue based on whether the order exceeds $5000. Nothing on the web application or the processing nodes changes other than the target queue to which the order is sent. The rates at which orders are processed can be achieved by modifying the poll rates and scalability settings that I have already discussed.

Extending the design pattern with Amazon SNS

Amazon SNS supports reliable publish-subscribe (pub-sub) scenarios and push notifications to known endpoints across a wide variety of protocols. It eliminates the need to periodically check or poll for new information and updates. SNS supports:

  • Reliable storage of messages for immediate or delayed processing
  • Publish / subscribe – direct, broadcast, targeted “push” messaging
  • Multiple subscriber protocols
  • Amazon SQS, HTTP, HTTPS, email, SMS, mobile push, AWS Lambda

With these capabilities, you can provide parallel asynchronous processing of orders in the system and extend it to support any number of different business use cases without affecting the production environment. This is commonly referred to as a “fanout” scenario.

Rather than your web application pushing orders to a queue for processing, send a notification via SNS. The SNS messages are sent to a topic and then replicated and pushed to multiple SQS queues and Lambda functions for processing.

As the diagram above shows, you have the development team consuming “live” data as they work on the next version of the processing application, or potentially using the messages to troubleshoot issues in production.

Marketing is consuming all order information, via a Lambda function that has subscribed to the SNS topic, inserting the records into an Amazon Redshift warehouse for analysis.

All of this, of course, is happening without affecting your order processing application.

Summary

While I haven’t dived deep into the specifics of each service, I have discussed how these services can be applied at an architectural level to build loosely coupled systems that facilitate multiple business use cases. I’ve also shown you how to use infrastructure and application-level scaling techniques, so you can get the most out of your EC2 instances.

One of the many benefits of using these managed services is how quickly and easily you can implement powerful messaging capabilities in your systems, and lower the capital and operational costs of managing your own messaging middleware.

Using Amazon SQS and Amazon SNS together can provide you with a powerful mechanism for decoupling application components. This should be part of design considerations as you architect for the cloud.

For more information, see the Amazon SQS Developer Guide and Amazon SNS Developer Guide. You’ll find tutorials on all the concepts covered in this post, and more. To can get started using the AWS console or SDK of your choice visit:

Happy messaging!

New – Auto Scaling for Amazon DynamoDB

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-auto-scaling-for-amazon-dynamodb/

Amazon DynamoDB has more than one hundred thousand customers, spanning a wide range of industries and use cases. These customers depend on DynamoDB’s consistent performance at any scale and presence in 16 geographic regions around the world. A recent trend we’ve been observing is customers using DynamoDB to power their serverless applications. This is a good match: with DynamoDB, you don’t have to think about things like provisioning servers, performing OS and database software patching, or configuring replication across availability zones to ensure high availability – you can simply create tables and start adding data, and let DynamoDB handle the rest.

DynamoDB provides a provisioned capacity model that lets you set the amount of read and write capacity required by your applications. While this frees you from thinking about servers and enables you to change provisioning for your table with a simple API call or button click in the AWS Management Console, customers have asked us how we can make managing capacity for DynamoDB even easier.

Today we are introducing Auto Scaling for DynamoDB to help automate capacity management for your tables and global secondary indexes. You simply specify the desired target utilization and provide upper and lower bounds for read and write capacity. DynamoDB will then monitor throughput consumption using Amazon CloudWatch alarms and then will adjust provisioned capacity up or down as needed. Auto Scaling will be on by default for all new tables and indexes, and you can also configure it for existing ones.

Even if you’re not around, DynamoDB Auto Scaling will be monitoring your tables and indexes to automatically adjust throughput in response to changes in application traffic. This can make it easier to administer your DynamoDB data, help you maximize availability for your applications, and help you reduce your DynamoDB costs.

Let’s see how it works…

Using Auto Scaling
The DynamoDB Console now proposes a comfortable set of default parameters when you create a new table. You can accept them as-is or you can uncheck Use default settings and enter your own parameters:

Here’s how you enter your own parameters:

Target utilization is expressed in terms of the ratio of consumed capacity to provisioned capacity. The parameters above would allow for sufficient headroom to allow consumed capacity to double due to a burst in read or write requests (read Capacity Unit Calculations to learn more about the relationship between DynamoDB read and write operations and provisioned capacity). Changes in provisioned capacity take place in the background.

Auto Scaling in Action
In order to see this important new feature in action, I followed the directions in the Getting Started Guide. I launched a fresh EC2 instance, installed (sudo pip install boto3) and configured (aws configure) the AWS SDK for Python. Then I used the code in the Python and DynamoDB section to create and populate a table with some data, and manually configured the table for 5 units each of read and write capacity.

I took a quick break in order to have clean, straight lines for the CloudWatch metrics so that I could show the effect of Auto Scaling. Here’s what the metrics look like before I started to apply a load:

I modified the code in Step 3 to continually issue queries for random years in the range of 1920 to 2007, ran a single copy of the code, and checked the read metrics a minute or two later:

The consumed capacity is higher than the provisioned capacity, resulting in a large number of throttled reads. Time for Auto Scaling!

I returned to the console and clicked on the Capacity tab for my table. Then I clicked on Read capacity, accepted the default values, and clicked on Save:

DynamoDB created a new IAM role (DynamoDBAutoscaleRole) and a pair of CloudWatch alarms to manage the Auto Scaling of read capacity:

DynamoDB Auto Scaling will manage the thresholds for the alarms, moving them up and down as part of the scaling process. The first alarm was triggered and the table state changed to Updating while additional read capacity was provisioned:

The change was visible in the read metrics within minutes:

I started a couple of additional copies of my modified query script and watched as additional capacity was provisioned, as indicated by the red line:

I killed all of the scripts and turned my attention to other things while waiting for the scale-down alarm to trigger. Here’s what I saw when I came back:

The next morning I checked my Scaling activities and saw that the alarm had triggered several more times overnight:

This was also visible in the metrics:

Until now, you would prepare for this situation by setting your read capacity well about your expected usage, and pay for the excess capacity (the space between the blue line and the red line). Or, you might set it too low, forget to monitor it, and run out of capacity when traffic picked up. With Auto Scaling you can get the best of both worlds: an automatic response when an increase in demand suggests that more capacity is needed, and another automated response when the capacity is no longer needed.

Things to Know
DynamoDB Auto Scaling is designed to accommodate request rates that vary in a somewhat predictable, generally periodic fashion. If you need to accommodate unpredictable bursts of read activity, you should use Auto Scaling in combination with DAX (read Amazon DynamoDB Accelerator (DAX) – In-Memory Caching for Read-Intensive Workloads to learn more). Also, the AWS SDKs will detect throttled read and write requests and retry them after a suitable delay.

I mentioned the DynamoDBAutoscaleRole earlier. This role provides Auto Scaling with the privileges that it needs to have in order for it to be able to scale your tables and indexes up and down. To learn more about this role and the permissions that it uses, read Grant User Permissions for DynamoDB Auto Scaling.

Auto Scaling has complete CLI and API support, including the ability to enable and disable the Auto Scaling policies. If you have some predictable, time-bound spikes in traffic, you can programmatically disable an Auto Scaling policy, provision higher throughput for a set period of time, and then enable Auto Scaling again later.

As noted on the Limits in DynamoDB page, you can increase provisioned capacity as often as you would like and as high as you need (subject to per-account limits that we can increase on request). You can decrease capacity up to nine times per day for each table or global secondary index.

You pay for the capacity that you provision, at the regular DynamoDB prices. You can also purchase DynamoDB Reserved Capacity to further savings.

Available Now
This feature is available now in all regions and you can start using it today!

Jeff;

Build a Serverless Architecture to Analyze Amazon CloudFront Access Logs Using AWS Lambda, Amazon Athena, and Amazon Kinesis Analytics

Post Syndicated from Rajeev Srinivasan original https://aws.amazon.com/blogs/big-data/build-a-serverless-architecture-to-analyze-amazon-cloudfront-access-logs-using-aws-lambda-amazon-athena-and-amazon-kinesis-analytics/

Nowadays, it’s common for a web server to be fronted by a global content delivery service, like Amazon CloudFront. This type of front end accelerates delivery of websites, APIs, media content, and other web assets to provide a better experience to users across the globe.

The insights gained by analysis of Amazon CloudFront access logs helps improve website availability through bot detection and mitigation, optimizing web content based on the devices and browser used to view your webpages, reducing perceived latency by caching of popular object closer to its viewer, and so on. This results in a significant improvement in the overall perceived experience for the user.

This blog post provides a way to build a serverless architecture to generate some of these insights. To do so, we analyze Amazon CloudFront access logs both at rest and in transit through the stream. This serverless architecture uses Amazon Athena to analyze large volumes of CloudFront access logs (on the scale of terabytes per day), and Amazon Kinesis Analytics for streaming analysis.

The analytic queries in this blog post focus on three common use cases:

  1. Detection of common bots using the user agent string
  2. Calculation of current bandwidth usage per Amazon CloudFront distribution per edge location
  3. Determination of the current top 50 viewers

However, you can easily extend the architecture described to power dashboards for monitoring, reporting, and trigger alarms based on deeper insights gained by processing and analyzing the logs. Some examples are dashboards for cache performance, usage and viewer patterns, and so on.

Following we show a diagram of this architecture.

Prerequisites

Before you set up this architecture, install the AWS Command Line Interface (AWS CLI) tool on your local machine, if you don’t have it already.

Setup summary

The following steps are involved in setting up the serverless architecture on the AWS platform:

  1. Create an Amazon S3 bucket for your Amazon CloudFront access logs to be delivered to and stored in.
  2. Create a second Amazon S3 bucket to receive processed logs and store the partitioned data for interactive analysis.
  3. Create an Amazon Kinesis Firehose delivery stream to batch, compress, and deliver the preprocessed logs for analysis.
  4. Create an AWS Lambda function to preprocess the logs for analysis.
  5. Configure Amazon S3 event notification on the CloudFront access logs bucket, which contains the raw logs, to trigger the Lambda preprocessing function.
  6. Create an Amazon DynamoDB table to look up partition details, such as partition specification and partition location.
  7. Create an Amazon Athena table for interactive analysis.
  8. Create a second AWS Lambda function to add new partitions to the Athena table based on the log delivered to the processed logs bucket.
  9. Configure Amazon S3 event notification on the processed logs bucket to trigger the Lambda partitioning function.
  10. Configure Amazon Kinesis Analytics application for analysis of the logs directly from the stream.

ETL and preprocessing

In this section, we parse the CloudFront access logs as they are delivered, which occurs multiple times in an hour. We filter out commented records and use the user agent string to decipher the browser name, the name of the operating system, and whether the request has been made by a bot. For more details on how to decipher the preceding information based on the user agent string, see user-agents 1.1.0 in the Python documentation.

We use the Lambda preprocessing function to perform these tasks on individual rows of the access log. On successful completion, the rows are pushed to an Amazon Kinesis Firehose delivery stream to be persistently stored in an Amazon S3 bucket, the processed logs bucket.

To create a Firehose delivery stream with a new or existing S3 bucket as the destination, follow the steps described in Create a Firehose Delivery Stream to Amazon S3 in the S3 documentation. Keep most of the default settings, but select an AWS Identity and Access Management (IAM) role that has write access to your S3 bucket and specify GZIP compression. Name the delivery stream CloudFrontLogsToS3.

Another pre-requisite for this setup is to create an IAM role that provides the necessary permissions our AWS Lambda function to get the data from S3, process it, and deliver it to the CloudFrontLogsToS3 delivery stream.

Let’s use the AWS CLI to create the IAM role using the following the steps:

  1. Create the IAM policy (lambda-exec-policy) for the Lambda execution role to use.
  2. Create the Lambda execution role (lambda-cflogs-exec-role) and assign the service to use this role.
  3. Attach the policy created in step 1 to the Lambda execution role.

To download the policy document to your local machine, type the following command.

aws s3 cp s3://aws-bigdata-blog/artifacts/Serverless-CF-Analysis/preprocessiong-lambda/lambda-exec-policy.json  <path_on_your_local_machine>

To download the assume policy document to your local machine, type the following command.

aws s3 cp s3://aws-bigdata-blog/artifacts/Serverless-CF-Analysis/preprocessiong-lambda/assume-lambda-policy.json  <path_on_your_local_machine>

Following is the lambda-exec-policy.json file, which is the IAM policy used by the Lambda execution role.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "CloudWatchAccess",
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "arn:aws:logs:*:*:*"
        },
        {
            "Sid": "S3Access",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:PutObject"
            ],
            "Resource": [
                "arn:aws:s3:::*"
            ]
        },
        {
            "Sid": "FirehoseAccess",
            "Effect": "Allow",
            "Action": [
                "firehose:ListDeliveryStreams",
                "firehose:PutRecord",
                "firehose:PutRecordBatch"
            ],
            "Resource": [
                "arn:aws:firehose:*:*:deliverystream/CloudFrontLogsToS3"
            ]
        }
    ]
}

To create the IAM policy used by Lambda execution role, type the following command.

aws iam create-policy --policy-name lambda-exec-policy --policy-document file://<path>/lambda-exec-policy.json

To create the AWS Lambda execution role and assign the service to use this role, type the following command.

aws iam create-role --role-name lambda-cflogs-exec-role --assume-role-policy-document file://<path>/assume-lambda-policy.json

Following is the assume-lambda-policy.json file, to grant Lambda permission to assume a role.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "lambda.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

To attach the policy (lambda-exec-policy) created to the AWS Lambda execution role (lambda-cflogs-exec-role), type the following command.

aws iam attach-role-policy --role-name lambda-cflogs-exec-role --policy-arn arn:aws:iam::<your-account-id>:policy/lambda-exec-policy

Now that we have created the CloudFrontLogsToS3 Firehose delivery stream and the lambda-cflogs-exec-role IAM role for Lambda, the next step is to create a Lambda preprocessing function.

This Lambda preprocessing function parses the CloudFront access logs delivered into the S3 bucket and performs a few transformation and mapping operations on the data. The Lambda function adds descriptive information, such as the browser and the operating system that were used to make this request based on the user agent string found in the logs. The Lambda function also adds information about the web distribution to support scenarios where CloudFront access logs are delivered to a centralized S3 bucket from multiple distributions. With the solution in this blog post, you can get insights across distributions and their edge locations.

Use the Lambda Management Console to create a new Lambda function with a Python 2.7 runtime and the s3-get-object-python blueprint. Open the console, and on the Configure triggers page, choose the name of the S3 bucket where the CloudFront access logs are delivered. Choose Put for Event type. For Prefix, type the name of the prefix, if any, for the folder where CloudFront access logs are delivered, for example cloudfront-logs/. To invoke Lambda to retrieve the logs from the S3 bucket as they are delivered, select Enable trigger.

Choose Next and provide a function name to identify this Lambda preprocessing function.

For Code entry type, choose Upload a file from Amazon S3. For S3 link URL, type https.amazonaws.com//preprocessing-lambda/pre-data.zip. In the section, also create an environment variable with the key KINESIS_FIREHOSE_STREAM and a value with the name of the Firehose delivery stream as CloudFrontLogsToS3.

Choose lambda-cflogs-exec-role as the IAM role for the Lambda function, and type prep-data.lambda_handler for the value for Handler.

Choose Next, and then choose Create Lambda.

Table creation in Amazon Athena

In this step, we will build the Athena table. Use the Athena console in the same region and create the table using the query editor.

CREATE EXTERNAL TABLE IF NOT EXISTS cf_logs (
  logdate date,
  logtime string,
  location string,
  bytes bigint,
  requestip string,
  method string,
  host string,
  uri string,
  status bigint,
  referrer string,
  useragent string,
  uriquery string,
  cookie string,
  resulttype string,
  requestid string,
  header string,
  csprotocol string,
  csbytes string,
  timetaken bigint,
  forwardedfor string,
  sslprotocol string,
  sslcipher string,
  responseresulttype string,
  protocolversion string,
  browserfamily string,
  osfamily string,
  isbot string,
  filename string,
  distribution string
)
PARTITIONED BY(year string, month string, day string, hour string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LOCATION 's3://<pre-processing-log-bucket>/prefix/';

Creation of the Athena partition

A popular website with millions of requests each day routed using Amazon CloudFront can generate a large volume of logs, on the order of a few terabytes a day. We strongly recommend that you partition your data to effectively restrict the amount of data scanned by each query. Partitioning significantly improves query performance and substantially reduces cost. The Lambda partitioning function adds the partition information to the Athena table for the data delivered to the preprocessed logs bucket.

Before delivering the preprocessed Amazon CloudFront logs file into the preprocessed logs bucket, Amazon Kinesis Firehose adds a UTC time prefix in the format YYYY/MM/DD/HH. This approach supports multilevel partitioning of the data by year, month, date, and hour. You can invoke the Lambda partitioning function every time a new processed Amazon CloudFront log is delivered to the preprocessed logs bucket. To do so, configure the Lambda partitioning function to be triggered by an S3 Put event.

For a website with millions of requests, a large number of preprocessed logs can be delivered multiple times in an hour—for example, at the interval of one each second. To avoid querying the Athena table for partition information every time a preprocessed log file is delivered, you can create an Amazon DynamoDB table for fast lookup.

Based on the year, month, data and hour in the prefix of the delivered log, the Lambda partitioning function checks if the partition specification exists in the Amazon DynamoDB table. If it doesn’t, it’s added to the table using an atomic operation, and then the Athena table is updated.

Type the following command to create the Amazon DynamoDB table.

aws dynamodb create-table --table-name athenapartitiondetails \
--attribute-definitions AttributeName=PartitionSpec,AttributeType=S \
--key-schema AttributeName=PartitionSpec,KeyType=HASH \
--provisioned-throughput ReadCapacityUnits=100,WriteCapacityUnits=100

Here the following is true:

  • PartitionSpec is the hash key and is a representation of the partition signature—for example, year=”2017”; month=”05”; day=”15”; hour=”10”.
  • Depending on the rate at which the processed log files are delivered to the processed log bucket, you might have to increase the ReadCapacityUnits and WriteCapacityUnits values, if these are throttled.

The other attributes besides PartitionSpec are the following:

  • PartitionPath – The S3 path associated with the partition.
  • PartitionType – The type of partition used (Hour, Month, Date, Year, or ALL). In this case, ALL is used.

Next step is to create the IAM role to provide permissions for the Lambda partitioning function. You require permissions to do the following:

  1. Look up and write partition information to DynamoDB.
  2. Alter the Athena table with new partition information.
  3. Perform Amazon CloudWatch logs operations.
  4. Perform Amazon S3 operations.

To download the policy document to your local machine, type following command.

aws s3 cp s3://aws-bigdata-blog/artifacts/Serverless-CF-Analysis/partitioning-lambda/lambda-partition-function-execution-policy.json  <path_on_your_local_machine>

To download the assume policy document to your local machine, type the following command.

aws s3 cp s3://aws-bigdata-blog/artifacts/Serverless-CF-Analysis/partitioning-lambda/assume-lambda-policy.json <path_on_your_local_machine>

To create the Lambda execution role and assign the service to use this role, type the following command.

aws iam create-role --role-name lambda-cflogs-exec-role --assume-role-policy-document file://<path>/assume-lambda-policy.json

Let’s use the AWS CLI to create the IAM role using the following three steps:

  1. Create the IAM policy(lambda-partition-exec-policy) used by the Lambda execution role.
  2. Create the Lambda execution role (lambda-partition-execution-role)and assign the service to use this role.
  3. Attach the policy created in step 1 to the Lambda execution role.

To create the IAM policy used by Lambda execution role, type the following command.

aws iam create-policy --policy-name lambda-partition-exec-policy --policy-document file://<path>/lambda-partition-function-execution-policy.json

To create the Lambda execution role and assign the service to use this role, type the following command.

aws iam create-role --role-name lambda-partition-execution-role --assume-role-policy-document file://<path>/assume-lambda-policy.json

To attach the policy (lambda-partition-exec-policy) created to the AWS Lambda execution role (lambda-partition-execution-role), type the following command.

aws iam attach-role-policy --role-name lambda-partition-execution-role --policy-arn arn:aws:iam::<your-account-id>:policy/lambda-partition-exec-policy

Following is the lambda-partition-function-execution-policy.json file, which is the IAM policy used by the Lambda execution role.

{
    "Version": "2012-10-17",
    "Statement": [
      	{
            	"Sid": "DDBTableAccess",
            	"Effect": "Allow",
            	"Action": "dynamodb:PutItem"
            	"Resource": "arn:aws:dynamodb*:*:table/athenapartitiondetails"
        	},
        	{
            	"Sid": "S3Access",
            	"Effect": "Allow",
            	"Action": [
                		"s3:GetBucketLocation",
                		"s3:GetObject",
                		"s3:ListBucket",
                		"s3:ListBucketMultipartUploads",
                		"s3:ListMultipartUploadParts",
                		"s3:AbortMultipartUpload",
                		"s3:PutObject"
            	],
          		"Resource":"arn:aws:s3:::*"
		},
	              {
		      "Sid": "AthenaAccess",
      		"Effect": "Allow",
      		"Action": [ "athena:*" ],
      		"Resource": [ "*" ]
	      },
        	{
            	"Sid": "CloudWatchLogsAccess",
            	"Effect": "Allow",
            	"Action": [
                		"logs:CreateLogGroup",
                		"logs:CreateLogStream",
             	   	"logs:PutLogEvents"
            	],
            	"Resource": "arn:aws:logs:*:*:*"
        	}
    ]
}

Download the .jar file containing the Java deployment package to your local machine.

aws s3 cp s3://aws-bigdata-blog/artifacts/Serverless-CF-Analysis/partitioning-lambda/aws-lambda-athena-1.0.0.jar <path_on_your_local_machine>

From the AWS Management Console, create a new Lambda function with Java8 as the runtime. Select the Blank Function blueprint.

On the Configure triggers page, choose the name of the S3 bucket where the preprocessed logs are delivered. Choose Put for the Event Type. For Prefix, type the name of the prefix folder, if any, where preprocessed logs are delivered by Firehose—for example, out/. For Suffix, type the name of the compression format that the Firehose stream (CloudFrontLogToS3) delivers the preprocessed logs —for example, gz. To invoke Lambda to retrieve the logs from the S3 bucket as they are delivered, select Enable Trigger.

Choose Next and provide a function name to identify this Lambda partitioning function.

Choose Java8 for Runtime for the AWS Lambda function. Choose Upload a .ZIP or .JAR file for the Code entry type, and choose Upload to upload the downloaded aws-lambda-athena-1.0.0.jar file.

Next, create the following environment variables for the Lambda function:

  • TABLE_NAME – The name of the Athena table (for example, cf_logs).
  • PARTITION_TYPE – The partition to be created based on the Athena table for the logs delivered to the sub folders in S3 bucket based on Year, Month, Date, Hour, or Set this to ALL to use Year, Month, Date, and Hour.
  • DDB_TABLE_NAME – The name of the DynamoDB table holding partition information (for example, athenapartitiondetails).
  • ATHENA_REGION – The current AWS Region for the Athena table to construct the JDBC connection string.
  • S3_STAGING_DIR – The Amazon S3 location where your query output is written. The JDBC driver asks Athena to read the results and provide rows of data back to the user (for example, s3://<bucketname>/<folder>/).

To configure the function handler and IAM, for Handler copy and paste the name of the handler: com.amazonaws.services.lambda.CreateAthenaPartitionsBasedOnS3EventWithDDB::handleRequest. Choose the existing IAM role, lambda-partition-execution-role.

Choose Next and then Create Lambda.

Interactive analysis using Amazon Athena

In this section, we analyze the historical data that’s been collected since we added the partitions to the Amazon Athena table for data delivered to the preprocessing logs bucket.

Scenario 1 is robot traffic by edge location.

SELECT COUNT(*) AS ct, requestip, location FROM cf_logs
WHERE isbot='True'
GROUP BY requestip, location
ORDER BY ct DESC;

Scenario 2 is total bytes transferred per distribution for each edge location for your website.

SELECT distribution, location, SUM(bytes) as totalBytes
FROM cf_logs
GROUP BY location, distribution;

Scenario 3 is the top 50 viewers of your website.

SELECT requestip, COUNT(*) AS ct  FROM cf_logs
GROUP BY requestip
ORDER BY ct DESC;

Streaming analysis using Amazon Kinesis Analytics

In this section, you deploy a stream processing application using Amazon Kinesis Analytics to analyze the preprocessed Amazon CloudFront log streams. This application analyzes directly from the Amazon Kinesis Stream as it is delivered to the preprocessing logs bucket. The stream queries in section are focused on gaining the following insights:

  • The IP address of the bot, identified by its Amazon CloudFront edge location, that is currently sending requests to your website. The query also includes the total bytes transferred as part of the response.
  • The total bytes served per distribution per population for your website.
  • The top 10 viewers of your website.

To download the firehose-access-policy.json file, type the following.

aws s3 cp s3://aws-bigdata-blog/artifacts/Serverless-CF-Analysis/kinesisanalytics/firehose-access-policy.json  <path_on_your_local_machine>

To download the kinesisanalytics-policy.json file, type the following.

aws s3 cp s3://aws-bigdata-blog/artifacts/Serverless-CF-Analysis/kinesisanalytics/assume-kinesisanalytics-policy.json <path_on_your_local_machine>

Before we create the Amazon Kinesis Analytics application, we need to create the IAM role to provide permission for the analytics application to access Amazon Kinesis Firehose stream.

Let’s use the AWS CLI to create the IAM role using the following three steps:

  1. Create the IAM policy(firehose-access-policy) for the Lambda execution role to use.
  2. Create the Lambda execution role (ka-execution-role) and assign the service to use this role.
  3. Attach the policy created in step 1 to the Lambda execution role.

Following is the firehose-access-policy.json file, which is the IAM policy used by Kinesis Analytics to read Firehose delivery stream.

{
    "Version": "2012-10-17",
    "Statement": [
      	{
    	"Sid": "AmazonFirehoseAccess",
    	"Effect": "Allow",
    	"Action": [
       	"firehose:DescribeDeliveryStream",
        	"firehose:Get*"
    	],
    	"Resource": [
              "arn:aws:firehose:*:*:deliverystream/CloudFrontLogsToS3”
       ]
     }
}

Following is the assume-kinesisanalytics-policy.json file, to grant Amazon Kinesis Analytics permissions to assume a role.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "kinesisanalytics.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

To create the IAM policy used by Analytics access role, type the following command.

aws iam create-policy --policy-name firehose-access-policy --policy-document file://<path>/firehose-access-policy.json

To create the Analytics execution role and assign the service to use this role, type the following command.

aws iam attach-role-policy --role-name ka-execution-role --policy-arn arn:aws:iam::<your-account-id>:policy/firehose-access-policy

To attach the policy (irehose-access-policy) created to the Analytics execution role (ka-execution-role), type the following command.

aws iam attach-role-policy --role-name ka-execution-role --policy-arn arn:aws:iam::<your-account-id>:policy/firehose-access-policy

To deploy the Analytics application, first download the configuration file and then modify ResourceARN and RoleARN for the Amazon Kinesis Firehose input configuration.

"KinesisFirehoseInput": { 
    "ResourceARN": "arn:aws:firehose:<region>:<account-id>:deliverystream/CloudFrontLogsToS3", 
    "RoleARN": "arn:aws:iam:<account-id>:role/ka-execution-role"
}

To download the Analytics application configuration file, type the following command.

aws s3 cp s3://aws-bigdata-blog/artifacts/Serverless-CF-Analysis//kinesisanalytics/kinesis-analytics-app-configuration.json <path_on_your_local_machine>

To deploy the application, type the following command.

aws kinesisanalytics create-application --application-name "cf-log-analysis" --cli-input-json file://<path>/kinesis-analytics-app-configuration.json

To start the application, type the following command.

aws kinesisanalytics start-application --application-name "cf-log-analysis" --input-configuration Id="1.1",InputStartingPositionConfiguration={InputStartingPosition="NOW"}

SQL queries using Amazon Kinesis Analytics

Scenario 1 is a query for detecting bots for sending request to your website detection for your website.

-- Create output stream, which can be used to send to a destination
CREATE OR REPLACE STREAM "BOT_DETECTION" (requesttime TIME, destribution VARCHAR(16), requestip VARCHAR(64), edgelocation VARCHAR(64), totalBytes BIGINT);
-- Create pump to insert into output 
CREATE OR REPLACE PUMP "BOT_DETECTION_PUMP" AS INSERT INTO "BOT_DETECTION"
--
SELECT STREAM 
    STEP("CF_LOG_STREAM_001"."request_time" BY INTERVAL '1' SECOND) as requesttime,
    "distribution_name" as distribution,
    "request_ip" as requestip, 
    "edge_location" as edgelocation, 
    SUM("bytes") as totalBytes
FROM "CF_LOG_STREAM_001"
WHERE "is_bot" = true
GROUP BY "request_ip", "edge_location", "distribution_name",
STEP("CF_LOG_STREAM_001"."request_time" BY INTERVAL '1' SECOND),
STEP("CF_LOG_STREAM_001".ROWTIME BY INTERVAL '1' SECOND);

Scenario 2 is a query for total bytes transferred per distribution for each edge location for your website.

-- Create output stream, which can be used to send to a destination
CREATE OR REPLACE STREAM "BYTES_TRANSFFERED" (requesttime TIME, destribution VARCHAR(16), edgelocation VARCHAR(64), totalBytes BIGINT);
-- Create pump to insert into output 
CREATE OR REPLACE PUMP "BYTES_TRANSFFERED_PUMP" AS INSERT INTO "BYTES_TRANSFFERED"
-- Bytes Transffered per second per web destribution by edge location
SELECT STREAM 
    STEP("CF_LOG_STREAM_001"."request_time" BY INTERVAL '1' SECOND) as requesttime,
    "distribution_name" as distribution,
    "edge_location" as edgelocation, 
    SUM("bytes") as totalBytes
FROM "CF_LOG_STREAM_001"
GROUP BY "distribution_name", "edge_location", "request_date",
STEP("CF_LOG_STREAM_001"."request_time" BY INTERVAL '1' SECOND),
STEP("CF_LOG_STREAM_001".ROWTIME BY INTERVAL '1' SECOND);

Scenario 3 is a query for the top 50 viewers for your website.

-- Create output stream, which can be used to send to a destination
CREATE OR REPLACE STREAM "TOP_TALKERS" (requestip VARCHAR(64), requestcount DOUBLE);
-- Create pump to insert into output 
CREATE OR REPLACE PUMP "TOP_TALKERS_PUMP" AS INSERT INTO "TOP_TALKERS"
-- Top Ten Talker
SELECT STREAM ITEM as requestip, ITEM_COUNT as requestcount FROM TABLE(TOP_K_ITEMS_TUMBLING(
  CURSOR(SELECT STREAM * FROM "CF_LOG_STREAM_001"),
  'request_ip', -- name of column in single quotes
  50, -- number of top items
  60 -- tumbling window size in seconds
  )
);

Conclusion

Following the steps in this blog post, you just built an end-to-end serverless architecture to analyze Amazon CloudFront access logs. You analyzed these both in interactive and streaming mode, using Amazon Athena and Amazon Kinesis Analytics respectively.

By creating a partition in Athena for the logs delivered to a centralized bucket, this architecture is optimized for performance and cost when analyzing large volumes of logs for popular websites that receive millions of requests. Here, we have focused on just three common use cases for analysis, sharing the analytic queries as part of the post. However, you can extend this architecture to gain deeper insights and generate usage reports to reduce latency and increase availability. This way, you can provide a better experience on your websites fronted with Amazon CloudFront.

In this blog post, we focused on building serverless architecture to analyze Amazon CloudFront access logs. Our plan is to extend the solution to provide rich visualization as part of our next blog post.


About the Authors

Rajeev Srinivasan is a Senior Solution Architect for AWS. He works very close with our customers to provide big data and NoSQL solution leveraging the AWS platform and enjoys coding . In his spare time he enjoys riding his motorcycle and reading books.

 

Sai Sriparasa is a consultant with AWS Professional Services. He works with our customers to provide strategic and tactical big data solutions with an emphasis on automation, operations & security on AWS. In his spare time, he follows sports and current affairs.

 

 


Related

Analyzing VPC Flow Logs with Amazon Kinesis Firehose, Amazon Athena, and Amazon QuickSight

Amazon DynamoDB Accelerator (DAX) – In-Memory Caching for Read-Intensive Workloads

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/amazon-dynamodb-accelerator-dax-in-memory-caching-for-read-intensive-workloads/

I’m fairly sure that you already know about Amazon DynamoDB. As you probably know, it is a managed NoSQL database that scales to accommodate as much table space, read capacity, and write capacity as you need. With response times measured in single-digit milliseconds, our customers are using DynamoDB for many types of applications including adtech, IoT, gaming, media, online learning, travel, e-commerce, and finance. Some of these customers store more than 100 terabytes in a single DynamoDB table and make millions of read or write requests per second. The Amazon retail site relies on DynamoDB and uses it to withstand the traffic surges associated with brief, high-intensity events such as Black Friday, Cyber Monday, and Prime Day.

While DynamoDB’s ability to deliver fast, consistent performance benefits just about any application and workload, there’s always room to do even better. The business value of some workloads (gaming and adtech come to mind, but there are many others) is driven by low-latency, high-performance database reads. The ability to pull data from DynamoDB as quickly as possible leads to faster & more responsive games or ads that drive the highest click-through rates.

Amazon DynamoDB Accelerator
In order to support demanding, read-heavy workloads, we are launching a public preview of the Amazon DynamoDB Accelerator, otherwise known as DAX.

DAX is a fully managed caching service that sits (logically) in front of your DynamoDB tables. It operates in write-through mode, and is API-compatible with DynamoDB. Responses are returned from the cache in microseconds, making DAX a great fit for eventually-consistent read-intensive workloads. DAX is seamless and easy to use. As a managed service, you simply create your DAX cluster and use it as the target for your existing reads and writes. You don’t have to worry about patching, cluster maintenance, replication, or fault management.

Each DAX cluster can contain 1 to 10 nodes; you can add nodes in order to increase overall read throughput. The cache size (also known as the working set) is based on the node size (dax.r3.large to dax.r3.8xlarge) that you choose when you create the cluster. Clusters run within a VPC, with nodes spread across Availability Zones.

You will need to use the DAX SDK for Java to communicate with DAX. This SDK communicates with your cluster using a low-level TCP interface that is fine-tuned for low latency and high throughput (we’ll support access to DAX through other languages as quickly as possible).

Creating a DAX Cluster
Let’s create a DAX cluster from the DynamoDB Console (API and CLI support is also available). I open up the console and click on Create cluster to get started:

I enter a name and description, choose a node type, and set the initial size of my cluster. Then I create an IAM role and policy that gives DAX permission to access my DynamoDB tables (I can also choose an existing role):

The console allows me to create a policy that grants access to a single table. I am add additional tables to the policy using the IAM Console.

Next, I create a subnet group that DAX uses to place cluster nodes. I name the group and choose the desired subnets:

I accept the default settings and then click on Launch cluster:

My cluster is ready to use within minutes:

The next step is to update my application to use the DAX SDK for Java and to configure it to use the endpoint of my cluster (dax1.seutl3.clustercfg.dax.use1.cache.amazonaws.com:8111 in this case).

Once my application is up and running, I can visit the Metrics tab to see how well the cache is performing. The Amazon CloudWatch metrics include cache hits and misses, request counts, error counts, and so forth:

I can use the Alarms tab to create a CloudWatch Alarm for any of the metrics. Perhaps I want to know if an excessive number of cache misses are taking place:

I can use the Nodes tab to see the nodes in my cluster. I can also add new nodes or delete existing ones:

In order to see how DAX works, I installed the DAX Sample Application and ran it twice. The first run accessed DynamoDB directly and demonstrated the non-cached, baseline performance:

As you can see from the middle group of results, the queries ran in 2.9 to 11.3 milliseconds. The second run used DAX and showed the effect of caching on performance:

The first iteration of each test results in a cache miss. The subsequent iterations retrieve the results from the cache, and are (as you can see) quite a bit faster.

Things to Know
Here are a few things to keep in mind as you think about how to put DAX to use in your environment:

Java API – As I mentioned earlier, we are launching this public preview with support for Java. with plans to add support for other languages. DAX is API-compatible with DynamoDB so there’s no need to write your own caching logic or make changes to your code.

Consistency – DAX offers the best opportunity for performance gains when you are using eventually consistent reads that can be served from the in-memory cache (DAX always refers back to the DynamoDB table when processing consistent reads).

Write-Throughs – DAX is a write-through cache. However, if there is a weak correlation between what you read and what you write, you may want to direct your writes to DynamoDB. This will allow DAX to be of greater assistance for your reads.

Deprovisioning – After you have put DAX to use in your environment, you should be able to reduce the amount of read capacity provisioned for the underlying tables. This will reduce your costs (dramatically in many cases), while allowing DAX to provide spare capacity for sudden surges in usage.

Available Now
The public preview of DAX is available today in the US East (Northern Virginia), US West (Oregon), and EU (Ireland) Regions and you can sign up today. You can use the public preview at no charge and you can also learn more by reading the DAX Developer Guide.

Jeff;

 

 

Introducing Amazon CloudWatch Logs Integration for AWS OpsWorks Stacks

Post Syndicated from Kai Rubarth original https://aws.amazon.com/blogs/devops/introducing-amazon-cloudwatch-logs-integration-for-aws-opsworks-stacks/

AWS OpsWorks Stacks now supports Amazon CloudWatch Logs. This benefits all users who want to stream their log files from OpsWorks instances to CloudWatch. This enables you to take advantage of CloudWatch Logs features such as centralized log archival, real-time monitoring of log data, or generating CloudWatch alarms. Until now, OpsWorks customers had to manually install and configure the CloudWatch Logs agent on every instance they wanted to ship log data from. Now, with the built-in CloudWatch Logs integration these features are made available with just a few clicks across all instances in a layer.

The OpsWorks CloudWatch logs integration is compatible with all Chef 11 and Chef 12 Linux stacks. To get started, simply click the new CloudWatch Logs tab in your layer settings screen. OpsWorks agent command logs, which include Chef recipe output, are enabled by default; to enable logging for custom log files, simply input an absolute path or file pattern for every log file you want shipped.

It’s that simple. OpsWorks creates log groups and log streams for you, and starts streaming your system and application logs within a few minutes.

For more information, see Using Amazon CloudWatch Logs with AWS OpsWorks Stacks in the AWS OpsWorks User Guide.

AWS Big Data Blog Month in Review: March 2017

Post Syndicated from Derek Young original https://aws.amazon.com/blogs/big-data/aws-big-data-blog-month-in-review-march-2017/

Another month of big data solutions on the Big Data Blog. Please take a look at our summaries below and learn, comment, and share. Thank you for reading!

Analyze Security, Compliance, and Operational Activity Using AWS CloudTrail and Amazon Athena
In this blog post, walk through how to set up and use the recently released Amazon Athena CloudTrail SerDe to query CloudTrail log files for EC2 security group modifications, console sign-in activity, and operational account activity.  

Big Updates to the Big Data on AWS Training Course!
AWS offers a range of training resources to help you advance your knowledge with practical skills so you can get more out of the cloud. We’ve updated Big Data on AWS, a three-day, instructor-led training course to keep pace with the latest AWS big data innovations. This course allows you to hear big data best practices from an expert, get answers to your questions in person, and get hands-on practice using AWS big data services. 

Analyzing VPC Flow Logs with Amazon Kinesis Firehose, Amazon Athena, and Amazon QuickSight
In this blog post, build a serverless architecture using Amazon Kinesis Firehose, AWS Lambda, Amazon S3, Amazon Athena, and Amazon QuickSight to collect, store, query, and visualize flow logs. In building this solution, you also learn how to implement Athena best practices with regard to compressing and partitioning data so as to reduce query latencies and drive down query costs. 

Amazon Redshift Monitoring Now Supports End User Queries and Canaries
The serverless Amazon Redshift Monitoring utility lets you gather important performance metrics from your Redshift cluster’s system tables and persists the results in Amazon CloudWatch. You can now create your own diagnostic queries and plug-in “canaries” that monitor the runtime of your most vital end user queries. These user-defined metrics can be used to create dashboards and trigger Alarms and should improve visibility into workloads running on a Cluster.  

Running R on Amazon Athena
In this blog post, connect R/RStudio running on an Amazon EC2 instance with Athena. You’ll learn to build a simple interactive application with Athena and R. Athena can be used to store and query the underlying data for your big data applications using standard SQL, while R can be used to interactively query Athena and generate analytical insights using the powerful set of libraries that R provides. This post has been translated into Japanese. 

Top 10 Performance Tuning Tips for Amazon Athena
In this blog post, we review the top 10 tips that can improve query performance. We focus on aspects related to storing data in Amazon S3 and tuning specific to queries. Amazon Athena uses Presto to run SQL queries and hence some of the advice will work if you are running Presto on Amazon EMR. This post has been translated into Japanese. 

Big Data Resources on the AWS Knowledge Center
The AWS Knowledge Center answers the questions we receive most frequently from AWS customers. It is a resource for you that is distinct from AWS Documentation, the AWS Discussion Forums, and the AWS Support Center. It covers questions from across every AWS service. This post is an introduction to Big Data resources on the AWS Knowledge Center. 

Encrypt and Decrypt Amazon Kinesis Records Using AWS KMS
In this bog post, learn to build encryption and decryption into sample Kinesis producer and consumer applications using the Amazon Kinesis Producer Library (KPL), the Amazon Kinesis Consumer Library (KCL), AWS KMS, and the aws-encryption-sdk. The methods and the techniques used in this post to encrypt and decrypt Kinesis records can be easily replicated into your architecture.

Want to learn more about Big Data or Streaming Data? Check out our Big Data and Streaming data educational pages.

Leave a comment below to let us know what big data topics you’d like to see next on the AWS Big Data Blog.

Build your own Crystal Maze at Home

Post Syndicated from Laura Sach original https://www.raspberrypi.org/blog/build-crystal-maze/

I recently discovered a TV channel which shows endless re-runs of the game show The Crystal Maze, and it got me thinking: what resources are available to help the younger generation experience the wonder of this iconic show? Well…

Enter the Crystal Maze

If you’re too young to remember The Crystal Maze, or if you come from a country lacking this nugget of TV gold, let me explain. A band of fairly useless contestants ran around a huge warehouse decked out to represent four zones: Industrial, Aztec, Futuristic, and Medieval. They were accompanied by a wisecracking host in a fancy coat, Richard O’Brien.

A GIF of Crystal Maze host Richard O'Brien having fun on set. Build your own Raspberry Pi Crystal Maze

Richard O’Brien also wrote The Rocky Horror Picture Show so, y’know, he was interesting to watch if nothing else.

The contestants would enter rooms to play themed challenges – the categories were mental, physical, mystery, and skill – with the aim of winning crystals. If they messed up, they were locked in the room forever (well, until the end of the episode). For every crystal they collected, they’d be given a bit more time in a giant crystal dome at the end of the programme. And what did they do in the dome? They tried to collect pieces of gold paper while being buffeted by a wind machine, of course!

A GIF of a boring prize being announced to the competing team. Build your own Raspberry Pi Crystal Maze

Collect enough gold paper and you win a mediocre prize. Fail to collect enough gold paper and you win a mediocre prize. Like I said: TV gold.

Sounds fun, doesn’t it? Here are some free resources that will help you recreate the experience of The Crystal Maze in your living room…without the fear of being locked in.

Marble maze

Image of Crystal Maze Board Game

Photo credit: Board Game Geek

Make the classic Crystal Maze game, but this time with a digital marble! Use your Sense HAT to detect pitch, roll, and yaw as you guide the marble to its destination.

Bonus fact: marble mazes featured in the Crystal Maze board game from the 1990s.

Buzz Wire

Crystal Maze Buzz Wire game screengrab

Photo credit: Board Game Geek

Guide the hook along the wire and win the crystal! Slip up and buzz three times, though, and it’s an automatic lock-in. The beauty of this make is that you can play any fail sound you like: burp wire, anyone? Follow the tutorial by community member David Pride, which he created for the Cotswold Jam.

Laser tripwire

Crystal Maze laser trip wire screengrab

Photo credit: Marc Gerrish

Why not recreate the most difficult game of all? Can you traverse a room without setting off the laser alarms, and grab the crystal? Try your skill with our laser tripwire resource!

Forget the crystal! Get out!

I would love to go to a school fête where kids build their own Crystal Maze-style challenges. I’m sure there are countless other events which you could jazz up with a fun digital making challenge, though the bald dude in a fur coat remains optional. So if you have made your own Crystal Maze challenge, or you try out one of ours, we’d love to hear about it!

Here at the Raspberry Pi Foundation, we take great pride in the wonderful free resources we produce for you to use in classes, at home, and in coding clubs. We publish them under a Creative Commons licence, and they’re an excellent way to develop your digital making skills. And massive thanks to David Pride and the Cotswold Jam for creating and sharing your great resources for free.

The post Build your own Crystal Maze at Home appeared first on Raspberry Pi.

Scaling Your Desktop Application Streams with Amazon AppStream 2.0

Post Syndicated from Bryan Liston original https://aws.amazon.com/blogs/compute/scaling-your-desktop-application-streams-with-amazon-appstream-2-0/

Want to stream desktop applications to a web browser, without rewriting them? Amazon AppStream 2.0 is a fully managed, secure, application streaming service. An easy way to learn what the service does is to try out the end-user experience, at no cost.

In this post, I describe how you can scale your AppStream 2.0 environment, and achieve some cost optimizations. I also add some setup and monitoring tips.

AppStream 2.0 workflow

You import your applications into AppStream 2.0 using an image builder. The image builder allows you to connect to a desktop experience from within the AWS Management Console, and then install and test your apps. Then, create an image that is a snapshot of the image builder.

After you have an image containing your applications, select an instance type and launch a fleet of streaming instances. Each instance in the fleet is used by only one user, and you match the instance type used in the fleet to match the needed application performance. Finally, attach the fleet to a stack to set up user access. The following diagram shows the role of each resource in the workflow.

Figure 1: Describing an AppStream 2.0 workflow

appstreamscaling_1.png

Setting up AppStream 2.0

To get started, set up an example AppStream 2.0 stack or use the Quick Links on the console. For this example, I named my stack ds-sample, selected a sample image, and chose the stream.standard.medium instance type. You can explore the resources that you set up in the AWS console, or use the describe-stacks and describe-fleets commands as follows:

Figure 2: Describing an AppStream 2.0 stack

appstreamscaling_1.png

Figure 3: Describing an AppStream 2.0 fleet

appstreamscaling_2.43%20AM

To set up user access to your streaming environment, you can use your existing SAML 2.0 compliant directory. Your users can then use their existing credentials to log in. Alternatively, to quickly test a streaming connection, or to start a streaming session from your own website, you can create a streaming URL. In the console, choose Stacks, Actions, Create URL, or call create-streaming-url as follows:

Figure 4: Creating a streaming URL

appstreamscaling_3.png

You can paste the streaming URL into a browser, and open any of the displayed applications.

appstreamscaling_4.30%20PM

Now that you have a sample environment set up, here are a few tips on scaling.

Scaling and cost optimization for AppStream 2.0

To provide an instant-on streaming connection, the instances in an AppStream 2.0 fleet are always running. You are charged for running instances, and each running instance can serve exactly one user at any time. To optimize your costs, match the number of running instances to the number of users who want to stream apps concurrently. This section walks through three options for doing this:

  • Fleet Auto Scaling
  • Fixed fleets based on a schedule
  • Fleet Auto Scaling with schedules

Fleet Auto Scaling

To dynamically update the number of running instances, you can use Fleet Auto Scaling. This feature allows you to scale the size of the fleet automatically between a minimum and maximum value based on demand. This is useful if you have user demand that changes constantly, and you want to scale your fleet automatically to match this demand. For examples about setting up and managing scaling policies, see Fleet Auto Scaling.

You can trigger changes to the fleet through the available Amazon CloudWatch metrics:

  • CapacityUtilization – the percentage of running instances already used.
  • AvailableCapacity – the number of instances that are unused and can receive connections from users.
  • InsufficientCapacityError – an error that is triggered when there is no available running instance to match a user’s request.

You can create and attach scaling policies using the AWS SDK or AWS Management Console. I find it convenient to set up the policies using the console. Use the following steps:

  1. In the AWS Management Console, open AppStream 2.0.
  2. Choose Fleets, select a fleet, and choose Scaling Policies.
  3. For Minimum capacity and Maximum capacity, enter values for the fleet.

Figure 5: Fleets tab for setting scaling policies

appstreamscaling_5.png

  1. Create scale out and scale in policies by choosing Add Policy in each section.

Figure 6: Adding a scale out policy

appstreamscaling_6.png

Figure 7: Adding a scale in policy

appstreamscaling_7.png

After you create the policies, they are displayed as part of your fleet details.

appstreamscaling_8.png

The scaling policies are triggered by CloudWatch alarms. These alarms are automatically created on your behalf when you create the scaling policies using the console. You can view and modify the alarms via the CloudWatch console.

Figure 8: CloudWatch alarms for triggering fleet scaling

appstreamscaling_9.png

Fixed fleets based on a schedule

An alternative option to optimize costs and respond to predictable demand is to fix the number of running instances based on the time of day or day of the week. This is useful if you have a fixed number of users signing in at different times of the day― scenarios such as a training classes, call center shifts, or school computer labs. You can easily set the number of instances that are running using the AppStream 2.0 update-fleet command. Update the Desired value for the compute capacity of your fleet. The number of Running instances changes to match the Desired value that you set, as follows:

Figure 9: Updating desired capacity for your fleet

appstreamscaling_10.png

Set up a Lambda function to update your fleet size automatically. Follow the example below to set up your own functions. If you haven’t used Lambda before, see Step 2: Create a HelloWorld Lambda Function and Explore the Console.

To create a function to change the fleet size

  1. In the Lambda console, choose Create a Lambda function.
  2. Choose the Blank Function blueprint. This gives you an empty blueprint to which you can add your code.
  3. Skip the trigger section for now. Later on, you can add a trigger based on time, or any other input.
  4. In the Configure function section:
    1. Provide a name and description.
    2. For Runtime, choose Node.js 4.3.
    3. Under Lambda function handler and role, choose Create a custom role.
    4. In the IAM wizard, enter a role name, for example Lambda-AppStream-Admin. Leave the defaults as is.
    5. After the IAM role is created, attach an AppStream 2.0 managed policy “AmazonAppStreamFullAccess” to the role. For more information, see Working with Managed Policies. This allows Lambda to call the AppStream 2.0 API on your behalf. You can edit and attach your own IAM policy, to limit access to only actions you would like to permit. To learn more, see Controlling Access to Amazon AppStream 2.0.
    6. Leave the default values for the rest of the fields, and choose Next, Create function.
  5. To change the AppStream 2.0 fleet size, choose Code and add some sample code, as follows:
    'use strict';
    
    /**
    This AppStream2 Update-Fleet blueprint sets up a schedule for a streaming fleet
    **/
    
    const AWS = require('aws-sdk');
    const appstream = new AWS.AppStream();
    const fleetParams = {
      Name: 'ds-sample-fleet', /* required */
      ComputeCapacity: {
        DesiredInstances: 1 /* required */
    
      }
    };
    
    exports.handler = (event, context, callback) => {
        console.log('Received event:', JSON.stringify(event, null, 2));
    
        var resource = event.resources[0];
        var increase = resource.includes('weekday-9am-increase-capacity')
    
        try {
            if (increase) {
                fleetParams.ComputeCapacity.DesiredInstances = 3
            } else {
                fleetParams.ComputeCapacity.DesiredInstances = 1
            }
            appstream.updateFleet(fleetParams, (error, data) => {
                if (error) {
                    console.log(error, error.stack);
                    return callback(error);
                }
                console.log(data);
                return callback(null, data);
            });
        } catch (error) {
            console.log('Caught Error: ', error);
            callback(error);
        }
    };

  6. Test the code. Choose Test and use the “Hello World” test template. The first time you do this, choose Save and Test. Create a test input like the following to trigger the scaling update.

    appstreamscaling_11.png

  7. You see output text showing the result of the update-fleet call. You can also use the CLI to check the effect of executing the Lambda function.

Next, to set up a time-based schedule, set a trigger for invoking the Lambda function.

To set a trigger for the Lambda function

  1. Choose Triggers, Add trigger.
  2. Choose CloudWatch Events – Schedule.
  3. Enter a rule name, such as “weekday-9am-increase-capacity”, and a description. For Schedule expression, choose cron. You can edit the value for the cron later.
  4. After the trigger is created, open the event weekday-9am-increase-capacity.
  5. In the CloudWatch console, edit the event details. To scale out the fleet at 9 am on a weekday, you can adjust the time to be: 00 17 ? * MON-FRI *. (If you’re not in Seattle (Pacific Time Zone), change this to another specific time zone).
  6. You can also add another event that triggers at the end of a weekday.

appstreamscaling_12.png

This setup now triggers scale-out and scale-in automatically, based on the time schedule that you set.

Fleet Auto Scaling with schedules

You can choose to combine both the fleet scaling and time-based schedule approaches to manage more complex scenarios. This is useful to manage the number of running instances based on business and non-business hours, and still respond to changes in demand. You could programmatically change the minimum and maximum sizes for your fleet based on time of day or day of week, and apply the default scale-out or scale-in policies. This allows you to respond to predictable minimum demand based on a schedule.

For example, at the start of a work day, you might expect a certain number of users to request streaming connections at one time. You wouldn’t want to wait for the fleet to scale out and meet this requirement. However, during the course of the day, you might expect the demand to scale in or out, and would want to match the fleet size to this demand.

To achieve this, set up the scaling polices via the console, and create a Lambda function to trigger changes to the minimum, maximum, and desired capacity for your fleet based on a schedule. Replace the code for the Lambda function that you created earlier with the following code:

'use strict';

/**
This AppStream2 Update-Fleet function sets up a schedule for a streaming fleet
**/

const AWS = require('aws-sdk');
const appstream = new AWS.AppStream();
const applicationAutoScaling = new AWS.ApplicationAutoScaling();

const fleetParams = {
  Name: 'ds-sample-fleet', /* required */
  ComputeCapacity: {
    DesiredInstances: 1 /* required */
  }
};

var scalingParams = {
  ResourceId: 'fleet/ds-sample-fleet', /* required - fleet name*/
  ScalableDimension: 'appstream:fleet:DesiredCapacity', /* required */
  ServiceNamespace: 'appstream', /* required */
  MaxCapacity: 1,
  MinCapacity: 6,
  RoleARN: 'arn:aws:iam::659382443255:role/service-role/ApplicationAutoScalingForAmazonAppStreamAccess'
};

exports.handler = (event, context, callback) => {
    
    console.log('Received this event now:', JSON.stringify(event, null, 2));
    
    var resource = event.resources[0];
    var increase = resource.includes('weekday-9am-increase-capacity')

    try {
        if (increase) {
            //usage during business hours - start at capacity of 10 and scale
            //if required. This implies at least 10 users can connect instantly. 
            //More users can connect as the scaling policy triggers addition of
            //more instances. Maximum cap is 20 instances - fleet will not scale
            //beyond 20. This is the cap for number of users.
            fleetParams.ComputeCapacity.DesiredInstances = 10
            scalingParams.MinCapacity = 10
            scalingParams.MaxCapacity = 20
        } else {
            //usage during non-business hours - start at capacity of 1 and scale
            //if required. This implies only 1 user can connect instantly. 
            //More users can connect as the scaling policy triggers addition of
            //more instances. 
            fleetParams.ComputeCapacity.DesiredInstances = 1
            scalingParams.MinCapacity = 1
            scalingParams.MaxCapacity = 10
        }
        
        //Update minimum and maximum capacity used by the scaling policies
        applicationAutoScaling.registerScalableTarget(scalingParams, (error, data) => {
             if (error) console.log(error, error.stack); 
             else console.log(data);                     
            });
            
        //Update the desired capacity for the fleet. This sets 
        //the number of running instances to desired number of instances
        appstream.updateFleet(fleetParams, (error, data) => {
            if (error) {
                console.log(error, error.stack);
                return callback(error);
            }

            console.log(data);
            return callback(null, data);
        });
            
    } catch (error) {
        console.log('Caught Error: ', error);
        callback(error);
    }
};

Note: To successfully execute this code, you need to add IAM policies to the role used by the Lambda function. The policies allow Lambda to call the Application Auto Scaling service on your behalf.

Figure 10: Inline policies for using Application Auto Scaling with Lambda

{
"Version": "2012-10-17",
"Statement": [
   {
      "Effect": "Allow", 
         "Action": [
            "iam:PassRole"
         ],
         "Resource": "*"
   }
]
}
{
"Version": "2012-10-17",
"Statement": [
   {
      "Effect": "Allow", 
         "Action": [
            "application-autoscaling:*"
         ],
         "Resource": "*"
   }
]
}

Monitoring usage

After you have set up scaling for your fleet, you can use CloudWatch metrics with AppStream 2.0, and create a dashboard for monitoring. This helps optimize your scaling policies over time based on the amount of usage that you see.

For example, if you were very conservative with your initial set up and over-provisioned resources, you might see long periods of low fleet utilization. On the other hand, if you set the fleet size too low, you would see high utilization or errors from insufficient capacity, which would block users’ connections. You can view CloudWatch metrics for up to 15 months, and drive adjustments to your fleet scaling policy.

Figure 11: Dashboard with custom Amazon CloudWatch metrics

appstreamscaling_13.53%20PM

Summary

These are just a few ideas for scaling AppStream 2.0 and optimizing your costs. Let us know if these are useful, and if you would like to see similar posts. If you have comments about the service, please post your feedback on the AWS forum for AppStream 2.0.