In the past I’ve talked about several agents, deaemons, and scripts that you could use to collect system metrics and log files for your Windows and Linux instances and on-premise services and publish them to Amazon CloudWatch. The data collected by this somewhat disparate collection of tools gave you visibility into the status and behavior of your compute resources, along with the power to take action when a value goes out of range and indicates a potential issue. You can graph any desired metrics on CloudWatch Dashboards, initiate actions via CloudWatch Alarms, and search CloudWatch Logs to find error messages, while taking advantage of our support for custom high-resolution metrics.
New Unified Agent Today we are taking a nice step forward and launching a new, unified CloudWatch Agent. It runs in the cloud and on-premises, on Linux and Windows instances and servers, and handles metrics and log files. You can deploy it using AWS Systems Manager (SSM) Run Command, SSM State Manager, or from the CLI. Here are some of the most important features:
Single Agent – A single agent now collects both metrics and logs. This simplifies the setup process and reduces complexity.
Cross-Platform / Cross-Environment – The new agent runs in the cloud and on-premises, on 64-bit Linux and 64-bit Windows, and includes HTTP proxy server support.
Configurable – The new agent captures the most useful system metrics automatically. It can be configured to collect hundreds of others, including fine-grained metrics on sub-resources such as CPU threads, mounted filesystems, and network interfaces.
CloudWatch-Friendly – The new agent supports standard 1-minute metrics and the newer 1-second high-resolution metrics. It automatically includes EC2 dimensions such as Instance Id, Image Id, and Auto Scaling Group Name, and also supports the use of custom dimensions. All of the dimensions can be used for custom aggregation across Auto Scaling Groups, applications, and so forth.
Migration – You can easily migrate existing AWS SSM and EC2Config configurations for use with the new agent.
Installing the Agent The CloudWatch Agent uses an IAM role when running on an EC2 instance, and an IAM user when running on an on-premises server. The role or the user must include the AmazonSSMFullAccess and AmazonEC2ReadOnlyAccess policies. Here’s my role:
I can easily add it to a running instance (this is a relatively new and very handy EC2 feature):
Next, I install the CloudWatch Agent using the AWS Systems Manager:
This takes just a few seconds. Now I can use a simple wizard to set up the configuration file for the agent:
The wizard also lets me set up the log files to be monitored:
The wizard generates a JSON-format config file and stores it on the instance. It also offers me the option to upload the file to my Parameter Store so that I can deploy it to my other instances (I can also do fine-grained customization of the metrics and log collection configuration by editing the file):
Now I can start the CloudWatch Agent using Run Command, supplying the name of my configuration in the Parameter Store:
This runs in a few seconds and the agent begins to publish metrics right away. As I mentioned earlier, the agent can publish fine-grained metrics on the resources inside of or attached to an instance. For example, here are the metrics for each filesystem:
There’s a separate log stream for each monitored log file on each instance:
I can view and search it, just like I can do for any other log stream:
Now Available The new CloudWatch Agent is available now and you can start using it today in all public AWS Regions, with AWS GovCloud (US) and the Regions in China to follow.
There’s no charge for the agent; you pay the usual CloudWatch prices for logs and custom metrics.
AWS Systems Manager is a new way to manage your cloud and hybrid IT environments. AWS Systems Manager provides a unified user interface that simplifies resource and application management, shortens the time to detect and resolve operational problems, and makes it easy to operate and manage your infrastructure securely at scale. This service is absolutely packed full of features. It defines a new experience around grouping, visualizing, and reacting to problems using features from products like Amazon EC2 Systems Manager (SSM) to enable rich operations across your resources.
As I said above, there are a lot of powerful features in this service and we won’t be able to dive deep on all of them but it’s easy to go to the console and get started with any of the tools.
You start by defining a group based on tag filters. From there you can view all of the resources in a centralized console. You would typically use these groupings to differentiate between applications, application layers, and environments like production or dev – but you can make your own rules about how to use them as well. If you imagine a typical 3 tier web-app you might have a few EC2 instances, an ELB, a few S3 buckets, and an RDS instance. You can define a grouping for that application and with all of those different resources simultaneously.
AWS Systems Manager automatically aggregates and displays operational data for each resource group through a dashboard. You no longer need to navigate through multiple AWS consoles to view all of your operational data. You can easily integrate your exiting Amazon CloudWatch dashboards, AWS Config rules, AWS CloudTrail trails, AWS Trusted Advisor notifications, and AWS Personal Health Dashboard performance and availability alerts. You can also easily view your software inventories across your fleet. AWS Systems Manager also provides a compliance dashboard allowing you to see the state of various security controls and patching operations across your fleets.
Acting on Insights
Building on the success of EC2 Systems Manager (SSM), AWS Systems Manager takes all of the features of SSM and provides a central place to access them. These are all the same experiences you would have through SSM with a more accesible console and centralized interface. You can use the resource groups you’ve defined in Systems Manager to visualize and act on groups of resources.
Automations allow you to define common IT tasks as a JSON document that specify a list of tasks. You can also use community published documents. These documents can be executed through the Console, CLIs, SDKs, scheduled maintenance windows, or triggered based on changes in your infrastructure through CloudWatch events. You can track and log the execution of each step in the documents and prompt for additional approvals. It also allows you to incrementally roll out changes and automatically halt when errors occur. You can start executing an automation directly on a resource group and it will be able to apply itself to the resources that it understands within the group.
Run Command is a superior alternative to enabling SSH on your instances. It provides safe, secure remote management of your instances at scale without logging into your servers, replacing the need for SSH bastions or remote powershell. It has granular IAM permissions that allow you to restrict which roles or users can run certain commands.
Patch Manager, Maintenance Windows, and State Manager
I’ve written about Patch Manager before and if you manage fleets of Windows and Linux instances it’s a great way to maintain a common baseline of security across your fleet.
Maintenance windows allow you to schedule instance maintenance and other disruptive tasks for a specific time window.
State Manager allows you to control various server configuration details like anti-virus definitions, firewall settings, and more. You can define policies in the console or run existing scripts, PowerShell modules, or even Ansible playbooks directly from S3 or GitHub. You can query State Manager at any time to view the status of your instance configurations.
Things To Know
There’s some interesting terminology here. We haven’t done the best job of naming things in the past so let’s take a moment to clarify. EC2 Systems Manager (sometimes called SSM) is what you used before today. You can still invoke aws ssm commands. However, AWS Systems Manager builds on and enhances many of the tools provided by EC2 Systems Manager and allows those same tools to be applied to more than just EC2. When you see the phrase “Systems Manager” in the future you should think of AWS Systems Manager and not EC2 Systems Manager.
AWS Systems Manager with all of this useful functionality is provided at no additional charge. It is immediately available in all public AWS regions.
The best part about these services is that even with their tight integrations each one is designed to be used in isolation as well. If you only need one component of these services it’s simple to get started with only that component.
There’s a lot more than I could ever document in this post so I encourage you all to jump into the console and documentation to figure out where you can start using AWS Systems Manager.
Amazon ElastiCache makes it easy to for you to set up a fast, in-memory data store and cache. With support for the two most popular open source offerings (Redis and Memcached), ElastiCache supports the demanding needs of game leaderboards, in-memory analytics, and large-scale messaging.
Today I would like to tell you about an important addition to Amazon ElastiCache for Redis. You can already create clusters with up to 15 shards, each responsible for storing keys and values for a specific set of slots (each cluster has exactly 16,384 slots). A single cluster can expand to store 3.55 terabytes of in-memory data while supporting up to 20 million reads and 4.5 million writes per second.
Now with Online Resizing You can now adjust the number of shards in a running ElastiCache for Redis cluster while the cluster remains online and responding to requests. This gives you the power to respond to changes in traffic and data volume without having to take the cluster offline or to start with an empty cache. You can also rebalance a running cluster to uniformly redistribute slot space without changing the number of shards.
When you initiate a resharding or rebalancing operation, ElastiCache for Redis starts by preparing a plan that will result in an even distribution of slots across the shards in the cluster. Then it transfers slots across shards, moving many in parallel for efficiency. This all happens while the cluster continues to respond to requests, with a modest impact on write throughput for writes to a slot that is in motion. The migration rate is dependent on the instance type, network speed, read/write traffic to the slots, and is generally about 1 gigabyte per minute.
The resharding and rebalancing operations apply to Redis clusters that were created with Cluster Mode enabled:
Resharding a Cluster In general, you will know that it is time to expand a cluster via resharding when it starts to face significant memory pressure or when individual nodes are becoming bottlenecks. You can watch the cluster’s CloudWatch metrics to identify each situation:
To reshard a Redis cluster from the ElastiCache Dashboard, click on the cluster to visit the detail page, and then click on the Add shards button:
Enter the number of shards to add and (optionally) the desired Availability Zones, then click on Add:
The status of the cluster will change to modifying and the resharding process will begin. It can take anywhere from a few minutes to several hours, as indicated above. You can track the progress on the detail page for the cluster:
You can see the slots moving from shard to shard:
You can also watch the Events for the cluster:
During the resharding you should avoid the use of the KEYS and SMEMBERS commands, as well as compute-intensive Lua scripts in order to moderate the load on the cluster shards. You should avoid the FLUSHDB and FLUSHALL commands entirely; using them will interrupt and then abort the resharding process.
The status of each shard will return to available when the process is complete:
The same process takes place when you delete shards.
As part of the AWS Shared Responsibility Model, you are responsible for monitoring and managing your resources at the operating system and application level. When you monitor your application servers, for example, you can measure, visualize, react to, and improve the security of those servers. You probably already do this on premises or in other environments, and you can adapt your existing processes, tools, and methodologies for use in the AWS Cloud. For more details about best practices for monitoring your AWS resources, see the “Manage Security Monitoring, Alerting, Audit Trail, and Incident Response” section in the AWS Security Best Practices whitepaper.
This blog post focuses on how to log and create alarms on invalid Secure Shell (SSH) access attempts. Implementing live monitoring and session recording facilitates the identification of unauthorized activity and can help confirm that remote users access only those systems they are authorized to use. With SSH log information in hand (such as invalid access type, bad private keys, and remote IP addresses), you can take proactive actions to protect your servers. For example, you can use an AWS Lambda function to adjust your server’s security rules when an alarm is triggered that indicates an invalid SSH access attempt.
In this post, I demonstrate how to use Amazon CloudWatch Logs to monitor SSH access to your application servers (Amazon EC2 Linux instances) so that you can monitor rejected SSH connection requests and take action. I also show how to configure CloudWatch Logs to send SSH access logs from application servers that reside in a public subnet. Last, I demonstrate how to visualize how many attempts are made to SSH into your application servers with bad private keys and invalid user names. Using these techniques and tools can help you improve the security of your application servers.
AWS services and terminology I use in this post
In this post, I use the following AWS services and terminology:
Amazon CloudWatch – A monitoring service for the resources and applications you run on the AWS Cloud. You can use CloudWatch to collect and track metrics, collect and monitor log files, set alarms, and automatically react to changes in your AWS resources.
CloudWatch namespaces – Containers for metrics. Metrics in different namespaces are isolated from each other so that metrics from different applications are not mistakenly aggregated into the same statistics. You also can create custom metrics for which you must specify namespaces as containers.
CloudWatch Logs – A feature of CloudWatch that allows you to monitor, store, and access your log files from EC2 instances, AWS CloudTrail, and other sources. Additionally, you can use CloudWatch Logs to monitor applications and systems by using log data and create alarms. For example, you can choose to search for a phrase in logs and then create an alarm if the phrase you are looking for is found in the log more than 5 times in the last 10 minutes. You can then take action on these alarms, if necessary.
Log stream – A log stream represents the sequence of events coming from an application instance or resource that you are monitoring. In this post, I use the EC2 instance ID as the log stream identifier so that I can easily map log entries to the instances that produced the log entries
Log group – In CloudWatch Logs, a group of log streams that share the same retention time, monitoring, and access control settings. Each log stream must belong to one log
Metric – A specific term or value that you can monitor and extract from log events.
Metric filter – A metric filter describes how Amazon CloudWatch Logs extracts information from logs and transforms it into CloudWatch metrics. It defines the terms and patterns to look for in log data as the data is sent to CloudWatch Logs. Metric filters are assigned to log groups, and all metric filters assigned to a given log group are applied to their log stream—see the following diagram for more details.
SSH logs – Reside on EC2 instances and capture all SSH activities. The logs include successful attempts as well as unsuccessful attempts. Debian Linux SSH logs reside in /var/log/auth.log, and stock CentOS SSH logs are written to /var/log/secure. This blog post uses an Amazon Linux AMI, which also logs SSH sessions to /var/log/secure.
AWS Identity and Access Management (IAM) – IAM enables you to securely control access to AWS services and resources for your users. In the solution in this post, you create an IAM policy and configure an EC2 instance that assumes a role. The IAM policy allows the EC2 instance to create log events and save them in an Amazon S3 bucket (in other words, CloudWatch Logs log files are saved in the S3 bucket).
CloudWatch dashboards – Amazon CloudWatch dashboards are customizable home pages in the CloudWatch console that you can use to monitor your resources in a single view, even those resources that are spread across different regions. You can use CloudWatch dashboards to create customized views of the metrics and alarms for your AWS resources.
The following diagram depicts the services and flow of information between the different AWS services used in this post’s solution.
Here is how the process works, as illustrated and numbered in the preceding diagram:
A CloudWatch Logs agent runs on each EC2 instance. The agents are configured to send SSH logs from the EC2 instance to a log stream identified by an instance ID.
Log streams are aggregated into a log group. As a result, one log group contains all the logs you want to analyze from one or more instances.
You apply metric filters to a log group in order to search for specific keywords. When the metric filter finds specific keywords, the filter counts the occurrences of the keywords in a time-based sliding window. If the occurrence of a keyword exceeds the CloudWatch alarm threshold, an alarm is triggered.
An IAM policy defines a role that gives the EC2 servers permission to create logs in a log group and send log events (new log entries) from EC2 to log groups. This role is then assumed by the application servers.
CloudWatch alarms notify users when a specified threshold has been crossed. For example, you can set an alarm to trigger when more than 2 failed SSH connections happen in a 5-minute period.
The CloudWatch dashboard is used to visualize data and alarms from the monitoring process.
Deploy and test the solution
1. Deploy the solution by using CloudFormation
Now that I have explained how the solution works, I will show how to use AWS CloudFormation to create a stack with the desired solution configuration. CloudFormation allows you to create a stack of resources in your AWS account.
Sign in to the AWS Management Console, choose CloudFormation, choose Create Stack, choose Specify an Amazon S3 template URL and paste the following link in the box: https://s3.amazonaws.com/awsiammedia/public/sample/MonitorSSHActivities/CloudWatchLogs_ssh.yaml
Choose Launch to deploy the stack.
On the Specify Details page, enter the Stack name. Then enter the KeyName, which is the SSH key pair for the region you use. I use this key-pair later in this post; if you don’t have a key pair for your current region, follow these instructions to create one. The OperatorEmail is the CloudWatch alarm recipient email address (this field is mandatory to launch the stack), which is the email address to which SSH activity alarms will be sent. You can use the SSHLocation box to limit the IP address range that can access your instances; the default is 0.0.0/0, which means that any IP can access the instance. After specifying these variables, click Next.
On the Options page, tag your instance, and click Next. Tags allow you to assign metadata to AWS resources. For example, you can tag a project’s resources and then use the tag to manage, search for, and filter resources. For more information about tagging, see Tagging Your Amazon EC2 Resources.
Wait until the CloudFormation template shows CREATE_COMPLETE, as shown in the following screenshot. This means your stack was created successfully.
After the stack is created successfully, you have two distinct application servers running, each with a CloudWatch agent. These servers represent a fleet of servers in your infrastructure. Choose the Outputs tab to see more details about the resources, such as the public IP addresses of the servers. You will need to use these IP addresses later in this post in order to trigger log events and alarms.
The CloudWatch log agent on each server is installed at startup and configured to stream SSH log entries from /var/log/secure to CloudWatch via a log stream. CloudWatch aggregates the log streams (ssh.log) from the application servers and saves them in a CloudWatch Logs log group. Each log stream is identified by an instance-ID, as shown in the following screenshot.
The application servers assume a role that gives them permissions to create CloudWatch Logs log files and events. CloudFormation also configures two metrics: ssh/InvalidUser and ssh/Disconnect. The ssh/InvalidUser metric sends an alarm when there are more than 2 SSH attempts into any server that include an invalid user name. Similarly, the ssh/Disconnect metric creates an alarm when more than 10 SSH disconnect requests come from users within 5 minutes.
To review the metrics created by CloudFormation, choose Metrics in the CloudWatch console. A new SSH custom namespace has been created, which contains the two metrics described in the previous paragraph.
You should now have two application servers running and two custom CloudWatch metrics and alarms configured. Now, it’s time to generate log events, trigger alarms, and test the configurations.
2. Test SSH metrics and alarms
Now, let’s try to trigger an alarm by trying to SSH with an invalid user name into one of the servers. Use the key pair you specified when launching the stack and connect to one of the Linux instances from a terminal window (replace the placeholder values in the following command).
ssh -i <the-key-pair-you-specified-in-the-CloudFormation-template>[email protected]<ec2 DNS or IP address>
Now, exit the session and try to sign in as bad-user, as shown in the following command.
ssh -i <the-key-pair-you-specified-in-the-CloudFormation-template>bad-user@<ec2 DNS or IP address>
The following command is the same as the previous command, but with the placeholder values replaced by actual values.
Because the alarm triggers after two or more unsuccessful SSH login attempts with an invalid user name in 1 minute, run the preceding command a few times. The server’s log captures the bad SSH login attempts, and after a minute, you should see InvalidUserAlarm in the CloudWatch console, as shown in the following screenshot. Choose Alarms to see more details. The alarm should disappear after another minute if there are no more SSH login attempts.
You can also view the history of your alarms by choosing the History tab. CloudWatch metrics are saved for 15 months.
When the CloudFormation stack launches, a topic-registration email is sent to the email address you specified in the template. After you accept the topic registration, you will receive an alarm email with details about the alarm. The email looks like what is shown in the following screenshot.
3. Understanding CloudWatch metric filters and their transformation
The CloudFormation template includes two alarms, InvalidUserAlarm and SSHReceiveddisconnectAlarm, and two metric filters. As I mentioned previously, the metric filters define the pattern you want to match in a CloudWatch Logs log group. When a pattern is found, it transforms into an Amazon metric as defined in the MetricTransformations section of the metric filter.
The following is a snippet of the InvalidUser metric filter. Each pattern match—denoted by FilterPattern—is counted as one metric value as specified in the MetricValue parameter in the MetricTranformations section. The CloudWatch alarm associated with this metric filter will be triggered when the metric value crosses a specified threshold.
When a CloudWatch alarm is triggered, the service sends an email to an Amazon SNS topic with more details about the alarm type, trigger parameters, and status.
4. Create a CloudWatch metric filter to identify attempts to SSH into your servers with bad private keys
You can create additional metric filters in CloudWatch Logs to provide better visibility into the SSH activity on your servers. Let’s assume you want to know if there are too many attempts to SSH into your servers with bad private keys. If an attempt is made with a bad private key, a line like the following is logged in the SSH log file.
You can produce this log line by modifying the pem file you are using (a pem file holds your private key). In a terminal window, modify your private key by copying and pasting the following lines in the same directory where your key resides.
$ cat <valid-pem-file> | sed s/./A/25 > bad-keys.pem
$ cat bad-keys.pem | sed s/./A/26 > bad-keys.pem
These lines simply change the characters at positions 25 and 26 from their current value to the character A, keeping the original pem file intact. Alternatively, you can use nano <valid-keys>.pem from the command line or any other editor, change a character, save the file as bad-keys.pem, and exit the file.
Now, try to use bad-keys.pem to access one of the application servers.
The SSH attempt should fail because you are using a bad private key.
Permission denied (public key).
Now, let’s look at the server’s ssh.log file from the CloudWatch Logs console and analyze the error log messages. I need to understand the log format in order to configure a new filter. To review the logs, choose Logs in the navigation pane, and select the log group that was created by CloudFormation (it starts with the name you specified when launching the CloudFormation template).
In particular, notice the following line when you try to SSH with a bad private key.
Let’s add a metric filter to capture this line so that we can use this metric later when we build an SSH Dashboard. Copy the following line to the Filter events search box at the top of the console screen and press Enter.
You can now see only the lines that match the pattern you specified. These are the lines you want to count and transform into metrics. Each string in the message is represented by a word in the filter. In our example, we are looking for a pattern where the sixth word is Connection and the seventh word is closed. Other words in the log line are not important in this context. The following image depicts the mapping between a string in a log file and a metric filter.
To create the metric filter, choose Logs in the navigation pane of the CloudWatch console. Choose the log groups to which you want to apply the new metric filter and then choose Create Metric Filter. Choose Next.
Paste the filter pattern we used previously (the sixth word equals Connection and the seventh word equals closed) in the Filter Pattern box. Select the server you tried to sign in to with the bad private key to Select Log Data to Test and click Test Pattern. You should see the results that are shown in the following screenshot. When completed, click Assign Metric.
Type SSH for the Metric Namespace and sshClosedConnection-InvalidKeysFilter for Filter Name. Choose Create Filter to see your new metric filter listed. You can use the newly created metric filter to graph the metrics and set alarms. The alarms can be used to inform your administrator via email of any event you specify. In addition, metrics can be used to generate SNS notification to trigger an AWS Lambda function in order to take proactive actions, such as blocking suspicious IP addresses in a security group.
Choose Create Alarm next to Filter Name and follow the instructions to create a CloudWatch alarm.
Back at the Metrics view, you should now have three SSH metric filters under Custom Namespaces. Note that it can take a few minutes for the number of SSH metrics to update from two to three.
5. Create a graph by using a CloudWatch dashboard
After you have configured the metrics, you can display SSH metrics in a graph. CloudWatch dashboards allow you to create reusable graphs of AWS resources and custom metrics so that you can quickly monitor your operational status and identify issues at a glance. Metrics data is kept for a period of two weeks.
In the CloudWatch console, choose Dashboards in the navigation pane, and then choose Create dashboard to create a new graph in a dashboard. Name your dashboard SSH-Dashboard and choose Create dashboard. Choose Line Graph from the graph options and choose Configure.
In the Add metric graph window under Custom Namespace, choose SSH > Metrics with no dimensions. Select all three metrics you have configured (the CloudFormation template configured two metrics and you manually added one more metric).
By default, the metrics are displayed on the graph as an average. However, you configured metrics that are based on summary metrics (for example, the total number of alarms in two minutes). To change the default, choose the Graphed metrics tab, and change the statistic from Average to Sum, as shown in the following screenshot. Also, change the time period from 5 minutes to 1 minute.
Your graphed metrics should look like the following screenshot. When you have provided all the necessary information, choose Create Widget.
You can rename the graph and add static text to give the console more context. To add a text widget, choose Widget and select text. Then edit the widget with markdown language. Your dashboard may then look like the following screenshot.
The consolidated metrics graph displays the number of SSH attempts with bad private keys, invalid user names, and too many disconnects.
In this blog post, I demonstrated how to automate the deployment of the CloudWatch Logs agent, create filters and alarms, and write, test, and apply metrics on the fly from the AWS Management Console. You can then visualize the metrics with the AWS Management Console. The solution described in this post gives you monitoring and alarming capabilities that can help you understand the status of and potential risks to your instances and applications. You can easily aggregate logs from many servers and different applications, create alarms, and display logs’ metrics on a graph.
If you have comments about this post, submit them in the “Comments” section below. If you have questions about the solution in this post, start a new thread on the CloudWatch forum.
The collective thoughts of the interwebz
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.