Tag Archives: Amazon ES

Improve the Operational Efficiency of Amazon Elasticsearch Service Domains with Automated Alarms Using Amazon CloudWatch

Post Syndicated from Veronika Megler original https://aws.amazon.com/blogs/big-data/improve-the-operational-efficiency-of-amazon-elasticsearch-service-domains-with-automated-alarms-using-amazon-cloudwatch/

A customer has been successfully creating and running multiple Amazon Elasticsearch Service (Amazon ES) domains to support their business users’ search needs across products, orders, support documentation, and a growing suite of similar needs. The service has become heavily used across the organization.  This led to some domains running at 100% capacity during peak times, while others began to run low on storage space. Because of this increased usage, the technical teams were in danger of missing their service level agreements.  They contacted me for help.

This post shows how you can set up automated alarms to warn when domains need attention.

Solution overview

Amazon ES is a fully managed service that delivers Elasticsearch’s easy-to-use APIs and real-time analytics capabilities along with the availability, scalability, and security that production workloads require.  The service offers built-in integrations with a number of other components and AWS services, enabling customers to go from raw data to actionable insights quickly and securely.

One of these other integrated services is Amazon CloudWatch. CloudWatch is a monitoring service for AWS Cloud resources and the applications that you run on AWS. You can use CloudWatch to collect and track metrics, collect and monitor log files, set alarms, and automatically react to changes in your AWS resources.

CloudWatch collects metrics for Amazon ES. You can use these metrics to monitor the state of your Amazon ES domains, and set alarms to notify you about high utilization of system resources.  For more information, see Amazon Elasticsearch Service Metrics and Dimensions.

While the metrics are automatically collected, the missing piece is how to set alarms on these metrics at appropriate levels for each of your domains. This post includes sample Python code to evaluate the current state of your Amazon ES environment, and to set up alarms according to AWS recommendations and best practices.

There are two components to the sample solution:

  • es-check-cwalarms.py: This Python script checks the CloudWatch alarms that have been set, for all Amazon ES domains in a given account and region.
  • es-create-cwalarms.py: This Python script sets up a set of CloudWatch alarms for a single given domain.

The sample code can also be found in the amazon-es-check-cw-alarms GitHub repo. The scripts are easy to extend or combine, as described in the section “Extensions and Adaptations”.

Assessing the current state

The first script, es-check-cwalarms.py, is used to give an overview of the configurations and alarm settings for all the Amazon ES domains in the given region. The script takes the following parameters:

python es-checkcwalarms.py -h
usage: es-checkcwalarms.py [-h] [-e ESPREFIX] [-n NOTIFY] [-f FREE][-p PROFILE] [-r REGION]
Checks a set of recommended CloudWatch alarms for Amazon Elasticsearch Service domains (optionally, those beginning with a given prefix).
optional arguments:
  -h, --help   		show this help message and exit
  -e ESPREFIX, --esprefix ESPREFIX	Only check Amazon Elasticsearch Service domains that begin with this prefix.
  -n NOTIFY, --notify NOTIFY    List of CloudWatch alarm actions; e.g. ['arn:aws:sns:xxxx']
  -f FREE, --free FREE  Minimum free storage (MB) on which to alarm
  -p PROFILE, --profile PROFILE     IAM profile name to use
  -r REGION, --region REGION       AWS region for the domain. Default: us-east-1

The script first identifies all the domains in the given region (or, optionally, limits them to the subset that begins with a given prefix). It then starts running a set of checks against each one.

The script can be run from the command line or set up as a scheduled Lambda function. For example, for one customer, it was deemed appropriate to regularly run the script to check that alarms were correctly set for all domains. In addition, because configuration changes—cluster size increases to accommodate larger workloads being a common change—might require updates to alarms, this approach allowed the automatic identification of alarms no longer appropriately set as the domain configurations changed.

The output shown below is the output for one domain in my account.

Starting checks for Elasticsearch domain iotfleet , version is 53
Iotfleet Automated snapshot hour (UTC): 0
Iotfleet Instance configuration: 1 instances; type:m3.medium.elasticsearch
Iotfleet Instance storage definition is: 4 GB; free storage calced to: 819.2 MB
iotfleet Desired free storage set to (in MB): 819.2
iotfleet WARNING: Not using VPC Endpoint
iotfleet WARNING: Does not have Zone Awareness enabled
iotfleet WARNING: Instance count is ODD. Best practice is for an even number of data nodes and zone awareness.
iotfleet WARNING: Does not have Dedicated Masters.
iotfleet WARNING: Neither index nor search slow logs are enabled.
iotfleet WARNING: EBS not in use. Using instance storage only.
iotfleet Alarm ok; definition matches. Test-Elasticsearch-iotfleet-ClusterStatus.yellow-Alarm ClusterStatus.yellow
iotfleet Alarm ok; definition matches. Test-Elasticsearch-iotfleet-ClusterStatus.red-Alarm ClusterStatus.red
iotfleet Alarm ok; definition matches. Test-Elasticsearch-iotfleet-CPUUtilization-Alarm CPUUtilization
iotfleet Alarm ok; definition matches. Test-Elasticsearch-iotfleet-JVMMemoryPressure-Alarm JVMMemoryPressure
iotfleet WARNING: Missing alarm!! ('ClusterIndexWritesBlocked', 'Maximum', 60, 5, 'GreaterThanOrEqualToThreshold', 1.0)
iotfleet Alarm ok; definition matches. Test-Elasticsearch-iotfleet-AutomatedSnapshotFailure-Alarm AutomatedSnapshotFailure
iotfleet Alarm: Threshold does not match: Test-Elasticsearch-iotfleet-FreeStorageSpace-Alarm Should be:  819.2 ; is 3000.0

The output messages fall into the following categories:

  • System overview, Informational: The Amazon ES version and configuration, including instance type and number, storage, automated snapshot hour, etc.
  • Free storage: A calculation for the appropriate amount of free storage, based on the recommended 20% of total storage.
  • Warnings: best practices that are not being followed for this domain. (For more about this, read on.)
  • Alarms: An assessment of the CloudWatch alarms currently set for this domain, against a recommended set.

The script contains an array of recommended CloudWatch alarms, based on best practices for these metrics and statistics. Using the array allows alarm parameters (such as free space) to be updated within the code based on current domain statistics and configurations.

For a given domain, the script checks if each alarm has been set. If the alarm is set, it checks whether the values match those in the array esAlarms. In the output above, you can see three different situations being reported:

  • Alarm ok; definition matches. The alarm set for the domain matches the settings in the array.
  • Alarm: Threshold does not match. An alarm exists, but the threshold value at which the alarm is triggered does not match.
  • WARNING: Missing alarm!! The recommended alarm is missing.

All in all, the list above shows that this domain does not have a configuration that adheres to best practices, nor does it have all the recommended alarms.

Setting up alarms

Now that you know that the domains in their current state are missing critical alarms, you can correct the situation.

To demonstrate the script, set up a new domain named “ver”, in us-west-2. Specify 1 node, and a 10-GB EBS disk. Also, create an SNS topic in us-west-2 with a name of “sendnotification”, which sends you an email.

Run the second script, es-create-cwalarms.py, from the command line. This script creates (or updates) the desired CloudWatch alarms for the specified Amazon ES domain, “ver”.

python es-create-cwalarms.py -r us-west-2 -e test -c ver -n "['arn:aws:sns:us-west-2:xxxxxxxxxx:sendnotification']"
EBS enabled: True type: gp2 size (GB): 10 No Iops 10240  total storage (MB)
Desired free storage set to (in MB): 2048.0
Creating  Test-Elasticsearch-ver-ClusterStatus.yellow-Alarm
Creating  Test-Elasticsearch-ver-ClusterStatus.red-Alarm
Creating  Test-Elasticsearch-ver-CPUUtilization-Alarm
Creating  Test-Elasticsearch-ver-JVMMemoryPressure-Alarm
Creating  Test-Elasticsearch-ver-FreeStorageSpace-Alarm
Creating  Test-Elasticsearch-ver-ClusterIndexWritesBlocked-Alarm
Creating  Test-Elasticsearch-ver-AutomatedSnapshotFailure-Alarm
Successfully finished creating alarms!

As with the first script, this script contains an array of recommended CloudWatch alarms, based on best practices for these metrics and statistics. This approach allows you to add or modify alarms based on your use case (more on that below).

After running the script, navigate to Alarms on the CloudWatch console. You can see the set of alarms set up on your domain.

Because the “ver” domain has only a single node, cluster status is yellow, and that alarm is in an “ALARM” state. It’s already sent a notification that the alarm has been triggered.

What to do when an alarm triggers

After alarms are set up, you need to identify the correct action to take for each alarm, which depends on the alarm triggered. For ideas, guidance, and additional pointers to supporting documentation, see Get Started with Amazon Elasticsearch Service: Set CloudWatch Alarms on Key Metrics. For information about common errors and recovery actions to take, see Handling AWS Service Errors.

In most cases, the alarm triggers due to an increased workload. The likely action is to reconfigure the system to handle the increased workload, rather than reducing the incoming workload. Reconfiguring any backend store—a category of systems that includes Elasticsearch—is best performed when the system is quiescent or lightly loaded. Reconfigurations such as setting zone awareness or modifying the disk type cause Amazon ES to enter a “processing” state, potentially disrupting client access.

Other changes, such as increasing the number of data nodes, may cause Elasticsearch to begin moving shards, potentially impacting search performance on these shards while this is happening. These actions should be considered in the context of your production usage. For the same reason I also do not recommend running a script that resets all domains to match best practices.

Avoid the need to reconfigure during heavy workload by setting alarms at a level that allows a considered approach to making the needed changes. For example, if you identify that each weekly peak is increasing, you can reconfigure during a weekly quiet period.

While Elasticsearch can be reconfigured without being quiesced, it is not a best practice to automatically scale it up and down based on usage patterns. Unlike some other AWS services, I recommend against setting a CloudWatch action that automatically reconfigures the system when alarms are triggered.

There are other situations where the planned reconfiguration approach may not work, such as low or zero free disk space causing the domain to reject writes. If the business is dependent on the domain continuing to accept incoming writes and deleting data is not an option, the team may choose to reconfigure immediately.

Extensions and adaptations

You may wish to modify the best practices encoded in the scripts for your own environment or workloads. It’s always better to avoid situations where alerts are generated but routinely ignored. All alerts should trigger a review and one or more actions, either immediately or at a planned date. The following is a list of common situations where you may wish to set different alarms for different domains:

  • Dev/test vs. production
    You may have a different set of configuration rules and alarms for your dev environment configurations than for test. For example, you may require zone awareness and dedicated masters for your production environment, but not for your development domains. Or, you may not have any alarms set in dev. For test environments that mirror your potential peak load, test to ensure that the alarms are appropriately triggered.
  • Differing workloads or SLAs for different domains
    You may have one domain with a requirement for superfast search performance, and another domain with a heavy ingest load that tolerates slower search response. Your reaction to slow response for these two workloads is likely to be different, so perhaps the thresholds for these two domains should be set at a different level. In this case, you might add a “max CPU utilization” alarm at 100% for 1 minute for the fast search domain, while the other domain only triggers an alarm when the average has been higher than 60% for 5 minutes. You might also add a “free space” rule with a higher threshold to reflect the need for more space for the heavy ingest load if there is danger that it could fill the available disk quickly.
  • “Normal” alarms versus “emergency” alarms
    If, for example, free disk space drops to 25% of total capacity, an alarm is triggered that indicates action should be taken as soon as possible, such as cleaning up old indexes or reconfiguring at the next quiet period for this domain. However, if free space drops below a critical level (20% free space), action must be taken immediately in order to prevent Amazon ES from setting the domain to read-only. Similarly, if the “ClusterIndexWritesBlocked” alarm triggers, the domain has already stopped accepting writes, so immediate action is needed. In this case, you may wish to set “laddered” alarms, where one threshold causes an alarm to be triggered to review the current workload for a planned reconfiguration, but a different threshold raises a “DefCon 3” alarm that immediate action is required.

The sample scripts provided here are a starting point, intended for you to adapt to your own environment and needs.

Running the scripts one time can identify how far your current state is from your desired state, and create an initial set of alarms. Regularly re-running these scripts can capture changes in your environment over time and adjusting your alarms for changes in your environment and configurations. One customer has set them up to run nightly, and to automatically create and update alarms to match their preferred settings.

Removing unwanted alarms

Each CloudWatch alarm costs approximately $0.10 per month. You can remove unwanted alarms in the CloudWatch console, under Alarms. If you set up a “ver” domain above, remember to remove it to avoid continuing charges.

Conclusion

Setting CloudWatch alarms appropriately for your Amazon ES domains can help you avoid suboptimal performance and allow you to respond to workload growth or configuration issues well before they become urgent. This post gives you a starting point for doing so. The additional sleep you’ll get knowing you don’t need to be concerned about Elasticsearch domain performance will allow you to focus on building creative solutions for your business and solving problems for your customers.

Enjoy!


Additional Reading

If you found this post useful, be sure to check out Analyzing Amazon Elasticsearch Service Slow Logs Using Amazon CloudWatch Logs Streaming and Kibana and Get Started with Amazon Elasticsearch Service: How Many Shards Do I Need?

 


About the Author

Dr. Veronika Megler is a senior consultant at Amazon Web Services. She works with our customers to implement innovative big data, AI and ML projects, helping them accelerate their time-to-value when using AWS.

 

 

 

Optimize Delivery of Trending, Personalized News Using Amazon Kinesis and Related Services

Post Syndicated from Yukinori Koide original https://aws.amazon.com/blogs/big-data/optimize-delivery-of-trending-personalized-news-using-amazon-kinesis-and-related-services/

This is a guest post by Yukinori Koide, an the head of development for the Newspass department at Gunosy.

Gunosy is a news curation application that covers a wide range of topics, such as entertainment, sports, politics, and gourmet news. The application has been installed more than 20 million times.

Gunosy aims to provide people with the content they want without the stress of dealing with a large influx of information. We analyze user attributes, such as gender and age, and past activity logs like click-through rate (CTR). We combine this information with article attributes to provide trending, personalized news articles to users.

In this post, I show you how to process user activity logs in real time using Amazon Kinesis Data Firehose, Amazon Kinesis Data Analytics, and related AWS services.

Why does Gunosy need real-time processing?

Users need fresh and personalized news. There are two constraints to consider when delivering appropriate articles:

  • Time: Articles have freshness—that is, they lose value over time. New articles need to reach users as soon as possible.
  • Frequency (volume): Only a limited number of articles can be shown. It’s unreasonable to display all articles in the application, and users can’t read all of them anyway.

To deliver fresh articles with a high probability that the user is interested in them, it’s necessary to include not only past user activity logs and some feature values of articles, but also the most recent (real-time) user activity logs.

We optimize the delivery of articles with these two steps.

  1. Personalization: Deliver articles based on each user’s attributes, past activity logs, and feature values of each article—to account for each user’s interests.
  2. Trends analysis/identification: Optimize delivering articles using recent (real-time) user activity logs—to incorporate the latest trends from all users.

Optimizing the delivery of articles is always a cold start. Initially, we deliver articles based on past logs. We then use real-time data to optimize as quickly as possible. In addition, news has a short freshness time. Specifically, day-old news is past news, and even the news that is three hours old is past news. Therefore, shortening the time between step 1 and step 2 is important.

To tackle this issue, we chose AWS for processing streaming data because of its fully managed services, cost-effectiveness, and so on.

Solution

The following diagrams depict the architecture for optimizing article delivery by processing real-time user activity logs

There are three processing flows:

  1. Process real-time user activity logs.
  2. Store and process all user-based and article-based logs.
  3. Execute ad hoc or heavy queries.

In this post, I focus on the first processing flow and explain how it works.

Process real-time user activity logs

The following are the steps for processing user activity logs in real time using Kinesis Data Streams and Kinesis Data Analytics.

  1. The Fluentd server sends the following user activity logs to Kinesis Data Streams:
{"article_id": 12345, "user_id": 12345, "action": "click"}
{"article_id": 12345, "user_id": 12345, "action": "impression"}
...
  1. Map rows of logs to columns in Kinesis Data Analytics.

  1. Set the reference data to Kinesis Data Analytics from Amazon S3.

a. Gunosy has user attributes such as gender, age, and segment. Prepare the following CSV file (user_id, gender, segment_id) and put it in Amazon S3:

101,female,1
102,male,2
103,female,3
...

b. Add the application reference data source to Kinesis Data Analytics using the AWS CLI:

$ aws kinesisanalytics add-application-reference-data-source \
  --application-name <my-application-name> \
  --current-application-version-id <version-id> \
  --reference-data-source '{
  "TableName": "REFERENCE_DATA_SOURCE",
  "S3ReferenceDataSource": {
    "BucketARN": "arn:aws:s3:::<my-bucket-name>",
    "FileKey": "mydata.csv",
    "ReferenceRoleARN": "arn:aws:iam::<account-id>:role/..."
  },
  "ReferenceSchema": {
    "RecordFormat": {
      "RecordFormatType": "CSV",
      "MappingParameters": {
        "CSVMappingParameters": {"RecordRowDelimiter": "\n", "RecordColumnDelimiter": ","}
      }
    },
    "RecordEncoding": "UTF-8",
    "RecordColumns": [
      {"Name": "USER_ID", "Mapping": "0", "SqlType": "INTEGER"},
      {"Name": "GENDER",  "Mapping": "1", "SqlType": "VARCHAR(32)"},
      {"Name": "SEGMENT_ID", "Mapping": "2", "SqlType": "INTEGER"}
    ]
  }
}'

This application reference data source can be referred on Kinesis Data Analytics.

  1. Run a query against the source data stream on Kinesis Data Analytics with the application reference data source.

a. Define the temporary stream named TMP_SQL_STREAM.

CREATE OR REPLACE STREAM "TMP_SQL_STREAM" (
  GENDER VARCHAR(32), SEGMENT_ID INTEGER, ARTICLE_ID INTEGER
);

b. Insert the joined source stream and application reference data source into the temporary stream.

CREATE OR REPLACE PUMP "TMP_PUMP" AS
INSERT INTO "TMP_SQL_STREAM"
SELECT STREAM
  R.GENDER, R.SEGMENT_ID, S.ARTICLE_ID, S.ACTION
FROM      "SOURCE_SQL_STREAM_001" S
LEFT JOIN "REFERENCE_DATA_SOURCE" R
  ON S.USER_ID = R.USER_ID;

c. Define the destination stream named DESTINATION_SQL_STREAM.

CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
  TIME TIMESTAMP, GENDER VARCHAR(32), SEGMENT_ID INTEGER, ARTICLE_ID INTEGER, 
  IMPRESSION INTEGER, CLICK INTEGER
);

d. Insert the processed temporary stream, using a tumbling window, into the destination stream per minute.

CREATE OR REPLACE PUMP "STREAM_PUMP" AS
INSERT INTO "DESTINATION_SQL_STREAM"
SELECT STREAM
  ROW_TIME AS TIME,
  GENDER, SEGMENT_ID, ARTICLE_ID,
  SUM(CASE ACTION WHEN 'impression' THEN 1 ELSE 0 END) AS IMPRESSION,
  SUM(CASE ACTION WHEN 'click' THEN 1 ELSE 0 END) AS CLICK
FROM "TMP_SQL_STREAM"
GROUP BY
  GENDER, SEGMENT_ID, ARTICLE_ID,
  FLOOR("TMP_SQL_STREAM".ROWTIME TO MINUTE);

The results look like the following:

  1. Insert the results into Amazon Elasticsearch Service (Amazon ES).
  2. Batch servers get results from Amazon ES every minute. They then optimize delivering articles with other data sources using a proprietary optimization algorithm.

How to connect a stream to another stream in another AWS Region

When we built the solution, Kinesis Data Analytics was not available in the Asia Pacific (Tokyo) Region, so we used the US West (Oregon) Region. The following shows how we connected a data stream to another data stream in the other Region.

There is no need to continue containing all components in a single AWS Region, unless you have a situation where a response difference at the millisecond level is critical to the service.

Benefits

The solution provides benefits for both our company and for our users. Benefits for the company are cost savings—including development costs, operational costs, and infrastructure costs—and reducing delivery time. Users can now find articles of interest more quickly. The solution can process more than 500,000 records per minute, and it enables fast and personalized news curating for our users.

Conclusion

In this post, I showed you how we optimize trending user activities to personalize news using Amazon Kinesis Data Firehose, Amazon Kinesis Data Analytics, and related AWS services in Gunosy.

AWS gives us a quick and economical solution and a good experience.

If you have questions or suggestions, please comment below.


Additional Reading

If you found this post useful, be sure to check out Implement Serverless Log Analytics Using Amazon Kinesis Analytics and Joining and Enriching Streaming Data on Amazon Kinesis.


About the Authors

Yukinori Koide is the head of development for the Newspass department at Gunosy. He is working on standardization of provisioning and deployment flow, promoting the utilization of serverless and containers for machine learning and AI services. His favorite AWS services are DynamoDB, Lambda, Kinesis, and ECS.

 

 

 

Akihiro Tsukada is a start-up solutions architect with AWS. He supports start-up companies in Japan technically at many levels, ranging from seed to later-stage.

 

 

 

 

Yuta Ishii is a solutions architect with AWS. He works with our customers to provide architectural guidance for building media & entertainment services, helping them improve the value of their services when using AWS.

 

 

 

 

 

Amazon Elasticsearch Service now supports VPC

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/amazon-elasticsearch-service-now-supports-vpc/

Starting today, you can connect to your Amazon Elasticsearch Service domains from within an Amazon VPC without the need for NAT instances or Internet gateways. VPC support for Amazon ES is easy to configure, reliable, and offers an extra layer of security. With VPC support, traffic between other services and Amazon ES stays entirely within the AWS network, isolated from the public Internet. You can manage network access using existing VPC security groups, and you can use AWS Identity and Access Management (IAM) policies for additional protection. VPC support for Amazon ES domains is available at no additional charge.

Getting Started

Creating an Amazon Elasticsearch Service domain in your VPC is easy. Follow all the steps you would normally follow to create your cluster and then select “VPC access”.

That’s it. There are no additional steps. You can now access your domain from within your VPC!

Things To Know

To support VPCs, Amazon ES places an endpoint into at least one subnet of your VPC. Amazon ES places an Elastic Network Interface (ENI) into the VPC for each data node in the cluster. Each ENI uses a private IP address from the IPv4 range of your subnet and receives a public DNS hostname. If you enable zone awareness, Amazon ES creates endpoints in two subnets in different availability zones, which provides greater data durability.

You need to set aside three times the number of IP addresses as the number of nodes in your cluster. You can divide that number by two if Zone Awareness is enabled. Ideally, you would create separate subnets just for Amazon ES.

A few notes:

  • Currently, you cannot move existing domains to a VPC or vice-versa. To take advantage of VPC support, you must create a new domain and migrate your data.
  • Currently, Amazon ES does not support Amazon Kinesis Firehose integration for domains inside a VPC.

To learn more, see the Amazon ES documentation.

Randall

Perform Near Real-time Analytics on Streaming Data with Amazon Kinesis and Amazon Elasticsearch Service

Post Syndicated from Tristan Li original https://aws.amazon.com/blogs/big-data/perform-near-real-time-analytics-on-streaming-data-with-amazon-kinesis-and-amazon-elasticsearch-service/

Nowadays, streaming data is seen and used everywhere—from social networks, to mobile and web applications, IoT devices, instrumentation in data centers, and many other sources. As the speed and volume of this type of data increases, the need to perform data analysis in real time with machine learning algorithms and extract a deeper understanding from the data becomes ever more important. For example, you might want a continuous monitoring system to detect sentiment changes in a social media feed so that you can react to the sentiment in near real time.

In this post, we use Amazon Kinesis Streams to collect and store streaming data. We then use Amazon Kinesis Analytics to process and analyze the streaming data continuously. Specifically, we use the Kinesis Analytics built-in RANDOM_CUT_FOREST function, a machine learning algorithm, to detect anomalies in the streaming data. Finally, we use Amazon Kinesis Firehose to export the anomalies data to Amazon Elasticsearch Service (Amazon ES). We then build a simple dashboard in the open source tool Kibana to visualize the result.

Solution overview

The following diagram depicts a high-level overview of this solution.

Amazon Kinesis Streams

You can use Amazon Kinesis Streams to build your own streaming application. This application can process and analyze streaming data by continuously capturing and storing terabytes of data per hour from hundreds of thousands of sources.

Amazon Kinesis Analytics

Kinesis Analytics provides an easy and familiar standard SQL language to analyze streaming data in real time. One of its most powerful features is that there are no new languages, processing frameworks, or complex machine learning algorithms that you need to learn.

Amazon Kinesis Firehose

Kinesis Firehose is the easiest way to load streaming data into AWS. It can capture, transform, and load streaming data into Amazon S3, Amazon Redshift, and Amazon Elasticsearch Service.

Amazon Elasticsearch Service

Amazon ES is a fully managed service that makes it easy to deploy, operate, and scale Elasticsearch for log analytics, full text search, application monitoring, and more.

Solution summary

The following is a quick walkthrough of the solution that’s presented in the diagram:

  1. IoT sensors send streaming data into Kinesis Streams. In this post, you use a Python script to simulate an IoT temperature sensor device that sends the streaming data.
  2. By using the built-in RANDOM_CUT_FOREST function in Kinesis Analytics, you can detect anomalies in real time with the sensor data that is stored in Kinesis Streams. RANDOM_CUT_FOREST is also an appropriate algorithm for many other kinds of anomaly-detection use cases—for example, the media sentiment example mentioned earlier in this post.
  3. The processed anomaly data is then loaded into the Kinesis Firehose delivery stream.
  4. By using the built-in integration that Kinesis Firehose has with Amazon ES, you can easily export the processed anomaly data into the service and visualize it with Kibana.

Implementation steps

The following sections walk through the implementation steps in detail.

Creating the delivery stream

  1. Open the Amazon Kinesis Streams console.
  2. Create a new Kinesis stream. Give it a name that indicates it’s for raw incoming stream data—for example, RawStreamData. For Number of shards, type 1.
  3. The Python code provided below simulates a streaming application, such as an IoT device, and generates random data and anomalies into a Kinesis stream. The code generates two temperature ranges, where the first range is the hypothetical sensor’s normal operating temperature range (10–20), and the second is the anomaly temperature range (100–120).Make sure to change the stream name on line 16 and 20 and the Region on line 6 to match your configuration. Alternatively, you can download the Amazon Kinesis Data Generator from this repository and use it to generate the data.
    import json
    import datetime
    import random
    import testdata
    from boto import kinesis
    
    kinesis = kinesis.connect_to_region("us-east-1")
    
    def getData(iotName, lowVal, highVal):
       data = {}
       data["iotName"] = iotName
       data["iotValue"] = random.randint(lowVal, highVal) 
       return data
    
    while 1:
       rnd = random.random()
       if (rnd < 0.01):
          data = json.dumps(getData("DemoSensor", 100, 120))  
          kinesis.put_record("RawStreamData", data, "DemoSensor")
          print '***************************** anomaly ************************* ' + data
       else:
          data = json.dumps(getData("DemoSensor", 10, 20))  
          kinesis.put_record("RawStreamData", data, "DemoSensor")
          print data

  4. Open the Amazon Elasticsearch Service console and create a new domain.
    1. Give the domain a unique name. In the Configure cluster screen, use the default settings.
    2. In the Set up access policy screen, in the Set the domain access policy list, choose Allow access to the domain from specific IP(s).
    3. Enter the public IP address of your computer.
      Note: If you’re working behind a proxy or firewall, see the “Use a proxy to simplify request signing” section in this AWS Database blog post to learn how to work with a proxy. For additional information about securing access to your Amazon ES domain, see How to Control Access to Your Amazon Elasticsearch Domain in the AWS Security Blog.
  5. After the Amazon ES domain is up and running, you can set up and configure Kinesis Firehose to export results to Amazon ES:
    1. Open the Amazon Kinesis Firehose console and choose Create Delivery Stream.
    2. In the Destination dropdown list, choose Amazon Elasticsearch Service.
    3. Type a stream name, and choose the Amazon ES domain that you created in Step 4.
    4. Provide an index name and ES type. In the S3 bucket dropdown list, choose Create New S3 bucket. Choose Next.
    5. In the configuration, change the Elasticsearch Buffer size to 1 MB and the Buffer interval to 60s. Use the default settings for all other fields. This shortens the time for the data to reach the ES cluster.
    6. Under IAM Role, choose Create/Update existing IAM role.
      The best practice is to create a new role every time. Otherwise, the console keeps adding policy documents to the same role. Eventually the size of the attached policies causes IAM to reject the role, but it does it in a non-obvious way, where the console basically quits functioning.
    7. Choose Next to move to the Review page.
  6. Review the configuration, and then choose Create Delivery Stream.
  7. Run the Python file for 1–2 minutes, and then press Ctrl+C to stop the execution. This loads some data into the stream for you to visualize in the next step.

Analyzing the data

Now it’s time to analyze the IoT streaming data using Amazon Kinesis Analytics.

  1. Open the Amazon Kinesis Analytics console and create a new application. Give the application a name, and then choose Create Application.
  2. On the next screen, choose Connect to a source. Choose the raw incoming data stream that you created earlier. (Note the stream name Source_SQL_STREAM_001 because you will need it later.)
  3. Use the default settings for everything else. When the schema discovery process is complete, it displays a success message with the formatted stream sample in a table as shown in the following screenshot. Review the data, and then choose Save and continue.
  4. Next, choose Go to SQL editor. When prompted, choose Yes, start application.
  5. Copy the following SQL code and paste it into the SQL editor window.
    CREATE OR REPLACE STREAM "TEMP_STREAM" (
       "iotName"        varchar (40),
       "iotValue"   integer,
       "ANOMALY_SCORE"  DOUBLE);
    -- Creates an output stream and defines a schema
    CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
       "iotName"       varchar(40),
       "iotValue"       integer,
       "ANOMALY_SCORE"  DOUBLE,
       "created" TimeStamp);
     
    -- Compute an anomaly score for each record in the source stream
    -- using Random Cut Forest
    CREATE OR REPLACE PUMP "STREAM_PUMP_1" AS INSERT INTO "TEMP_STREAM"
    SELECT STREAM "iotName", "iotValue", ANOMALY_SCORE FROM
      TABLE(RANDOM_CUT_FOREST(
        CURSOR(SELECT STREAM * FROM "SOURCE_SQL_STREAM_001")
      )
    );
    
    -- Sort records by descending anomaly score, insert into output stream
    CREATE OR REPLACE PUMP "OUTPUT_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM"
    SELECT STREAM "iotName", "iotValue", ANOMALY_SCORE, ROWTIME FROM "TEMP_STREAM"
    ORDER BY FLOOR("TEMP_STREAM".ROWTIME TO SECOND), ANOMALY_SCORE DESC;

 

  1. Choose Save and run SQL.
    As the application is running, it displays the results as stream data arrives. If you don’t see any data coming in, run the Python script again to generate some fresh data. When there is data, it appears in a grid as shown in the following screenshot.Note that you are selecting data from the source stream name Source_SQL_STREAM_001 that you created previously. Also note the ANOMALY_SCORE column. This is the value that the Random_Cut_Forest function calculates based on the temperature ranges provided by the Python script. Higher (anomaly) temperature ranges have a higher score.Looking at the SQL code, note that the first two blocks of code create two new streams to store temporary data and the final result. The third block of code analyzes the raw source data (Stream_Pump_1) using the Random_Cut_Forest function. It calculates an anomaly score (ANOMALY_SCORE) and inserts it into the TEMP_STREAM stream. The final code block loads the result stored in the TEMP_STREAM into DESTINATION_SQL_STREAM.
  2. Choose Exit (done editing) next to the Save and run SQL button to return to the application configuration page.

Load processed data into the Kinesis Firehose delivery stream

Now, you can export the result from DESTINATION_SQL_STREAM into the Amazon Kinesis Firehose stream that you created previously.

  1. On the application configuration page, choose Connect to a destination.
  2. Choose the stream name that you created earlier, and use the default settings for everything else. Then choose Save and Continue.
  3. On the application configuration page, choose Exit to Kinesis Analytics applications to return to the Amazon Kinesis Analytics console.
  4. Run the Python script again for 4–5 minutes to generate enough data to flow through Amazon Kinesis Streams, Kinesis Analytics, Kinesis Firehose, and finally into the Amazon ES domain.
  5. Open the Kinesis Firehose console, choose the stream, and then choose the Monitoring
  6. As the processed data flows into Kinesis Firehose and Amazon ES, the metrics appear on the Delivery Stream metrics page. Keep in mind that the metrics page takes a few minutes to refresh with the latest data.
  7. Open the Amazon Elasticsearch Service dashboard in the AWS Management Console. The count in the Searchable documents column increases as shown in the following screenshot. In addition, the domain shows a cluster health of Yellow. This is because, by default, it needs two instances to deploy redundant copies of the index. To fix this, you can deploy two instances instead of one.

Visualize the data using Kibana

Now it’s time to launch Kibana and visualize the data.

  1. Use the ES domain link to go to the cluster detail page, and then choose the Kibana link as shown in the following screenshot.

    If you’re working behind a proxy or firewall, see the “Use a proxy to simplify request signing” section in this blog post to learn how to work with a proxy.
  2. In the Kibana dashboard, choose the Discover tab to perform a query.
  3. You can also visualize the data using the different types of charts offered by Kibana. For example, by going to the Visualize tab, you can quickly create a split bar chart that aggregates by ANOMALY_SCORE per minute.


Conclusion

In this post, you learned how to use Amazon Kinesis to collect, process, and analyze real-time streaming data, and then export the results to Amazon ES for analysis and visualization with Kibana. If you have comments about this post, add them to the “Comments” section below. If you have questions or issues with implementing this solution, please open a new thread on the Amazon Kinesis or Amazon ES discussion forums.


Next Steps

Take your skills to the next level. Learn real-time clickstream anomaly detection with Amazon Kinesis Analytics.

 


About the Author

Tristan Li is a Solutions Architect with Amazon Web Services. He works with enterprise customers in the US, helping them adopt cloud technology to build scalable and secure solutions on AWS.

 

 

 

 

Visualize and Monitor Amazon EC2 Events with Amazon CloudWatch Events and Amazon Kinesis Firehose

Post Syndicated from Karan Desai original https://aws.amazon.com/blogs/big-data/visualize-and-monitor-amazon-ec2-events-with-amazon-cloudwatch-events-and-amazon-kinesis-firehose/

Monitoring your AWS environment is important for security, performance, and cost control purposes. For example, by monitoring and analyzing API calls made to your Amazon EC2 instances, you can trace security incidents and gain insights into administrative behaviors and access patterns. The kinds of events you might monitor include console logins, Amazon EBS snapshot creation/deletion/modification, VPC creation/deletion/modification, and instance reboots, etc.

In this post, I show you how to build a near real-time API monitoring solution for EC2 events using Amazon CloudWatch Events and Amazon Kinesis Firehose. Please be sure to have Amazon CloudTrail enabled in your account.

  • CloudWatch Events offers a near real-time stream of system events that describe changes in AWS resources. CloudWatch Events now supports Kinesis Firehose as a target.
  • Kinesis Firehose is a fully managed service for continuously capturing, transforming, and delivering data in minutes to storage and analytics destinations such as Amazon S3, Amazon Kinesis Analytics, Amazon Redshift, and Amazon Elasticsearch Service.

Walkthrough

For this walkthrough, you create a CloudWatch event rule that matches specific EC2 events such as:

  • Starting, stopping, and terminating an instance
  • Creating and deleting VPC route tables
  • Creating and deleting a security group
  • Creating, deleting, and modifying instance volumes and snapshots

Your CloudWatch event target is a Kinesis Firehose delivery stream that delivers this data to an Elasticsearch cluster, where you set up Kibana for visualization. Using this solution, you can easily load and visualize EC2 events in minutes without setting up complicated data pipelines.

Set up the Elasticsearch cluster

Create the Amazon ES domain in the Amazon ES console, or by using the create-elasticsearch-domain command in the AWS CLI.

This example uses the following configuration:

  • Domain Name: esLogSearch
  • Elasticsearch Version: 1
  • Instance Count: 2
  • Instance type:elasticsearch
  • Enable dedicated master: true
  • Enable zone awareness: true
  • Restrict Amazon ES to an IP-based access policy

Other settings are left as the defaults.

Create a Kinesis Firehose delivery stream

In the Kinesis Firehose console, create a new delivery stream with Amazon ES as the destination. For detailed steps, see Create a Kinesis Firehose Delivery Stream to Amazon Elasticsearch Service.

Set up CloudWatch Events

Create a rule, and configure the event source and target. You can choose to configure multiple event sources with several AWS resources, along with options to specify specific or multiple event types.

In the CloudWatch console, choose Events.

For Service Name, choose EC2.

In Event Pattern Preview, choose Edit and copy the pattern below. For this walkthrough, I selected events that are specific to the EC2 API, but you can modify it to include events for any of your AWS resources.

 

{
	"source": [
		"aws.ec2"
	],
	"detail-type": [
		"AWS API Call via CloudTrail"
	],
	"detail": {
		"eventSource": [
			"ec2.amazonaws.com"
		],
		"eventName": [
			"RunInstances",
			"StopInstances",
			"StartInstances",
			"CreateFlowLogs",
			"CreateImage",
			"CreateNatGateway",
			"CreateVpc",
			"DeleteKeyPair",
			"DeleteNatGateway",
			"DeleteRoute",
			"DeleteRouteTable",
"CreateSnapshot",
"DeleteSnapshot",
			"DeleteVpc",
			"DeleteVpcEndpoints",
			"DeleteSecurityGroup",
			"ModifyVolume",
			"ModifyVpcEndpoint",
			"TerminateInstances"
		]
	}
}

The following screenshot shows what your event looks like in the console.

Next, choose Add target and select the delivery stream that you just created.

Set up Kibana on the Elasticsearch cluster

Amazon ES provides a default installation of Kibana with every Amazon ES domain. You can find the Kibana endpoint on your domain dashboard in the Amazon ES console. You can restrict Amazon ES access to an IP-based access policy.

In the Kibana console, for Index name or pattern, type log. This is the name of the Elasticsearch index.

For Time-field name, choose @time.

To view the events, choose Discover.

The following chart demonstrates the API operations and the number of times that they have been triggered in the past 12 hours.

Summary

In this post, you created a continuous, near real-time solution to monitor various EC2 events such as starting and shutting down instances, creating VPCs, etc. Likewise, you can build a continuous monitoring solution for all the API operations that are relevant to your daily AWS operations and resources.

With Kinesis Firehose as a new target for CloudWatch Events, you can retrieve, transform, and load system events to the storage and analytics destination of your choice in minutes, without setting up complicated data pipelines.

If you have any questions or suggestions, please comment below.


Additional Reading

Learn how to build a serverless architecture to analyze Amazon CloudFront access logs using AWS Lambda, Amazon Athena, and Amazon Kinesis Analytics

 

 

 

AWS Big Data Blog Month in Review: April 2017

Post Syndicated from Derek Young original https://aws.amazon.com/blogs/big-data/aws-big-data-blog-month-in-review-april-2017/

Another month of big data solutions on the Big Data Blog. Please take a look at our summaries below and learn, comment, and share. Thank you for reading!

NEW POSTS

Amazon QuickSight Spring Announcement: KPI Charts, Export to CSV, AD Connector, and More! 
In this blog post, we share a number of new features and enhancements in Amazon Quicksight. You can now create key performance indicator (KPI) charts, define custom ranges when importing Microsoft Excel spreadsheets, export data to comma separated value (CSV) format, and create aggregate filters for SPICE data sets. In the Enterprise Edition, we added an additional option to connect to your on-premises Active Directory using AD Connector. 

Securely Analyze Data from Another AWS Account with EMRFS
Sometimes, data to be analyzed is spread across buckets owned by different accounts. In order to ensure data security, appropriate credentials management needs to be in place. This is especially true for large enterprises storing data in different Amazon S3 buckets for different departments. This post shows how you can use a custom credentials provider to access S3 objects that cannot be accessed by the default credentials provider of EMRFS.

Querying OpenStreetMap with Amazon Athena
This post explains how anyone can use Amazon Athena to quickly query publicly available OSM data stored in Amazon S3 (updated weekly) as an AWS Public Dataset. Imagine that you work for an NGO interested in improving knowledge of and access to health centers in Africa. You might want to know what’s already been mapped, to facilitate the production of maps of surrounding villages, and to determine where infrastructure investments are likely to be most effective.

Build a Real-time Stream Processing Pipeline with Apache Flink on AWS
This post outlines a reference architecture for a consistent, scalable, and reliable stream processing pipeline that is based on Apache Flink using Amazon EMR, Amazon Kinesis, and Amazon Elasticsearch Service. An AWSLabs GitHub repository provides the artifacts that are required to explore the reference architecture in action. Resources include a producer application that ingests sample data into an Amazon Kinesis stream and a Flink program that analyses the data in real time and sends the result to Amazon ES for visualization.

Manage Query Workloads with Query Monitoring Rules in Amazon Redshift
Amazon Redshift is a powerful, fully managed data warehouse that can offer significantly increased performance and lower cost in the cloud. However, queries which hog cluster resources (rogue queries) can affect your experience. In this post, you learn how query monitoring rules can help spot and act against such queries. This, in turn, can help you to perform smooth business operations in supporting mixed workloads to maximize cluster performance and throughput.

Amazon QuickSight Now Supports Audit Logging with AWS CloudTrail
In this post, we announce support for AWS CloudTrail in Amazon QuickSight, which allows logging of QuickSight events across an AWS account. Whether you have an enterprise setting or a small team scenario, this integration will allow QuickSight administrators to accurately answer questions such as who last changed an analysis, or who has connected to sensitive data. With CloudTrail, administrators have better governance, auditing and risk management of their QuickSight usage.

Near Zero Downtime Migration from MySQL to DynamoDB
This post introduces two methods of seamlessly migrating data from MySQL to DynamoDB, minimizing downtime and converting the MySQL key design into one more suitable for NoSQL.


Want to learn more about Big Data or Streaming Data? Check out our Big Data and Streaming data educational pages.

Leave a comment below to let us know what big data topics you’d like to see next on the AWS Big Data Blog.

How to Visualize and Refine Your Network’s Security by Adding Security Group IDs to Your VPC Flow Logs

Post Syndicated from Guy Denney original https://aws.amazon.com/blogs/security/how-to-visualize-and-refine-your-networks-security-by-adding-security-group-ids-to-your-vpc-flow-logs/

Many organizations begin their cloud journey to AWS by moving a few applications to demonstrate the power and flexibility of AWS. This initial application architecture includes building security groups that control the network ports, protocols, and IP addresses that govern access and traffic to their AWS Virtual Private Cloud (VPC). When the architecture process is complete and an application is fully functional, some organizations forget to revisit their security groups to optimize rules and help ensure the appropriate level of governance and compliance. Not optimizing security groups can create less-than-optimal security, with ports open that may not be needed or source IP ranges set that are broader than required.

Last year, I published an AWS Security Blog post that showed how to optimize and visualize your security groups. Today’s post continues in the vein of that post by using Amazon Kinesis Firehose and AWS Lambda to enrich the VPC Flow Logs dataset and enhance your ability to optimize security groups. The capabilities in this post’s solution are based on the Lambda functions available in this VPC Flow Log Appender GitHub repository.

Solution overview

Removing unused rules or limiting source IP addresses requires either an in-depth knowledge of an application’s active ports on Amazon EC2 instances or analysis of active network traffic. In this blog post, I discuss a method to:

  • Use VPC Flow Logs to capture information about the IP traffic in an Amazon VPC.
  • Enrich the VPC Flow Logs dataset with security group IDs by using Firehose and Lambda.
  • Demonstrate how to visualize and analyze network traffic from VPC Flow Logs by using Amazon Elasticsearch Service (Amazon ES).

Using this approach can help you remediate security group rules to necessary source IPs, ports, and nested security groups, helping to improve the security of your AWS resources while minimizing the potential risk to production environments.

Solution diagram

As illustrated in the preceding diagram, this is how the data flows in this model:

  1. The VPC posts its flow log data to Amazon CloudWatch Logs.
  2. The Lambda ingestor function passes the data to Firehose.
  3. Firehose then passes the data to the Lambda decorator function.
  4. The Lambda decorator function performs a number of lookups for each record and returns the data to Firehose with additional fields.
  5. Firehose then posts the enhanced dataset to the Amazon ES endpoint and any errors to Amazon S3.

The solution

Step 1: Set up your Amazon ES cluster and VPC Flow Logs

Create an Amazon ES cluster

The first step in this solution is to create an Amazon ES cluster. Do this first because it takes some time for the cluster to become available. If you are new to Amazon ES, you can learn more about it in the Amazon ES documentation.

To create an Amazon ES cluster:

  1. In the AWS Management Console, choose Elasticsearch Service under Analytics.
  2. Choose Create a new domain or Get started.
  3. Type es-flowlogs for the Elasticsearch domain name.
  4. Set Version to 1 in the drop-down list. Choose Next.
  5. Set Instance count to 2 and select the Enable zone awareness check box. (This ensures cluster stability in the event of an Availability Zone outage.) Accept the defaults for the rest of the page.
    • [Optional] If you use this domain for production purposes, I recommend using dedicated master nodes. Select the Enable dedicated master check box and select medium.elasticsearch from the Instance type drop-down list. Leave the Instance count at 3, which is the default.
  6. Choose Next.
  7. From the Set the domain access policy to drop-down list on the next page, select Allow access to the domain from specific IP(s). In the dialog box, type or paste the comma-separated list of valid IPv4 addresses or Classless Inter-Domain Routing (CIDR) blocks you would like to be able to access the Amazon ES domain.
  8. Choose Next.
  9. On the next page, choose Confirm and create.

It will take a few minutes for the cluster to be available. In the meantime, you can begin enabling VPC Flow Logs.

Enable VPC Flow Logs

VPC Flow Logs is a feature that lets you capture information about the IP traffic going to and from network interfaces in your VPC. Flow log data is stored using Amazon CloudWatch Logs. For more information about VPC Flow Logs, see VPC Flow Logs and CloudWatch Logs.

To enable VPC Flow Logs:

  1. In the AWS Management Console, choose CloudWatch under Management Tools.
  2. Click Logs in the navigation pane.
  3. From the Actions drop-down list, choose Create log group.
  4. Type Flowlogs as the Log Group Name.
  5. In the AWS Management Console, choose VPC under Networking & Content Delivery.
  6. Choose Your VPCs in the navigation pane, and select the VPC you would like to analyze. (You can also enable VPC Flow Logs on only a subnet if you do not want to enable it on the entire VPC.)
  7. Choose the Flow Logs tab in the bottom pane, and then choose Create Flow Log.
  8. In the text beneath the Role box, choose Set Up Permissions (this will open an IAM management page).
  9. Choose Allow on the IAM management page. Return to the VPC Flow Logs setup page.
  10. Choose All from the Filter drop-down list.
  11. Choose flowlogsRole from the Role drop-down list (you created this role in steps 3 and 4 in this procedure).
  12. Choose Flowlogs from the Destination Log Group drop-down list.
  13. Choose Create Flow Log.

Step 2: Set up AWS Lambda to enrich the VPC Flow Logs dataset with security group IDs

If you completed Step 1, VPC Flow Logs data is now streaming to CloudWatch Logs. Next, you will deploy two Lambda functions. The first, the ingestor function, moves the data into Firehose, and the second, the decorator function, adds three new fields to the VPC Flow Logs dataset and returns records to Firehose for delivery to Amazon ES.

The new fields added by the decorator function are:

  1. Direction – By comparing the primary IP address of the elastic network interface (ENI) in the destination IP address, you can set the direction for the IP connection.
  2. Security group IDs – Each ENI can be associated with as many as five security groups. The security group IDs are added as an array in the record.
  3. Source – This includes a number of fields that result from looking up srcaddr from a free service for geographical lookups.
    1. The Source includes:
      • source-country-code
      • source-country-name
      • source-region-code
      • source-region-name
      • source-city
      • source-location, latitude, and longitude.

Follow the instructions in this GitHub repository to deploy the two Lambda functions and the associated permissions that are required.

Step 3: Set up Firehose

Firehose is a fully managed service that allows you to transform flow log data and stream it into Amazon ES. The service scales automatically with load, and you only pay for the data transmitted through the service.

To create a Firehose delivery stream:

  1. In the AWS Management Console, choose Kinesis under Analytics.
  2. Choose Go to Firehose and then choose Create Delivery Stream.

Step 3.1: Define the destination

  1. Choose Amazon Elasticsearch Service from the Destination drop-down list.
  2. For Delivery stream name, type VPCFlowLogsToElasticSearch (the name must match the default environment variable in the ingestion Lambda function).
  3. Choose es-flowlogs from the Elasticsearch domain drop-down list. (The Amazon ES cluster configuration state needs to be Active for es-flowlogs to be available in the drop-down list.)
  4. For Index, type cwl.
  5. Choose OneDay from the Index rotation drop-down list.
  6. For Type, type log.
  7. For Backup mode, select Failed Documents Only.
  8. For S3 bucket, select New S3 bucket in the drop-down list and type a bucket name of your choice. Choose Create bucket.
  9. Choose Next.

Step 3.2: Configure Lambda

  1. Choose Enable for Data transformation.
  2. Choose vpc-flow-log-appender-dev-FlowLogDecoratorFunction-xxxxx from the Lambda function drop-down list (make sure you select the Decorator function).
  3. Choose Create/Update existing IAM role, Firehose delivery IAM roll from the IAM role drop-down list.
  4. Choose Allow. This takes you back to the Firehose Configuration.
  5. Choose Next and then choose Create Delivery Stream.

Step 4: Stream data to Firehose

The next step is to enable the data to stream from CloudWatch Logs to Firehose. You will use the Lambda ingestion function you deployed earlier: vpc-flow-log-appender-dev-FlowLogIngestionFunction-xxxxxxx.

  1. In the AWS Management Console, choose CloudWatch under Management Tools.
  2. Choose Logs in the navigation pane, and select the check box next to Flowlogs under Log Groups.
  3. From the Actions menu, choose Stream to AWS Lambda. Choose vpc-flow-log-appender-dev-FlowLogIngestionFunction-xxxxxxx (select the Ingestion function). Choose Next.
  4. Choose Amazon VPC Flow Logs from the Log Format drop-down list. Choose Next.
    Screenshot of Log Format drop-down list
  5. Choose Start Streaming.

VPC Flow Logs will now be forwarded to Firehose, capturing information about the IP traffic going to and from network interfaces in your VPC. Firehose appends additional data fields and forwards the enriched data to your Amazon ES cluster.

Data is now flowing to your Amazon ES cluster, but be patient because it can take up to 30 minutes for the data to begin appearing in your Amazon ES cluster.

Step 5: Verify that the flow log data is streaming through Firehose to the Amazon ES cluster

You should see VPC Flow Logs with ENI IDs under Log Streams (see the following screenshot) and Stored Bytes greater than zero in the CloudWatch log group.

Do you have logs from the Lambda ingestion function in the CloudWatch log group? As shown in the following screenshot, you should see START, END and REPORT records. These show that the ingestion function is running and streaming data to Firehose.

Screenshot showing logs from the Lambda ingestion function

Do you have logs from the Lambda decorator function in the CloudWatch log group? You should see START, END, and REPORT records as well as entries similar to: “Processing completed. Successful records XXX, Failed records 0.”

Screenshot showing logs from the Lambda decorator function

Do you have cwl-* indexes in the Amazon ES dashboard, as shown in the following screenshot? If you do, you are successfully streaming through Firehose and populating the Amazon ES cluster, and you are ready to proceed to Step 6. Remember, it can take up to 30 minutes for the flow logs from your workloads to begin flowing to the Amazon ES cluster.

Screenshot showing cwl-* indexes in the Amazon ES dashboard

Step 6: Using the SGDashboard to analyze VPC network traffic

You now need set up a Kibana dashboard to monitor the traffic in your VPC.

To find the Kibana URL:

  1. In the AWS Management Console, click Elasticsearch Service under Analytics.
  2. Choose es-flowlogs under Elasticsearch domain name.
  3. Click the link next to Kibana, as shown in the following screenshot.
    Screenshot showing the Kibana link

The first time you access Kibana, you will be asked to set the defaultindex. To set the defaultindex in the Amazon ES cluster:

  1. Set the Index name or pattern to cwl-*.
    Screenshot of configuring an index pattern
  2. For Time-field name, type @timestamp.
  3. Choose Create.

Load the SGDashboard:

  1. Download this JSON file and save it to your computer. The file includes a dashboard and visualizations I created for this blog post’s purposes.
  2. In Kibana, choose Management in the navigation pane, choose Saved Objects, and then import the file you just downloaded.
  3. Choose Dashboard and Open to load the SGDashboard you just imported. (You might have to press Enter in the top search box to have the dashboard load the first time.)

The following screenshot shows the SGDashboard after it has loaded.

Screenshot showing the dashboard after it has loaded

The SGDashboard is composed of a set of visualizations. Each visualization contains a view or summary of the underlying data contained in the Amazon ES cluster, as shown in the preceding screenshot. You can control the timeframe for the dashboard in the upper right corner. By clicking the timeframe, the dashboard exposes alternative timeframes that you can select.

The SGDashboard includes a list of security groups, destination ports, source IP addresses, actions, protocols, and connection directions as well as raw VPC Flow Log records. This information is useful because you can compare this to your security group configurations. Ports might be open in the security group but have no network traffic flowing to the instances on those ports, which means the corresponding rules can probably be removed. Also, by evaluating IP ranges in use, you can narrow the ranges to only those IP addresses required for the application. The following screenshot on the left shows a view of the SGDashboard for a specific security group. By comparing its accepted inbound IP addresses with the security group rules in the following screenshot on the right, you can ensure the source IP ranges are sufficiently restrictive.

Screenshot showing a view of the SGDashboard for a specific security group   Screenshot showing security group rules

Analyze VPC Flow Logs data

Amazon ES allows you to quickly view and filter VPC Flow Logs data to determine what network traffic is flowing in your VPC. This analysis requires an understanding of security groups and elastic network interfaces (ENIs). Let’s say you have two security groups associated with the same ENI, and the first security group has traffic it will register for both groups. You will still see traffic to the ENI listed in the second security group because it is allowing traffic to the ENI. Therefore, when you click a security group that you want to filter, additional groups might still be on the list because they are included in the VPC Flow Logs records.

The following screenshot on the left is a view of the SGDashboard with a security group selected (sg-978414e8). Even though that security group has a filter, two additional security groups remain in the dashboard. The following screenshot on the right shows the raw log data where each record contains all three security groups and demonstrates that all three security groups share a common set of flow log records.

Screenshot showing the SGDashboard with a security group selected   Screenshot showing raw log data

Also, note that security groups are stateful, so if the instance itself is initiating traffic to a different location, the return traffic will be displayed in the Kibana dashboard. The best example of this is port 123 Network Time Protocol (NTP). This type of traffic can be easily removed from the display by choosing the port on the right side of the dashboard, and then reversing the filter, as shown in the following screenshot. By reversing the filter, you can exclude data from the view.

Screenshot of reversing the filter on a port

Example: Unused security groups

Let’s say that some security groups are no longer in use. First, I change the time range by clicking the current time range in the top right corner of the dashboard, as shown in the following screenshot. I select Week to date.

Screenshot of changing the time range

As the following screenshot shows, the dashboard has identified five security groups that have had traffic during the week to date.

Screenshot showing five security groups that have had traffic during the week to date

As you can see in the following screenshot, I have many security groups in my test account that are not in use. Any security groups not in the SGDashboard are candidates for removal.

Example: Unused inbound rules

Let’s take a look at security group sg-63ed8c1c from the preceding screenshot. When I click sg-63ed8c1c (the security group ID) in the dashboard, a filter is applied that reduces the security groups displayed to only the records with that security group included. We can compare the traffic associated with this security group in the SGDashboard (shown in the following screenshot) to the security group rules in the EC2 console.

Screenshot showing the traffic of the sg-63ed8c1c security group

As the following screenshot of the EC2 console shows, this security group has only 2 inbound rules: one for HTTP on port 80 and one for RDP. The SGDashboard shows that traffic is not flowing on port 80, so I can safely remove that rule from the security group.

Screenshot showing this security group has only 2 inbound rules

Summary

It can be challenging to help ensure that your AWS Cloud environment allows only intended traffic and is as secure and manageable as possible. In this post, I have shown how to enable VPC Flow Logs. I then showed how to use Firehose and Lambda to add security group IDs, directions, and locations to the VPC Flow Logs dataset. The SGDashboard then enables you to analyze the flow log data and compare it with your security group configurations to improve your cloud security.

If you have comments about this blog post, submit them in the “Comments” section below. If you have implementation or troubleshooting questions about the solution in this post, please start a new thread on the AWS WAF forum.

– Guy

How to Monitor Host-Based Intrusion Detection System Alerts on Amazon EC2 Instances

Post Syndicated from Cameron Worrell original https://aws.amazon.com/blogs/security/how-to-monitor-host-based-intrusion-detection-system-alerts-on-amazon-ec2-instances/

To help you secure your AWS resources, we recommend that you adopt a layered approach that includes the use of preventative and detective controls. For example, incorporating host-based controls for your Amazon EC2 instances can restrict access and provide appropriate levels of visibility into system behaviors and access patterns. These controls often include a host-based intrusion detection system (HIDS) that monitors and analyzes network traffic, log files, and file access on a host. A HIDS typically integrates with alerting and automated remediation solutions to detect and address attacks, unauthorized or suspicious activities, and general errors in your environment.

In this blog post, I show how you can use Amazon CloudWatch Logs to collect and aggregate alerts from an open-source security (OSSEC) HIDS. I use a CloudWatch Logs subscription to deliver the alerts to Amazon Elasticsearch Service (Amazon ES) for analysis and visualization with Kibana – a popular open-source visualization tool. To make it easier for you to see this solution in action, I provide a CloudFormation template to handle most of the deployment work. You can use this solution to gain improved visibility and insights across your EC2 fleet and help drive security remediation activities. For example, if specific hosts are scanning your EC2 instances and triggering OSSEC alerts, you can implement a VPC network access control list (ACL) or AWS WAF rule to block those source IP addresses or CIDR blocks.

Solution overview

The following diagram depicts a high-level overview of this post’s solution.

Diagram showing a high-level overview of this post's solution

Here is how the solution works:

  1. On the target EC2 instances, the OSSEC HIDS generates alerts that the CloudWatch Logs agent captures. The HIDS performs log analysis, integrity checking, Windows registry monitoring, rootkit detection, real-time alerting, and active response. For more information, see Getting started with OSSEC.
  2. The CloudWatch Logs group receives the alerts as events.
  3. A CloudWatch Logs subscription is applied to the target log group to forward the events through AWS Lambda to Amazon ES.
  4. Amazon ES loads the logged alert data.
  5. Kibana visualizes the alerts in near-real time. Amazon ES provides a default installation of Kibana with every Amazon ES domain.

Deployment considerations

For the purposes of this post, the primary OSSEC HIDS deployment consists of a Linux-based installation for which the alerts are generated locally within each system. Note that this solution depends on Amazon ES and Lambda in the target region for deployment. You can find the latest information about AWS service availability in the Region table. You also must identify an Amazon Virtual Private Cloud (VPC) subnet that has Internet access and DNS resolution for your EC2 instances to provision the required components properly.

To simplify the deployment process, I created a test environment AWS CloudFormation template. You can use this template to provision a test environment stack automatically into an existing Amazon VPC subnet. You will use CloudFormation to provision the core components of this solution and then configure Kibana for alert analysis. The source code for this solution is available on GitHub.

This post’s template performs the following high-level steps in the region you choose:

  1. Creates two EC2 instances running Amazon Linux with an AWS Identity and Access Management (IAM) role for CloudWatch Logs access. Note: To provide sample HIDS alert data, the two EC2 instances are configured automatically to generate simulated HIDS alerts locally.
  2. Installs and configures OSSEC, the CloudWatch Logs agent, and additional packages used for the test environment.
  3. Creates the target HIDS Amazon ES domain.
  4. Creates the target HIDS CloudWatch Logs group.
  5. Creates the Lambda function and CloudWatch Logs subscription to send HIDS alerts to Amazon ES.

After the CloudFormation stack has been deployed, you can access the Kibana instance on the Amazon ES domain to complete the final steps of the setup for the test environment, which I show later in the post.

Although out of scope for this blog post, when deploying OSSEC into your existing EC2 environment, you should determine the desired configuration, including target log files for monitoring, directories for integrity checking, and active response. This typically also requires time for testing and tuning of the system to optimize it for your environment. The OSSEC documentation is a good place to start to familiarize yourself with this process. You could take another approach to OSSEC deployment, which involves an agent installation and a separate OSSEC manager to process events centrally before exporting them to CloudWatch Logs. This deployment requires an additional server component and network communication between the agent and the manager. Note that although Windows Server is supported by OSSEC, it requires an agent-based installation and therefore requires an OSSEC manager to be present. Review OSSEC Architecture for additional information about OSSEC architecture and deployment options.

Deploy the solution

This solution’s high-level steps are:

  1. Launch the CloudFormation stack.
  2. Configure a Kibana index pattern and begin exploring alerts.
  3. Configure a Kibana HIDS dashboard and visualize alerts.

1. Launch the CloudFormation stack

You will launch your test environment by using a CloudFormation template that automates the provisioning process. For the following input parameters, you must identify a target VPC and subnet (which requires Internet access) for deployment. If the target subnet uses an Internet gateway, set the AssignPublicIP parameter to true. If the target subnet uses a NAT gateway, you can leave the default setting of AssignPublicIP as false.

First, you will need to stage the Lambda function deployment package in an S3 bucket located in the region into which you are deploying. To do this, download the zipped deployment package and upload it to your in-region bucket. For additional information about uploading objects to S3, see Uploading Object into Amazon S3.

You also must provide a trusted source IP address or CIDR block for access to the environment following the creation of the stack and an EC2 key pair to associate with the instances. For information about creating an EC2 key pair, see Creating a Key Pair Using Amazon EC2. Note that the trusted IP address or CIDR block also is used to create the Amazon ES access policy automatically for Kibana access. We recommend that you use a specific IP address or CIDR range rather than using 0.0.0.0/0, which would allow all IPv4 addresses to access your instances. For more information about authorizing inbound traffic to your instances, see Authorizing Inbound Traffic for Your Linux Instances.

After you have confirmed the input parameters (see the following screenshot and table for more details), create the CloudFormation stack.

Numbered screenshot showing input parameters

Input parameterInput parameter description
1. HIDSInstanceSizeEC2 instance size for test server
2. ESInstanceSizeAmazon ES instance size
3. MyKeyPairA public/private key pair that allows you to connect securely to your instance after it launches
4. MyS3BucketIn-region S3 bucket with the zipped deployment package
5. MyS3KeyIn-region S3 key for the zipped deployment package
6. VPCIdAn Amazon VPC into which to deploy the solution
7. SubnetIdA SubnetId with outbound connectivity within the VPC you selected (requires Internet access)
8. AssignPublicIPSet to true if your subnet is configured to connect through an Internet gateway; set to false if your subnet is configured to connect through a NAT gateway
9. MyTrustedNetworkYour trusted source IP or CIDR block that is used to whitelist access to the EC2 instances and the Amazon ES endpoint

To finish creating the CloudFormation stack:

  1. Enter the input parameters and choose Next.
  2. On the Options page, accept the defaults and choose Next.
  3. On the Review page, confirm the details, select the I acknowledge that AWS CloudFormation might create IAM resources check box, and then choose Create. (The stack will be created in approximately 10 minutes.)

After the stack has been created, note the HIDSESKibanaURL on the CloudFormation Outputs tab. Then, proceed to the Kibana configuration instructions in the next section.

2. Configure a Kibana index pattern and begin exploring alerts

In this section, you perform the initial setup of Kibana. To access Kibana, find the HIDSESKibanaURL in the CloudFormation stack outputs (see the previous section) and choose it. This will bring you to the Kibana instance, which is automatically provisioned to your Amazon ES instance. The source IP you provided in the CloudFormation input parameters is used to automatically populate the Amazon ES access policy. If you receive an error similar to the following error, you must confirm that your Amazon ES access policy is correct.

{"Message":"User: anonymous is not authorized to perform: es:ESHttpGet on resource: hids-alerts"}

For additional information about securing access to your Amazon ES domain, see How to Control Access to Your Amazon Elasticsearch Service Domain.

The OSSEC HIDS alerts now are being processed into Amazon ES. To use Kibana to analyze the alert data interactively, you must configure an index pattern that identifies the data you wish to analyze in Amazon ES. You can read additional information about index patterns in the Kibana documentation.

In the Index name or pattern box, type cwl-2017.*. The index pattern is generated within the Lambda function as cwl-YYYY.MM.DD, so you can use a wildcard character for the month and day to match data from 2017. From the Time-field name drop-down list, choose @timestamp, and then choose Create.

Screenshot of the "Configure an index pattern" screen

In Kibana, you should now be able to choose the Discover pane and see alerts being populated. To set the refresh rate for the display of near-real-time alerts, choose your desired time range in the top right (such as Last 15 minutes).

Screenshot of setting the refresh rate of near-real-time alerts

Choose Auto-refresh, and then choose an interval, such as 5 seconds.

Screenshot of auto-refresh of 5 seconds

Kibana should now be configured to auto-refresh at a 5-second interval within the timeframe you configured. You should now see your alerts updating along with a count graph, as shown in the following screenshot.

Screenshot of the alerts updating with a count graph

The EC2 instances are automatically configured by CloudFormation to simulate activity to display several types of alerts, including:

  • Successful sudo to ROOT executed – The Linux sudo command was successfully executed.
  • Web server 400 error code – The server cannot process the request due to an apparent client error (such as malformed request syntax, too large size, invalid request message framing, or deceptive request routing).
  • SSH insecure connection attempt (scan) – Invalid connection attempt to the SSH listener.
  • Login session opened – Opened login session on the system.
  • Login session closed – Closed login session on the system.
  • New Yum package installed – Package installed on the system.
  • Yum package deleted – Package deleted from the system.

Let’s take a closer look at some of the alert fields, as shown in the following screenshot.

Screenshot highlighting some of the alert fields

The numbered alert fields in the preceding screenshot are defined as follows:

  1. @log_group – The source CloudWatch Logs group
  2. @log_stream – The CloudWatch Logs stream name (InstanceID)
  3. @message – The JSON payload from the source alerts.json OSSEC log
  4. @owner – The AWS account ID where the alert originated
  5. @timestamp – The time stamp applied by the consumer Lambda function
  6. full_log – The log event from the source file
  7. location – The source log file path and file name
  8. rule.comment – A brief description of the OSSEC rule that was matched
  9. rule.level – The OSSEC rule classification from 0 to 16 (see Rules Classification for more information)
  10. rule.sidid – The rule ID of the OSSEC rule that was matched
  11. srcip – The source IP address that triggered the alert; in this case, the simulated alerts contain the local IP of the server

You can enter search criteria in the Kibana query bar to explore HIDS alert data interactively. For example, you can run the following query to see all the rule.level 6 alerts for the EC2 InstanceID i-0e427a8594852eca2 where the source IP is 10.10.10.10.

“rule.level: 6 AND @log_stream: "i-0e427a8594852eca2" AND srcip: 10.10.10.10”

You can perform searches including simple text, Lucene query syntax, or use the full JSON-based Elasticsearch Query DSL. You can find additional information on searching your data in the Elasticsearch documentation.

3. Configure a Kibana HIDS dashboard and visualize alerts

To analyze alert trends and patterns over time, it can be helpful to use charts and graphs to represent the alert data. I have configured a basic dashboard template that you can import into your Kibana instance.

To add the template of a sample HIDS dashboard to your Kibana instance:

  1. Save the template locally and then choose Management in the Kibana navigation pane.
  2. Choose Saved Objects, Import, and the HIDS dashboard template.
  3. Choose the eye icon to the right of the HIDS Alerts dashboard entry. This will take you to the imported dashboard.
    Screenshot of the "Edit Saved Objects" screen

After importing the Kibana dashboard template and selecting it, you will see the HIDS dashboard, as shown in the following screenshot. This sample HIDS dashboard includes Alerts Over Time, Top 20 Alert Types, Rule Level Breakdown, Top 10 Rule Source ID, and Top 10 Source IPs.

Screenshot of the HIDS dashboard

To explore the alert data in more detail, you can choose an alert type on which to filter, as shown in the following two screenshots.

Alert showing SSH insecure connection attempts

Alert showing @timestamp per 30 seconds

You can see more details about the alerts based on criteria such as source IP address or time range. For more information about using Kibana to visualize alert data, see the Kibana User Guide.

Summary

In this blog post, I showed how to use CloudWatch Logs to collect alerts in near-real time from an OSSEC HIDS and use a CloudWatch Logs subscription to pass the alerts into Amazon ES for analysis and visualization with Kibana. The dashboard deployed by this solution can help you improve the security monitoring of your EC2 fleet as part of a defense-in-depth security strategy in your AWS environment.

You can use this solution to help detect attacks, anomalous activities, and error trends across your EC2 fleet. You can also use it to help prioritize remediation efforts for your systems or help determine where to introduce additional security controls such as VPC security group rules, VPC network ACLs, or AWS WAF rules.

If you have comments about this post, add them to the “Comments” section below. If you have questions about or issues implementing this solution, start a new thread on the CloudWatch or Amazon ES forum. The source code for this solution is available on GitHub. If you need OSSEC-specific support, see OSSEC Support Options.

– Cameron