Tag Archives: OpenSearch

How to deploy an Amazon OpenSearch cluster to ingest logs from Amazon Security Lake

2024-07-30 Kevin Low

Post Syndicated from Kevin Low original https://aws.amazon.com/blogs/security/how-to-deploy-an-amazon-opensearch-cluster-to-ingest-logs-from-amazon-security-lake/

January 30, 2025: This post was republished to make the instructions clearer and compatible with OCSF 1.1.

Customers often require multiple log sources across their AWS environment to empower their teams to respond and investigate security events. In part one of this two-part blog post, I show you how you can use Amazon OpenSearch Service to ingest logs collected by Amazon Security Lake to facilitate near real-time monitoring.

Many customers use Security Lake to automatically centralize security data from Amazon Web Services (AWS) environments, software as a service (SaaS) providers, on-premises workloads, and cloud sources into a purpose-built data lake in their AWS environment. OpenSearch Service is a managed service that customers can use to deploy, operate, and scale OpenSearch clusters in the AWS Cloud. It natively integrates with Security Lake to enable customers to perform interactive log analytics and searches across large datasets, create enterprise visualization and dashboards, and perform analysis across disparate applications and logs. With Amazon OpenSearch Security Analytics, customers can also gain visibility into the security posture of their organization’s infrastructure, monitor for anomalous activity, detect potential security threats in near real time, and initiate alerts to pre-configured destinations.

Without using Amazon OpenSearch Service, customers would need to build, deploy and manage infrastructure for an analytics solution, such as an ELK stack.

Prerequisites

Security Lake should already be deployed. For details on how to deploy Security Lake, see Getting started with Amazon Security Lake. You will need AWS Identity and Access Management (IAM) permissions to manage Security Lake, OpenSearch Service, Amazon Cognito, AWS Secrets Manager, and Amazon Elastic Compute Cloud (Amazon EC2), and to create IAM roles to follow along with this post. The solution can be deployed in any AWS Region that has at least 3 Availability Zones, supports Security Lake, OpenSearch, and OpenSearch Ingestion.

Solution overview

The architecture diagram in Figure 1 shows the completed architecture of the solution.

The OpenSearch Service cluster is deployed within a virtual private cloud (VPC) across three Availability Zones for high availability.
The OpenSearch Service cluster ingests logs from Security Lake using an OpenSearch Ingestion pipeline.
The cluster is accessed by end users through a public-facing proxy hosted on an Amazon EC2 instance.
1. To reduce costs, the template doesn’t deploy a dead letter queue (DLQ) for the OpenSearch Ingestion pipeline. You can add one later if you want.
2. Instead of a public facing proxy, you can deploy a VPN to access your cluster.
Authentication to the cluster is managed with Amazon Cognito.

Figure 1: Solution architecture

Planning the deployment

This section will help you plan your OpenSearch service deployment, including what nodes you should choose, the amount of storage to allocate, and where to deploy the cluster.

Deciding instances for the OpenSearch Service master and data nodes

First, determine what instance type to use for the master and data nodes. If your workload generates less than 100 GB of Security Lake logs per day, we recommend using three m6g.large.search master nodes and three r6g.large.search data nodes. You can start small and scale up or scale out later. For more information about deciding the size and number of instances, see Get started with Amazon OpenSearch Service. Note the instance types that you have selected on a text editor because you will use this as an input for the AWS CloudFormation template that you will deploy later.

Configuring storage

To optimize your storage costs, you need to plan your data strategy. In this architecture, Security Lake is used for long-term log storage. Because Security Lake uses Amazon Simple Storage Service (Amazon S3), you can optimize long-term storage costs. You can configure OpenSearch Service to ingest priority logs based on the recent data that you can use for near-real time detection and alerting. Your team can query logs in Security Lake using its Zero-ETL integration with OpenSearch Service to analyze older logs.

Therefore, Security Lake should serve as your primary long-term log storage, with OpenSearch Service storing only the most recent logs.

The number of days of logs in OpenSearch Service will depend on how many days’ worth of data you need to investigate at a given time. I recommend storing 15 days of data in OpenSearch Service. This allows you to react to and investigate the most immediate security events while optimizing storage costs for older logs.

The next step is to determine the volume of logs generated by Security Lake.

Sign in to the Security Lake delegated administrator account.
Go to the AWS Management Console for Security Lake. Choose Usage in the navigation pane.
On the Usage screen, select Last 30 days as the range of usage.
Add the total Actual usage for the last 30 days for the data sources that you intend to send to OpenSearch. If you have used Security Lake for less than 30 days, you can use the Total predicted usage per month. Divide this figure by 30 to get the daily data volume.

Figure 2: Select range of usage

To determine the total storage needed, multiply the data generated by Security Lake per day by the retention period you chose, then by 1.1 to account for the indexes, then multiply that number by 1.15 for overhead storage. For more information about calculating storage, see Get started with Amazon OpenSearch Service.

To determine the amount of Amazon Elastic Block Store (Amazon EBS) storage that you need per node, take the total amount of storage and divide it by the number of nodes that you have. Round that number up to the nearest whole number. You can increase the amount of storage after deployment when you have a better understanding of your workload. Make a note of this number in a text editor because you’ll use it as an input in the CloudFormation template later.

Example 1: 10 GB of Security Lake logs generated per day, stored for 30 days in OpenSearch Service in three nodes

10 GB of Security Lake logs stored for 30 days = 10 GB * 30 = 300 GB
Account for additional space for indexes and overhead space = 300 GB * 1.1 * 1.15 = 379.5 GB
Divide the storage required across three nodes, rounded up = 379.5/3 ≈ 127 GB per node
You would need 127 GB per node in OpenSearch Service

Example 2: 200 GB of Security Lake logs generated per day, stored for 15 days in OpenSearch Service across six nodes

200 GB of Security Lake logs stored for 15 days = 200 GB * 15 = 3000 GB
Account for additional space for indexes and overhead space = 3000 GB * 1.1 * 1.15 = 3795 GB
Divide the storage required across three nodes, rounded up = 3795/6 ≈ 633 GB per node
You would need 633 GB per node in OpenSearch Service

Where to deploy the cluster?

If you have an AWS Control Tower deployment or have a deployment modelled after the AWS Security Reference Architecture (AWS SRA), Security Lake should be deployed in the Log Archive account. Because security best practices recommend that the Log Archive account should not be frequently accessed, the OpenSearch Service cluster should be deployed into your Audit account or Security Tooling account.

You need to deploy your Security Lake subscriber in the same Region as your Security Lake roll-up Region. If you have more than one roll-up Region, choose the Region that collects logs from the Regions you want to monitor.

Your cluster needs to be deployed in the same Region as your Security Lake subscriber be able to access data.

Setting up the Security Lake subscriber

Before deploying the solution, create a Security Lake subscriber in your Security Lake roll-up Region so that OpenSearch Service can access data from Amazon Security Lake.

Access the Security Lake console in your Log Archive account.
Choose Subscribers in the navigation pane.
Choose Create subscriber.
On the Create subscriber page, enter a name, such as OpenSearch-subscriber.
Under Data Access, select Under S3 notification type, select SQS queue.
Under Subscriber credentials, enter the AWS account ID for the account you plan to deploy the OpenSearch cluster to, which should be your Security Tooling
Enter OpenSearchIngestion-<AWS account ID> under External ID.

Figure 3: Configuring the Security Lake subscriber
Leave All log and event sources selected and choose Create.

After the subscriber has been created, you will need to collect information to facilitate the deployment.

To gather necessary information:

Select the subscriber that you just created.
Derive the S3 bucket name from the S3 bucket ARN and store it in a text editor. The Amazon Resource Name (ARN) is formatted as arn:aws:s3:::<bucket name>. The bucket name should look like aws-security-data-lake-<region>-xxxxx.

Figure 4: Derive the S3 bucket name from the Subscriber details page
Go to the Amazon Simple Queue Service (Amazon SQS) console and select the SQS queue created as part of the Security Lake subscriber. It should look like AmazonSecurityLake-xxxxxxxxx-Main-Queue. Note the queue’s ARN and URL in your text editor.

Figure 5: Relevant details from the SQS queue

Deploy the solution

To deploy the solution in your Security Tooling account, use a CloudFormation template. This template deploys the OpenSearch Service cluster, OpenSearch Ingestion pipeline, and an AWS Lambda function to initialize the cluster.

To deploy the OpenSearch cluster:

To deploy the CloudFormation template that builds the OpenSearch service cluster, select the Launch Stack button.
In the CloudFormation console, make sure that you are in the correct AWS account. You should be in your Security Tooling account. Also make sure that you have selected the same Region as your Security Lake subscriber.
Enter a name for your stack. A name like os-stack-<day>-<month> can help you keep track of deployments.
Enter the instance types and Amazon EBS volume size that you noted earlier.
Enter the IP address range that you want to allow to access the proxy’s security group. You should limit this to your corporate IP range. You can set it as 0.0.0/0 if you want to expose it to the public internet.
Fill in the details of the Security Lake bucket and the subscriber Amazon SQS queue ARN, URL, and Region.

Figure 6: Add stack parameters
Check the acknowledgements in the Capabilities section.
Choose Create stack to begin deploying the resources.
It will take 20–30 minutes to deploy the multiple nested templates. Wait for the main stack (not the nested ones) to achieve the CREATE_COMPLETE status before proceeding to the next step.

Note: If you encounter failures while deployment, you can download the CloudFormation file here and select Preserve successfully provisioned resources under Stack failure options while deploying. This will allow you to troubleshoot the stack deployment.
Go to the Outputs pane of the main CloudFormation stack. Save the DashboardsProxyURL, OpenSearchInitRoleARN, and PipelineRole values in a text editor to refer to later.

Figure 7: The stacks in the CREATE_COMPLETE state with the outputs panel shown
Open the DashboardsProxyURL value in a new tab.

Note: Because the proxy relies on a self-signed certificate, you will get an insecure certificate warning. You can safely ignore this warning and proceed. For a production workload, you should issue a trusted private certificate from your internal public key infrastructure or use AWS Private Certificate Authority.
You will be presented with the Amazon Cognito sign-in page. Use administrator as the username.
Access Secrets Manager to find the password. Select the secret that was created as part of the stack.

Figure 8: The Cognito password in Secrets Manager
Choose Retrieve secret value to get the password.

Figure 9: Retrieve the secret value
After signing in, you will be prompted to change your password and will be redirected to the OpenSearch dashboard.
If you see a pop-up that states Start by adding your own data, select Explore on my own. On the next page, Introducing new OpenSearch Dashboards look & feel, choose Dismiss.
If you see a pop-up that states Select your tenant, select Global, and then choose Confirm.

Figure 10: Select and confirm your tenant

To initialize the OpenSearch cluster:

Choose the menu icon (three stacked horizontal lines) on the top left and select Security under the Management section.

Figure 11: Navigating to the Security page in the OpenSearch console
Select Roles. On the Roles page, search for the all_access role and select it.
Select Mapped users, and then select Manage mapping.
On the Map user screen, choose Add another backend role. Paste the value for the OpenSearchInitRoleARN from the list of CloudFormation outputs. Choose Map.

Figure 12: Mapping the role on the Security page in the OpenSearch console
Leave this tab open and return to the AWS Management console. Go to the AWS Lambda console and select the function named xxxxxx-OS_INIT.
In the function screen, choose Test, and then Create new test event.

Figure 13: Creating the test event in the Lambda console
Choose Invoke. The function should run for about 30 seconds. The execution results should show the component templates that have been created. This Lambda function creates the component and index templates to ingest Open Cybersecurity Framework (OCSF) formatted data, a set of indices and aliases that correspond with the OCSF classes generated by Security Lake, and a rollover policy that will rollover the index daily or if it becomes larger than 40 GB.

Figure 14: Invoking the Lambda function in the Lambda console

To set up the pipeline

Return to the Map user page on the OpenSearch console.
Choose Add another backend role. Paste the value of the PipelineRole from the CloudFormation template output. Choose This will allow the OpenSearch Ingestion to write to the cluster.

Figure 15: Mapping the OpenSearch Ingestion role
Access the Amazon S3 console in the Log Archive account where Security Lake is hosted.
Select the Security Lake bucket in your roll-up Region. It should look like aws-security-data-lake-region-xxxxxxxxxx.
Choose Permissions, then Edit under Bucket policy.

Add this policy to the end of the existing bucket policy. Replace the Principal with the ARN of the PipelineRole and the name of your Security Lake bucket in the Resource section.

{
            "Sid": "Cross Account Permissions",
            "Effect": "Allow",
            "Principal": {
                "AWS": "<Pipeline role ARN>"
            },
            "Action": "s3:*",
            "Resource": [
                "arn:aws:s3:::<Security Lake bucket name>/*",
                "arn:aws:s3:::<Security Lake bucket name>"
            ]
        }

Figure 16: The modified S3 bucket access policy

Choose Save changes.

To upload the index patterns and dashboards

Download the Security-lake-objects.ndjson file by right-clicking on this link and selecting Save link as.
Access the Dashboards Management page through the navigation menu.
Choose Saved objects in the navigation pane.
On the Saved Objects page, choose Import on the right side of the screen.

Figure 17: Import saved objects
Choose Import and select the Security-lake-objects.ndjson file that you downloaded previously.
Leave Create new objects with unique IDs selected and choose Import.
You can now view the ingested logs on the Discover page and visualizations on the Dashboards page, which you can find on the navigation bar.

Figure 18: The Discover page displaying ingested logs

Clean up

To avoid unwanted charges, delete the main CloudFormation template, named os-stack-<day>-<month> (not the nested stacks).

Figure 19: Select the main stack in the CloudFormation console

Modify the Security Lake bucket policy in the logging account to remove the section you added that trusted the PipelineRole. Be careful not to modify the rest of the policy because it could impact the functioning of Security Lake and other subscribers.

Figure 20: The S3 bucket policy with the relevant sections that needed to be deleted

Conclusion

In this post, you learned how to plan an OpenSearch deployment with Amazon OpenSearch Service to ingest logs from Amazon Security Lake. With this solution, you’re able to aggregate and manage logs with Security Lake and visualize and monitor those logs with OpenSearch Service. After deployment, monitor the OpenSearch Service metrics to determine if you need to scale this up or out for improved performance. In part 2, I will show you how to set up the Security Analytics detector to generate alerts to security findings in near-real time.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Three ways to accelerate incident response in the cloud: insights from re:Inforce 2023

2023-06-30 Anne Grahn

Post Syndicated from Anne Grahn original https://aws.amazon.com/blogs/security/three-ways-to-accelerate-incident-response-in-the-cloud-insights-from-reinforce-2023/

AWS re:Inforce took place in Anaheim, California, on June 13–14, 2023. AWS customers, partners, and industry peers participated in hundreds of technical and non-technical security-focused sessions across six tracks, an Expo featuring AWS experts and AWS Security Competency Partners, and keynote and leadership sessions.

The threat detection and incident response track showcased how AWS customers can get the visibility they need to help improve their security posture, identify issues before they impact business, and investigate and respond quickly to security incidents across their environment.

With dozens of service and feature announcements—and innumerable best practices shared by AWS experts, customers, and partners—distilling highlights is a challenge. From an incident response perspective, three key themes emerged.

Proactively detect, contextualize, and visualize security events

When it comes to effectively responding to security events, rapid detection is key. Among the launches announced during the keynote was the expansion of Amazon Detective finding groups to include Amazon Inspector findings in addition to Amazon GuardDuty findings.

Detective, GuardDuty, and Inspector are part of a broad set of fully managed AWS security services that help you identify potential security risks, so that you can respond quickly and confidently.

Using machine learning, Detective finding groups can help you conduct faster investigations, identify the root cause of events, and map to the MITRE ATT&CK framework to quickly run security issues to ground. The finding group visualization panel shown in the following figure displays findings and entities involved in a finding group. This interactive visualization can help you analyze, understand, and triage the impact of finding groups.

Figure 1: Detective finding groups visualization panel

With the expanded threat and vulnerability findings announced at re:Inforce, you can prioritize where to focus your time by answering questions such as “was this EC2 instance compromised because of a software vulnerability?” or “did this GuardDuty finding occur because of unintended network exposure?”

In the session Streamline security analysis with Amazon Detective, AWS Principal Product Manager Rich Vorwaller, AWS Senior Security Engineer Rima Tanash, and AWS Program Manager Jordan Kramer demonstrated how to use graph analysis techniques and machine learning in Detective to identify related findings and resources, and investigate them together to accelerate incident analysis.

In addition to Detective, you can also use Amazon Security Lake to contextualize and visualize security events. Security Lake became generally available on May 30, 2023, and several re:Inforce sessions focused on how you can use this new service to assist with investigations and incident response.

As detailed in the following figure, Security Lake automatically centralizes security data from AWS environments, SaaS providers, on-premises environments, and cloud sources into a purpose-built data lake stored in your account. Security Lake makes it simpler to analyze security data, gain a more comprehensive understanding of security across an entire organization, and improve the protection of workloads, applications, and data. Security Lake automates the collection and management of security data from multiple accounts and AWS Regions, so you can use your preferred analytics tools while retaining complete control and ownership over your security data. Security Lake has adopted the Open Cybersecurity Schema Framework (OCSF), an open standard. With OCSF support, the service normalizes and combines security data from AWS and a broad range of enterprise security data sources.

Figure 2: How Security Lake works

To date, 57 AWS security partners have announced integrations with Security Lake, and we now have more than 70 third-party sources, 16 analytics subscribers, and 13 service partners.

In Gaining insights from Amazon Security Lake, AWS Principal Solutions Architect Mark Keating and AWS Security Engineering Manager Keith Gilbert detailed how to get the most out of Security Lake. Addressing questions such as, “How do I get access to the data?” and “What tools can I use?,” they demonstrated how analytics services and security information and event management (SIEM) solutions can connect to and use data stored within Security Lake to investigate security events and identify trends across an organization. They emphasized how bringing together logs in multiple formats and normalizing them into a single format empowers security teams to gain valuable context from security data, and more effectively respond to events. Data can be queried with Amazon Athena, or pulled by Amazon OpenSearch Service or your SIEM system directly from Security Lake.

Build your security data lake with Amazon Security Lake featured AWS Product Manager Jonathan Garzon, AWS Product Solutions Architect Ross Warren, and Global CISO of Interpublic Group (IPG) Troy Wilkinson demonstrating how Security Lake helps address common challenges associated with analyzing enterprise security data, and detailing how IPG is using the service. Wilkinson noted that IPG’s objective is to bring security data together in one place, improve searches, and gain insights from their data that they haven’t been able to before.

“With Security Lake, we found that it was super simple to bring data in. Not just the third-party data and Amazon data, but also our on-premises data from custom apps that we built.” — Troy Wilkinson, global CISO, Interpublic Group

Use automation and machine learning to reduce mean time to response

Incident response automation can help free security analysts from repetitive tasks, so they can spend their time identifying and addressing high-priority security issues.

In How LLA reduces incident response time with AWS Systems Manager, telecommunications provider Liberty Latin America (LLA) detailed how they implemented a security framework to detect security issues and automate incident response in more than 180 AWS accounts accessed by internal stakeholders and third-party partners by using AWS Systems Manager Incident Manager, AWS Organizations, Amazon GuardDuty, and AWS Security Hub.

LLA operates in over 20 countries across Latin America and the Caribbean. After completing multiple acquisitions, LLA needed a centralized security operations team to handle incidents and notify the teams responsible for each AWS account. They used GuardDuty, Security Hub, and Systems Manager Incident Manager to automate and streamline detection and response, and they configured the services to initiate alerts whenever there was an issue requiring attention.

Speaking alongside AWS Principal Solutions Architect Jesus Federico and AWS Principal Product Manager Sarah Holberg, LLA Senior Manager of Cloud Services Joaquin Cameselle noted that when GuardDuty identifies a critical issue, it generates a new finding in Security Hub. This finding is then forwarded to Systems Manager Incident Manager through an Amazon EventBridge rule. This configuration helps ensure the involvement of the appropriate individuals associated with each account.

“We have deployed a security framework in Liberty Latin America to identify security issues and streamline incident response across over 180 AWS accounts. The framework that leverages AWS Systems Manager Incident Manager, Amazon GuardDuty, and AWS Security Hub enabled us to detect and respond to incidents with greater efficiency. As a result, we have reduced our reaction time by 90%, ensuring prompt engagement of the appropriate teams for each AWS account and facilitating visibility of issues for the central security team.” — Joaquin Cameselle, senior manager, cloud services, Liberty Latin America

How Citibank (Citi) advanced their containment capabilities through automation outlined how the National Institute of Standards and Technology (NIST) Incident Response framework is applied to AWS services, and highlighted Citi’s implementation of a highly scalable cloud incident response framework designed to support the 28 AWS services in their cloud environment.

After describing the four phases of the incident response process — preparation and prevention; detection and analysis; containment, eradication, and recovery; and post-incident activity—AWS ProServe Global Financial Services Senior Engagement Manager Harikumar Subramonion noted that, to fully benefit from the cloud, you need to embrace automation. Automation benefits the third phase of the incident response process by speeding up containment, and reducing mean time to response.

Citibank Head of Cloud Security Operations Elvis Velez and Vice President of Cloud Security Damien Burks described how Citi built the Cloud Containment Automation Framework (CCAF) from the ground up by using AWS Step Functions and AWS Lambda, enabling them to respond to events 24/7 without human error, and reduce the time it takes to contain resources from 4 hours to 15 minutes. Velez described how Citi uses adversary emulation exercises that use the MITRE ATT&CK Cloud Matrix to simulate realistic attacks on AWS environments, and continuously validate their ability to effectively contain incidents.

Innovate and do more with less

Security operations teams are often understaffed, making it difficult to keep up with alerts. According to data from CyberSeek, there are currently 69 workers available for every 100 cybersecurity job openings.

Effectively evaluating security and compliance posture is critical, despite resource constraints. In Centralizing security at scale with Security Hub and Intuit’s experience, AWS Senior Solutions Architect Craig Simon, AWS Senior Security Hub Product Manager Dora Karali, and Intuit Principal Software Engineer Matt Gravlin discussed how to ease security management with Security Hub. Fortune 500 financial software provider Intuit has approximately 2,000 AWS accounts, 10 million AWS resources, and receives 20 million findings a day from AWS services through Security Hub. Gravlin detailed Intuit’s Automated Compliance Platform (ACP), which combines Security Hub and AWS Config with an internal compliance solution to help Intuit reduce audit timelines, effectively manage remediation, and make compliance more consistent.

“By using Security Hub, we leveraged AWS expertise with their regulatory controls and best practice controls. It helped us keep up to date as new controls are released on a regular basis. We like Security Hub’s aggregation features that consolidate findings from other AWS services and third-party providers. I personally call it the super aggregator. A key component is the Security Hub to Amazon EventBridge integration. This allowed us to stream millions of findings on a daily basis to be inserted into our ACP database.” — Matt Gravlin, principal software engineer, Intuit

At AWS re:Inforce, we launched a new Security Hub capability for automating actions to update findings. You can now use rules to automatically update various fields in findings that match defined criteria. This allows you to automatically suppress findings, update the severity of findings according to organizational policies, change the workflow status of findings, and add notes. With automation rules, Security Hub provides you a simplified way to build automations directly from the Security Hub console and API. This reduces repetitive work for cloud security and DevOps engineers and can reduce mean time to response.

In Continuous innovation in AWS detection and response services, AWS Worldwide Security Specialist Senior Manager Himanshu Verma and GuardDuty Senior Manager Ryan Holland highlighted new features that can help you gain actionable insights that you can use to enhance your overall security posture. After mapping AWS security capabilities to the core functions of the NIST Cybersecurity Framework, Verma and Holland provided an overview of AWS threat detection and response services that included a technical demonstration.

Bolstering incident response with AWS Wickr enterprise integrations highlighted how incident responders can collaborate securely during a security event, even on a compromised network. AWS Senior Security Specialist Solutions Architect Wes Wood demonstrated an innovative approach to incident response communications by detailing how you can integrate the end-to-end encrypted collaboration service AWS Wickr Enterprise with GuardDuty and AWS WAF. Using Wickr Bots, you can build integrated workflows that incorporate GuardDuty and third-party findings into a more secure, out-of-band communication channel for dedicated teams.

Evolve your incident response maturity

AWS re:Inforce featured many more highlights on incident response, including How to run security incident response in your Amazon EKS environment and Investigating incidents with Amazon Security Lake and Jupyter notebooks code talks, as well as the announcement of our Cyber Insurance Partners program. Content presented throughout the conference made one thing clear: AWS is working harder than ever to help you gain the insights that you need to strengthen your organization’s security posture, and accelerate incident response in the cloud.

To watch AWS re:Inforce sessions on demand, see the AWS re:Inforce playlists on YouTube.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Analyze AWS WAF logs using Amazon OpenSearch Service anomaly detection built on Random Cut Forests

2022-01-24 Umesh Ramesh

Post Syndicated from Umesh Ramesh original https://aws.amazon.com/blogs/security/analyze-aws-waf-logs-using-amazon-opensearch-service-anomaly-detection-built-on-random-cut-forests/

This blog post shows you how to use the machine learning capabilities of Amazon OpenSearch Service (successor to Amazon Elasticsearch Service) to detect and visualize anomalies in AWS WAF logs. AWS WAF logs are streamed to Amazon OpenSearch Service using Amazon Kinesis Data Firehose. Kinesis Data Firehose invokes an AWS Lambda function to transform incoming source data and deliver the transformed data to Amazon OpenSearch Service. You can implement this solution without any machine learning expertise. AWS WAF logs capture a number of attributes about the incoming web request, and you can analyze these attributes to detect anomalous behavior. This blog post focuses on the following two scenarios:

Identifying anomalous behavior based on a high number of web requests coming from an unexpected country (Country Code is one of the request fields captured in AWS WAF logs).
Identifying anomalous behavior based on HTTP method for a read-heavy application like a content media website that receives unexpected write requests.

Log analysis is essential for understanding the effectiveness of any security solution. It helps with day-to-day troubleshooting, and also with long-term understanding of how your security environment is performing.

AWS WAF is a web application firewall that helps protect your web applications from common web exploits which could affect application availability, compromise security, or consume excessive resources. AWS WAF gives you control over which traffic sent to your web applications is allowed or blocked, by defining customizable web security rules. AWS WAF lets you define multiple types of rules to block unauthorized traffic.

Machine learning can assist in identifying unusual or unexpected behavior. Amazon OpenSearch Service is one of the commonly used services which offer log analytics for monitoring service logs, using dashboards and alerting mechanisms. Static, rule‑based analytics approaches are slow to adapt to evolving workloads, and can miss critical issues. With the announcement of real-time anomaly detection support in Amazon OpenSearch Service, you can use machine learning to detect anomalies in real‑time streaming data, and identify issues as they evolve so you can mitigate them quickly. Real‑time anomaly detection support uses Random Cut Forest (RCF), an unsupervised algorithm, which continuously adapts to evolving data patterns. Simply stated, RCF takes a set of random data points, divides them into multiple groups, each with the same number of points, and then builds a collection of models. As an unsupervised algorithm, RCF uses cluster analysis to detect spikes in time series data, breaks in periodicity or seasonality, and data point exceptions. The anomaly detection feature is lightweight, with the computational load distributed across Amazon OpenSearch Service nodes. Figure 1 shows the architecture of the solution described in this blog post.

Figure 1: End-to-end architecture

The architecture flow shown in Figure 1 includes the following high-level steps:

AWS WAF streams logs to Kinesis Data Firehose.
Kinesis Data Firehose invokes a Lambda function to add attributes to the AWS WAF logs.
Kinesis Data Firehose sends the transformed source records to Amazon OpenSearch Service.
Amazon OpenSearch Service automatically detects anomalies.
Amazon OpenSearch Service delivers anomaly alerts via Amazon Simple Notification Service (Amazon SNS).

Solution

Figure 2 shows examples of both an original and a modified AWS WAF log. The solution in this blog post focuses on Country and httpMethod. It uses a Lambda function to transform the AWS WAF log by adding fields, as shown in the snippet on the right side. The values of the newly added fields are evaluated based on the values of country and httpMethod in the AWS WAF log.

Figure 2: Sample processing done by a Lambda function

In this solution, you will use a Lambda function to introduce new fields to the incoming AWS WAF logs through Kinesis Data Firehose. You will introduce additional fields by using one-hot encoding to represent the incoming linear values as a “1” or “0”.

Scenario 1

In this scenario, the goal is to detect traffic from unexpected countries when serving user traffic expected to be from the US and UK. The function adds three new fields:

usTraffic
ukTraffic
otherTraffic

As shown in the lambda function inline code, we use the traffic_from_country function, in which we only want actions that ALLOW the traffic. Once we have that, we use conditions to check the country code. If the value of the country field in the web request captured in AWS WAF log is US, the usTraffic field in the transformed data will be assigned the value 1 while otherTraffic and ukTraffic will be assigned the value 0. The other two fields are transformed as shown in Table 1.

Original AWS WAF log	Transformed AWS WAF log with new fields after one-hot encoding
Country	usTraffic	ukTraffic	otherTraffic
US	1	0	0
UK	0	1	0
All other country codes	0	0	1

Table 1: One-hot encoding field mapping for country

Scenario 2

In the second scenario, you detect anomalous requests that use POST HTTP method.

As shown in the lambda function inline code, we use the filter_http_request_method function, in which we only want actions that ALLOW the traffic. Once we have that, we use conditions to check the HTTP _request method. If the value of the HTTP method in the AWS WAF log is GET, the getHttpMethod field is assigned the value 1 while headHttpMethod and postHttpMethod are assigned the value 0. The other two fields are transformed as shown in Table 2.

Original AWS WAF log	Transformed AWS WAF log with new fields after one-hot encoding
HTTP method	getHttpMethod	headHttpMethod	postHttpMethod
GET	1	0	0
HEAD	0	1	0
POST	0	0	1

Table 2: One-hot encoding field mapping for HTTP method

After adding these new fields, the transformed record from Lambda must contain the following parameters before the data is sent back to Kinesis Data Firehose:

recordId	The transformed record must contain the same original record ID as is received from the Kinesis Data Firehose.
result	The status of the data transformation of the record (the status can be OK or Dropped).
data	The transformed data payload.

AWS WAF logs are JSON files, and this anomaly detection feature works only on numeric data. This means that to use this feature for detecting anomalies in logs, you must pre-process your logs using a Lambda function.

Lambda function for one-hot encoding

Use the following Lambda function to transform the AWS WAF log by adding new attributes, as explained in Scenario 1 and Scenario 2.

import base64
import json

def lambda_handler(event,context):
    output = []
    
    try:
        # loop through records in incoming Event
        for record in event["records"]:
            # extract message
            message = json.loads(base64.b64decode(event["records"][0]["data"]))
            
            print('Country: ', message["httpRequest"]["country"])
            print('Action: ', message["action"])
            print('User Agent: ', message["httpRequest"]["headers"][1]["value"])
             
            timestamp = message["timestamp"]
            action = message["action"]
            country = message["httpRequest"]["country"]
            user_agent = message["httpRequest"]["headers"][1]["value"]
            http_method = message["httpRequest"]["httpMethod"]
            
            mobileUserAgent, browserUserAgent = filter_user_agent(user_agent)
            usTraffic, ukTraffic, otherTraffic = traffic_from_country(country, action)
            getHttpMethod, headHttpMethod, postHttpMethod = filter_http_request_method(http_method, action)
            
            # append new fields in message dict
            message["usTraffic"] = usTraffic
            message["ukTraffic"] = ukTraffic
            message["otherTraffic"] = otherTraffic
            message["mobileUserAgent"] = mobileUserAgent
            message["browserUserAgent"] = browserUserAgent
            message["getHttpMethod"] = getHttpMethod
            message["headHttpMethod"] = headHttpMethod
            message["postHttpMethod"] = postHttpMethod
            
            # base64-encoding
            data = base64.b64encode(json.dumps(message).encode('utf-8'))
            
            output_record = {
                "recordId": record['recordId'], # retain same record id from the Kinesis data Firehose
                "result": "Ok",
                "data": data.decode('utf-8')
            }
            output.append(output_record)
        return {"records": output}
    except Exception as e:
        print(e)
        
def filter_user_agent(user_agent):
    # returns one hot encoding based on user agent
    if "Mobile" in user_agent:
        mobile_user_agent = True
        return (1, 0)
    else:
        mobile_user_agent = False
        return (0, 1) # anomaly recorded
        
def traffic_from_country(country_code, action):
    # returns one hot encoding based on allowed traffic from countries
    if action == "ALLOW":
        if "US" in country_code:
            allowed_country_traffic = True
            return (1, 0, 0)
        elif "UK" in country_code:
            allowed_country_traffic = True
            return (0, 1, 0)
        else:
            allowed_country_traffic = False
            return (0, 0, 1) # anomaly recorded
            
def filter_http_request_method(http_method, action):
    # returns one hot encoding based on allowed http method type
    if action == "ALLOW":
        if "GET" in http_method:
            return (1, 0, 0)
        elif "HEAD" in http_method:
            return (0, 1, 0)
        elif "POST" in http_method:
            return (0, 0, 1) # anomaly recorded

After the transformation, the data that’s delivered to Amazon OpenSearch Service will have additional fields, as described in Table 1 and Table 2 above. You can configure an anomaly detector in Amazon OpenSearch Service to monitor these additional fields. The algorithm computes an anomaly grade and confidence score value for each incoming data point. Anomaly detection uses these values to differentiate an anomaly from normal variations in your data. Anomaly detection and alerting are plugins that are included in the available set of Amazon OpenSearch Service plugins. You can use these two plugins to generate a notification as soon as an anomaly is detected.

Deployment steps

In this section, you complete five high-level steps to deploy the solution. In this blog post, we are deploying this solution in the us-east-1 Region. The solution assumes you already have an active web application protected by AWS WAF rules. If you’re looking for details on creating AWS WAF rules, refer to Working with web ACLs and sample examples for more information.

Note: When you associate a web ACL with Amazon CloudFront as a protected resource, make sure that the Kinesis Firehose Delivery Stream is deployed in the us-east-1 Region.

The steps are:

Deploy an AWS CloudFormation template
Enable AWS WAF logs
Create an anomaly detector
Set up alerts in Amazon OpenSearch Service
Create a monitor for the alerts

Deploy a CloudFormation template

To start, deploy a CloudFormation template to create the following AWS resources:

Amazon OpenSearch Service and Kibana (versions 1.5 to 7.10) with built-in AWS WAF dashboards.
Kinesis Data Firehose streams
A Lambda function for data transformation and an Amazon SNS topic with email subscription.

To deploy the CloudFormation template

Download the CloudFormation template and save it locally as Amazon-ES-Stack.yaml.
Go to the AWS Management Console and open the CloudFormation console.
Choose Create Stack.
On the Specify template page, choose Upload a template file. Then select Choose File, and select the template file that you downloaded in step 1.
Choose Next.
Provide the Parameters:
1. Enter a unique name for your CloudFormation stack.
2. Update the email address for UserEmail with the address you want alerts sent to.
3. Choose Next.
Review and choose Create stack.
When the CloudFormation stack status changes to CREATE_COMPLETE, go to the Outputs tab and make note of the DashboardLinkOutput value. Also note the credentials you’ll receive by email (Subject: Your temporary password) and subscribe to the SNS topic for which you’ll also receive an email confirmation request.

Enable AWS WAF logs

Before enabling the AWS WAF logs, you should have AWS WAF web ACLs set up to protect your web application traffic. From the console, open the AWS WAF service and choose your existing web ACL. Open your web ACL resource, which can either be deployed on an Amazon CloudFront distribution or on an Application Load Balancer.

To enable AWS WAF logs

From the AWS WAF home page, choose Create web ACL.
From the AWS WAF home page, choose Logging and metrics
From the AWS WAF home page, choose the web ACL for which you want to enable logging, as shown in Figure 3:

Figure 3 – Enabling WAF logging
Go to the Logging and metrics tab, and then choose Enable Logging. The next page displays all the delivery streams that start with aws-waf-logs. Choose the Kinesis Data Firehose delivery stream that was created by the Cloud Formation template, as shown in Figure 3 (in this example, aws-waf-logs-useast1). Don’t redact any fields or add filters. Select Save.

Create an Index template

Index templates lets you initialize new indices with predefined mapping. For example, in this case you predefined mapping for timestamp.

To create an Index template

Log into the Kibana dashboard. You can find the Kibana dashboard link in the Outputs tab of the CloudFormation stack. You should have received the username and temporary password (Ignore the period (.) at the end of the temporary password) by email, at the email address you entered as part of deploying the CloudFormation template. You will be logged in to the Kibana dashboard after setting a new password.
Choose Dev Tools in the left menu panel to access Kibana’s console.
The left pane in the console is the request pane, and the right pane is the response pane.

Select the green arrow at the end of the command line to execute the following PUT command.

PUT  _template/awswaf
{
    "index_patterns": ["awswaf-*"],
    "settings": {
    "number_of_shards": 1
    },
    "mappings": {
       "properties": {
          "timestamp": {
            "type": "date",
            "format": "epoch_millis"
          }
      }
  }
}

You should see the following response:
```
{
  "acknowledged": true
}
```

The command creates a template named awswaf and applies it to any new index name that matches the regular expression awswaf-*

Create an anomaly detector

A detector is an individual anomaly detection task. You can create multiple detectors, and all the detectors can run simultaneously, with each analyzing data from different sources.

To create an anomaly detector

Select Anomaly Detection from the menu bar, select Detectors and Create Detector.

Figure 4- Home page view with menu bar on the left
To create a detector, enter the following values and features:
Name and description

Name: aws-waf-country

Description: Detect anomalies on other country values apart from “US” and “UK“

Data Source

Index: awswaf*

Timestamp field: timestamp

Data filter: Visual editor

Figure 5 – Detector features and their values
For Detector operation settings, enter a value in minutes for the Detector interval to set the time interval at which the detector collects data. To add extra processing time for data collection, set a Window delay value (also in minutes). This tells the detector that the data isn’t ingested into Amazon OpenSearch Service in real time, but with a delay. The example in Figure 6 uses a 1-minute interval and a 2-minute delay.

Figure 6 – Detector operation settings
Next, select Create.
Once you create a detector, select Configure Model and add the following values to Model configuration:

Feature Name: waf-country-other

Feature State: Enable feature

Find anomalies based on: Field value

Aggregation method: sum()

Field: otherTraffic

The aggregation method determines what constitutes an anomaly. For example, if you choose min(), the detector focuses on finding anomalies based on the minimum values of your feature. If you choose average(), the detector finds anomalies based on the average values of your feature. For this scenario, you will use sum().The value otherTraffic for Field is the transformed field in the Amazon OpenSearch Service logs that was added by the Lambda function.

Figure 7 – Detector Model configuration
Under Advanced Settings on the Model configuration page, update the Window size to an appropriate interval (1 equals 1 minute) and choose Save and Start detector and Automatically start detector.
We recommend you choose this value based on your actual data. If you expect missing values in your data, or if you want the anomalies based on the current value, choose 1. If your data is continuously ingested and you want the anomalies based on multiple intervals, choose a larger window size.

Note: The detector takes 4 to 5 minutes to start.

Figure 8 – Detector window size

Set up alerts

You’ll use Amazon SNS as a destination for alerts from Amazon OpenSearch Service.

Note: A destination is a reusable location for an action.

To set up alerts:

Go to the Kibana main menu bar and select Alerting, and then navigate to the Destinations tab.
Select Add destination and enter a unique name for the destination.
For Type, choose Amazon SNS and provide the topic ARN that was created as part of the CloudFormation resources (captured in the Outputs tab).
Provide the ARN for an IAM role that was created as part of the CloudFormation outputs (SNSAccessIAMRole-********) that has the following trust relationship and permissions (at a minimum):
```
{"Version": "2012-10-17",
  "Statement": [{"Effect": "Allow",
    "Principal": {"Service": "es.amazonaws.com"
    },
    "Action": "sts:AssumeRole"
  }]
}
{"Version": "2012-10-17",
  "Statement": [{"Effect": "Allow",
    "Action": "sns:Publish",
    "Resource": "sns-topic-arn"
  }]
}
```
Figure 9 – Destination

Note: For more information, see Adding IAM Identity Permissions in the IAM user guide.
Choose Create.

Create a monitor

A monitor can be defined as a job that runs on a defined schedule and queries Amazon OpenSearch Service. The results of these queries are then used as input for one or more triggers.

To create a monitor for the alert

Select Alerting on the Kibana main menu and navigate to the Monitors tab. Select Create monitor
Create a new record with the following values:

Monitor Name: aws-waf-country-monitor

Method of definition: Define using anomaly detector

Detector: aws-waf-country

Monitor schedule: Every 2 minutes
Select Create.

Figure 10 – Create monitor

Choose Create Trigger to connect monitoring alert with the Amazon SNS topic using the below values:

Trigger Name:	SNS_Trigger
Severity Level:	1
Trigger Type:	Anomaly Detector grade and confidence

Under Configure Actions, set the following values:

Action Name:	SNS-alert
Destination:	select the destination name you chose when you created the Alert above
Message Subject:	“Anomaly detected – Country”
Message:	<Use the default message displayed>

Select Create to create the trigger.

Figure 11 – Create trigger

Figure 12 – Configure actions

Test the solution

Now that you’ve deployed the solution, the AWS WAF logs will be sent to Amazon OpenSearch Service.

Kinesis Data Generator sample template

When testing the environment covered in this blog outside a production context, we used Kinesis Data Generator to generate sample user traffic with the template below, changing the country strings in different runs to reflect expected records or anomalous ones. Other tools are also available.

{
"timestamp":"[{{date.now("DD/MMM/YYYY:HH:mm:ss Z")}}]",
"formatVersion":1,
"webaclId":"arn:aws:wafv2:us-east-1:066931718055:regional/webacl/FMManagedWebACLV2test-policy1596636761038/3b9e0dde-812c-447f-afe7-2dd16658e746",
"terminatingRuleId":"Default_Action",
"terminatingRuleType":"REGULAR",
"action":"ALLOW",
"terminatingRuleMatchDetails":[
],
"httpSourceName":"ALB",
"httpSourceId":"066931718055-app/Webgoat-ALB/d1b4a2c257e57f2f",
"ruleGroupList":[
{
"ruleGroupId":"AWS#AWSManagedRulesAmazonIpReputationList",
"terminatingRule":null,
"nonTerminatingMatchingRules":[
],
"excludedRules":null
}
],
"rateBasedRuleList":[
],
"nonTerminatingMatchingRules":[
],
"httpRequest":{
"clientIp":"{{internet.ip}}",
"country":"{{random.arrayElement(
["US","UK"]
)}}",
"headers":[
{
"name":"Host",
"value":"34.225.62.38"
},
{
"name":"User-Agent",
"value":"{{internet.userAgent}}"
},
{
"name":"Accept",
"value":"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8"
},
{
"name":"Accept-Language",
"value":"en-GB,en;q=0.5"
},
{
"name":"Accept-Encoding",
"value":"gzip, deflate"
},
{
"name":"Upgrade-Insecure-Requests",
"value":"1"
}
],
"uri":"/config/getuser",
"args":"index=0",
"httpVersion":"HTTP/1.1",
"httpMethod":"{{random.arrayElement(
["GET","HEAD"]
)}}",
"requestId":null
}
}

You will receive an email alert via Amazon SNS if the traffic contains any anomalous data. You should also be able to view the anomalies recorded in Amazon OpenSearch Service by selecting the detector and choosing Anomaly results for the detector, as shown in Figure 13.

Figure 13 – Anomaly results

Conclusion

In this post, you learned how you can discover anomalies in AWS WAF logs across parameters like Country and httpMethod defined by the attribute values. You can further expand your anomaly detection use cases with application logs and other AWS Service logs. To learn more about this feature with Amazon OpenSearch Service, we suggest reading the Amazon OpenSearch Service documentation. We look forward to hearing your questions, comments, and feedback.

If you found this post interesting and useful, you may be interested in https://aws.amazon.com/blogs/security/how-to-improve-visibility-into-aws-waf-with-anomaly-detection/ and https://aws.amazon.com/blogs/big-data/analyzing-aws-waf-logs-with-amazon-es-amazon-athena-and-amazon-quicksight/ as further alternative approaches.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security news? Follow us on Twitter.

Noise

Tag Archives: OpenSearch

Three ways to accelerate incident response in the cloud: insights from re:Inforce 2023

Proactively detect, contextualize, and visualize security events

Use automation and machine learning to reduce mean time to response

Innovate and do more with less

Evolve your incident response maturity

Analyze AWS WAF logs using Amazon OpenSearch Service anomaly detection built on Random Cut Forests

Solution

Scenario 1

Scenario 2

Lambda function for one-hot encoding

Deployment steps

Deploy a CloudFormation template

To deploy the CloudFormation template

Enable AWS WAF logs

To enable AWS WAF logs

Create an Index template

To create an Index template

Create an anomaly detector

To create an anomaly detector

Set up alerts

To set up alerts:

Create a monitor

To create a monitor for the alert

Test the solution

Kinesis Data Generator sample template

Conclusion

The collective thoughts of the interwebz

Name:	aws-waf-country
Description:	Detect anomalies on other country values apart from “US” and “UK“

Index:	awswaf*
Timestamp field:	timestamp
Data filter:	Visual editor

Feature Name:	waf-country-other
Feature State:	Enable feature
Find anomalies based on:	Field value
Aggregation method:	sum()
Field:	otherTraffic

Monitor Name:	aws-waf-country-monitor
Method of definition:	Define using anomaly detector
Detector:	aws-waf-country
Monitor schedule:	Every 2 minutes