Tag Archives: AWS Incident Response

How to develop an Amazon Security Lake POC

Post Syndicated from Anna McAbee original https://aws.amazon.com/blogs/security/how-to-develop-an-amazon-security-lake-poc/

You can use Amazon Security Lake to simplify log data collection and retention for Amazon Web Services (AWS) and non-AWS data sources. To make sure that you get the most out of your implementation requires proper planning.

In this post, we will show you how to plan and implement a proof of concept (POC) for Security Lake to help you determine the functionality and value of Security Lake in your environment, so that your team can confidently design and implement in production. We will walk you through the following steps:

  1. Understand the functionality and value of Security Lake
  2. Determine success criteria for the POC
  3. Define your Security Lake configuration
  4. Prepare for deployment
  5. Enable Security Lake
  6. Validate deployment

Understand the functionality of Security Lake

Figure 1 summarizes the main features of Security Lake and the context of how to use it:

Figure 1: Overview of Security Lake functionality

Figure 1: Overview of Security Lake functionality

As shown in the figure, Security Lake ingests and normalizes logs from data sources such as AWS services, AWS Partner sources, and custom sources. Security Lake also manages the lifecycle, orchestration, and subscribers. Subscribers can be AWS services, such as Amazon Athena, or AWS Partner subscribers.

There are four primary functions that Security Lake provides:

  • Centralize visibility to your data from AWS environments, SaaS providers, on-premises, and other cloud data sources — You can collect log sources from AWS services such as AWS CloudTrail management events, Amazon Simple Storage Service (Amazon S3) data events, AWS Lambda data events, Amazon Route 53 Resolver logs, VPC Flow Logs, and AWS Security Hub findings, in addition to log sources from on-premises, other cloud services, SaaS applications, and custom sources. Security Lake automatically aggregates the security data across AWS Regions and accounts.
  • Normalize your security data to an open standard — Security Lake normalizes log sources in a common schema, the Open Security Schema Framework (OCSF), and stores them in compressed parquet files.
  • Use your preferred analytics tools to analyze your security data — You can use AWS tools, such as Athena and Amazon OpenSearch Service, or you can utilize external security tools to analyze the data in Security Lake.
  • Optimize and manage your security data for more efficient storage and query — Security Lake manages the lifecycle of your data with customizable retention settings with automated storage tiering to help provide more cost-effective storage.

Determine success criteria

By establishing success criteria, you can assess whether Security Lake has helped address the challenges that you are facing. Some example success criteria include:

  • I need to centrally set up and store AWS logs across my organization in AWS Organizations for multiple log sources.
  • I need to more efficiently collect VPC Flow Logs in my organization and analyze them in my security information and event management (SIEM) solution.
  • I want to use OpenSearch Service to replace my on-premises SIEM.
  • I want to collect AWS log sources and custom sources for machine learning with Amazon Sagemaker.
  • I need to establish a dashboard in Amazon QuickSight to visualize my Security Hub findings and a custom log source data.

Review your success criteria to make sure that your goals are realistic given your timeframe and potential constraints that are specific to your organization. For example, do you have full control over the creation of AWS services that are deployed in an organization? Do you have resources that can dedicate time to implement and test? Is this time convenient for relevant stakeholders to evaluate the service?

The timeframe of your POC will depend on your answers to these questions.

Important: Security Lake has a 15-day free trial per account that you use from the time that you enable Security Lake. This is the best way to estimate the costs for each Region throughout the trial, which is an important consideration when you configure your POC.

Define your Security Lake configuration

After you establish your success criteria, you should define your desired Security Lake configuration. Some important decisions include the following:

  • Determine AWS log sources — Decide which AWS log sources to collect. For information about the available options, see Collecting data from AWS services.
  • Determine third-party log sources — Decide if you want to include non-AWS service logs as sources in your POC. For more information about your options, see Third-party integrations with Security Lake; the integrations listed as “Source” can send logs to Security Lake.

    Note: You can add third-party integrations after the POC or in a second phase of the POC. Pre-planning will be required to make sure that you can get these set up during the 15-day free trial. Third-party integrations usually take more time to set up than AWS service logs.

  • Select a delegated administrator – Identify which account will serve as the delegated administrator. Make sure that you have the appropriate permissions from the organization admin account to identify and enable the account that will be your Security Lake delegated administrator. This account will be the location for the S3 buckets with your security data and where you centrally configure Security Lake. The AWS Security Reference Architecture (AWS SRA) recommends that you use the AWS logging account for this purpose. In addition, make sure to review Important considerations for delegated Security Lake administrators.
  • Select accounts in scope — Define which accounts to collect data from. To get the most realistic estimate of the cost of Security Lake, enable all accounts across your organization during the free trial.
  • Determine analytics tool — Determine if you want to use native AWS analytics tools, such as Athena and OpenSearch Service, or an existing SIEM, where the SIEM is a subscriber to Security Lake.
  • Define log retention and Regions — Define your log retention requirements and Regional restrictions or considerations.

Prepare for deployment

After you determine your success criteria and your Security Lake configuration, you should have an idea of your stakeholders, desired state, and timeframe. Now you need to prepare for deployment. In this step, you should complete as much as possible before you deploy Security Lake. The following are some steps to take:

  • Create a project plan and timeline so that everyone involved understands what success look like and what the scope and timeline is.
  • Define the relevant stakeholders and consumers of the Security Lake data. Some common stakeholders include security operations center (SOC) analysts, incident responders, security engineers, cloud engineers, finance, and others.
  • Define who is responsible, accountable, consulted, and informed during the deployment. Make sure that team members understand their roles.
  • Make sure that you have access in your management account to delegate and administrator. For further details, see IAM permissions required to designate the delegated administrator.
  • Consider other technical prerequisites that you need to accomplish. For example, if you need roles in addition to what Security Lake creates for custom extract, transform, and load (ETL) pipelines for custom sources, can you work with the team in charge of that process before the POC?

Enable Security Lake

The next step is to enable Security Lake in your environment and configure your sources and subscribers.

  1. Deploy Security Lake across the Regions, accounts, and AWS log sources that you previously defined.
  2. Configure custom sources that are in scope for your POC.
  3. Configure analytics tools in scope for your POC.

Validate deployment

The final step is to confirm that you have configured Security Lake and additional components, validate that everything is working as intended, and evaluate the solution against your success criteria.

  • Validate log collection — Verify that you are collecting the log sources that you configured. To do this, check the S3 buckets in the delegated administrator account for the logs.
  • Validate analytics tool — Verify that you can analyze the log sources in your analytics tool of choice. If you don’t want to configure additional analytics tooling, you can use Athena, which is configured when you set up Security Lake. For sample Athena queries, see Amazon Security Lake Example Queries on GitHub and Security Lake queries in the documentation.
  • Obtain a cost estimate — In the Security Lake console, you can review a usage page to verify that the cost of Security Lake in your environment aligns with your expectations and budgets.
  • Assess success criteria — Determine if you achieved the success criteria that you defined at the beginning of the project.

Next steps

Next steps will largely depend on whether you decide to move forward with Security Lake.

  • Determine if you have the approval and budget to use Security Lake.
  • Expand to other data sources that can help you provide more security outcomes for your business.
  • Configure S3 lifecycle policies to efficiently store logs long term based on your requirements.
  • Let other teams know that they can subscribe to Security Lake to use the log data for their own purposes. For example, a development team that gets access to CloudTrail through Security Lake can analyze the logs to understand the permissions needed for an application.

Conclusion

In this blog post, we showed you how to plan and implement a Security Lake POC. You learned how to do so through phases, including defining success criteria, configuring Security Lake, and validating that Security Lake meets your business needs.

As a customer, this guide will help you run a successful proof of value (POV) with Security Lake. It guides you in assessing the value and factors to consider when deciding to implement the current features.

Further resources

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Anna McAbee

Anna McAbee

Anna is a Security Specialist Solutions Architect focused on threat detection and incident response at AWS. Before AWS, she worked as an AWS customer in financial services on both the offensive and defensive sides of security. Outside of work, Anna enjoys cheering on the Florida Gators football team, wine tasting, and traveling the world.

Author

Marshall Jones

Marshall is a Worldwide Security Specialist Solutions Architect at AWS. His background is in AWS consulting and security architecture, focused on a variety of security domains including edge, threat detection, and compliance. Today, he is focused on helping enterprise AWS customers adopt and operationalize AWS security services to increase security effectiveness and reduce risk.

Marc Luescher

Marc Luescher

Marc is a Senior Solutions Architect helping enterprise customers be successful, focusing strongly on threat detection, incident response, and data protection. His background is in networking, security, and observability. Previously, he worked in technical architecture and security hands-on positions within the healthcare sector as an AWS customer. Outside of work, Marc enjoys his 3 dogs, 4 cats, and 20+ chickens.

Two real-life examples of why limiting permissions works: Lessons from AWS CIRT

Post Syndicated from Richard Billington original https://aws.amazon.com/blogs/security/two-real-life-examples-of-why-limiting-permissions-works-lessons-from-aws-cirt/

Welcome to another blog post from the AWS Customer Incident Response Team (CIRT)! For this post, we’re looking at two events that the team was involved in from the viewpoint of a regularly discussed but sometimes misunderstood subject, least privilege. Specifically, we consider the idea that the benefit of reducing permissions in real-life use cases does not always require using the absolute minimum set of privileges. Instead, you need to weigh the cost and effort of creating and maintaining privileges against the risk reduction that is achieved, to make sure that your permissions are appropriate for your needs.

To quote VP and Distinguished Engineer at Amazon Security, Eric Brandwine, “Least privilege equals maximum effort.” This is the idea that creating and maintaining the smallest possible set of privileges needed to perform a given task will require the largest amount of effort, especially as customer needs and service features change over time. However, the correlation between effort and permission reduction is not linear. So, the question you should be asking is: How do you balance the effort of privilege reduction with the benefits it provides?

Unfortunately, this is not an easy question to answer. You need to consider the likelihood of an unwanted issue happening, the impact if that issue did happen, and the cost and effort to prevent (or detect and recover from) that issue. You also need to factor requirements such as your business goals and regulatory requirements into your decision process. Of course, you won’t need to consider just one potential issue, but many. Often it can be useful to start with a rough set of permissions and refine them down as you develop your knowledge of what security level is required. You can also use service control policies (SCPs) to provide a set of permission guardrails if you’re using AWS Organizations. In this post, we tell two real-world stories where limiting AWS Identity and Access Management (IAM) permissions worked by limiting the impact of a security event, but where the permission set did not involve maximum effort.

Story 1: On the hunt for credentials

In this AWS CIRT story, we see how a threat actor was unable to achieve their goal because the access they obtained — a database administrator’s — did not have the IAM permissions they were after.

Background and AWS CIRT engagement

A customer came to us after they discovered unauthorized activity in their on-premises systems and in some of their AWS accounts. They had incident response capability and were looking for an additional set of hands with AWS knowledge to help them with their investigation. This helped to free up the customer’s staff to focus on the on-premises analysis.

Before our engagement, the customer had already performed initial containment activities. This included rotating credentials, revoking temporary credentials, and isolating impacted systems. They also had a good idea of which federated user accounts had been accessed by the threat actor.

The key part of every AWS CIRT engagement is the customer’s ask. Everything our team does falls on the customer side of the AWS Shared Responsibility Model, so we want to make sure that we are aligned to the customer’s desired outcome. The ask was clear—review the potential unauthorized federated users’ access, and investigate the unwanted AWS actions that were taken by those users during the known timeframe. To get a better idea of what was “unwanted,” we talked to the customer to understand at a high level what a typical day would entail for these users, to get some context around what sort of actions would be expected. The users were primarily focused on working with Amazon Relational Database Service (RDS).

Analysis and findings

For this part of the story, we’ll focus on a single federated user whose apparent actions we investigated, because the other federated users had not been impersonated by the threat actor in a meaningful way. We knew the approximate start and end dates to focus on and had discovered that the threat actor had performed a number of unwanted actions.

After reviewing the actions, it was clear that the threat actor had performed a console sign-in on three separate occasions within a 2-hour window. Each time, the threat actor attempted to perform a subset of the following actions:

Note: This list includes only the actions that are displayed as readOnly = false in AWS CloudTrail, because these actions are often (but not always) the more impactful ones, with the potential to change the AWS environment.

This is the point where the benefit of permission restriction became clear. As soon as this list was compiled, we noticed that two fields were present for all of the actions listed:

"errorCode": "Client.UnauthorizedOperation",
"errorMessage": "You are not authorized to perform this operation. [rest of message]"

As this reveals, every single non-readOnly action that was attempted by the threat actor was denied because the federated user account did not have the required IAM permissions.

Customer communication and result

After we confirmed the findings, we had a call with the customer to discuss the results. As you can imagine, they were happy that the results showed no material impact to their data, and said no further investigation or actions were required at that time.

What were the IAM permissions the federated user had, which prevented the set of actions the threat actor attempted?

The answer did not actually involve the absolute minimal set of permissions required by the user to do their job. It’s simply that the federated user had a role that didn’t have an Allow statement for the IAM actions the threat actor tried — their job did not require them. Without an explicit Allow statement, the IAM actions attempted were denied because IAM policies are Deny by default. In this instance, simply not having the desired IAM permissions meant that the threat actor wasn’t able to achieve their goal, and stopped using the access. We’ll never know what their goal actually was, but trying to create access keys, passwords, and update policies means that a fair guess would be that they were attempting to create another way to access that AWS account.

Story 2: More instances for crypto mining

In this AWS CIRT story, we see how a threat actor’s inability to create additional Amazon Elastic Compute Cloud (Amazon EC2) instances resulted in the threat actor leaving without achieving their goal.

Background and AWS CIRT engagement

Our second story involves a customer who had an AWS account they were using to test some new third-party software that uses Amazon Elastic Container Service (Amazon ECS). This customer had Amazon GuardDuty turned on, and found that they were getting GuardDuty alerts for CryptoCurrency:EC2/BitcoinTool related findings.

Because this account was new and currently only used for testing their software, the customer saw that the detection was related to the Amazon ECS cluster and decided to delete all the resources in the account and rebuild. Not too long after doing this, they received a similar GuardDuty alert for the new Amazon ECS cluster they had set up. The second finding resulted in the customer’s security team and AWS being brought in to try to understand what was causing this. The customer’s security team was focused on reviewing the tasks that were being run on the cluster, while AWS CIRT reviewed the AWS account actions and provided insight about the GuardDuty finding and what could have caused it.

Analysis and findings

Working with the customer, it wasn’t long before we discovered that the 3rd party Amazon ECS task definition that the customer was using, was unintentionally exposing a web interface to the internet. That interface allowed unauthenticated users to run commands on the system. This explained why the same alert was also received shortly after the new install had been completed.

This is where the story takes a turn for the better. The AWS CIRT analysis of the AWS CloudTrail logs found that there were a number of attempts to use the credentials of the Task IAM role that was associated with the Amazon ECS task. The majority of actions were attempting to launch multiple Amazon EC2 instances via RunInstances calls. Every one of these actions, along with the other actions attempted, failed with either a Client.UnauthorizedOperation or an AccessDenied error message.

Next, we worked with the customer to understand the permissions provided by the Task IAM role. Once again, the permissions could have been limited more tightly. However, this time the goal of the threat actor — running a number of Amazon EC2 instances (most likely for surreptitious crypto mining) — did not align with the policy given to the role:

{
    "Version": "2012-10-17",
    "Statement": [
        {
          "Effect": "Allow",
          "Action": "s3:*",
          "Resource": "*"
        }
    ]
}

AWS CIRT recommended creating policies to restrict the allowed actions further, providing specific resources where possible, and potentially also adding in some conditions to limit other aspects of the access (such as the two Condition keys launched recently to limit where Amazon EC2 instance credentials can be used from). However, simply having the policy limit access to Amazon Simple Storage Service (Amazon S3) meant that the threat actor decided to leave with just the one Amazon ECS task running crypto mining rather than a larger number of Amazon EC2 instances.

Customer communication and result

After reporting these findings to the customer, there were two clear next steps: First, remove the now unwanted and untrusted Amazon ECS resource from their AWS account. Second, review and re-architect the Amazon ECS task so that access to the web interface was only provided to appropriate users. As part of that re-architecting, an Amazon S3 policy similar to “Allows read and write access to objects in an S3 bucket” was recommended. This separates Amazon S3 bucket actions from Amazon S3 object actions. When applications have a need to read and write objects in Amazon S3, they don’t normally have a need to create or delete entire buckets (or versioning on those buckets).

Some tools to help

We’ve just looked at how limiting privileges helped during two different security events. Now, let’s consider what can help you decide how to reduce your IAM permissions to an appropriate level. There are a number of resources that talk about different approaches:

The first approach is to use Access Analyzer to help generate IAM policies based on access activity from log data. This can then be refined further with the addition of Condition elements as desired. We already have a couple of blog posts about that exact topic:

The second approach is similar, and that is to reduce policy scope based on the last-accessed information:

The third approach is a manual method of creating and refining policies to reduce the amount of work required. For this, you can begin with an appropriate AWS managed IAM policy or an AWS provided example policy as a starting point. Following this, you can add or remove Actions, Resources, and Conditions — using wildcards as desired — to balance your effort and permission reduction.

An example of balancing effort and permission is in the IAM tutorial Create and attach your first customer managed policy. In it, the authors create a policy that uses iam:Get* and iam:List:* in the Actions section. Although not all iam:Get* and iam:List:* Actions may be required, this is a good way to group similar Actions together while minimizing Actions that allow unwanted access — for example, iam:Create* or iam:Delete*. Another example of this balancing was mentioned earlier relating to Amazon S3, allowing access to create, delete, and read objects, but not to change the configuration of the bucket those objects are in.

In addition to limiting permissions, we also recommend that you set up appropriate detection and response capability. This will enable you to know when an issue has occurred and provide the tools to contain and recover from the issue. Further details about performing incident response in an AWS account can be found in the AWS Security Incident Response Guide.

There are also two services that were used to help in the stories we presented here — Amazon GuardDuty and AWS CloudTrail. GuardDuty is a threat detection service that continuously monitors your AWS accounts and workloads for malicious activity. It’s a great way to monitor for unwanted activity within your AWS accounts. CloudTrail records account activity across your AWS infrastructure and provides the logs that were used for the analysis that AWS CIRT performed for both these stories. Making sure that both of these are set up correctly is a great first step towards improving your threat detection and incident response capability in AWS.

Conclusion

In this post, we looked at two examples where limiting privilege provided positive results during a security event. In the second case, the policy used should probably have restricted permissions further, but even as it stood, it was an effective preventative control in stopping the unauthorized user from achieving their assumed goal.

Hopefully these stories will provide new insight into the way your organization thinks about setting permissions, while taking into account the effort of creating the permissions. These stories are a good example of how starting a journey towards least privilege can help stop unauthorized users. Neither of the scenarios had policies that were least privilege, but the policies were restrictive enough that the unauthorized users were prevented from achieving their goals this time, resulting in minimal impact to the customers. However in both cases AWS CIRT recommended further reducing the scope of the IAM policies being used.

Finally, we provided a few ways to go about reducing permissions—first, by using tools to assist with policy creation, and second, by editing existing policies so they better fit your specific needs. You can get started by checking your existing policies against what Access Analyzer would recommend, by looking for and removing overly permissive wildcard characters (*) in some of your existing IAM policies, or by implementing and refining your SCPs.

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Richard Billington

Richard Billington

Richard is the Incident Response Watch Lead for the Asia-Pacific region of the AWS Customer Incident Response Team (a team that supports AWS Customers during active security events). He also helps customers prepare for security events using event simulations. Outside of work, he loves wildlife photography and Dr Pepper (which is hard to find in meaningful quantities within Australia).

Logging strategies for security incident response

Post Syndicated from Anna McAbee original https://aws.amazon.com/blogs/security/logging-strategies-for-security-incident-response/

Effective security incident response depends on adequate logging, as described in the AWS Security Incident Response Guide. If you have the proper logs and the ability to query them, you can respond more rapidly and effectively to security events. If a security event occurs, you can use various log sources to validate what occurred and understand the scope. Then, you can use the results of your analysis to take remediation actions. To learn more about logging best practices, see Configure service and application logging and Analyze logs, findings, and metrics centrally.

In this blog post, we will show you how to achieve an effective strategy for logging for security incident response. We will share logging options across the typical cloud application stack, log analysis options, and sample queries. AWS offers managed services, such as Amazon GuardDuty for threat detection and Amazon Detective for incident analysis. If you want to collect additional logs or perform custom analysis, then you should consider the options described in this blog post.

Selection of logs

To select the appropriate logs for security incident response, you should start with the common cloud application stack, which consists of the components and layers of your application deployed on AWS. For each component, we will describe the logging sources that you have. For each log source, we will describe why you should log it for security incident response, how to enable the logs, and what your log storage options are.

To select the logs for security incident response, first consider the following questions:

  • What are your compliance and regulatory requirements for logging?

    Note: Make sure that you comply with the log retention requirements of compliance standards relevant to your organization, as well as your organization’s incident response strategy.

  • What AWS services do you commonly use?
  • What AWS services have access to or contain sensitive data?
  • What threats are most relevant to you?

    Note: Performing a threat model of your cloud architectures can help you answer this question. For more information, see How to approach threat modelling.

Considering these questions can help you develop requirements for logging that will guide your selection of the following log sources.

AWS account logs

An AWS account is the first, fundamental component of an application deployed on AWS. The account is a container for your AWS resources. You create and manage your AWS resources in this account, and the account provides administrative capabilities for access and billing.

AWS CloudTrail

Within an account, each action performed is an API call. From a console sign-in to the deployment of each resource in an AWS CloudFormation stack, events are generated to provide transparency on what has occurred in the account. With AWS CloudTrail, you can log, continuously monitor, and retain account activity related to actions across supported AWS services. CloudTrail provides the event history of your account activity, including actions taken through the AWS Management Console, AWS SDKs, command line tools, and other AWS services. CloudTrail logs API calls as three types of events:

  • Management events (also known as control plane operations) show management operations that are performed on resources in your account. This includes actions like creating an Amazon Simple Storage Service (Amazon S3) bucket and setting up logging.
  • Data events (also known as data plane operations) show the resource operations performed on or within resources in your account. These operations are often high-volume activities, such as Amazon S3 object-level API activity (for example, GetObject, DeleteObject, and PutObject API operations) and AWS Lambda function invocation activity.
  • Insights events capture unusual API call rate or error rate activity in your account. You must enable these events on a trail in order to capture them, and they are logged to a different folder prefix in the destination S3 bucket for your trail. Insights events provide you with information such as the type of event, the incident time period, the associated API, the error code, and statistics to help you understand and respond effectively to unusual activity.

For security investigations, CloudTrail provides context on the creation, modification, and deletion of AWS resources. Therefore, CloudTrail is one of your most important log sources for security incident response in an AWS environment. You have three primary ways to set up CloudTrail:

  • CloudTrail Event history — CloudTrail is enabled by default with 90-day retention of management events that you can retrieve through the CloudTrail Event history facility using the console, AWS Command line Interface (AWS CLI), or AWS SDK. You don’t need to take any action to get started using the Event history feature.
  • CloudTrail trail — For longer retention and visibility of data events, you need to create a CloudTrail trail and associate it with an S3 bucket and optionally with an Amazon CloudWatch log group. If you use AWS Organizations, you can create an organization trail that will log events for each account in the organization. By default, trails are multi-Region, so you don’t need to enable CloudTrail logs in each AWS Region.
  • AWS CloudTrail Lake — You can create a CloudTrail lake, which retains CloudTrail logs for up to seven years and provides a SQL-based querying facility. You don’t need to have a trail configured in your account to use CloudTrail Lake.
  • Amazon Security Lake — You can use Security Lake to ingest CloudTrail events, which include management and data events. You can further analyze these events with Amazon QuickSight or another other third-party security information and event management (SIEM) tool.

AWS Config

Creating and modifying resources is an integral part of your account use. Tracking resource configuration changes made by calling the AWS API helps you review changes throughout the resource lifecycle. AWS Config provides a detailed view of the configuration of AWS resources in your account, examines the resource configurations periodically, and tracks configuration changes that were not initiated by the API. This includes how the resources are related to one another and how they were configured in the past so that you can see how configurations and relationships change over time.

You should enable AWS Config in each Region where you have resources deployed, and you should configure an S3 bucket to receive configuration history and configuration snapshot files, which contain details on the resources that AWS Config records. You can then review configuration compliance and analyze activities performed before, during, and after an event using the configuration history in S3. You should centralize AWS Config resource tracking across multiple accounts in the same organization by setting up an aggregator. You can use AWS Control Tower to automate the setup.

During a security investigation, you might want to understand how a resource configuration has changed over time. For example, you might want to investigate the changes to an S3 bucket policy before and after a security event that involves an S3 bucket. AWS Config provides a configuration history for resources that can help you track activities performed during a security event.

Operating system and application logs

To record interactions with applications, you must capture operating system (OS) and application logs, especially custom logs generated by the application development framework. OS and local application logs are relevant for security events that involve an Amazon Elastic Compute Cloud (Amazon EC2) instance. These instances could be standalone, in an auto scaling group behind a load balancer, or compute workloads for Amazon Elastic Container Service (Amazon ECS) or an Amazon Elastic Kubernetes Service (Amazon EKS) cluster. OS logs track privileged use, processes, login events, access to directory services, and file system activity on a server. To analyze a potential compromise to an EC2 instance, you will want to review the security event logs for Windows OS and the system logs for Linux-based OS.

With the unified CloudWatch agent, you can collect metrics and logs from EC2 instances and on-premises servers. The CloudWatch agent aggregates log data into CloudWatch logs, which can then be exported to Amazon S3 for long-term retention and analyzed with a SIEM tool of your choice or Amazon Athena, as shown in Figure 1.

Figure 1: Aggregate OS and application logs using CloudWatch Logs

Figure 1: Aggregate OS and application logs using CloudWatch Logs

Database logs

With SQL databases, you can log transactions to help track modifications to the databases, such as additions or deletions. After an engine or system failure, you will need transaction logs to restore a database to a consistent state. Transaction logs are designed to be secure, and they require additional processing to access valuable information. It’s important that you understand data interactions during a security investigation, especially if your databases hold personally identifiable information (PII), financial and payments information, or other information subject to regulatory controls.

When you use Amazon Relational Database Service (Amazon RDS), you can publish database logs to Amazon CloudWatch Logs. For NoSQL databases, tracking atomic interactions is useful. You can find logs for managed NoSQL databases like Amazon DynamoDB in CloudTrail. DynamoDB integrates with CloudTrail, providing a record of actions taken by a user, role, or service. These events are classified as data events in CloudTrail.

Network logs

The goal of logging network activity is to gain insight into the communications that traverse your network. You might need this data for a variety of reasons, such as network troubleshooting or for use in a forensic investigation of suspected malware activity within your network.

In the AWS Cloud, you can log network activity by creating a proxy that logs network traffic or by using Traffic Mirroring to send a copy of network traffic to a logging server. You can adopt cloud-native approaches to capture this type of data using Amazon Route 53 DNS query logs and Amazon VPC Flow Logs.

There are also a variety of third-party networking solutions available like Palo Alto Networks and Fortinet, so you can continue to use the network logging mechanisms that you might have used in an on-premises environment.

Route 53 DNS query logs

You can configure Amazon Route 53 to log Domain Name System (DNS) queries. These logs are categorized into two groups:

  • Public DNS query logging
  • Resolver query logging

Logging public DNS queries against domains that you have hosted in Route 53 provides query information, such as the domain or subdomain requested, date and time stamp of the request, DNS record type, Route 53 edge location that responded, and response code.

The Amazon Route 53 Resolver comes with Amazon Virtual Private Cloud (Amazon VPC) by default. Capturing Resolver query logs provides the same information as public queries, as well as additional information such as the Instance ID of the resource that the query originated from. You can also capture Resolver query logs against different types of queries.

VPC Flow Logs

You can configure VPC Flow Logs for a VPC in your account to capture traffic that enters and moves around your VPC network, without the addition of instances or products. From these logs, you can review information, such as source and destination IP, ports, timestamps, protocol, account ID, and whether the traffic was accepted or rejected. For a complete list of the fields available for flow log records, see Available fields. You can create a flow log for a VPC, a subnet, or a network interface. If you create a flow log for a subnet or VPC, IP traffic going to and from each network interface in that subnet or VPC will be logged. For more details on VPC Flow Logs, see Logging IP traffic using VPC Flow Logs.

You can forward flow logs to Amazon CloudWatch Logs to create CloudWatch alarms based on metric filters. You can also forward flow logs to an S3 bucket for long-term retention and further analysis. Figure 2 demonstrates these configurations.

Figure 2: Sending VPC Flow logs to CloudWatch Logs and S3

Figure 2: Sending VPC Flow logs to CloudWatch Logs and S3

Access logs

To identify access patterns for accessible endpoints, especially public endpoints, you should use access logs. Access logs capture detailed information about requests sent to your load balancer. Each log contains information such as the time the request was received, the client’s IP address, latencies, request paths, and server responses. With services built in layers behind a load balancer, unless you track the X-Forwarded-For request header, the requestor’s context is lost. Access logs help bridge that gap during investigations and analysis.

Amazon S3 server access logs

Access logs are critical to track object level access when using S3 buckets to store confidential or sensitive data. You can also turn on CloudTrail to capture S3 data events. You can store access logs in S3 buckets for long-term storage for compliance purposes and to run analyses during and after an event.

Load balancing logs

Elastic Load Balancing provides access logs that capture detailed information about requests sent to load balancers. Each log contains information such as the time the request was received, the client’s IP address, latencies, request paths, and server responses. You can use this log to analyze traffic patterns and to troubleshoot issues.

Access logs is an optional feature of Elastic Load Balancing that is turned off by default. To enable access logs for load balancers, see Access logs for your Application Load Balancer.

If you implement your own reverse proxy for load balancing needs, make sure that you capture the reverse proxy access logs. You can use the unified CloudWatch agent to forward the logs to CloudWatch. As with OS logs, you can export CloudWatch logs to an S3 bucket for long-term retention and analysis.

If you use an Amazon CloudFront distribution as the public endpoint for end users with load balancers as the custom origin, then load balancing access logs will represent the CloudFront distribution as the requestor, rather than the actual end user. If this information doesn’t add value to your incident handling process, then you can use CloudFront access logs as the log source that provides end user request details.

CloudFront access logs

You should enable standard logs, also known as access logs, when using CloudFront. Specify an S3 bucket where you want CloudFront to save the files.

CloudFront access logs are delivered on a best-effort basis. For information about requests made to a distribution in real time, use real-time logs that are delivered within seconds of receiving the requests. You should use real-time logs to monitor, analyze, and take action based on content delivery performance. For more details on the fields available from these logs, see the CloudFront standard log file format.

AWS WAF logs

When associated with a supported resource like a CloudFront distribution, Amazon API Gateway REST API, Application Load Balancer, AWS AppSync GraphQL API, Amazon Cognito user pool, or AWS App Runner, AWS WAF can help you monitor HTTP and HTTPS requests that are forwarded to the resource. You should configure web access control lists (ACLs) to gain fine-grained control over the requests, and enable logging for such ACLs to get detailed information about traffic that is analyzed by AWS WAF. Log information includes time of the request being received by AWS WAF from the AWS resource, details about the request, and the AWS WAF rules that the request matched. You can use this log information to monitor access patterns of public endpoints and configure rules to inspect requests in detail. For more information about AWS WAF logging, see Logging web ACL traffic.

Serverless logs

Serverless computing has become increasingly popular in the cloud-computing space. It provides on-demand compute power in a relatively short burst, meaning that cloud-based instances don’t need to be provisioned and kept around, idle, when there are no tasks to be completed. Although more and more compute tasks are being moved to serverless solutions, the need to log has not changed, but how the logs are generated has. In a serverless environment, security investigations not only benefit from logs that demonstrate the interactions and changes made by the code deployed, but that also document changes to the deployed code itself and access permissions of the Lambda execution role that is granting privileged access.

AWS Lambda

The logging of Lambda functions involves two components: how the function itself is operating, and what is happening inside the function (what your code is actually doing).

The logging of a Lambda function itself occurs through data events captured by CloudTrail. As noted earlier in this post, you must configure data events on a trail created in CloudTrail. During configuration, you will need to specify the function from which logs will be captured by your trail, and the destination S3 bucket where they will be stored. These logs contain details on the invocation of the function and help identify the IAM principals that called the Invoke API for Lambda.

AWS Lambda automatically monitors Lambda functions on your behalf and sends logs to CloudWatch. Your Lambda function comes with a CloudWatch Logs log group and a log stream for each instance of your function. The Lambda runtime environment sends details about each invocation to the log stream, and relays logs and other output from your function’s code. For more details on how to monitor Lambda functions, see Accessing Amazon CloudWatch logs for AWS Lambda.

Log analysis

For incident response, you need to be able to analyze and query your logs to validate what occurred and to understand the scope.

To begin, you can aggregate logs from various sources in S3 buckets for long-term storage, and you can integrate that data with query tools for further investigation. Logs can be exported and either parsed through directly, or ingested by another tool to help with the analysis. The following are some options that you can use to query these logs:

  • Amazon Athena — You can directly query CloudTrail events stored in S3 with Athena using SQL commands, specifying the LOCATION of the log files. You would generally use this approach if you have advanced queries to run, and you don’t have a SIEM. To set up Athena to query logs, you can use this open-source solution from AWS.
  • Amazon OpenSearch Service — OpenSearch is a distributed search and log analytics suite. Because it’s open source, it can ingest logs from more than just AWS log sources. To set this up, you can use this open-source SIEM solution from AWS.
  • CloudTrail Event History — Either from the console, or programmatically, you can query CloudTrail management events from the last 90-day period. This is ideal for when you have simple queries to make within the last 90 days, and you don’t need stored logs or more complex queries.
  • AWS CloudTrail Lake — Either from the console, or programmatically, you can query stored events in your configured CloudTrail Lake from the time of its configuration, up until the maximum storage duration of 2,557 days (7 years) from the time that you make your query. This approach allows for SQL-based queries, and it is ideal for when you need to make more complex queries against events, but don’t require the additional features of a SIEM solution.
  • Parse through raw JSON using CLI — This is achieved programmatically and parsed through terminal commands. It’s more a legacy method of parsing through logs. You might choose to use this approach for analysis if another service or solution isn’t feasible (for example, if you can’t use the service due to your corporate security policy).
  • Third-party SIEM — A third-party SIEM might be ideal if you already have a SIEM solution on AWS or elsewhere, and you don’t need a duplicated solution elsewhere. Typically, SIEM solutions will import logs from an S3 bucket and process and index events for analysis. To learn more about SIEM options, see the SIEM solutions in the AWS Marketplace, or the AWS Security Competency Partners for a partner local to you with threat detection and incident response (TDIR) expertise.

Sample queries

In this section, we provide samples of SQL queries. Both Athena and CloudTrail Lake accept SQL queries, but the following samples have been tested for use in Athena only. This is because some samples are for VPC Flow Logs, which you can’t query from CloudTrail Lake. To query CloudTrail logs in Athena, you must first create a table definition that points to the location of your logs stored in S3. You can do this from the CloudTrail Events console by using a hyperlinked suggestion, or from the Athena console directly. Alternatively, for Athena, you can use the AWS Security Analytics Bootstrap.

For each of these queries, you might need to modify some of the fields, such as the time frame that you are investigating, the IAM entity involved, and the account and Region in scope. For example, you might want to modify the time frame based on the current time and when you believe the security event began. This often involves expanding the time frame after running additional queries and learning more about the scope and timeline.

By using partitions for tables, you can restrict the amount of data scanned by each Athena query, helping to improve performance and reduce cost. For example, you can partition your CloudTrail Athena table manually or by using partition projection. You can include the partition column (for example, the timestamp) in your queries to limit the amount of data scanned.

Unauthorized attempts

When a security event occurs, you might want to review API calls that were attempted but failed due to the IAM principal not having access to perform the action on that resource. To discover this activity, run the following query (be sure to modify the time window first):

SELECT *
FROM cloudtrail
WHERE errorcode IN ('Client.UnauthorizedOperation','Client.InvalidPermission.NotFound','Client.OperationNotPermitted','AccessDenied')
AND useridentity.arn LIKE '%iam%'
AND eventtime >= '2023-01-01T00:00:00Z'
AND eventtime < '2023-03-01T00:00:00Z'
ORDER BY eventtime desc

This sample query can help you identify whether certain IAM principals have a significant amount of unauthorized API calls, which can indicate that an IAM principal is compromised.

Rejected TCP connections

During a security event, the unauthorized user that is interacting with the resources in your account is probably trying to establish persistence through the network layer. To get a list of rejected TCP connections and extract from it the day that these events occurred, run the following query:

SELECT day_of_week(date) AS
day,date,interface_id,srcaddr,action,protocol
FROM vpc_flow_logs
WHERE action = 'REJECT' AND protocol = 6
LIMIT 100;

Connections over older TLS versions

You might want to see how many calls to AWS APIs were made using older versions of the TLS protocol, as part of a forensic follow-up or a discovery job after a risk analysis. You can get this data by querying CloudTrail logs.

SELECT eventSource
COUNT(*) AS numOutdatedTlsCalls FROM cloudtrail WHERE tlsDetails.tlsVersion IN ('TLSv1', 'TLSv1.1') AND eventTime > '2023-01-01 00:00:00' GROUP BY eventSource ORDER BY numOutdatedTlsCalls DESC

Filter connections from an IP

With an IP address that you’d like to investigate, as a part of your forensic analysis, you might want to see the connections made to resources in a VPC. You can obtain this information by querying VPC Flow Logs. As with the server access logs, if you’re using Athena, you will first need to create a new table.

SELECT day_of_week(date) AS 
day, date, srcaddr, dstaddr, action, protocol
FROM vpc_flow_logs
WHERE day >= '2023/01/01' AND day < '2023/03/01' AND srcaddr LIKE '172.50.%'
ORDER BY day DESC
LIMIT 100

Investigate user actions

If you have identified a user who has been compromised, or that you suspect has been compromised, you might want to know the API calls that they made over a specific time period. Understanding the activity of a user can help you understand the scope of impact during an incident, as well as the reach of user permissions when you design your access management strategy.

SELECT eventID, eventName, eventSource, eventTime, userIdentity.arn
AS user
FROM cloudtrail
WHERE userIdentity.arn = '%<username>%' AND eventTime > '2022-12-05 00:00:00' AND eventTime < '2022-12-08 00:00:00'

Conclusion

It is essential that you capture logs from various layers within your application architecture, so that you can effectively respond to a security event at various layers of the application stack. If a security event occurs, logs can help provide a clear picture of what happened and the scope of the affected resources. This post helps you build a logging strategy for security incident response by understanding what logs you want to analyze, where you want to store those logs, and how you will analyze them.

Further resources

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the AWS Security, Identity, & Compliance re:Post or contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Author

Anna McAbee

Anna is a Security Specialist Solutions Architect focused on threat detection and incident response at AWS. Before AWS, she worked as an AWS customer in financial services on both the offensive and defensive sides of security. Outside of work, Anna enjoys cheering on the Florida Gators football team, wine tasting, and traveling the world.

Pratima Singh

Pratima Singh

Pratima is a Security Specialist Solutions Architect with Amazon Web Services based out of Sydney, Australia. She is a security enthusiast who enjoys helping customers find innovative solutions to complex business challenges. Outside of work, Pratima enjoys going on long drives and spending time with her family at the beach.

Ciarán Carragher

Ciarán Carragher

Ciarán is a Security Specialist Solutions Architect, based out of Dublin, Ireland. Before becoming a Security SSA, Ciarán was an AWS Cloud Support Engineer for AWS security services. Outside of work, Ciarán is an avid computer gamer as well as a serving army reservist in the Irish Defence Forces.

Updated whitepaper available: AWS Security Incident Response Guide

Post Syndicated from Anna McAbee original https://aws.amazon.com/blogs/security/updated-whitepaper-available-aws-security-incident-response-guide/

The AWS Security Incident Response Guide focuses on the fundamentals of responding to security incidents within a customer’s Amazon Web Services (AWS) Cloud environment. You can use the guide to help build and iterate on your AWS security incident response program.

Recently, we updated the AWS Security Incident Response Guide to more clearly explain what you should do before, during, and after a security event. In this post, we will highlight some of the changes and discuss how to use the new guide.

Update highlights

Based on customer feedback, new service and feature releases, and our experience helping customers, we’ve updated the majority of the guide with new content. Some highlights of the new version include:

  • New foundational content on the differences between AWS and on-premises incident response – Because customers have frequently asked the question “What’s different about incident response on AWS?” the new introduction includes a section on the Key differences of incident response on AWS, which enumerates six core differences between AWS and on-premises incident response.
  • Alignment to incident response industry standards – The new guide was re-structured to align with the incident response standards and best practices from the National Institute of Technology (NIST) Computer Security Incident Handling Guide SP 800-61 Rev. 2. This alignment helps clarify how AWS technologies apply to these concepts.
  • New Operations section – The guide contains a new section, Operations, which explains actions to take during a security event by following NIST’s phases of incident response: detection, analysis, containment, eradication, and recovery.
  • Clearer prescriptive guidance – The updated guide also contains prescriptive guidance to clarify the actions that a customer should take before, during, and after a security incident. The Preparation section contains a table in the conclusion that summarizes the actions that you can take before a security event. Similarly, the Operations section has a summary table with techniques and methodologies for active response. Lastly, the Post-incident activity section contains a framework for learning from incidents, which includes a list of questions to address after a security incident.

Using the new guide

We encourage you to read the entire guide before taking action and building a list of changes to implement. After you read the guide, assess your current status based on the preparation items and check off action items that you have already completed in the Preparation items table. This will help you assess the current state of your AWS incident response. Then, you should plan a short-term and long-term roadmap based on your gaps, desired state, resources, and business needs. Building a cloud incident response program often involves iteration, so you should prioritize key items and regularly revisit your backlog to keep up with technology changes and your business requirements.

More information

For more information and to get started, see the updated AWS Security Incident Response Guide.

We greatly value feedback and contributions from our community. To share your thoughts and insights about the AWS Security Incident Response Guide, your experience using it, and what you want to see in future versions, complete the feedback form.

Want more AWS Security news? Follow us on Twitter.

Author

Anna McAbee

Anna is a Security Specialist Solutions Architect focused on threat detection and incident response at AWS. Before AWS, she worked as an AWS customer in financial services on both the offensive and defensive sides of security. Outside of work, Anna enjoys cheering on the Florida Gators football team, wine tasting, and traveling the world.

Introducing the AWS Best Practices for Security, Identity, & Compliance Webpage and Customer Polling Feature

Post Syndicated from Marta Taggart original https://aws.amazon.com/blogs/security/introducing-aws-best-practices-security-identity-compliance-webpage-and-customer-polling-feature/

The AWS Security team has made it easier for you to find information and guidance on best practices for your cloud architecture. We’re pleased to share the Best Practices for Security, Identity, & Compliance webpage of the new AWS Architecture Center. Here you’ll find top recommendations for security design principles, workshops, and educational materials, and you can browse our full catalog of self-service content including blogs, whitepapers, videos, trainings, reference implementations, and more.

We’re also running polls on the new AWS Architecture Center to gather your feedback. Want to learn more about how to protect account access? Or are you looking for recommendations on how to improve your incident response capabilities? Let us know by completing the poll. We will use your answers to help guide security topics for upcoming content.

Poll topics will change periodically, so bookmark the Security, Identity, & Compliance webpage for easy access to future questions, or to submit your topic ideas at any time. Our first poll, which asks what areas of the Well-Architected Security Pillar are most important for your use, is available now. We look forward to hearing from you.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Marta Taggart

Marta is a Seattle-native and Senior Program Manager in AWS Security, where she focuses on privacy, content development, and educational programs. Her interest in education stems from two years she spent in the education sector while serving in the Peace Corps in Romania. In her free time, she’s on a global hunt for the perfect cup of coffee.