Tag Archives: Tools

Evolving cyber threats demand new security approaches – The benefits of a unified and global IT/OT SOC

2023-10-30 Stuart Gregg

Post Syndicated from Stuart Gregg original https://aws.amazon.com/blogs/security/evolving-cyber-threats-demand-new-security-approaches-the-benefits-of-a-unified-and-global-it-ot-soc/

In this blog post, we discuss some of the benefits and considerations organizations should think through when looking at a unified and global information technology and operational technology (IT/OT) security operations center (SOC). Although this post focuses on the IT/OT convergence within the SOC, you can use the concepts and ideas discussed here when thinking about other environments such as hybrid and multi-cloud, Industrial Internet of Things (IIoT), and so on.

The scope of assets has vastly expanded as organizations transition to remote work, and from increased interconnectivity through the Internet of Things (IoT) and edge devices coming online from around the globe, such as cyber physical systems. For many organizations, the IT and OT SOCs were separate, but there is a strong argument for convergence, which provides better context for the business outcomes of being able to respond to unexpected activity. In the ten security golden rules for IIoT solutions, AWS recommends deploying security audit and monitoring mechanisms across OT and IIoT environments, collecting security logs, and analyzing them using security information and event management (SIEM) tools within a SOC. SOCs are used to monitor, detect, and respond; this has traditionally been done separately for each environment. In this blog post, we explore the benefits and potential trade-offs of the convergence of these environments for the SOC. Although organizations should carefully consider the points raised throughout this blog post, the benefits of a unified SOC outweigh the potential trade-offs—visibility into the full threat chain propagating from one environment to another is critical for organizations as daily operations become more connected across IT and OT.

Traditional IT SOC

Traditionally, the SOC was responsible for security monitoring, analysis, and incident management of the entire IT environment within an organization—whether on-premises or in a hybrid architecture. This traditional approach has worked well for many years and ensures the SOC has the visibility to effectively protect the IT environment from evolving threats.

Note: Organizations should be aware of the considerations for security operations in the cloud which are discussed in this blog post.

Traditional OT SOC

Traditionally, OT, IT, and cloud teams have worked on separate sides of the air gap as described in the Purdue model. This can result in siloed OT, IIoT, and cloud security monitoring solutions, creating potential gaps in coverage or missing context that could otherwise have improved the response capability. To realize the full benefits of IT/OT convergence, IIoT, IT and OT must collaborate effectively to provide a broad perspective and the most effective defense. The convergence trend applies to newly connected devices and to how security and operations work together.

As organizations explore how industrial digital transformation can give them a competitive advantage, they’re using IoT, cloud computing, artificial intelligence and machine learning (AI/ML), and other digital technologies. This increases the potential threat surface that organizations must protect and requires a broad, integrated, and automated defense-in-depth security approach delivered through a unified and global SOC.

Without full visibility and control of traffic entering and exiting OT networks, the operations function might not be able to get full context or information that can be used to identify unexpected events. If a control system or connected assets such as programmable logic controllers (PLCs), operator workstations, or safety systems are compromised, threat actors could damage critical infrastructure and services or compromise data in IT systems. Even in cases where the OT system isn’t directly impacted, the secondary impacts can result in OT networks being shut down due to safety concerns over the ability to operate and monitor OT networks.

The SOC helps improve security and compliance by consolidating key security personnel and event data in a centralized location. Building a SOC is significant because it requires a substantial upfront and ongoing investment in people, processes, and technology. However, the value of an improved security posture is of great consideration compared to the costs.

In many OT organizations, operators and engineering teams may not be used to focusing on security; in some cases, organizations set up an OT SOC that’s independent from their IT SOC. Many of the capabilities, strategies, and technologies developed for enterprise and IT SOCs apply directly to the OT environment, such as security operations (SecOps) and standard operating procedures (SOPs). While there are clearly OT-specific considerations, the SOC model is a good starting point for a converged IT/OT cybersecurity approach. In addition, technologies such as a SIEM can help OT organizations monitor their environment with less effort and time to deliver maximum return on investment. For example, by bringing IT and OT security data into a SIEM, IT and OT stakeholders share access to the information needed to complete security work.

Benefits of a unified SOC

A unified SOC offers numerous benefits for organizations. It provides broad visibility across the entire IT and OT environments, enabling coordinated threat detection, faster incident response, and immediate sharing of indicators of compromise (IoCs) between environments. This allows for better understanding of threat paths and origins.

Consolidating data from IT and OT environments in a unified SOC can bring economies of scale with opportunities for discounted data ingestion and retention. Furthermore, managing a unified SOC can reduce overhead by centralizing data retention requirements, access models, and technical capabilities such as automation and machine learning.

Operational key performance indicators (KPIs) developed within one environment can be used to enhance another, promoting operational efficiency such as reducing mean time to detect security events (MTTD). A unified SOC enables integrated and unified security, operations, and performance, which supports comprehensive protection and visibility across technologies, locations, and deployments. Sharing lessons learned between IT and OT environments improves overall operational efficiency and security posture. A unified SOC also helps organizations adhere to regulatory requirements in a single place, streamlining compliance efforts and operational oversight.

By using a security data lake and advanced technologies like AI/ML, organizations can build resilient business operations, enhancing their detection and response to security threats.

Creating cross-functional teams of IT and OT subject matter experts (SMEs) help bridge the cultural divide and foster collaboration, enabling the development of a unified security strategy. Implementing an integrated and unified SOC can improve the maturity of industrial control systems (ICS) for IT and OT cybersecurity programs, bridging the gap between the domains and enhancing overall security capabilities.

Considerations for a unified SOC

There are several important aspects of a unified SOC for organizations to consider.

First, the separation of duty is crucial in a unified SOC environment. It’s essential to verify that specific duties are assigned to individuals based on their expertise and job function, allowing the most appropriate specialists to work on security events for their respective environments. Additionally, the sensitivity of data must be carefully managed. Robust access and permissions management is necessary to restrict access to specific types of data, maintaining that only authorized analysts can access and handle sensitive information. You should implement a clear AWS Identity and Access Management (IAM) strategy following security best practices across your organization to verify that the separation of duties is enforced.

Another critical consideration is the potential disruption to operations during the unification of IT and OT environments. To promote a smooth transition, careful planning is required to minimize any loss of data, visibility, or disruptions to standard operations. It’s crucial to recognize the differences in IT and OT security. The unique nature of OT environments and their close ties to physical infrastructure require tailored cybersecurity strategies and tools that address the distinct missions, challenges, and threats faced by industrial organizations. A copy-and-paste approach from IT cybersecurity programs will not suffice.

Furthermore, the level of cybersecurity maturity often varies between IT and OT domains. Investment in cybersecurity measures might differ, resulting in OT cybersecurity being relatively less mature compared to IT cybersecurity. This discrepancy should be considered when designing and implementing a unified SOC. Baselining the technology stack from each environment, defining clear goals and carefully architecting the solution can help ensure this discrepancy has been accounted for. After the solution has moved into the proof-of-concept (PoC) phase, you can start to testing for readiness to move the convergence to production.

You also must address the cultural divide between IT and OT teams. Lack of alignment between an organization’s cybersecurity policies and procedures with ICS and OT security objectives can impact the ability to secure both environments effectively. Bridging this divide through collaboration and clear communication is essential. This has been discussed in more detail in the post on managing organizational transformation for successful IT/OT convergence.

Unified IT/OT SOC deployment:

Figure 1 shows the deployment that would be expected in a unified IT/OT SOC. This is a high-level view of a unified SOC. In part 2 of this post, we will provide prescriptive guidance on how to design and build a unified and global SOC on AWS using AWS services and AWS Partner Network (APN) solutions.

Figure 1: Unified IT/OT SOC architecture

The parts of the IT/OT unified SOC are the following:

Environment: There are multiple environments, including a traditional IT on-premises organization, OT environment, cloud environment, and so on. Each environment represents a collection of security events and log sources from assets.

Data lake: A centralized place for data collection, normalization, and enrichment to verify that raw data from the different environments is standardized into a common scheme. The data lake should support data retention and archiving for long term storage.

Visualize: The SOC includes multiple dashboards based on organizational and operational needs. Dashboards can cover scenarios for multiple environments including data flows between IT and OT environments. There are also specific dashboards for the individual environments to cover each stakeholder’s needs. Data should be indexed in a way that allows humans and machines to query the data to monitor for security and performance issues.

Security analytics: Security analytics are used to aggregate and analyze security signals and generate higher fidelity alerts and to contextualize OT signals against concurrent IT signals and against threat intelligence from reputable sources.

Detect, alert, and respond: Alerts can be set up for events of interest based on data across both individual and multiple environments. Machine learning should be used to help identify threat paths and events of interest across the data.

Conclusion

Throughout this blog post, we’ve talked through the convergence of IT and OT environments from the perspective of optimizing your security operations. We looked at the benefits and considerations of designing and implementing a unified SOC.

Visibility into the full threat chain propagating from one environment to another is critical for organizations as daily operations become more connected across IT and OT. A unified SOC is the nerve center for incident detection and response and can be one of the most critical components in improving your organization’s security posture and cyber resilience.

If unification is your organization’s goal, you must fully consider what this means and design a plan for what a unified SOC will look like in practice. Running a small proof of concept and migrating in steps often helps with this process.

In the next blog post, we will provide prescriptive guidance on how to design and build a unified and global SOC using AWS services and AWS Partner Network (APN) solutions.

Learn more:

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Considerations for the security operations center in the cloud: deployment using AWS security services

2023-03-01 Stuart Gregg

Post Syndicated from Stuart Gregg original https://aws.amazon.com/blogs/security/considerations-for-the-security-operations-center-in-the-cloud-deployment-using-aws-security-services/

Welcome back. If you’re joining this series for the first time, we recommend that you read the first blog post in this series, Considerations for security operations in the cloud, for some context on what we will discuss and deploy in this blog post. In the earlier post, we talked through the different operating models (centralized, decentralized, or hybrid) that you can deploy for a Security Operations Center (SOC) function when you operate in the cloud. We covered the advantages of each model and some of the potential drawbacks you might see when you start to scale up operations within the cloud.

This post will focus on the Amazon Web Services (AWS) native security service, AWS Security Hub, that you can use to deploy in different SOC operating models. AWS Security Hub is a cloud security posture management service that SOC teams can use to perform security best practice checks and aggregate alerts. AWS Security Hub accepts findings from multiple sources, whether native to AWS, from the pre-built integrations, or from your own sources converted into the AWS Security Finding Format (ASFF). The data collected in Security Hub facilitates response and remediation actions.

Although the models we describe here use services that are native to AWS, the reference architectures that correspond to each operating model can be applied to a variety of deployments, including multi-cloud and traditional on-premises deployments. The majority of this post will focus on the decentralized and hybrid models—the centralized model is well documented and has reference architectures already available for you today.

Each organization is different, and no one operating model will fit everyone. You should choose the model that works best for your organizational landscape, with an understanding that the landscape will change and evolve over time. Using feedback loops and being open to change is important to help you meet the continued needs of your business. Additional factors to consider include, but are not limited to: staff skills, compliance requirements, previous operating model, and budget.

The centralized model

The centralized operating model for the SOC is well documented and frequently discussed, both at AWS and in the security community. According to AWS best practices, typically you designate a central security tooling account that is dedicated to operating security services, monitoring AWS accounts, and automating security alerting and response. The security tooling account serves as the administrator account for security services that are managed in an administrator/member structure across your AWS accounts. The key objectives for establishing a security tooling account are the following:

Provide a dedicated enclave with controlled access for managing security guardrails, monitoring, and response.
Maintain the appropriate centralized security infrastructure to monitor security operations data and maintain traceability across the security lifecycle.

Figure 1 demonstrates the variety of AWS security services that you can deploy in the central security account. For example, Security Hub within the security tooling account can act as the administrator to enable Security Hub in the member accounts, as well as view findings, view insights, and set security standards across member accounts, which can help simplify security posture management across your existing and future accounts.

Figure 1: Reference architecture for the security tooling account in a centralized model

As mentioned earlier, you can enable Security Hub to administer and enable member accounts. This is achieved by using AWS Organizations and the delegated administrator functionality. In addition, you can use Security Hub cross-Region aggregation within the delegated administrator account to aggregate findings, finding updates, insights, control compliance statuses, and security scores from multiple Regions to a single aggregation Region. You can then manage this data from the aggregation Region. Figure 2 shows the reference architecture for this functionality.

Figure 2: Reference architecture for Security Hub in the delegated administrator model

The AWS Security Reference Architecture (AWS SRA) is a great starting point for establishing the centralized security operations model. The AWS SRA is a holistic set of guidelines for deploying the full complement of AWS security services in a multi-account environment. You can use it to help design, implement, and manage AWS security services so that they align with AWS best practices. The AWS SRA’s Security Hub Organization solution provides deployable templates and examples that automate the process of enabling Security Hub by delegating administration to an account and configuring Security Hub for the existing and future AWS Organizations accounts.

The decentralized and hybrid models

As mentioned in Considerations for security operations in the cloud, the decentralized and hybrid SOC models provide many benefits for organizations. The flexibility of these operating models allows organizational units (OUs) to control how they deal with security-related incidents while still having organization-wide visibility into security posture. This flexibility is important as organizations start to scale up activities within the cloud.

The reference architecture in Figure 3 shows how the benefits we discussed in our earlier blog post can be architected in the decentralized and hybrid operating models in the AWS Cloud.

Figure 3: Reference architecture for the decentralized and hybrid operating models in AWS

The key features of this architecture are as follows:

The organization root account is separate, according to AWS Organizations best practices. By using service control policies (SCPs), the root account can still achieve a level of governance across the business.
Dedicated accounts have been created for each OU for the Security Hub administration. The model we will use for this deployment is the invite model. In this reference architecture and as an example, we’re using Amazon GuardDuty to flow findings into Security Hub. When you use this model, each OU can manage findings for that OU. This gives you flexibility to work from the Security Hub admin with full visibility of the OU and accounts associated with that OU, or to work in each member account and view findings for that account only.
(Optional, for use with the hybrid model) Each OU’s Security Hub member accounts first send events to their Security Hub admin account. The Security Hub admin account will then send events for that OU to the local Amazon EventBridge bus. You can then set up rules to forward events to a central EventBridge bus in a dedicated AWS account. In the architecture in Figure 3, this account is named SecAnalytics. This step will follow a similar flow as the one described in this AWS Cloud Operations & Migrations blog post.
(Optional, for use with the hybrid model) After the OUs have sent data to the central bus, you can use a capability similar to the one in this AWS Architecture Blog post to start organizing the findings and gain organization-wide visibility. The solution in the earlier post used Amazon QuickSight to visualize the data, but you can use another tool or pre-existing data pipeline.

Items 3 and 4 labeled with (Optional) are capabilities that enable the hybrid model; these are not required if you only want to enable the decentralized model.

Considerations for all deployments

Keep the following considerations in mind for all deployments:

Steady state operations should be considered for whichever model you deploy in. For the centralized model, you can use functionality within AWS Organizations to automatically enable Security Hub for accounts within the organization. In the decentralized and hybrid models, you will need to build out this capability or use a similar capability as described in this repo.
Alert fatigue happens when humans work on the same repetitive tasks’ day in and day out. To help reduce this, within the reference architecture and solution overview, we’ve added the capability described in this Security Blog post to automatically suppress findings based on criteria set by you. For the centralized model, you can add this capability in the delegated admin account for Security Hub. For the decentralized and hybrid models, we recommend that you put the auto-suppression capability in the Security Hub admin account, and then centralize the rules for suppression for that OU at the Security Hub admin level. This will reduce the overhead for deploying suppression rules multiple times and give a single location where rules are placed for that OU.
Context is key. Within the reference architecture and solution overview for decentralized and hybrid deployments, we’ve added the capability described in this Security Blog post. This capability will add additional context, such as the account name, the OU associated with the account, security contact information, and account tags. This information is pulled from AWS Organizations to enrich Security Hub findings. This additional context can also be used in the centralized model.

Deploy the decentralized and hybrid models

In this section, we’ll walk you through the deployment that reflects the reference architecture for the decentralized and hybrid models. Figure 4 shows the solution architecture, including the solution that needs to be deployed in the Security Hub admin account and in the aggregation Region for each business unit within the organization. The solution provides the capability to suppress Security Hub findings, enrich the findings, and propagate findings to central security accounts.

Figure 4: Reference architecture for the decentralized and hybrid deployment

The solution architecture consists of the following:

An EventBridge rule to invoke a Lambda function (Suppression Lambda) as the target to suppress any findings based on specific generator IDs within specific member accounts.

Note: The Security Hub Generator IDs and AWS Account IDs in the EventBridge rule are left as placeholders so that you can fill based on your needs.
An EventBridge rule to invoke a Lambda function (Enrichment Lambda) as the target to enrich the findings with AWS account and OU related metadata, along with alternate contact information to better prioritize the findings. The API calls to AWS Organizations and AWS account management services are optimized by caching the metadata in an Amazon DynamoDB table with a time-to-live (TTL) value of 24 hours.
An EventBridge rule to post the enriched findings that were not suppressed to a custom EventBridge event bus in the organization-level Security Tooling/SecAnalytics account.

Prerequisites

The following are the prerequisites for this deployment:

AWS Organizations is utilized across the business. In this scenario, AWS Organizations will be used to group AWS accounts into OUs, as well as to provide enrichment data for Security Hub findings.
Alternative contacts for AWS accounts have been filled out with the most up-to-date information. This is a best practice recommendation. This information will be used for enrichment of the Security Hub findings.
Your organization already has a pipeline in place for indexing Security Hub findings and visualizing them.
Security Hub is set up in the invite model. OU-level Security Hub accounts have been invited and accepted to be managed by the OU-level Security Hub admin account.
The grouping of findings across multiple OU-level Security Hub admin accounts uses Amazon EventBridge to forward events to a centralized bus. You should have the event bus set up ready for this deployment.

Deploy the solution

This solution deployment consists of two parts:

Create an IAM role in your Organizations management account that allows BU-level Security Hub admin to access account metadata, as described in the Create the IAM role procedure that follows.
Deploy the Enrichment Lambda function, the Suppression Lambda function, and the associated EventBridge event rules within the BU-level Security Hub administrator account.

Create the IAM role

Follow the instructions in Creating a role to delegate permissions to an IAM user to create an IAM role by using the IAM console, AWS Command Line Interface (AWS CLI), or AWS API. Create the role in the AWS Organizations management account with the role name as account-contact-readonly, based on the following trust and permission policy templates. You will need the account ID of your BU-level Security Hub administrator account.

The IAM trust policy allows the Security Hub administrator account to assume the role in your Organizations management account.

Note: The following trust policy shows only one BU Security admin account. You will need to add all BU Security admin accounts to the trust policy.

IAM role trust policy

{
   "Version": "2012-10-17",
   "Statement": [
     {
       "Effect": "Allow",
       "Principal": {
         "AWS": "arn:aws:iam::<BU SecHubAdmin Account ID>:root"
       },
       "Action": "sts:AssumeRole",
       "Condition": {}
     }
   ]
 }

Note: Replace <BU SecHubAdmin Account ID> with the account ID of your decentralized BU-level Security Hub administrator account. After the solution is deployed, you should update the principal in the preceding trust policy to use the new IAM role created for the solution.

IAM permission policy

{
     "Version": "2012-10-17",
     "Statement": [
         {
            "Action": "Account:GetAlternateContact",
            "Resource": [
                "arn:aws:account::<Org Management Account ID>:account/o-*/*"
            ],
            "Effect": "Allow"
        },
        {
            "Action": [
                "organizations:DescribeAccount",
                "organizations:ListTagsForResource",
                "organizations:DescribeOrganizationalUnit",
                "organizations:ListParents"
            ],
            "Resource": [
                "arn:aws:organizations::<Org Management Account ID>:account/o-*/*",
                "arn:aws:organizations::<Org Management Account ID>:ou/o-*/ou-*"
            ],
            "Effect": "Allow"
        }
     ]
 }

The IAM permission policy allows the Security Hub administrator account to look up the alternate contact information for the member accounts.

Make a note of the role Amazon Resource Name (ARN) for the IAM role, which will be similar to this format:
arn:aws:iam::<Org Management Account ID>:role/account-contact-readonly.

You will need this ARN when you deploy the solution in the next procedure.

Use AWS CloudFormation to create the IAM role

Alternatively, you can use the CloudFormation template we provide in our GitHub repository to create the role in the management account. The IAM role ARN is available in the Outputs section of the created CloudFormation stack.

Deploy the solution to your BU-level Security Hub administrator account

After you have the IAM role created, you can deploy the solution either from the AWS Management Console, or from our GitHub repository by using the AWS SAM CLI.

Note: If you’ve designated an aggregation Region within the BU-level Security Hub administrator account, you can deploy this solution only in the aggregation Region. Otherwise, you need to deploy this solution separately in each Region of the BU-level Security Hub administrator account where Security Hub is enabled.

To deploy the solution by using the AWS Management Console

In your Security Hub administrator account, launch the template by choosing the following Launch Stack button, which creates the stack the in us-east-1 Region.

Note: If your Security Hub aggregation Region is different than us-east-1 or you want to deploy the solution in a different AWS Region, you can deploy the solution from the GitHub repository described in the next section.
On the Quick create stack page, for Stack name, enter a unique stack name for this account; for example, aws-security-hub-decentralized-deployment-stack.

Figure 5: Quick create CloudFormation stack for the solution
For SecurityToolingAccountEventBus, provide the EventBus ARN in the security tooling account to post the Security Hub findings from the BU-level Security Hub administrator account.
For OrgManagementAccountContactRole, enter the role ARN of the role you created previously in the Create IAM role procedure.
Choose Create stack.
After the stack is created, go to the Resources tab and take note of the name of the IAM role that was created.
Update the principal element of the IAM role trust policy that you previously created in the Organizations management account in the Create the IAM role procedure, replacing the existing value with the role name you noted down.

Figure 6: Update the principal element for the role trust policy

To deploy the solution from our GitHub repository and AWS SAM CLI

Install the AWS SAM CLI.
Download or clone the GitHub repository by using the following commands.
git clone https://github.com/aws-samples/aws-securityhub-decentralized-operations-solution.git cd aws-securityhub-decentralized-operations-solution
Update the content of the profile.txt file with the profile name you want to use for the deployment.
To create a new bucket for deployment artifacts, run create-bucket.sh by specifying the Region as argument.
$ ./create-bucket.sh us-east-1
Deploy the solution to the account by running the deploy.sh script by specifying the Region as argument.
$ ./deploy.sh us-east-1
After the stack is created, go to the Resources tab and take note of the name of the IAM role that was created.
Update the principal element of the IAM role trust policy that you previously created in the Organizations management account in the Create the IAM role procedure, replacing it with the role name you noted down.
"AWS": "arn:aws:iam::<BU SH Delegated Account ID>: role/<Role Name>"

Note: The EventBridge rule to invoke the findings suppression Lambda function uses placeholders for the generator IDs and AWS account IDs. You need to update the EventBridge rule to meet your specific organizational requirements.

Further enhancements and conclusion

Beyond what is described in the decentralized and hybrid models, you can extend the solution to include the following aspects to meet your security operational needs:

In Considerations for security operations in the cloud, we spoke about the role of ChatOps. AWS Chatbot can enable OUs to set up rules to post notifications directly into chat rooms such as Amazon Chime or Slack. You can define rules to send only certain severity notifications or findings that are important to your OU to the chat room.
SCPs give organizations a level of control and governance. See this blog post for some best practices for deploying SCPs, as well as example policies that could be beneficial for your organization in any model you operate in.
We’ve performed testing of the decentralized and hybrid models in the reference architecture within one AWS Region. Although we don’t see any reason why this solution would not work in multiple Regions, if you do operate in multiple Regions you would need to deploy the CloudFormation template in each Region that you operate in. At this stage, you can keep findings within a Region or choose to centralize across multiple Regions by sending to the single central bus in Amazon EventBridge—the flexibility is yours.
The decentralized and hybrid models can also be extended if you operate in multiple organizations in AWS Organizations or have standalone accounts outside of your organization that you want to monitor. Interesting use cases could be in mergers and acquisitions scenarios, when newly acquired accounts need to be monitored to understand their posture before bringing them fully into the organization.

Throughout this two-part blog series, we’ve explored the role of the Security Operations Center (SOC) function, both traditionally in an on-premises environment and in the cloud. We’ve explored different operating models, from the traditional centralized deployment to the decentralized and hybrid models. We’ve also demonstrated, with reference architectures and deployable solutions, how you can achieve the different operating models in the AWS Cloud by using native AWS services. In the end, you should choose the model that works best for your environment and the security landscape you work in.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Considerations for security operations in the cloud

2022-11-18 Stuart Gregg

Post Syndicated from Stuart Gregg original https://aws.amazon.com/blogs/security/considerations-for-security-operations-in-the-cloud/

Cybersecurity teams are often made up of different functions. Typically, these can include Governance, Risk & Compliance (GRC), Security Architecture, Assurance, and Security Operations, to name a few. Each function has its own specific tasks, but works towards a common goal—to partner with the rest of the business and help teams ship and run workloads securely.

In this blog post, I’ll focus on the role of the security operations (SecOps) function, and in particular, the considerations that you should look at when choosing the most suitable operating model for your enterprise and environment. This becomes particularly important when your organization starts to adapt and operate more workloads in the cloud.

Operational teams that manage business processes are the backbone of organizations—they pave the way for efficient running of a business and provide a solid understanding of which day-to-day processes are effective. Typically, these processes are defined within standard operating procedures (SOPs), also known as runbooks or playbooks, and business functions are centralized around them—think Human Resources, Accounting, IT, and so on. This is also true for cybersecurity and SecOps, which typically has operational oversight of security for the entire organization.

Teams adopt an operating model that inherently leans toward a delegated ownership of security when scaling and developing workloads in the cloud. The emergence of this type of delegation might cause you to re-evaluate your currently supported model, and when you do this, it’s important to understand what outcome you are trying to get to. You want to be able to quickly respond to and resolve security issues. You want to help application teams own their own security decisions. You also want to have centralized visibility of the security posture of your organization. This last objective is key to being able to identify where there are opportunities for improvement in tooling or processes that can improve the operation of multiple teams.

Three ways of designing the operating model for SecOps are as follows:

Centralized – A more traditional model where SecOps is responsible for identifying and remediating security events across the business. This can also include reviewing general security posture findings for the business, such as patching and security configuration issues.
Decentralized – Responsibility for responding to and remediating security events across the business has been delegated to the application owners and individual business units, and there is no central operations function. Typically, there will still be an overarching security governance function that takes more of a policy or principles view.
Hybrid – A mix of both approaches, where SecOps still has a level of responsibility and ownership for identifying and orchestrating the response to security events, while the responsibility for remediation is owned by the application owners and individual business units.

As you can see from these descriptions, the main distinction between the different models is in the team that is responsible for remediation and response. I’ll discuss the benefits and considerations of each model throughout this blog post.

The strategies and operating models that I talk about throughout this blog post will focus on the role of SecOps and organizations that operate in the cloud. It’s worth noting that these operating models don’t apply to any particular technology or cloud provider. Each model has its own benefits and challenges to consider; overall, you should aim to adopt an operating model that gets to the best business outcome, while managing risk and providing a path for continuous improvement.

Background: the centralized model

As you might expect, the most familiar and well-understood operating model for SecOps is a centralized one. Traditionally, SecOps has developed gradually from internal security staff who have a very good understanding of the mostly static on-premises infrastructure and corporate assets, such as employee laptops, servers, and databases.

Centralizing in this way provides organizations with a familiar operating model and structure. Over time, operating in this model across an industry has allowed teams to develop reliable SOPs for common security events. Analysts who deal with these incidents have a good understanding of the infrastructure, the environment, and the steps that are needed to resolve incidents. Every incident gives opportunities to update the SOPs and to share this knowledge and the lessons learned with the wider industry. This continuous feedback cycle has provided benefits to SecOps teams for many years.

When security issues occur, understanding the division of responsibility between the various teams in this model is extremely important for quick resolution and remediation. The Responsibility Assignment Matrix, also known as the RACI model, has defined roles—Responsible, Accountable, Consulted, and Informed. Utilizing a model like this will help align each employee, department, and business unit so that they are aware of their role and contact points when incidents do occur, and can use defined playbooks to quickly act upon incidents.

The pressure can be high during a security event, and incidents that involve production systems carry additional weight. Typically, in a centralized model, security events flow into a central queue that a security analyst will monitor. A common approach is the Security Operations Center (SOC), where events from multiple sources are displayed on screens and also trigger activity in the queue. Security incidents are acted upon by an experienced team that is well versed in SOPs and understands the importance of time sensitivity when dealing with such incidents. Additionally, a centralized SecOps team usually operates in a 24/7 model, which might be achieved by having teams in multiple time zones or with help from an MSSP (Managed Security Service Provider). Whichever strategy is followed, having experienced security analysts deal with security incidents is a great benefit, because experience helps to ensure efficient and thorough remediation of issues.

So, with context and background set—how does a centralized SOC look and feel when it operates in the cloud, and what are its challenges?

Centralized SOC in the cloud: the advantages

Cloud providers offer many solutions and capabilities for SOCs that operate in a centralized model. For example, you can monitor your organization’s cloud security posture as a whole, which allows for key performance indicator (KPI) benchmarking, both internally and industry wide. This can then help your organization target security initiatives, training, and awareness on lower-scoring areas.

Security orchestration, automation, and response (SOAR) is a phrase commonly used across the security industry, and the cloud unlocks this capability. Combining both native and third-party security services and solutions with automation facilitates quick resolution of security incidents. The use of SOAR means that only incidents that need human intervention are actually reviewed by the analysts. After investigation, if automation can be introduced on that alert, it’s quickly applied. Having a central place for automating alerts helps the organization to have a consistent and structured approach to the response for security events and gives analysts more time to focus on activities like threat hunting.

Additionally, such threat-hunting operations require a central security data lake or similar technology. As a result, the SecOps team helps to drive the centralization of data across the business, which is a traditional cybersecurity function.

Centralized SOC in the cloud: organizational considerations

Some KPIs that a traditional SOC would typically use are time to detect (TTD), time to acknowledge (TTA), and time to resolve (TTR). These have been good metrics that SecOps managers can use to understand and benchmark how well the SecOps team is performing, both internally and against industry benchmarks. As your organization starts to take advantage of the breadth and depth available within the cloud, how does this change the KPIs that you need to track? As stated earlier, the cloud makes it easier to track KPIs through increased visibility of your cloud footprint—although you should evaluate traditional KPIs to understand whether they still make sense to use. Some additional KPIs that should be considered are metrics that show increasing automation, reduction in human access, and the overall improvement in security posture.

Organizations should consider scaling factors for operational processes and capability in the centralized SOC model. Once benefits from adopting the cloud have been realized, organizations typically expand and scale up their cloud footprint aggressively. For a centralized SecOps team, this could cause a challenging battle between the wider business, which wants to expand, and the SOC, which needs the ability to fully understand and respond to issues in the environment. For example, most organizations will put together small proof of concepts (POCs) to showcase new architectures and their benefits, and these POCs may become available as blueprints for the wider organization to consume. When new blueprints are implemented, the centralized SecOps team should implement and rely on its automation capabilities to verify that the correct alerting, monitoring, and operational processes are in place.

Decentralization: all ownership with the application teams

Moving or designing workloads in the cloud provides organizations with many benefits, such as increased speed and agility, built-in native security, and the ability to launch globally in minutes. When looking at the decentralized model, business units should incorporate practices into their development pipelines to benefit from the security capabilities of the cloud. This is sometimes referred to as a shift left or DevSecOps approach—essentially building security best practices into every part of the development process, and as early as possible.

Placing the ownership of the SecOps function on the business units and application owners can provide some benefits. One immediate benefit is that the teams that create applications and architectures have first-hand knowledge and contextual awareness of their products. This knowledge is critical when security events occur, because understanding the expected behavior and information flows of workloads helps with quick remediation and resolution of issues. Having teams work on security incidents in the ways that best fit their operational processes can also increase speed of remediation.

Decentralization: organizational considerations

When considering the decentralized approach, there are some organizational considerations that you should be aware of:

Dedicated security analysts within a central SecOps function deal with security incidents day in and day out; they study the industry, have a keen eye on upcoming threats, and are also well versed in high-pressure situations. By decentralizing, you might lose the consistent, level-headed experience they offer during a security incident. Embedding security champions who have industry experience into each business unit can help ensure that security is considered throughout the development lifecycle and that incidents are resolved as quickly as possible.

Contextual information and root cause analysis from past incidents are vital data points. Having a centralized SecOps team makes it much simpler to get a broad view of the security issues affecting the whole organization, which improves the ability to take a signal from one business unit and apply that to other parts of the organization to understand if they are also vulnerable, and to help protect the organization in the future.

Decentralizing the SecOps responsibility completely can cause you to lose these benefits. As mentioned earlier, effective communication and an environment to share data is key to verifying that lessons learned are shared across business units—one way of achieving this effective knowledge sharing could be to set up a Cloud Center of Excellence (CCoE). The CCoE helps with broad information sharing, but the minimization of team hand-offs provided by a centralized SecOps function is a strong organizational mechanism to drive consistency.

Traditionally, in the centralized model, the SOC has 24/7 coverage of applications and critical business functions, which can require a large security staff. The need for 24/7 operations still exists in a decentralized model, and having to provide that capability in each application team or business unit can increase costs while making it more difficult to share information. In a decentralized model, having greater levels of automation across organizational processes can help reduce the number of humans needed for 24/7 coverage.

Blending the models: the hybrid approach

Most organizations end up using a hybrid operating model in one way or another. This model combines the benefits of the centralized and decentralized models, with clear responsibility and division of ownership between the business units and the central SecOps function.

This best-of-both-worlds scenario can be summarized by the statement “global monitoring, local response.” This means that the SecOps team and wider cybersecurity function guides the entire organization with security best practices and guardrails while also maintaining visibility for reporting, compliance, and understanding the security posture of the organization as a whole. Meanwhile, local business units have the tools, knowledge, and expertise available to confidently own remediation of security events for their applications.

In this hybrid model, you split delegation of ownership into two parts. First, the operational capability for security is centrally owned. This centrally owned capability builds upon the partnership between the application teams and the security organization, via the CCoE. This gives the benefits of consistency, tooling expertise, and lessons learned from past security incidents. Second, the resolution of day-to-day security events and security posture findings is delegated to the business units. This empowers the people closest to the business problem to own service improvement in ways that best suit that team’s way of working, whether that’s through ChatOps and automation, or through the tools available in the cloud. Examples of the types of events you might want to delegate for resolution are items such as patching, configuration issues, and workload-specific security events. It’s important to provide these teams with a well-defined escalation route to the central security organization for issues that require specialist security knowledge, such as forensics or other investigations.

A RACI is particularly important when you operate in this hybrid model. Making sure that there is a clear set of responsibilities between the business units and the SecOps team is crucial to avoid confusion when security incidents occur.

Conclusion

The cloud has the ability to unlock new capabilities for your organization. Increased security, speed, and agility and are just some of the benefits you can gain when you move workloads to the cloud. The traditional centralized SecOps model offers a consistent approach to security detection and response for your organization. Decentralization of the response provides application teams with direct exposure to the consequences of their design decisions, which can speed up improvement. The hybrid model, where application teams are responsible for the resolution of issues, can improve the time to fix issues while freeing up SecOps to continue their works. The hybrid operating model compliments the capabilities of the cloud, and enables application owners and business units to work in ways that best suit them while maintaining a high bar for security across the organization.

Whichever operating model and strategy you decide to embark on, it’s important to remember the core principles that you should aim for:

Enable effective risk management across the business
Drive security awareness and embed security champions where possible
When you scale, maintain organization-wide visibility of security events
Help application owners and business units to work in ways that work best for them
Work with application owners and business units to understand the cyber landscape

The cloud offers many benefits for your organization, and your security organization is there to help teams ship and operate securely. This confidence will lead to realized productivity and continued innovation—which is good for both internal teams and your customers.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Noise

Tag Archives: Tools

Evolving cyber threats demand new security approaches – The benefits of a unified and global IT/OT SOC

Traditional IT SOC

Traditional OT SOC

Benefits of a unified SOC

Considerations for a unified SOC

Unified IT/OT SOC deployment:

Conclusion

Considerations for the security operations center in the cloud: deployment using AWS security services

The centralized model

The decentralized and hybrid models

Considerations for all deployments

Deploy the decentralized and hybrid models

Prerequisites

Deploy the solution

Create the IAM role

Use AWS CloudFormation to create the IAM role

Deploy the solution to your BU-level Security Hub administrator account

Further enhancements and conclusion

Considerations for security operations in the cloud

Background: the centralized model

Centralized SOC in the cloud: the advantages

Centralized SOC in the cloud: organizational considerations

Decentralization: all ownership with the application teams

Decentralization: organizational considerations

Blending the models: the hybrid approach

Conclusion

The collective thoughts of the interwebz