All posts by Joshua Harr

Build Security Muscle Memory With Tabletop Exercises

Post Syndicated from Joshua Harr original https://blog.rapid7.com/2023/03/15/build-security-muscle-memory-with-tabletop-exercises/

Build Security Muscle Memory With Tabletop Exercises

When I was in grade school, I played football. I was scrawny and afraid to go up against anyone bigger than I was (essentially everyone). I always hated Oklahoma drills and scrimmages with my team. For quite some time, I avoided “the tunnel” hoping to evade facing the bigger linemen. My coach sat me down and explained “why” we did these drills.

“We are building your muscle memory to increase your awareness and reflexes in a real game, when it really matters.” I was a snotty nosed kid back then, and I didn’t realize that this would be applicable throughout my career.

What scrimmages were to football, tabletop exercises (TTX) are to incident response, business continuity, disaster recovery, vulnerability management, and other critical components of your organization’s security program. TTXs build the essential muscle memory at all levels of the organization.

Three Levels of an Organization

Organizations can be divided into three core levels:

  • Strategic: Long term visionaries. Executive leaders that guide the organization forward and place a focus on larger strategic positioning.
  • Operational: Day to day management. Front line managers and the communication arm of the strategic level. This level takes strategic vision and turns it into tactical application.
  • Tactical: Where strategic planning and operational processes are put into action. These are the day-to-day tasks that are actioned on, such as monitoring and detection, deployment, onboarding, etc.

Delineating these levels is important when building a TTX. You need to know who you are facilitating a TTX for, to ensure that you’re incorporating the relevant data for them to understand, respond to, and discuss. In other words, you need to provide the technical artifacts to the Tactical level participants, operational processes to the Operational level participants, and strategic business impact events to Strategic level participants.

There are times in which you might deliver a TTX specific to just one level within the organization. While this may be better for building consistent muscle memory with those specific teams, scrimmaging with the entire business is also essential.

The Three TTX Methodologies

There are three methodologies that I discuss with our customers. Each of these methods have benefits for all organizational levels, but are ideally suited to specific levels as outlined below.

  • Break-The-Glass: This TTX method is great for the strategic level because it allows the teams to work backwards and forwards with an incident at the same time. A break-the-glass TTX drops the main incident event right at the beginning of the exercise, so the incident is known right out of the gate. This is great for testing the overall response to an incident and allows you to go as technical as you would like, while still enabling the strategic level to participate and add value. The downside is that nuanced operational processes may be missed and go untested, and those processes may be very important in a real-world scenario.
  • Escalatory Method: This method takes a more granular look at each process and response by beginning with lower-level events and escalating the severity of the incident as the scenario develops. This is great for the operational level as it really focuses on the operational processes, procedures, and response plans that have been developed. A lot of findings can come out of this from a procedural perspective that will help further develop your IR plan and playbooks. The only real downside is that it plays out more slowly than a real incident given we take a close look at each step within the response.
  • Choose Your Own Adventure: This method is the most robust and difficult to build out and facilitate. Nonetheless, if done well, you can reveal significant growth opportunities in your response plans. This method starts with a single artifact and allows participants to ask questions and go through their plans. Once a participant hits on a key point or says a key word, the scenario will begin to unfold. There are ways for the incident progress to be stifled should the participants go down a different path as well. This is great for the tactical level to practice their playbooks, technical analysis, and critical thinking skills. The downside is that this type of exercise requires a significant amount of time to fully develop while attempting to predict where things may go. You may also mistakenly create a “one right answer” issue which is a common mistake in TTXs.

The “One Right Answer” Issue

When I discuss a TTX with customers, there are times where they want to practice one specific thing to prove that there is an issue in the program or point out problems in other teams. This is never a good idea. If you build a scenario around one specific risk, you might miss other risks or and blind spots not currently known to the business. Additionally, it can create animosity and widen the divide, mistrust, and other sentiments toward the sponsoring team. One way to avoid this is to allow for freeform discussion and allow it to naturally be discussed. The scenario story should be broad enough to allow the discussion to go where it needs to go without feeling forced or coerced.  

The Goal

There are many goals that you may want to achieve when delivering your exercises. One of those goals should be bringing the organization together and practicing the plans and processes to ensure that the muscle memory is there when you need it most—gametime. Don’t be afraid to go for the big risks and talk about them. It’s better to find out you don’t have a developed process for something in the exercise than to find that out in the middle of a real-world incident that impacts the business. If your resources allow, you should run scenarios more than once per year. A good routine is as follows:

  • Tactical level: Once per quarter
  • Operational level: Twice a year
  • Strategic level: Once a year

You can combine levels so there aren’t as many exercises running per year or run additional exercises just to stay sharp. Additionally, your scenarios don’t all need to be hours long, building muscle memory is all about repetition. Keep your teams practicing, iterating, and building your organization’s muscle memory so that you can respond effectively and with precision to any incident that occurs.

If you need help with TTX scenario development and facilitation, Rapid7 Advisory Services provides a compact offering to develop and deliver a TTX as well as provide a detailed report of findings and recommendations to increase the effectiveness of your incident response program.

Rapid7 Advisory Services

Overwhelmed. Understaffed. Unprotected. Sound familiar? That’s why we’re here. Rapid7 Advisory Services can help you prioritize your security initiatives and align them with your business.

LEARN MORE

Grey Time: The Hidden Cost of Incident Response

Post Syndicated from Joshua Harr original https://blog.rapid7.com/2022/09/13/grey-time-the-hidden-cost-of-incident-response/

Grey Time: The Hidden Cost of Incident Response

The time cost of incident response for security teams may be greater – and more complex – than we’ve been assuming. To see that in action, let’s look at a hypothetical scenario that should feel familiar to most cybersecurity analysts.

An everyday story

A security engineer, Casey, is tuning a SIEM to detect a specific threat that poses an increased risk to their organization. This project has been allotted some set amount of time to get completed. The research and testing that Casey must do in order to get the query and tuning correct, accurate, and effective are essential to the business. This is one of many projects this engineer has on their plate. They are getting into the research and starting to understand the attack at a level they will be able to begin writing some preliminary factors of the alert, and then…

An employee forwards an email that they believe to be phishy. Casey looks at the email and confirms it requires further investigation. However, the engineer must respond to the user by giving them the process to send the email as an attachment to look into headers and other details that could help identify the artifacts of a malicious email. After that, the engineer will do their assessment and respond appropriately to the event.

Now, 25 minutes have passed. Casey returns to focus on tuning the alert but needs to go back over the research a bit more to confirm where they left off. Another 10 minutes have passed, and they are back where they were then the phishing alert came in. Now they are gathering the right information for the project and trying to get the right people involved, then…

An EDR alert comes in. It is from a director’s laptop. This begins to take priority, as the director needs this laptop for their presentation to a customer, and they leave for the airport in 3 hours. Casey steps away to analyze the alert, eradicate the malware, and begin a scan across the organization to determine if the malware hash value is seen elsewhere. 30 minutes go by, because an incident report needs to be added to the ticket. Casey sits back down and, for another 20 minutes, must recalibrate their thoughts to focus on the task at hand.

Grey time

Scenarios like this are happening in almost every organization today. High-risk security projects are delayed because fires pop up and need to be responded to. In the scenario we’ve just laid out, this engineer has lost one hour and 25 minutes from their project work due to incidents. These incidents may have a risk to them if not dealt with promptly, but the project that this engineer is working on carries a high risk of impact if not completed.

Cal Newport, a computer science professor at Georgetown University, famously explained in his seminal book “Deep Work” that it takes each person a different amount of time to pivot from one task to another. It’s how our brains work. I’m calling that amount of time that it takes to pivot “grey time.” Grey time is not normally added into the time it takes to respond to incidents, but we should change that.

Whether it takes 30 seconds, 5 minutes, or 15 minutes to respond to an incident, you have to add 5 to 25 minutes of grey time to the process to pivot back to the work previously being performed. The longer the break from the task, the longer it may take to get back into the project fully. Grey time is just as detrimental to an organization as not responding to the incidents. There are quite a few statistics out there that help us quantify distractions and interruptions:

Incidents can be distractions or interruptions. The fact is that some events that security professionals respond to are benign and do not lead to actioning an incident response plan or prevent prioritized work from being completed.

Here is where Security Orchestration, Automation, and Response (SOAR) comes into play. Those manual tasks security professionals are doing that take time away from risk-informed projects to secure the business can be automated. If tasks cannot be automated fully, we can at least automate the process of pivoting from tool to tool. SOAR can eliminate the manual notation in a ticketing system and the documentation of an incident report. It can also reduce time to respond and help eliminate grey time.

Grey time reduction through SOAR

In an industry where alert fatigue and employee attrition are pervasive issues, the need is high for SOAR’s extensive automation capabilities. Think about the tasks in your organization that you would automate if you could, because they are taking up more time than necessary. We can do some quick math to find your organization’s annual cost of manual response for each of those tasks, including grey time.

  1. First, think of a repetitive action your team does repeatedly.
  2. Assign a “task minutes” ™ value, which is approximately how long it takes to do that task.
  3. Then, estimate the “task instances per week” (ti) value.
  4. Multiply by 52 to find your “task minutes per year.”
  5. Divide by 60 to find your “task hours per year.”
  6. Multiply by your average hourly employee rate for the team that works on that task to find your annual cost of manual response.

I encourage you to do this for each playbook or process you have.

  • Task minutes ™ x task instances per week (ti) = total task minutes per week (ttw)
  • tw x 52 = total task minutes per year (tty)
  • tty / 60 = total hours per year (ty)
  • ty x hourly employee rate (hr) = cost of manual response

What we haven’t done here is add in the grey time. On average, it takes about 23 minutes and 15 seconds to regain focus on a task after a distraction. So, with that in mind, let’s round out this post by quantifying our story from earlier.

Let’s say that Casey, our engineer, takes 30 minutes for each phishing email, and malware compromises take 15 minutes to contain and eradicate. Both incident reports take about 20 minutes. Let’s also say that the organization sees about 16 phishing instances per week (ti) and phishing with the reporting takes 50 minutes. Let’s add in the grey time at 20 minutes to make it 70 minutes ™.

  • 70 x 16 = 1,120 minutes (tw)
  • 1,120 x 52 = 58,240 minutes (tty)
  • 58,240 / 60 = 970.7 hours (ty)

Using the national average salary of an entry-level incident and intrusion analyst at $88,226, we can break that down to an hourly rate of $42.41. From there, 970.7 (ty) x 42.41 (hr) = $41,167.39.

That’s just over $41K spent on manual responses to phishing each year. What about the malware? I’ll shorthand it because I believe you get the picture. Let’s say malware incidents happen about 10 times a week.

  • 25 min + 20 min = 45 min (Tm)
  • 45 x 10 = 450 (TTw)
  • 450 x 52 = 23,400 (TTy)
  • 23,400 / 60 = 390 (THy)
  • 390 x $42.41 = $16,539.90
  • $16,539.90 + $41,167.39 = $57,707.29

That’s nearly a full-time employee salary for just two manual processes!

SOAR past grey time

SOAR is becoming increasingly needed within our information security programs. Not only are we wasting time on manual processes that could be automated, but we are adding grey time to our workday and decreasing the time we have to work on high-priority projects that are informed by business risk and necessary to protect revenue and business operations. With SOAR, you can refocus your efforts on risk-relevant tasks and limit manual task interruptions. You can also reduce grey time and increase the effectiveness of your security program. With SOAR, it’s all blue skies – and no grey time.

Additional reading:

NEVER MISS A BLOG

Get the latest stories, expertise, and news about security today.

Log4Shell Strategic Response: 5 Practices for Vulnerability Management at Scale

Post Syndicated from Joshua Harr original https://blog.rapid7.com/2022/01/07/log4shell-strategic-response-5-practices-for-vulnerability-management-at-scale/

Log4Shell Strategic Response: 5 Practices for Vulnerability Management at Scale

In today’s cybersecurity world, risks evolve faster than we can remediate them. To meet our goals and become resilient to these fast changes, we need the right balance of automation and human interaction. Enabling rapid response for protecting information systems is paramount, but how does a business reach this level of reaction?

How can organizations maintain a standard of excellence to their responses in high-risk situations?

Where do you even begin to respond to a critical vulnerability like the one in Apache’s Log4j Java library (a.k.a. Log4Shell)?

Most importantly, how do we transform the tactical actions that need to take place into an effective strategy to scale?

1. Empower personnel

The personnel with the knowledge about your various solutions must be empowered to make the decisions necessary to address your company’s information technology needs. If those team members don’t feel they can make those decisions, then they will defer to management — but managers may not know the intricacies of the solutions and could create a natural bottleneck, since there are going to be more decision points than managers to make decisions. Providing personnel with policy documents with uniform criteria for evaluating the risk these new vulnerabilities present, the ways to respond, and the time expectations is paramount for a timely resolution.

In a typical risk resolution process, there are many gates to safeguard our systems. This helps ensure that whatever change happens increases the solution’s confidentiality, integrity, or availability rather than diminishing it. However, a situation like Log4Shell needs to be treated like an incident response activity to quickly address the risk. Create a task force to effectively answer the important questions like:

  • How do we find vulnerable systems?
  • Which systems are vulnerable?
  • What options are there for a fix? One size may not fit all.
  • Who is going to track changes?
  • Who is going to validate the fix is in place?

Utilizing a strong incident response procedure to answer all these questions will assist with prioritization and remediation to an acceptable level of risk.

2. Promote visibility

Any standard vulnerability management lifecycle process begins with identifying affected systems to assess and evaluate the scope of a vulnerability’s presence on the network. The approach should utilize both proactive and reactive efforts through a combination of tools and well-documented processes to streamline and scale the response effectively.

A proactive process would first involve having well-documented use of any such library versions internally in an inventory, so that discoverability and traceability are much more narrowly focused efforts. If you conduct authenticated vulnerability scans continuously on pre-scheduled frequencies, this will also help with identification of third-party software utilizing this library over time. Classifying system criticality within the vulnerability management tool will help you more effectively scale future remediation processes.

These proactive processes help jumpstart an initial response, but you’ll still need reactive efforts to help ensure effective and timely remediation. Vulnerability scanning tools will receive signature updates regarding this newly discovered vulnerability, which will require updating your vulnerability management tool and initiating one-off alternative scans that may deviate from pre-scheduled rotations. These alternative scans should include tiered phases, so the most critical systems receive scan priority, and then remaining systems are scanned in order of criticality. Leveraging the pre-existing system criticality classification will significantly expedite this process.

A security incident and event management (SIEM) tool can also assist with identifying, tracking, and alerting for any suspicious activity that may be tied to exploitation of this vulnerability. Host agents and network detection systems that report back to the SIEM should be closely monitored, and any activity or traffic that deviates from baselines should receive an active response. You may need to adjust logging and alerting rules and thresholds to ensure your efforts are strategically focused.

Tactical processes help you achieve this continuous identification, but you still need to orchestrate and execute them through strategic planning to remain timely, efficient, and effective. Well-documented asset inventories and appropriate system criticality classifications help you prioritize your efforts, while continuous vulnerability scans and leveraging vulnerability management and SIEM tools help to identify, track, and manage vulnerability exposure. Leadership should provide the direction to guide these activities from inception to implementation through effective communication and allocation of resources. Lay out a short-term roadmap for tracking objectives and quick wins as part of the remediation process, so you can quickly and concisely show how you’re tracking toward goals.

3. Implement prioritization and mitigation

Now that your team has successfully identified all affected systems, you’ll need to roll out patches to those systems on a continuous basis during the next phase to mitigate risk. Current enterprise-wide patching timelines may require adjustment due to the urgency associated with such critical vulnerabilities. Patch testing and rollout phases must be expedited to support a more timely and effective response.

Much like conducting our vulnerability scans in terms of system criticality prioritization, our patch management response should follow a similar approach, with the caveat that a pilot group or pilot system deemed non-critical should be patched first for testing purposes to ensure no adverse effects prior to rolling out patches in order of system criticality. If you’ve configured a full test environment is configured, you can test patches on critical systems first within that environment and then roll them out in production according to criticality. The testing timeline itself should be reduced throughout all standard phases of a testing cycle — you may even need to eliminate certain testing phases altogether. The rollout timelines for patches across all systems will need to be expedited as well to ensure as timely coverage as possible. If your environment has widespread use of the vulnerable library, you may require reductions in timelines of anywhere from 25% to 50%.

Emergency patching procedures should provide for timely testing and production rollouts within roughly half the time of a normal patching cycle, or 5 to 10 days at a maximum for critical systems to minimize breach potential as quickly as possible. Also keep in mind that some vulnerabilities may involve more than just application of a simple patch — configuration changes may also be necessary to further mitigate potential exploitation by an adversary.

4. Validate remediation

Now, you’ve deployed patches to all affected systems, so the mitigation efforts are complete, right? While you may want to shift your focus back to other tasks, it’s essential to maintain continuous identification processes to ensure that no stone remains unturned.

The vulnerability management validation phase leverages those reactive identification processes, in addition to patch management processes, to assist in efficient and effective vulnerability remediation for affected systems. This stage involves re-scanning initially identified vulnerable systems to assess successful patch application and performing additional open scans of the network to ensure that there are no lingering systems that may still be affected by the vulnerability but weren’t originally identified — or perhaps weren’t successfully patched as part of the patch management process. This cycle of continuous validation will remain in effect until “clean” scans are reported across the enterprise regarding this vulnerability.

Since the Log4j logging library is widely used throughout many enterprise applications and even unknowingly embedded in so many others, continuous validation will become crucial in ensuring your organization remains vigilant and can mitigate the vulnerability quickly and effectively as you continue to discover affected systems.

5. Regularly review risks

A vulnerability management lifecycle rarely ever comes to a true end. As adversaries and security evangelists further evaluate a specific vulnerability over time, new methods of exploitation are identified, affected versions increase in scope and scale, and recent patches and fixes are found to be ineffective. This leaves organizations potentially open to exposure and at a loss for the best path forward. Continuous review of the trends surrounding an ongoing critical vulnerability will help organizations ensure they remain both aware of the impact and the current mitigating measures that have been most successful. Additionally, leveraging other solutions can help further identify and launch a coordinated defense-in-depth response to any potential malicious activity that may be associated with such vulnerabilities.

Working to continuously identify, mitigate, validate, and review vulnerabilities throughout their inevitable course will require commitment and fortitude to achieve the best results, but once the tides have subsided with Log4Shell and you’ve successfully and securely endured one of the worst security vulnerability exposures in a decade by following these processes, you can rest assured that your incident response processes were well-tested during this endeavor — and your IT security budget should be more than solidified for the next few years to come.

Check out our additional resources for further insight of this vulnerability, mitigating measures, and tools available to assist.

NEVER MISS A BLOG

Get the latest stories, expertise, and news about security today.