Tag Archives: Splunk

Three recurring Security Hub usage patterns and how to deploy them

Post Syndicated from Tim Holm original https://aws.amazon.com/blogs/security/three-recurring-security-hub-usage-patterns-and-how-to-deploy-them/

As Amazon Web Services (AWS) Security Solutions Architects, we get to talk to customers of all sizes and industries about how they want to improve their security posture and get visibility into their AWS resources. This blog post identifies the top three most commonly used Security Hub usage patterns and describes how you can use these to improve your strategy for identifying and managing findings.

Customers have told us they want to provide security and compliance visibility to the application owners in an AWS account or to the teams that use the account; others want a single-pane-of-glass view for their security teams; and other customers want to centralize everything into a security information and event management (SIEM) system, most often due to being in a hybrid scenario.

Security Hub was launched as a posture management service that performs security checks, aggregates alerts, and enables automated remediation. Security Hub ingests findings from multiple AWS services, including Amazon GuardDuty, Amazon Inspector, AWS Firewall Manager, and AWS Health, and also from third-party services. It can be integrated with AWS Organizations to provide a single dashboard where you can view findings across your organization.

Security Hub findings are normalized into the AWS Security Findings Format (ASFF) so that users can review them in a standardized format. This reduces the need for time-consuming data conversion efforts and allows for flexible and consistent filtering of findings based on the attributes provided in the finding, as well as the use of customizable responsive actions. Partners who have integrations with Security Hub also send their findings to AWS using the ASFF to allow for consistent attribute definition and enforced criticality ratings, meaning that findings in Security Hub have a measurable rating. This helps to simplify the complexity of managing multiple findings from different providers.

Overview of the usage patterns

In this section, we outline the objectives for each usage pattern, list the typical stakeholders we have seen these patterns support, and discuss the value of deploying each one.

Usage pattern 1: Dashboard for application owners

Use Security Hub to provide visibility to application workload owners regarding the security and compliance posture of their AWS resources.

The application owner is often responsible for the security and compliance posture of the resources they have deployed in AWS. In our experience however, it is common for large enterprises to have a separate team responsible for defining security-related privileges and to not grant application owners the ability to modify configuration settings on the AWS account that is designated as the centralized Security Hub administration account. We’ll walk through how you can enable read-only access for application owners to use Security Hub to see the overall security posture of their AWS resources.

Stakeholders: Developers and cloud teams that are responsible for the security posture of their AWS resources. These individuals are often required to resolve security events and non-compliance findings that are captured with Security Hub.

Value adds for customers: Some organizations we have worked with put the onus on workload owners to own their security findings, because they have a better understanding of the nuances of the engineering, the business needs, and the overall risk that the security findings represent. This usage pattern gives the applications owners clear visibility into the security and compliance status of their workloads in the AWS accounts so that they can define appropriate mitigation actions with consideration to their business needs and risk.

Usage pattern 2: A single pane of glass for security professionals

Use Security Hub as a single pane of glass to view, triage, and take action on AWS security and compliance findings across accounts and AWS Regions.

Security Hub generates findings by running continuous, automated security checks based on supported industry standards. Additionally, Security Hub integrates with other AWS services to collect and correlate findings and uses over 60 partner integrations to simplify and prioritize findings. With these features, security professionals can use Security Hub to manage findings across their AWS landscape.

Stakeholders: Security operations, incident responders, and threat hunters who are responsible for monitoring compliance, as well as security events.

Value adds for customers: This pattern benefits customers who don’t have a SIEM but who are looking for a centralized model of security operations. By using Security Hub and aggregating findings across Regions into a single Security Hub dashboard, they get oversight of their AWS resources without the cost and complexity of managing a SIEM.

Usage pattern 3: Centralized routing to a SIEM solution

Use AWS Security Hub as a single aggregation point for security and compliance findings across AWS accounts and Regions, and route those findings in a normalized format to a centralized SIEM or log management tool.

Customers who have an existing SIEM capability and complex environments typically deploy this usage pattern. By using Security Hub, these customers gather security and compliance-related findings across the workloads in all their accounts, ingest those into their SIEM, and investigate findings or take response and remediation actions directly within their SIEM console. This mechanism also enables customers to define use cases for threat detection and analysis in a single environment, providing a holistic view of their risk.

Stakeholders: Security operations teams, incident responders, and threat hunters. This pattern supports a centralized model of security operations, where the responsibilities for monitoring and identifying both non-compliance with defined practice, as well as security events, fall within single teams within the organization.

Value adds for customers: When Security Hub aggregates the findings from workloads across accounts and Regions in a single place, those finding are normalized by using the ASFF. This means that findings are already normalized under a single format when they are sent to the SIEM. This enables faster analytics, use case definition, and dashboarding because analysts don’t have to create multi-tiered use cases for different finding structures across vendors and services.

The ASFF also enables streamlined response through security orchestration, automation, response (SOAR) tools or AWS native orchestration tools such as AWS EventBridge. With the ASFF, you can effortlessly parse and filter events based on an attribute and customize automation.

Overall, this usage pattern helps to improve the typical key performance indicators (KPIs) the SecOps function is measured against, such as Mean Time to Detect (MTTD) or Mean Time to Respond (MTTR) in the AWS environment.

Setting up each usage pattern

In this section, we’ll go over the steps for setting up each usage pattern

Usage pattern 1: Dashboard for application owners

Use the following steps to set up a Security Hub dashboard for an account owner, where the owner can view and take action on security findings.

Prerequisites for pattern 1

This solution has the following prerequisites:

  1. Enable AWS Security Hub to check your environment against security industry standards and best practices.
  2. Next, enable the AWS service integrations for all accounts and Regions as desired. For more information, refer to Enabling all features in your organization.

Set up read-only permissions for the AWS application owner

The following steps are commonly performed by the security team or those responsible for creating AWS Identity and Access Management (IAM) policies.

  • Assign the AWS managed permission policy AWSSecurityHubReadOnlyAccess to the principal who will be assuming the role. Figure 1 shows an image of the permission statement.
    Figure 1: Assign permissions

    Figure 1: Assign permissions

  • (Optional) Create custom insights in Security Hub. Using custom insights can provide a view of areas of interest for an application owner; however, creating a new insights view is not allowed unless the following additional set of permissions are granted to the application owner role or user.
    {
    "Effect": "Allow",
    "Action": [
    "securityhub:UpdateInsight",
    "securityhub:DeleteInsight",
    "securityhub:CreateInsight"
    ],
    "Resource": "*"
    }

Pattern 1 walkthrough: View the application owner’s security findings

After the read-only IAM policy has been created and applied, the application owner can access Security Hub to view the dashboard, which provides the application owner with a view of the overall security posture of their AWS resources. In this section, we’ll walk through the steps that the application owner can take to quickly view and assess the compliance and security of their AWS resources.

To view the application owner’s dashboard in Security Hub

  1. Sign into the AWS Management Console and navigate to the AWS Security Hub service page. You will be presented with a summary of the findings. Then, depending on the security standards that are enabled, you will be presented with a view similar to the one shown in Figure 2.
    Figure 2: Summary of aggregated Security Hub standard score

    Figure 2: Summary of aggregated Security Hub standard score

    Security Hub generates its own findings by running automated and continuous checks against the rules in a set of supported security standards. On the Summary page, the Security standards card displays the security scores for each enabled standard. It also displays a consolidated security score that represents the proportion of passed controls to enabled controls across the enabled standards.

  2. Choose the hyperlink of a security standard to get an additional summarized view, as shown in Figure 3.
    Figure 3: Security Hubs standards summarized view

    Figure 3: Security Hubs standards summarized view

  3. As you choose the hyperlinks for the specific findings, you will get additional details, along with recommended remediation instructions to take.
    Figure 4: Example of finding details view

    Figure 4: Example of finding details view

  4. In the left menu of the Security Hub console, choose Findings to see the findings ranked according to severity. Choose the link text of the finding title to drill into the details and view additional information on possible remediation actions.
    Figure 5: Findings example

    Figure 5: Findings example

  5. In the left menu of the Security Hub console, choose Insights. You will be presented with a collection of related findings. Security Hub provides several managed insights to get you started with assessing your security posture. As shown in Figure 6, you can quickly see if your Amazon Simple Storage Service (Amazon S3) buckets have public write or read permissions. This is just one example of managed insights that help you quickly identify risks.
    Figure 6: Insights view

    Figure 6: Insights view

  6. You can create custom insights to track issues and resources that are specific to your environment. Note that creating custom insights requires IAM permissions, as described earlier in the Prerequisites for Pattern 1 section. Use the following steps to create a custom insight for compliance status.

    To create a custom insight, use the Group By filter and select how you want your insights to be grouped together:

    1. In the left menu of the Security Hub console, choose Insights, and then choose Create insight in the upper right corner.
    2. By default, there will be filters included in the filter bar. Put the cursor in the filter bar, choose Group By, choose Compliance Status, and then choose Apply.
      Figure 7: Creating a custom insight

      Figure 7: Creating a custom insight

    3. For Insight name, enter a relevant name for your insight, and then choose Create insight. Your custom insight will be created.

In this scenario, you learned how application owners can quickly assess the resources in an AWS account and get details about security risks and recommended remediation steps. For a more hands-on walkthrough that covers how to use Security Hub, consider spending 2–3 hours going through this AWS Security Hub workshop.

Usage pattern 2: A single pane of glass for security professionals

To use Security Hub as a centralized source of security insight, we recommend that you choose to accept security data from the available integrated AWS services and third-party products that generate findings. Check the lists of available integrations often, because AWS continues to release new services that integrate with Security Hub. Figure 8 shows the Integrations page in Security Hub, where you can find information on how to accept findings from the many integrations that are available.

Figure 8: Security Hub integrations page

Figure 8: Security Hub integrations page

Solution architecture and workflow for pattern 2

As Figure 9 shows, you can visualize Security Hub as the centralized security dashboard. Here, Security Hub can act as both the consumer and issuer of findings. Additionally, if you have security findings you want sent to Security Hub that aren’t provided by a AWS Partner or AWS service, you can create a custom provider to provide the central visibility you need.

Figure 9: Security Hub findings flow

Figure 9: Security Hub findings flow

Because Security Hub is integrated with many AWS services and partner solutions, customers get improved security visibility across their AWS landscape. With the integration of Amazon Detective, it’s convenient for security analysts to use Security Hub as the centralized incident triage starting point. Amazon Detective is a security incident response service that can be used to analyze, investigate, and quickly identify the root cause of potential security issues or suspicious activities by collecting log data from AWS resources. To learn how to get started with Amazon Detective, we recommend watching this video.

Programmatically remediate high-volume workflows

Security teams increasingly rely on monitoring and automation to scale and keep up with the demands of their business. Using Security Hub, customers can configure automatic responses to findings based upon preconfigured rules. Security Hub gives you the option to create your own automated response and remediation solution or use the AWS provided solution, Security Hub Automated Response and Remediation (SHARR). SHARR is an extensible solution that provides predefined response and remediation actions (playbooks) based on industry compliance standards and best practices for security threats. For step-by-step instructions for setting up SHARR, refer to this blog post.

Routing to alerting and ticketing systems

For incidents you cannot or do not want to automatically remediate, either because the incident happened in an account with a production workload or some change control process must be followed, routing to an incident management environment may be necessary. The primary goal of incident response is reducing the time to resolution for critical incidents. Customers who use alerting or incident management systems can integrate Security Hub to streamline the time it takes to resolve incidents. ServiceNow ITSM, Slack and PagerDuty are examples of products that integrate with Security Hub. This allows for workflow processing, escalations, and notifications as required.

Additionally, Incident Manager, a capability of AWS Systems Manager, also provides response plans, an escalation path, runbook automation, and active collaboration to recover from incidents. By using runbooks, customers can set up and run automation to recover from incidents. This blog post walks through setting up runbooks.

Usage pattern 3: Centralized routing to a SIEM solution

Here, we will describe how to use Splunk as an AWS Partner SIEM solution. However, note that there are many other SIEM partners available in the marketplace; the instructions to route findings to those partners’ platforms will be available in their documentation.

Solution architecture and workflow for pattern 3

Figure 10: Security Hub findings ingestion to Splunk

Figure 10: Security Hub findings ingestion to Splunk

Figure 10 shows the use of a Security Hub delegated administrator that aggregates findings across multiple accounts and Regions, as well as other AWS services such as GuardDuty, Amazon Macie, and Inspector. These findings are then sent to Splunk through a combination of Amazon EventBridge, AWS Lambda, and Amazon Kinesis Data Firehose.

Prerequisites for pattern 3

This solution has the following prerequisites:

  • Enable Security Hub in your accounts, with one account defined as the delegated admin for other accounts within AWS Organizations, and enable cross-Region aggregation.
  • Set up third-party SIEM solution; you can visit the AWS marketplace for a list of our SIEM partners. For this walkthrough, we will be using Splunk, with the Security Hub app in Splunk and an HTTP Event Collector (HEC) with indexer acknowledgment configured.
  • Generate and deploy a CloudFormation template from Splunk’s automation, provided by Project Trumpet.
  • Enable cross-Region replication. This action can only be performed from within the delegated administrator account, or from within a standalone account that is not controlled by a delegated administrator. The aggregation Region must be a Region that is enabled by default.

Pattern 3 walkthrough: Set up centralized routing to a SIEM

To get started, first designate a Security Hub delegated administrator and configure cross-Region replication. Then you can configure integration with Splunk.

To designate a delegated administrator and configure cross-Region replication

  1. Follow the steps in Designating a Security Hub administrator account to configure the delegated administrator for Security Hub.
  2. Perform these steps to configure cross-Region replication:
    1. Sign in to the account to which you delegated Security Hub administration, and in the console, navigate to the Security Hub dashboard in your desired aggregation Region. You must have the correct permissions to access Security Hub and make this change.
    2. Choose Settings, choose Regions, and then choose Configure finding aggregation.
    3. Select the radio button that displays the Region you are currently in, and then choose Save.
    4. You will then be presented with all available Regions in which you can aggregate findings. Select the Regions you want to be part of the aggregation. You also have the option to automatically link future Regions that Security Hub becomes enabled in.
    5. Choose Save.

You have now enabled multi-Region aggregation. Navigate back to the dashboard, where findings will start to be replicated into a single view. The time it takes to replicate the findings from the Regions will vary. We recommend waiting 24 hours for the findings to be replicated into your aggregation Region.

To configure integration with Splunk

Note: These actions require that you have appropriate permissions to deploy a CloudFormation template.

  1. Navigate to https://splunktrumpet.github.io/ and enter your HEC details: the endpoint URL and HEC token. Leave Automatically generate the required HTTP Event Collector tokens on your Splunk environment unselected.
  2. Under AWS data source configuration, select only AWS CloudWatch Events, with the Security Hub findings – Imported filter applied.
  3. Download the CloudFormation template to your local machine.
  4. Sign in to the AWS Management Console in the account and Region where your Security Hub delegated administrator and Region aggregation are configured.
  5. Navigate to the CloudFormation console and choose Create stack.
  6. Choose Template is ready, and then choose Upload a template file. Upload the CloudFormation template you previously downloaded from the Splunk Trumpet page.
  7. In the CloudFormation console, on the Specify Details page, enter a name for the stack. Keep all the default settings, and then choose Next.
  8. Keep all the default settings for the stack options, and then choose Next to review.
  9. On the review page, scroll to the bottom of the page. Select the check box under the Capabilities section, next to the acknowledgment that AWS CloudFormation might create IAM resources with custom names.

    The CloudFormation template will take approximately 15–20 minutes to complete.

Test the solution for pattern 3

If you have GuardDuty enabled in your account, you can generate sample findings. Security Hub will ingest these findings and invoke the EventBridge rule to push them into Splunk. Alternatively, you can wait for findings to be generated from the periodic checks that are performed by Security Hub. Figure 11 shows an example of findings displayed in the Security Hub dashboard in Splunk.

Figure 11: Example of the Security Hub dashboard in Splunk

Figure 11: Example of the Security Hub dashboard in Splunk

Conclusion

AWS Security Hub provides multiple ways you can use to quickly assess and prioritize your security alerts and security posture. In this post, you learned about three different usage patterns that we have seen our customers implement to take advantage of the benefits and integrations offered by Security Hub. Note that these usage patterns are not mutually exclusive, but can be used together as needed.

To extend these solutions further, you can enrich Security Hub metadata with additional context by using tags, as described in this post. Configure Security Hub to ingest findings from a variety of AWS Partners to provide additional visibility and context to the overall status of your security posture. To start your 30-day free trial of Security Hub, visit AWS Security Hub.

If you have feedback about this blog post, submit comments in the Comments section below. If you have questions about this blog post, please start a new thread on the Security Hub forum or contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Tim Holm

Tim Holm

Tim is a Principal Solutions Architect at AWS, he joined AWS in 2018 and is deeply passionate about security, cloud services, and building innovative solutions that solve complex business problems.

Danny Cortegaca

Danny Cortegaca

Danny is a Senior Security Specialist at AWS. He joined AWS in 2021 and specializes in security architecture, threat modelling, and driving risk focused conversations.

Simplify setup of Amazon Detective with AWS Organizations

Post Syndicated from Karthik Ram original https://aws.amazon.com/blogs/security/simplify-setup-of-amazon-detective-with-aws-organizations/

Amazon Detective makes it easy to analyze, investigate, and quickly identify the root cause of potential security issues or suspicious activities by collecting log data from your AWS resources. Amazon Detective simplifies the process of a deep dive into a security finding from other AWS security services, such as Amazon GuardDuty and AWS SecurityHub. Detective uses machine learning, statistical analysis, and graph theory to build a linked set of data that enables customers to easily conduct faster and more efficient security investigations.

In this post you will learn about the new AWS Organizations integration with Amazon Detective, where new and existing Detective customers can delegate any account in their organization to be the delegated Detective administrator account, and can centrally manage the Detective behavior graph database for an organization with up to 1,200 accounts.

Customers tell us that they want to manage security findings and investigations across multiple AWS Accounts. Depending on the customer this can be 100s or 1000s of accounts. AWS Organizations integration with security services, including GuardDuty, Security Hub and AWS IAM Access Analyzer comes in handy by helping customers centralize management and governance of their environments as they scale and grow their AWS accounts and resources. Adding to the list, Detective is now integrated with AWS Organizations to simplify security posture management across all existing and future AWS accounts across an organization. Organizations integration is available in all AWS Regions that Detective supports.

Detective is aware of your existing delegated administrator accounts for other AWS Security services such as GuardDuty or Security Hub. Using this awareness, Detective recommends that you choose the same account as the administrator account for Detective, as shown in Figure 1. For a more complete walk though of how to enable your accounts, visit the AWS Detective Documentation.

Figure 1. Setting delegated administrator

Figure 1. Setting delegated administrator

You can then use the same account to manage all of your security services. AWS recommends you align your Detective administrator account with your GuardDuty and SecurityHub administrator accounts, to enable seamless integration between Detective and those services.

  • In GuardDuty or Security Hub, when viewing details for a GuardDuty finding, you can pivot from the finding details to the Detective finding profile.
  • In Detective, when investigating a GuardDuty finding, you can choose the option to archive that finding.

Once designated, the chosen account becomes the administrator account for the organization behavior graph. They can enable any organization account as a member account in the organization behavior graph, and can configure Detective to automatically enable organization accounts when they join the organization.

Figure 2. Auto-enabling Organization accounts

Figure 2. Auto-enabling Organization accounts

The Detective administrator account can also manually invite other accounts to join the organization behavior graph.

Figure 3. Inviting accounts to join the Organization behavior graph

Figure 3. Inviting accounts to join the Organization behavior graph

From Detective, the administrator account can centrally conduct security investigations across the organization

Considerations for AWS Organizations support

Some considerations and recommendations around Organizations support for Detective:

  1. Detective allows up to 1,200 member accounts in each behavior graph.
  2. The Detective administrator account becomes the administrator account for the organization’s behavior graph.
  3. An account can be a member account of multiple behavior graphs in the same Region. An account can accept multiple invitations. An organization account can be enabled as a member account in the organization behavior graph, and can then also accept invitations to other behavior graphs.
  4. An account can only be the administrator account of one behavior graph per Region, but can be an administrator account in different Regions.
  5. Changes to an organization are not immediately reflected in Detective. For most changes, such as new and removed organization accounts, it can take up to an hour for Detective to be notified.

Other recent updates from Amazon Detective

Additional support for all GuardDuty findings

With the recent expansion of security investigation support for Amazon Simple Storage Service (S3) and DNS-related findings on Amazon GuardDuty, Amazon Detective now provides full coverage of all detections from GuardDuty. Security analysts can now easily investigate and analyze the root cause of all GuardDuty findings using Detective, using the Investigate in Detective option in GuardDuty and Security Hub for further investigation.

New resource focused view

In addition to these integrations with AWS Organization and GuardDuty, Detective now makes it even easier for a security analyst to investigate entities and behaviors using a revamped user experience as seen in Figure 4. Amazon Detective presents a unified view of user and resource interactions over time, with all the context and details in one place, to help you quickly analyze the root cause of a security finding.

Figure 4. New resource focused view

Figure 4. New resource focused view

New finding overview

The new finding overview provides an expanded set of details for each finding, and provides links to the profiles for each involved entity as seen in the right panel in Figure 4. With this unified view, you can visualize all of the details and context in one place, while identifying the underlying reasons for the findings. This resource-focused view helps you understand the connections between resources affected by a security finding, and further helps you drill down into relevant historical activity to quickly determine the root cause.

Integration with Splunk

Amazon Detective, in coordination with the Splunk Trumpet project, has released the ability to pivot from an Amazon GuardDuty finding in Splunk directly to an Amazon Detective entity profile. Customers can now quickly identify the root cause of potential security issues or suspicious activities. This setting can be enabled on the Splunk Trumpet project installation page by selecting Detective GuardDuty URLs from the AWS CloudWatch Events dropdown.

Amazon Detective’s interactive visualizations make it easy to investigate and analyze issues more thoroughly and effectively, with minimal effort. Using these visualizations, customers can easily filter large sets of event data into specific timelines, with all the details, context, and guidance needed to help you to investigate quickly. For example; Amazon Detective enables you to view login attempts on a geolocation map, drill down into relevant historical activity, quickly determine a root cause, and if necessary, take action to resolve the issue.

Amazon Detective makes it easy to analyze, investigate, and quickly identify the root cause of potential security issues. To get started, enable a 30-day free trial of Amazon Detective with just a few clicks in the AWS Management console. See the AWS Regions page for a list of all Regions where Detective is available. To learn more, visit the Amazon Detective product page.

 
If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security news? Follow us on Twitter.

Karthik Ram

Karthik Ram

Karthik is a Senior Solutions Architect with Amazon Web Services based in Columbus, Ohio. He has a background in Networking and Infrastructure architecture. At AWS, Karthik helps customers build secure and innovative cloud solutions, solving their business problems using data driven approaches. Karthik’s Area of Depth is Cloud Security with a focus on Threat Detection and Incident Response (TDIR).

Author

Ross Warren

Ross Warren is a Senior Solution Architect at AWS based in Northern Virginia. Prior to his work at AWS, Ross’ areas of focus included cyber threat hunting and security operations. Ross has worked at a handful of startups and has enjoyed the transition to AWS because he can continue to build solutions for customers on today’s most innovative platform.

Managing complexity in Zabbix installations with Splunk

Post Syndicated from Christian Anton original https://blog.zabbix.com/managing-complexity-in-zabbix-installations-with-splunk/13053/

A big data analytics engine can be used to optimize large and complex Zabbix installations: keeping track of the amount and kind of problems over time, top alert producers, and much more. You can employ Splunk to optimize and analyze vital Zabbix runtime parameters, such as ‘unsupported items,’ repeatedly happening host availability issues, misconfigured agents, and Zabbix Queue entries.

Contents

I. Complexity (1:15)
II. Zabbix entity inventory (8:28)
III. Use cases (15:16)
IV. Conclusion (20:09)
V. Questions & Answers (21:41)

secadm GmbH is a service provider located in the south of Germany. The company with a strong background in monitoring and automation, network infrastructure, and security software development supports customers of all sizes to manage their IT infrastructures. secadm GmbH is a Zabbix partner and also a Zabbix training partner.

Complexity

Operating a Zabbix deployment of a specific size comes with some challenges:

  • A huge number of hosts, templates, items, host groups, macros, and configuration elements inside your Zabbix instance.
  • LLD rules/unsupported items — items that are unable to fetch information, for example, a wrong password or a wrong path, in an external check. It is often hard to get a hold of how many of those you have and in which of the various error states. Therefore, it’s also difficult to fix them.
  • Host availability/network issues — errors that you see only in the logs — things going up and down, losing their connectivity, but getting back in time before issuing an alert.
  • Queue entries. In larger Zabbix installations, you might have ten thousand or even more items in this service queue. Zabbix actually tells you that some items do not receive their data in time. Zabbix shows that something is really wrong, though it doesn’t give a hint about what is wrong.
  • Zabbix as a monitoring tool is there to actually generate problems and alerts out of these problems. Many problems often cause ‘alert fatigue’ when people start ignoring monitoring results because of too many alerts.

Therefore, we receive a lot of questions from our customers, such as:

  • Where do all these problems come from?
  • What are the hosts generating most of the problems, at what times, and generated by what templates?
  • Did the latest change/upgrade have any negative impact on our monitoring?
  • Can you get rid of unsupported items?
  • How many hosts have specific problems (for instance, caused by a known bug in an old version of an agent that behaves strangely with a specific version of the Zabbix server), and what would be the effect if we fixed those problems?
  • Where do all these queue entries come from?

Zabbix is a transparent and predictable monitoring tool that offers great ways to organize the monitored elements with templates and macros. Zabbix also offers excellent visualization capabilities. However, Zabbix is not an analytical utility offering a flexible query language to gather the required information in the required format, having on-demand statistical functions, and allowing you to enrich and correlate data with the data from arbitrary sources. So, extra tools will have to do the extra work.

Secadm GmbH being the partner of Zabbix and Splunk, has concluded that it’s obvious to use Splunk for such extra work. Splunk is offering many possibilities to onboard data in the platform far beyond the simple indexing of log data, looking up the Key-Value store, implementing scripts and programs inside the Splunk platform to fetch data in real-time and on-demand out of other systems without having to store and to index any kind of information, as well as performing custom search commands.

Zabbix entity inventory

The most important Zabbix data used for analysis — the inventory of all elements inside Zabbix that do not often change, such as:

  • Hosts,
  • Items,
  • Proxies,
  • Templates,
  • Triggers,
  • Discovery rules (LLD),
  • Item Prototypes, and
  • Trigger Prototypes.

As this data is not changing constantly, we fetch this data from Splunk with the scheduled search and custom search command directly from the API endpoints in Zabbix. Then we can store this information inside the Splunk KV Store, which is, in fact, the MongoDB allowing us to perform searches in milliseconds without having to index any data and quickly get the results.

Zabbix entity inventory

So, you can get statistics on status and state to drill down on the unsupported items for a list of all of the items. You can further identify the correlation for the hostnames instead of host IDs, which are not human-readable. The hostnames are available at the KV Store, which stores the hosts with their metadata. You can also identify how many unsupported items there are on each host.

You can also get information on the hostnames, hosts, item names, item types, and errors. You can categorize the problems as SNMP problems, shell problems such as wrong paths, and see how often certain problems happen and what hosts are assigned to what templates and host groups, and so on. This information may also be aggregated or correlated with information from UCMDB.

More data

More fun than having data within a data analytics platform has more data.

  • Indexing the Zabbix Server / Proxy Logs logs, categorizing events to identify availability issues, item problems, preprocessing problems, housekeepers statistics, etc.

  • A module to fetch information from Zabbix (item, host, trigger) in real-time.

  • Gathering metrics (History / Trends data) directly from Zabbix in real-time without the need to store these metrics in any place other than the Zabbix database. We can still use the data for graphing, correlations, calculations, etc.

  • Onboarding the Zabbix problems into Splunk by using the new custom Media types — Webhook.

Custom Media type

  • Correlation of the alert logs, which are new and available through the API since Zabbix 5.0.
  • Working on the queue items to solve these questions.

Use cases

Zabbix queue

Zabbix queue may be a real headache as you can wait for a Zabbix installation with 20,000-50,000 items for 5 or 10 minutes or even longer.

In this dashboard, the same view is displayed in Zabbix: items are categorized by overtime, item type, proxy, etc. Splunk here offers what Zabbix fails — the history so that you can see the spikes when things have changed dramatically. For instance, when more significant network changes happen, the network slows down, and the queues grow dramatically. You can see whether these queues have gone back down or remained up. This information is complicated to analyze in Zabbix.

You can also drill down to see the items correlated with their actual status and the host’s status inside Zabbix. So, you can clearly see, for instance, that an item is on the host that is down or in the queue as it’s not supported and doesn’t get any data.

Here, there is also an Ignore list. So you can get statistics for the remaining items and group them, for instance, by Item type. You can go further and analyze and fix the problems.

Zabbix problems analytics

Zabbix problems dashboard

In this dashboard, Zabbix problems are displayed by system categories. For instance, we can see that over the last 24 hours, Windows caused most of the problems.

Here, we can also drill down to see, for instance, if there are many similar problems. You can go further to identify a single issue that has caused many alerts or problems. You can see that one host is creating almost all of the problems. So, if you switch this one host off, you would have fewer problems.

Zabbix data for management visibility

We can use Zabbix data for greater management visibility, such as:

  • Correlation of data to generate meaningful dashboards:

— Zabbix (metrics, status, problems, etc.),
— application logs,
— other data sources,
— inventory (CMDB, …)

  • Business-level visualization.

Conclusion

Splunk is open-source software and is distributed for free. We are currently in the process of integrating Splunk with Zabbix.

If you are interested in Splunk, you can send a request to [email protected]  or look for Christian Anton on LinkedIn or Instagram.

Questions & Answers

Question. If we use this kind of integration, are there any performance issues caused by Splunk or some misconfiguration?

Answer. We have been using Splunk for installations with several tens of thousands of monitored hosts and from hundreds of thousands up to millions of items and have not seen any performance implications.

Question. How does this connector work under the hood? Does it use the API or direct queries to the database?

Answer. We rely on the API. Besides, we can fetch the data directly from the database.