Tag Archives: Technical

Aruba Central API Monitoring with Zabbix

Post Syndicated from Tibor Volanszki original https://blog.zabbix.com/aruba-central-api-monitoring-with-zabbix/31370/

Aruba Central is a SaaS solution that allows you to manage your Enterprise Aruba network environment. Due to the increasing number of cloud migrations, we can expect that more and more Aruba customers will move their on-premise environment to it, which will also mean a change in their monitoring environment. In this article, I will show you how to switch to API- based monitoring using Aruba Central and Zabbix. All custom resources mentioned can be found in my repository.

Aruba Central’s API

Oauth 2.0 is used, so you can forget the simple token management. At the end it is great, but for monitoring purposes it is overkill. There is pretty good documentation (referred to later) regarding how you can generate your access token, but after two hours it expires so you need to continually refresh it. To do this, you must use a refresh token, which can help you to get a new access token AND a new refresh token.

Within two hours, use the latest refresh token to repeat this action again. At this point you can imagine that this is not something you can implement easily by using the Zabbix GUI only. Well, maybe with some javascript magic, but otherwise there is no native support for this logic at this point of time. So how can we do this? In short:

  1. Generate your client credentials
  2. Generate your first token
  3. Schedule the token refresh for every two hours
  4. Update your host macro via Zabbix API
  5. Use the token in Zabbix HTTP agent checks
  6. Monitor your environment based on JSONPath pre-processing

Initial steps within Aruba Central

To manage your API access, you need to launch your “HPE Aruba Networking Central” application, so do NOT look into your workspace modules – the “Personal API clients” menu is NOT what we are looking for. Turn off the “New Central” view – at this point the early access version is not so useful (hopefully it will change soon).

The first time you get there, you will not see any items, but under the “My Apps & Tokens” tab you can click the “Add Apps & Tokens” button and generate it. Technically, this is already enough to start to monitoring your network infrastructure, but within two hours it would stop. So the relevant data for us are the “Client ID” and “Client Secret.” Feel free to revoke the recently created token at the bottom area as we do not need it.

Record your credentials

For this article, I am using a simple file to store all the credentials, which will be sourced into a bash script. Please keep in mind that storing your sensitive credentials in a single file is a BAD practice! Your SECO/CISO would probably have a few words with you about it, so please consider a better approach. A more secure way would be to use some Key Vault solution (like Azure, AWS, Google, or Hashicorp). Anyway, let’s continue with this unsecure example:

#!/bin/bash

### ZABBIX VARS ###

# URL of your zabbix instance (assuming you do not use the "/zabbix" ending, if yes, then add it to the end)
zabbix_url="https://your.zabbix.instance.net"
# Your Zabbix API token. If you do not know how to get it, check the documentation.
zabbix_api_token="1234_your_zabbix_api_key_5678"
# Create a host with a macro, remain at the "Macros" tab, turn on debug mode, look for "[hostmacroid] =>"
zabbix_macro_id="12345"

### ARUBA VARS ###
# To find yours, go here and check "Table: Domain URLs for API Gateway Access"
base_url="YOUR_ARUBA_CENTRAL_BASE_URL"
# Click on your profile in the Central app and you will find it there: 32 char long hexa string
client_id="YOUR_CLIENT_ID"
# provided in the previous step
client_secret="YOUR_CLIENT_ID"
# provided in the previous step
customer_id="YOUR_CUSTOMER_ID"
# your login credential
account_username="YOUR_CENTRAL_LOGIN_USERNAME"
# your login credential
account_password="YOUR_CENTRAL_LOGIN_PASSWORD"
# to be populated later
csrftoken=""
session=""
auth_code=""

Get or refresh your token and update the Zabbix host macro

The next steps are based on the official Aruba documentation, which you can find here. Please remember that there are many ways to achieve our target – this is just one example and probably not the most optimal one. Feel free to change / improve it with your code in your preferred scripting language.

The below script assumes that the file containing the credentials (previous step) is named as “variables” and located in the folder named “central.

Filename: aruba_central_token_new.sh

Purpose: To be used for first time token generation. Later, you only have to refresh your token with the script after this one.

Remarks: Aruba is limiting this API query set, so you can run it only ONCE every 30 minutes! If you made a typo somewhere, wait 30 minutes before your next attempt or tweak the result files.

#!/bin/bash

basedir=central
source $basedir/variables

curl -s --noproxy '*' -v --cookie-jar $basedir/cookie --location --request POST "$base_url/oauth2/authorize/central/api/login?client_id=$client_id" \
--header "Content-Type: application/json" \
--data-raw "{
    \"username\": \"$account_username\",
    \"password\": \"$account_password\"
}" > $basedir/result1.raw 2>&1

grep 'Added cookie' $basedir/result1.raw > $basedir/result1.filtered

csrftoken=$(grep csrftoken $basedir/result1.filtered | awk -F '"' '{print $2}')
session=$(grep session $basedir/result1.filtered | awk -F '"' '{print $2}')

curl -s --noproxy '*' --request POST "$base_url/oauth2/authorize/central/api?client_id=$client_id&response_type=code&scope=all" \
--header "Content-Type: application/json" \
--header "Cookie: session=$session" \
--header "X-CSRF-Token: $csrftoken" \
--data-raw "{
\"customer_id\": \"$customer_id\"
}" > $basedir/result2.raw

auth_code=$(cat $basedir/result2.raw | jq -r .auth_code)

curl -s --noproxy '*' --request POST "$base_url/oauth2/token" \
--header "Content-Type: application/json" \
--data "{
    \"client_id\": \"${client_id}\",
    \"client_secret\": \"${client_secret}\",
    \"grant_type\": \"authorization_code\",
    \"code\": \"${auth_code}\"         
}" > $basedir/result3.raw

refresh_token=$(cat $basedir/result3.raw | jq -r .refresh_token)
access_token=$(cat $basedir/result3.raw | jq -r .access_token)

if [ "$refresh_token" == "null" ]; then
    echo "something went wrong... exiting now"
    exit 1
fi

echo $access_token > $basedir/token_access.latest
echo $refresh_token > $basedir/token_refresh.latest

echo "access_token: $access_token"
echo "refresh_token: $refresh_token"

curl -s --request POST \
--url "$zabbix_url/api_jsonrpc.php" \
--header "Authorization: Bearer $zabbix_api_token" \
--header "Content-Type: application/json-rpc" \
--data "{\"jsonrpc\": \"2.0\",\"method\": \"usermacro.update\",\"params\": {\"hostmacroid\": \"${zabbix_macro_id}\",\"value\": \"${access_token_new}\"},\"id\": 1}"

rm -f $basedir/cookie

Filename: aruba_central_token_refresh.sh

Purpose: To refresh your existing token. It is expecting an existing refresh token in the “token_refresh.latest” file, so better to run the previous script one time before this.

Remarks: You can run this script as many times you want, but it will result in new tokens only once per every two hours (when the current one expires). Therefore, refreshing too frequently is pointless.

#!/bin/bash

basedir=central
source $basedir/variables

refresh_token_current=$(cat $basedir/token_refresh.latest | tr -d '\n')
refresh_token_new=""

curl -s --noproxy '*' --request POST "$base_url/oauth2/token?client_id=$client_id&client_secret=$client_secret&grant_type=refresh_token&refresh_token=$refresh_token_current" > $basedir/result4.raw

refresh_token_new=$(cat $basedir/result4.raw | jq -r .refresh_token)
access_token_new=$(cat $basedir/result4.raw | jq -r .access_token)
expires_in=$(cat $basedir/result4.raw | jq -r .expires_in)

if [ "$refresh_token_new" == "null" ]; then
    echo "something went wrong... exiting now"
    exit 1
fi

echo $access_token_new > $basedir/token_access.latest
echo $refresh_token_new > $basedir/token_refresh.latest

echo "access_token: $access_token_new"
echo "refresh_token: $refresh_token_new"
echo "expires_in: $expires_in"

curl -s --request POST \
--url "$zabbix_url/api_jsonrpc.php" \
--header "Authorization: Bearer $zabbix_api_token" \
--header "Content-Type: application/json-rpc" \
--data "{\"jsonrpc\": \"2.0\",\"method\": \"usermacro.update\",\"params\": {\"hostmacroid\": \"${zabbix_macro_id}\",\"value\": \"${access_token_new}\"},\"id\": 1}"

In my case, both the scripts and variables files are in the same “central” folder, which is in a git repository. Each time I call one of the scripts, it will record the new tokens in files, which are committed and pushed to the repo. In my own implementation, this is how I call the refresh script and sync the result with my repo:

git checkout master

basedir=central
source $basedir/variables
bash $basedir/aruba_central_token_refresh.sh

git add .
git commit -m "save the new tokens"
git push origin master

Schedule your token management

You must run your refresh script at least once per every two hours. To make this happen you have many options, including:

  • cron (old-school, outdated way)
  • systemctl timer (a better way, but only if it is monitored)
  • Jenkins / Github Actions/etc.
  • Zabbix itself, by calling your bash script

In my case, Jenkins does the scheduling and execution and the job is monitored via Zabbix.

Monitor your network infrastructure

When everything is in place, then the monitoring part is pretty simple. The usual JSONPath based logic can be used. API call documentation can be found here. The template contains only the wireless components, since I do not have my switches in Central. Implementing the switching part should not be difficult – just have a look at the “Switch” section, then clone and adjust one of your “get” items.

Screenshots

Latest data – tag based filtering:

Latest data – Site health

Latest data – Gateway info

Latest data – AP info

Triggers:

Some triggers are intentionally disabled, because they are a bit redundant. However, I wanted to cover all options. Sometimes less alerting is better if you have a ticketing system integration, otherwise your monitoring system will turn into a ticket factory.

Known issues and limitations

Since we are not querying the devices directly, some delay can be expected. Based on my recent testing, the delay compared to real time is between 3-10 minutes. In my test I disconnected my test environment and then started to do manual updates frequently. Some items got the real state earlier, some only later.

If your refresh script will malfunction for whatever reason (normally it should not), then you may have to run the other script once to generate a new token, or you can go to the GUI and check the last refresh token, with which you can override the content of the “token_refresh.latest” file.

Aruba is limiting the number of API queries to 5,000 per day. This could seem annoying, but it is way more than what you need (you should expect less than 1,000 in normal conditions, depending on your update frequency).

Zabbix API will not authorize your call unless you insert a line into your apache vhost configuration. This is a more generic Zabbix API issue that is not related to Aruba Central.

SetEnvIf Authorization "(.*)" HTTP_AUTHORIZATION=$1

If Aruba Central has a maintenance activity, then the token refreshing way could break. Running the token request script once should address the issue.

Summary

Aruba Central’s API is pretty decent, but if you start from zero it could take a while to get to the end of it. With this guide, my intention was to speed you up, but please do not consider my scripts and the shown example as the only or best possible way – I’m just hoping it can give you a good base for your own solution. Have fun!

The post Aruba Central API Monitoring with Zabbix appeared first on Zabbix Blog.

Monitoring Website Changes with Zabbix Browser Item

Post Syndicated from Adi Rusmanto original https://blog.zabbix.com/monitoring-website-changes-with-zabbix-browser-item/31684/

In today’s digital era, information is an asset and most of it is obtained from websites. The ability to automatically monitor website content changes has become a crucial competitive advantage, as even small changes on a website can affect business strategies, security postures, and data-driven decision-making. Accordingly, Zabbix 7.0 saw the introduction of a new feature called Browser Item, which allowed users to perform advanced website monitoring using a browser.

The Browser Item feature includes the ability to:

● Capture screenshots of the current website state
● Measure website performance and availability metrics
● Extract and analyze data from web pages
● Generate automatic alerts based on detected changes or errors

This means Zabbix is no longer limited to traditional IT infrastructure monitoring. It can now also serve as a tool for monitoring strategic external information.

Key use cases for website change monitoring with Zabbix

The Zabbix Browser Item opens up many valuable use cases for organizations that want to proactively track website changes. Below are some key examples:

Monitoring release notes

Tracking vendor release notes is essential for IT teams. With Zabbix, we can automatically detect new releases, extract relevant information, and notify the appropriate team members so they can respond faster.

Tracking security advisories

Security advisories are critical for maintaining a strong security posture. By monitoring websites that publish vulnerability information using Zabbix, security teams can be promptly alerted about new threats and take timely actions to reduce risks.

Monitoring competitor websites

In a competitive market, staying informed about competitor activities is vital. Zabbix allows users to monitor competitor websites for pricing updates, new product offerings, marketing campaigns, or news announcements, while providing valuable business intelligence to support strategic decisions.

Monitoring tender announcements

Zabbix can also monitor websites for new tender announcements from government portals or business partners, ensuring our organization stays aware of the latest business opportunities.

Ensuring internal website integrity

Beyond external sites, we can also use the Browser Item to ensure the integrity and availability of our own websites. It helps detect unexpected content changes, broken links, or performance degradation that may affect the user experience or signal potential issues. Proactive monitoring helps maintain a high-quality user experience and protect our brand reputation.

Getting started with website change monitoring in Zabbix

Solution overview architecture

This diagram shows how Zabbix uses a WebDriver to capture and analyze website content.
The collected data is stored in Zabbix for visualization and alerts when changes are
detected.

Step-by-step configuration

In this example, we’ll monitor changes on the Nginx Security Advisories webpage.

Step 1: Prepare the Web Driver

Zabbix requires a Web Driver to perform browser-based monitoring. One commonly used option is Selenium, which can be deployed using the following Docker image:

https://hub.docker.com/r/selenium/standalone-chrome

Step 2: Configure WebDriverURL on Zabbix server or proxy

Update the WebDriverURL parameter in your Zabbix Server or Zabbix Proxy configuration to point to the Selenium service you deployed.

Step 3: Create a Browser Item in Zabbix

1. Create a host if it doesn’t already exist.

2. Add a new item with the following settings:

  • Type: Browser
  • Type of information: Text

The key part is the script section. Below is the example script.

The script uses two methods:

  • browser.navigate method defines the URL to be monitored
  • browser.findElements method specifies the page section where changes should be detected

Note: The StartBrowserPollers parameter must be enabled on the Zabbix server or proxy configuration for browser items to work. It is enabled by default with the value StartBrowserPollers=1.

Step 4: Create dependent items

The Browser Item produces a JSON result containing website data. This item serves as the master item for dependent items such as:

  • Extracting the latest security advisories
  • Capturing a website screenshot

Step 5: Create a trigger for change alerts

Create a trigger that compares the current and previous values of the “latest security advisories” item. If any change is detected, Zabbix will automatically send an alert notifying your team of the update.

Step 6: Display data on the dashboard

To visualize the monitored data, we can use the Item History widget on a Zabbix dashboard to show both the latest security advisories and the corresponding screenshot, for example.

Conclusion

The Browser Item feature in Zabbix 7.0 elevates website monitoring beyond simple availability checks. It enables comprehensive monitoring of website changes, unlocking a variety of use cases such as tracking release notes, security advisories, competitor activity, and more.

If you’re interested in implementing this capability, feel free to contact us. Bangunindo is a Zabbix Premium Partner in Indonesia, ready to help you design, implement, and optimize your Zabbix monitoring solution to fit your specific needs.

The post Monitoring Website Changes with Zabbix Browser Item appeared first on Zabbix Blog.

Monitoring a Starlink Dish with Zabbix

Post Syndicated from Alexander Petrov-Gavrilov original https://blog.zabbix.com/monitoring-a-starlink-dish-with-zabbix/31543/

Did you realize that you can monitor a Starlink dish using just Zabbix? The idea (or rather the need) to use Starlink came to me almost as soon as I moved to a fairly rural area. Local internet providers have not yet “provided” fiberoptic or stable mobile connectivity to places like this, and while searching for a solution I accidentally discovered that Starlink was already providing service to some local companies. As I later found out, they also offered service in my area for residential customers.

To make a long story short, since internet access is crucial in the IT field, I decided to acquire and then monitor my very own Starlink dish. At first, this proved challenging because regular user data access is quite limited. However, thanks to Zabbix browser monitoring, I managed to solve it fairly easily. In this post I will share my solution with you, including the template.

Monitoring configuration

First, you need to make sure you have Zabbix installed (either a Zabbix proxy or server) on the same network that the Starlink dish and router are on. The next step is to configure Zabbix for browser monitoring.

WebDriver installation
# podman run --name webdriver -d \
-p 4444:4444 \ 
-p 7900:7900 \
--shm-size="2g" \
--restart=always -d docker.io/selenium/standalone-chrome:latest

Port 4444 will be the port on which the WebDriver will be listening, and port 7900 will be used by NoVNC, which allows us to observe browser behavior in case a browser with a GUI is used.

Zabbix server/proxy configuration

After WebDriver is installed, we need to set up the communication between Zabbix and the driver. This can be done by editing the Zabbix server/proxy configuration file and updating the following parameters:

### Option: WebDriverURL 
# WebDriver interface HTTP[S] URL. For example http://localhost:4444 used with 
# Selenium WebDriver standalone server. 
# 
# WebDriverURL= 
WebDriverURL=http://localhost:4444 
### Option: StartBrowserPollers 
# Number of pre-forked instances of browser item pollers. 
# 
# Range: 0-1000 
# StartBrowserPollers=1 
StartBrowserPollers=5

With the configuration parameters in place, restart the Zabbix server/proxy to apply the changes:

systemctl restart zabbix-server
Creating a host

First, we need to navigate to the “Data collection” > “Hosts” section and create a host that represents our Starlink dish. The host in my example will look like this:

Starlink dish host
Starlink dish host

The host also has a user macro:

{$LINK} with value: http://webapp.starlink.com to point to the correct Starlink dish web app:

Link macro
Link macro
Creating a browser item

We will now configure our browser item to collect and monitor the list of metrics exposed in the Starlink browser app:

Starlink browser item
Starlink browser item

We are using the bare minimum here, so make sure the update intervals are as frequent as you need. However, I would not recommend updating it more frequently than every 5 minutes. It’s also not a good idea to store the history, since it is already stored trough dependent items.

The most important part of the item is the script itself:

var browser, result;
var opts = Browser.chromeOptions();

opts.capabilities.alwaysMatch['goog:chromeOptions'].args = [];
browser = new Browser(opts);
browser.setScreenSize(Number(1980), Number(1020));

try {
    var params = JSON.parse(value);
    browser.navigate(params.url);

 // Wait for the dish to report status
    Zabbix.sleep(2000);

    // Find the JSON text element(s)
    var jsonElements = browser.findElements("xpath", "//div[@id='root']/div[@class='App']/div[@class='Main']/div[2]/div[@class='Section'][2]/pre[@class='Json-Format']/div[@class='Json-Text']");
    var extractedData = [];

    for (var i = 0; i < jsonElements.length; i++) {
        var text = jsonElements[i].getText();

        // Try parsing JSON
        try {
            extractedData.push(JSON.parse(text));
        } catch (e) {
            // If not valid JSON, include raw text instead
            extractedData.push({ raw: text, error: "Invalid JSON format" });
        }
    }

    // Collect result 
    result = browser.getResult();

    // Replace with parsed JSON data
    result.extractedJsonData = extractedData.length === 1 ? extractedData[0] : extractedData;

}
catch (err) {
    if (!(err instanceof BrowserError)) {
        browser.setError(err.message);
    }
    result = browser.getResult();
}
finally {
    // Return a clean JSON object
    return JSON.stringify(result.extractedJsonData);
}

So what does this script do? It opens the Starlink web app, waits for the Starlink dish to output all the status data, and, after a bit of parsing, returns the data highlighted in the screenshot:

Starlink dish diagnostic data
Starlink dish diagnostic data

Now we can click on the three dots on the left of our newly created item in the items page and proceed to create dependent items for each value we are interested in!

Creating dependent items

Now we just click here:

As an example, to create an item that monitors the hardware version we can create an item like this:

Hardware version dependent item
Hardware version dependent item

With JSONPath preprocessing:

Hardware version item preprocessing
Hardware version item preprocessing

In the end we get the data in Zabbix:

Starlink dish hardware version
Starlink dish hardware version

All other items (except alerts) will follow the same logic – just update the item name, key, and JSONPath in preprocessing to extract the required values.

Creating dependent LLD item prototypes

To automate the alerts items creation, we can create a dependent discovery rule. In the “Discovery” section, create a new discovery rule:

Starlink dish alerts discovery
Starlink dish alerts discovery

With preprocessing using Java Script:

var data = JSON.parse(value);
var alerts = data.alerts;
var lld = [];

for (var key in alerts) {
    if (alerts.hasOwnProperty(key)) {
        lld.push({
            "{#ALERT}": key
        });
    }
}

return JSON.stringify({ data: lld });

This will provide us with following JSON data:

{
  "data": [
    {
      "{#ALERT}": "dishIsHeating"
    },
    {
      "{#ALERT}": "dishThermalThrottle"
    },
    {
      "{#ALERT}": "dishThermalShutdown"
    },
    {
      "{#ALERT}": "powerSupplyThermalThrottle"
    },
    {
      "{#ALERT}": "motorsStuck"
    },
    {
      "{#ALERT}": "mastNotNearVertical"
    },
    {
      "{#ALERT}": "slowEthernetSpeeds"
    },
    {
      "{#ALERT}": "softwareInstallPending"
    },
    {
      "{#ALERT}": "movingTooFastForPolicy"
    },
    {
      "{#ALERT}": "obstructed"
    }
  ]
}

All that’s left ‘to do is to create a dependent item prototype:

Starlink dish alert prototype
Starlink dish alert prototype

With preprocessing, of course:

JSONPath will transform to extract each specific alert and “Boolean to Decimal” will save us some space in the database by tranforming true/false booleans to digits.

Result

In the end, we can monitor all the data:

Starlink dish latest data
Starlink dish latest data

Even more data can be collected using exporters – if you are willing to do a bit of extra configuration, of course! Let me know if you are interested, and I will show you a completely different approach with a template.

Before I forget, the template used in this tutorial can be found  here.

The post Monitoring a Starlink Dish with Zabbix appeared first on Zabbix Blog.

Proxy Group Load Balancing with SNMP Traps

Post Syndicated from Nathan Liefting original https://blog.zabbix.com/proxy-group-load-balancing-with-snmp-traps/31042/

The new Zabbix proxy groups provide us with a method to provide both redundancy and load balancing in our Zabbix proxy setups. However, one major limitation arises when we want to use SNMP traps with these new proxy groups – it isn’t natively supported at the moment. One of our customers asked me to find a solution to that problem, so here’s how I went about it.

Getting to grips with the problem

As mentioned, many of us are now facing a problem. Either we use proxy groups and we don’t use SNMP traps, or we use proxy groups and move SNMP traps to a single proxy. Unfortunately, this is unacceptable for many environments where SNMP traps might be an essential part of monitoring. The problem, however, stems from how snmptrapd works in combination with Zabbix reading the trapper file. Improvements have already been made to provide for more room when creating our own solutions like this.

Other Zabbix users have also been proposing solutions and I’m sure Zabbix is looking into improvements. Here’s an example case to vote on.

However, that doesn’t solve many of our issues now. The problem starts when we are sending SNMP traps to a single proxy (Proxy 1 for example) and a Zabbix host (let’s say Zabbix host 2) is assigned to another proxy in the proxy group (Proxy 2 for example). In this situation, the trap is coming in on an incorrect monitoring proxy and Zabbix won’t be able to read the trap. It will simply not add it to the Zabbix database and ignore it.

The solution here is simple – we can configure our monitoring target like a switch or a router to send the SNMP trap to multiple sources. However, this will cause our trap to be sent over the network multiple times, increasing the load on our network. This is acceptable for smaller setups, but we were dealing with a setup that is sending hundreds of traps every second.

Finding a solution

With the problem laid out for us, we came up with a simple duplication setup that included these requirements:

  1. Simple and easy to maintain/troubleshoot
  2. Traps could only be sent over the network once
  3. Works fast between failovers
  4. Works with both redundancy and load balancing
  5. Minimal extra packages
  6. No easily corruptible shared file systems

What we came up with in the end is visible in the image below:

 

It’s a simple setup that requires us to install 2 extra packages and a container.

First, we added a VIP to our proxy setup using keepalived, to provide our monitoring targets with a single SNMP trap destination. The VIP will be available on one proxy at the time, regardless of whether there are 2, 10 or more proxies in the proxy group. Our switches, routers, or any other SNMP trap host can now be configured to send traps to this VIP.

Second, we needed a way to duplicate our traps. Since only one proxy is going to be receiving traps, the other proxies still need to be able to receive the traps. Without the duplication and the VIP being present on Proxy 1, Zabbix host 2 still would not receive its trap. We installed Docker and created a tiny, lightweight container on our hosts to duplicate the SNMP trap from one proxy to all other proxies in the group. Admittedly this does slightly go against requirement number 2, as we are now sending the trap over the network between proxies. This is, however, all within our own more localized infrastructure instead of over a longer network.

That’s it! Whenever Proxy 1 receives a trap, it will now duplicate it to Proxy 2. The proxy with the host being monitoring will parse the trap correctly to Zabbix and the other proxies will ignore the trap. Even if the proxy restarts, fails over, or suddenly goes down, it will not read the trap twice.

The only thing to keep in mind is that it can take some time for keepalived to fail over the VIP. With SNMP traps being UDP-based, this means that any traps sent to the VIP while snmptrapd is down won’t be parsed. However, it’s definitely better to lose some in case of failover, than to lose all upon outage!

The post Proxy Group Load Balancing with SNMP Traps appeared first on Zabbix Blog.

Understanding HTTP Template Authorization in AWS

Post Syndicated from evgenii.gordymov original https://blog.zabbix.com/understanding-http-template-authorization-in-aws/30856/

Authorization in Amazon Web Services (AWS) determines what actions a user, service, or system can perform on resources. It answers the question: “Does this identity have permission to do this action on that resource?”

In AWS, authorization is primarily handled through:

  • IAM (Identity and Access Management) policies
  • Resource-based policies (like S3 bucket policies)
  • Session-based permissions (like STS AssumeRole)

What authorization types are available in Zabbix AWS templates?

  • Access key authorization
  • Role-based authorization
  • Assume role authorization

Let’s look briefly at each of them.

Before using the template, you need to create an IAM policy that grants the necessary permissions for the AWS services the template will interact with.

This policy defines what actions are allowedon which resources, and optionally, under which conditions. Once created, the policy should be attached to the IAM role or user that will run the template.

IAM policy for Zabbix

Add the following required permissions to your Zabbix IAM policy in order to collect metrics. The policy can change when new metrics and services are added in Zabbix templates.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "cloudwatch:DescribeAlarms",
                "cloudwatch:GetMetricData",
                "ec2:DescribeInstances",
                "ec2:DescribeVolumes",
                "ec2:DescribeRegions",
                "rds:DescribeEvents",
                "rds:DescribeDBInstances",
                "ecs:DescribeClusters",
                "ecs:ListServices",
                "ecs:ListTasks",
                "ecs:ListClusters",
                "s3:ListAllMyBuckets",
                "s3:GetBucketLocation",
                "s3:GetMetricsConfiguration",
                "elasticloadbalancing:DescribeLoadBalancers",
                "elasticloadbalancing:DescribeTargetGroups",
                "ec2:DescribeSecurityGroups",
                "lambda:ListFunctions"
            ],
            "Effect": "Allow",
            "Resource": "*"
        }
    ]
}

To create and attach the policy:

  • Go to IAM → Policies → Create policy
  • Choose JSON and paste your policy
  • Review and create the policy

Access key authorization

1. Attach the required policy to the IAM user

  • Go to IAM → Users → Select a user → Permissions tab
  • Click Attach policies
  • Select the policy you created before (IAM Policy for Zabbix)
  • Click Attach policy

2. Get your access key and secret access key

In the AWS console:

  • Go to IAM → Users → Select a user → Security credentials tab

  • Click Create access key

  • Copy:

    • Access key ID
    • Secret access key

⚠ Never expose your keys publicly!

3. Configure AWS CLI

Open your terminal and run:

configure aws cli

aws configure --profile zabbix_user

You’ll be prompted to enter:

AWS Access Key ID [None]: AKIAXXXXXXXXXXXEXAMPLE
AWS Secret Access Key [None]: asdkjhUSADWDskhjdasd/EXAMPLEKEY
Default region name [None]: eu-central-1
Default output format [None]: json

4. Test it

List all S3 buckets:

aws s3 ls --profile zabbix_user

Get EC2 tags:

Use region where you create instance

aws ec2 describe-instances --region eu-central-1 --query 'Reservations[*].Instances[*].Tags' --profile zabbix_user

If you get this error…

An error occurred (AccessDenied) when calling the DescribeInstances operation: User: arn:aws:iam::123456789010:user/zabbix_user is not authorized to perform: ec2:DescribeInstances on resource: arn:aws:ec2:eu-central-1:123456789010:instance/*

…you need to check the following permission to the role you are using (IAM Policy for Zabbix).

5. Set the following macros in Zabbix:

  • {$AWS.AUTH_TYPE} – set to access_key
  • {$AWS.ACCESS.KEY.ID} – set to your access key ID
  • {$AWS.SECRET.ACCESS.KEY} – set to your secret access key

Security tips

  • Never hardcode access keys in scripts or code.
  • Store them in ~/.aws/credentials, which is protected by file system permissions.
  • Apply least privilege with IAM policies.

Role-based authorization

1. Add the appropriate permission to the role you are using:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": "arn:aws:iam::{Account}:role/{RoleNameWithPath}"
        },
        {
            "Effect": "Allow",
            "Action": [
                "theSameAsIAMPolicyForZabbix",
            ],
            "Resource": "*"
        }
    ]
}

2. Add a principal to the trust relationships of the role you are using:

  • Go to IAM → Roles → Select a role → Trust relationships tab
  • Click Edit trust relationship
  • Add a principal to the trust relationships of the role you are using:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": [
                    "ec2.amazonaws.com"
                ]
            },
            "Action": [
                "sts:AssumeRole"
            ]
        }
    ]
}

⚠ Using role-based authorization is only possible when you use a Zabbix server or proxy inside AWS.

3. Attach the role to the instance

  • Go to EC2 → Instances → Select an instance → Actions → Security → Modify IAM role
  • Select the role you created before which has the policy attached (IAM Policy for Zabbix)
  • Click Apply

4. Test it

Connect to ES2 ssh terminal in instance and run:

  • Go to EC2 → Instances → Select an instance → Connect → SSH client

Example:

ssh -i "zabbix_user.pem" [email protected]

Get caller identity:

aws sts get-caller-identity

Get token for metadata service:

export TOKEN=$(curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")

Get IAM role from metadata service:

curl -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/meta-data/iam/security-credentials

Get IAM role credentials from metadata service using role name from instance metadata service (see Get IAM role from metadata service):

curl -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/meta-data/iam/security-credentials/<<--role_name-->>

6. Set the following macros in Zabbix:

  • {$AWS.AUTH_TYPE} – set to role_base
  • {$AWS.ASSUME.ROLE.ARN} – set to your role ARN

Assume role authorization

This method has two options:

  • Using access key authorization for getting creds for assume role
  • Using role-based authorization for getting creds for assume role

Lets look first at using access key authorization for getting creds for assume role.

Using access key authorization for getting creds for assume role

1. Create access key for user (see Access Key Authorization)

2. Add the appropriate permission in role you are using:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "sts:AssumeRole",
            "Resource": "arn:aws:iam::{Account}:user/{UserName}"
        },
        {
            "Effect": "Allow",
            "Action": [
                "theSameAsIAMPolicyForZabbix",
            ],
            "Resource": "*"
        }
    ]
}

3. Add principal to the trust relationships of the role you are using:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::{Account}:user/{UserName}"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}


4. Test It

Get assume role credentials using access key authorization

aws sts assume-role --role-arn arn:aws:iam::123456789010:role/Zabbix_Role --role-session-name test-session --profile zabbix_user

An example of response:

{
    "Credentials": {
        "AccessKeyId": "ASDFGHJKLEXAMPLE",
        "SecretAccessKey": "QowihdwoieuoinflksnliooEXAMPLE",
        "Expiration": "2029-09-09T22:22:22+00:00"
    },
    "AssumedRoleUser": {
        "AssumedRoleId": "ASDFGHJKLEXAMPLE:test-session",
        "Arn": "arn:aws:sts::123456789010:assumed-role/Zabbix_Role/test-session"
    }
}

5. Set the following macros in Zabbix:

  • {$AWS.AUTH_TYPE} – set to assume_role
  • {$AWS.ACCESS.KEY.ID} – set to your access key ID
  • {$AWS.SECRET.ACCESS.KEY} – set to your secret access key
  • {$AWS.ASSUME.ROLE.ARN} – set to your role ARN
  • {$AWS.ASSUME.ROLE.AUTH.METADATA} – set to false

Getting credentials for assume role using cross-account role (best practice)

1. Create role (see Role-Based Authorization)

2. Add the appropriate permission to the role you are using:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "sts:AssumeRole",
            "Resource": "arn:aws:iam::{Account}:role/{RoleNameWithPath}"
        },
        {
            "Effect": "Allow",
            "Action": [
                "theSameAsIAMPolicyForZabbix",
            ],
            "Resource": "*"
        }
    ]
}

3. Add the principal to the trust relationships of the role you are using:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::{Account}:role/{RoleNameWithPath}"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

⚠ Using assume role with role-based authorization is only possible when you use a Zabbix server or proxy inside AWS.

4. Test it

Connect to ES2 ssh terminal in the instance and run:

  • Go to EC2 → Instances → Select an instance → Connect → SSH client

Get assume role credentials using role name from instance metadata service:

aws sts assume-role --role-arn arn:aws:iam::123456789010:role/NewRole --role-session-name test-session

An example of response:

{
    "Credentials": {
        "AccessKeyId": "ACCESS_KEY_ID",
        "SecretAccessKey": "SECRET_ACCESS_KEY",
        "SessionToken": "SESSION_TOKEN",
        "Expiration": "EXPIRATION_DATE"
    },
    "AssumedRoleUser": {
        "AssumedRoleId": "ASSUMED_ROLE_ID",
        "Arn": "arn:aws:sts::ACCOUNT_ID:assumed-role/ROLE_NAME/SESSION_NAME"
    }
}

5. Set the following macros in Zabbix:

  • {$AWS.AUTH_TYPE} – set to assume_role
  • {$AWS.ASSUME.ROLE.ARN} – set to your role ARN
  • {$AWS.ASSUME.ROLE.AUTH.METADATA} – set to true

Well done! You have successfully configured AWS authorization in Zabbix AWS templates.

Now you can use the template to collect metrics from AWS.

The post Understanding HTTP Template Authorization in AWS appeared first on Zabbix Blog.

What’s new in Zabbix 7.4

Post Syndicated from Arturs Lontons original https://blog.zabbix.com/whats-new-in-zabbix-7-4/30597/

With the release of Zabbix 7.4, Zabbix users will be able to further extend their existing resource discovery workflows and enjoy a wastly improved user experience when it comes to configuring Zabbix entities. In addition, the latest release introduces multiple dashboard and network map improvements which will further enhance the visualization of infrastructure and resources.

Host Wizard

Host creation can be somewhat confusing for Zabbix beginners. Creating a host and applying a template involves numerous steps – from creating a host and assigning it to a host group, to configuring appropriate host interfaces, applying a template, and editing template-level macros to adjust the default problem thresholds and filters.

The Host Wizard aims to simplify the host onboarding process by providing a step-by-step guide for creating and configuring a host.

The Host Wizard can be opened from the Data Collection – Hosts section

A new Host Wizard button has been added to the Data Collection – Hosts section. Once you click on it, you will first have to select the template you wish to apply on the new host. Only one template can be applied at a time, so if you wish to apply multiple templates on a single host via Host Wizard, you will have to do so via one template and one Host Wizard session at a time.

Under the hood, if we look at the template files, the templates have also received 2 new parameters: wizard_ready and readme. Only templates marked with wizard_ready: ‘YES’ can be selected in the Host Wizard.

Filter for and select the required template

After you have selected the template, you will be prompted to enter a host name and select host groups. You can create a new host or apply the template on an existing host.

Provide a host name and select host groups

The next steps include the deployment instructions. Depending on the selected template type, the Host Wizard will provide all of the required instructions to start monitoring the host with the chosen template.

The Host Wizard will provide the required host configuration steps

In the final Host Wizard steps, you will be prompted to add the required host interface, read the template notes, and customize the template-level macros.

Customize template-level macros to modify the default filters, problem thresholds, and other parameters

Nested low-level discovery rules and host prototypes

Low-level discovery rules have received major improvements in Zabbix 7.4. It is now possible to create nested low-level discovery rules, while host prototypes are now capable of discovering hosts of their own with low-level discovery.

A new type of prototype has been added to low-level discovery rules – discovery prototype. These prototypes are used together with low-level discovery macros to automatically create low-level discovery rules for resource discovery.

Discovery prototypes can now be created in low-level discovery rules

A new item type has been added for discovery rule prototypes – Nested. This type of discovery rule iterates through the JSON file received by the parent low-level discovery rule to discover child entities. For example:

[   {     "database""db1",     "created_at""2024-02-01T12:30:00Z",     "encoding""UTF8",     "tablespaces": [       "name""ts1""max_size""10GB" },       "name""ts2""max_size""20GB" },       "name""ts3""max_size""15GB" }     ]   },   {     "database""db2",     "created_at""2023-11-15T08:45:00Z",     "encoding""UTF16",     "tablespaces": [       "name""ts1""max_size""5GB" },       "name""ts2""max_size""25GB" },       "name""ts3""max_size""30GB" }     ]   },   {     "database""db3",     "created_at""2024-01-05T15:10:00Z",     "encoding""UTF8",     "tablespaces": [       "name""ts1""max_size""12GB" },       "name""ts2""max_size""18GB" },       "name""ts3""max_size""22GB" }     ]   } ]

If we set the jsonpath preprocessing in the discovery rule prototype to JSONPath=$.tablespaces and set the low-level discovery macro to {#TSNAME}=$.name, the nested low-level discovery rule will create discovery rules to discover tablespaces for each database.

Low-level discovery rules are created from the discovery prototype

Inline form validation

Inline validation has been introduced with the goal of improving the overall user experience when configuring a variety of Zabbix entities. As of Zabbix 7.4, inline form validation is supported in:

  • Host configuration
  • Template configuration
  • Item configuration
  • Trigger configuration
Inline validation detects any configuration errors on the fly and displays a corresponding error message

With inline validation in place, users will now receive immediate feedback regarding any configuration mistakes they have made in the sections above. Configuring new entities, especially items and triggers with complex keys and expressions, is now faster than ever.

Frontend-to-server communication encryption

To further strengthen Zabbix communication flow security, Zabbix 7.4 introduces the ability to secure frontend to server communication with certificate encryption. The encryption must be configured from two sides, and the frontend setup now includes the options to enable and configure encrypted connections to the server.

 

Zabbix 7.4 introduces the ability to encrypt frontend-to-server connections

On the Zabbix server side, multiple new configuration parameters have been added:

  • TLSFrontendAccept – which incoming connections to accept from frontend
  • TLSFrontendCertIssuer – allowed frontend certificate issuer
  • TLSFrontendCertSubject – allowed frontend certificate subject
  • FrontendAllowedIP – frontend connections will be accepted only from addresses listed here if the parameter is set

New widgets and visualization improvements

Zabbix 7.4 introduces a new widget (Item card) and multiple visualization improvements for dashboards and network maps.

Item card widget

The new Item card widget behaves similarly to the existing Host card widget introduced in Zabbix 7.2. The Item card widget provides a customizable view of an item and its attributes, such as latest data together with a sparkling chart, error messages, interfaces, tags, triggers, and more. The attributes for display can be selected and ordered in the widget configuration.

Various item attributes can be displayed in the item card widget

Network map improvements

Network maps have also received multiple improvements, enabling new use cases and simplifying existing network map scenarios.

  • Map background images can now be scaled proportionally to the map dimensions
  • Map links now support link indicators based on item value thresholds
  • Map element icons can now be ordered when placed on top of one another
Item value thresholds can be defined for link indicators
  • Map element icons can now be ordered when placed on top of one another
  • Host group map elements will now take into account nested host groups when displaying host group-related information
  • Map link and element labels can now be hidden and only displayed on mouse hover
Map elements can be ordered on top of each other

Dashboard improvements

Zabbix 7.4 introduces multiple dashboard improvements to facilitate faster and smoother dashboard configuration.

The color picker in graph and pie chart widgets has been extended with the new palette color scheme in addition to the existing solid color scheme. Users can choose from the available palette color schemes. The new palette color schemes display the values within a data set in a more distinguishable way, while the existing solid color scheme displays the data set values in shades of the selected color.

The new palette color scheme is available in graph and pie chart widgets

Widget configuration changes are also displayed instantly in Zabbix 7.4 – there’s  no need anymore to apply the changes to see them reflected in the widget.

In addition, the default Global view dashboard has received an overhaul and now utilizes the latest Zabbix widgets to provide additional insights about the Zabbix instance.

The default Global view dashboard has received an overhaul

Other changes in Zabbix 7.4

Multiple smaller fixes have been introduced in Zabbix 7.4, such as new history functions, new macros, security fixes, and more:

  • Preprocessing results can now be copied directly to clipboard by using the “Copy to clipboard” button
  • All users are now allowed to manage their own media by default. These permissions can now be revoked in user role settings
  • A new Notifications section for customizing notification settings has been added under “User Settings”
  • Vault secret macros can now be resolved by either the Zabbix server or Zabbix proxy
  • A new icmppingretry simple check has been added to monitor host responses to ICMP ping with the ability to modify retries
  • New timestamp tracking history functions have been added
  • Multiple new macros added for item-value time tracking
  • Zabbix server/proxy automatically logs history cache diagnostic information when the history cache is full
  • Disabled items are now immediately removed from the history cache
  • It is now possible to manually clear the history cache for a specific item by its id with the history_cache_clear=target runtime command
  • Added support of Gmail OAuth authentication

New templates and integrations in Zabbix 7.4

Many of the existing webhook integrations have been refactored in Zabbix 7.4. The webhooks have been optimized for the best possible performance and include a variety of fixes:

  • Discord
  • GitHub
  • GLPi
  • Jira
  • Jira Service management
  • MS Teams
  • MS Teams Workflows
  • OTRS CE
  • PagerDuty
  • Slack
  • Telegram
  • Zammad
Many of the existing webhook integrations have been refactored in Zabbix 7.4

Multiple new templates have also been introduced:

  • Pure Storage FlashArray
  • Azure SQL Managed Instance
  • Azure MSSQL DTU database by HTTP
  • Azure Backup Jobs by HTTP
  • Palo Alto PA-440
  • Juniper MX
  • Improvements for Dell by HTTP and SNMP templates
Zabbix 7.4 introduces multiple new templates

The post What’s new in Zabbix 7.4 appeared first on Zabbix Blog.

Database Monitoring using Zabbix agent 2 – Part 1, SQL

Post Syndicated from Alexander Petrov-Gavrilov original https://blog.zabbix.com/database-monitoring-using-zabbix-agent-2-part-1-sql/30381/

If you find yourself needing additional flexibility when it comes to database monitoring, Zabbix agent 2 may be exactly what you need. Keep reading to see which features make it ideal for database monitoring and find out how to best use them for your own purposes. 

What is a database?

If you’ve been using Zabbix for a while, you know that a database is an organized collection of data that is stored and accessed electronically.
That data can be historical, configuration, business, social media-related, etc. A database, or rather a database management system (DBMS) allows you to store, manage, and retrieve information efficiently.

Types of DBMS

We can separate DBMS into multiple types. Depending on how data is stored, retrieved, managed, there can be quite a few, but we will try to limit ourselves to the most common four:

  • Relational databases (or RDBMS) see tables and SQL.
    • MySQL
    • MariaDB
    • PostgreSQL
    • Oracle
  • NoSQL databases store data in formats like JSON, key-value pairs, or graphs.
    • MongoDB
    • Redis
    • InfluxDB
    • ElasticSearch
  • Cloud databases use cloud platforms for scalability.
    • Amazon RDS
    • Azure SQL
  • Time-series databases (or TSDB databases) are optimized for time-stamped data.
    • TimescaleDB
    • InfluxDB

But what unites all those database engines? They can all be monitored by Zabbix!

Database monitoring

Database monitoring is important for a variety of reasons, the most common of which are to get a precise overview of database and application performance. Since databases can be a vital part of multiple departments and applications,  poor performance may impact an entire company and its users, leading to unsatisfactory results on all sides.

To avoid such situations, the set of metrics we should monitor for database engines can include:

  • Database environment metrics
    • CPU performance
    • Memory usage
    • Drive capacity
    • Disk latency
  • Database performance metrics
    • Query performance
    • Transaction/operations/indexing
    • Connections
  • Application and/or business related data
    • Amount of users
    • Transactions
    • Inventory
    • Configuration      

Why Zabbix agent 2?

Zabbix Agent 2 includes multiple features that enhance its flexibility:

  • Task queue management with respect to both schedule and task concurrency.
  • Concurrent active checks with threads.
  • Multiple agent 2 unique metrics
  • Easier to extend using GO plugins.

Plugins in Zabbix Agent 2 are written in the Go programming language and provide a flexible, native way to extend the agent’s functionality. These plugins communicate directly with databases using their native APIs or libraries, which allows for correct and efficient performance monitoring.

But agent2 provides even more flexibility when focusing on database monitoring, allowing us to:

  • Limit query execution
  • Control the session time
  • Configure encryption between Zabbix agent and database
  • Control cache mode

All database data is collected using the best approach for the monitored database.

  • MySQL, monitoring relies on the Go-MySQL-Driver
  • PostgreSQL integration is managed through the pgx driver

The list goes on for supported database engines:

  • MySQL / MariaDB
  • PostgreSQL
  • ORACLE
  • MSSQL
  • MongoDB
  • Redis
  • Memcached

Monitoring SQL databases

Database environment

In this part we will focus on how to monitor and retrieve data from SQL databases and SQL database-related parameters. Monitoring SQL database environment metrics with Zabbix agent 2 is as straightforward as monitoring any virtual or physical machine with an OS. All we need to do is add the repo:

# dnf install https://repo.zabbix.com/zabbix/7.0/centos/9/x86_64/zabbix-release-latest-7.0.el9.noarch.rpm

Install the agent:

# dnf install zabbix-agent2

Then, make sure that connections from Zabbix server to Zabbix agent 2 are allowed using Server parameter:

### Option: Server
#       List of comma delimited IP addresses, optionally in CIDR notation, or DNS names of Zabbix servers and Zabbix proxies.
#       Incoming connections will be accepted only from the hosts listed here....
# Mandatory: no
# Default:
# Server=
Server=127.0.0.1,server-dns.example.com

Finally, link one of the many templates available out of the box:

List of templates for OS monitoring
List of templates for OS monitoring

SQL database performance metrics

What about the actual DB performance metrics? There are plenty of approaches we can take using Zabbix agent 2.

Out-of-the-box templates are available for multiple databases that can be monitored by Zabbix agent 2:

SQL database template list
SQL database template list

Each of the templates uses a database native way to get precise performance data, such as SHOW GLOBAL STATUS for MySQL or dbStats for MongoDB. Also, template provides instructions on how to prepare the database for monitoring. Let’s take MySQL/MariaDB for example:

Create a MySQL user for monitoring (<password> at your discretion) and give this user enough permissions for monitoring:

mysql> CREATE USER 'zbx_monitor'@'%' IDENTIFIED BY '<password>';
mysql> GRANT REPLICATION CLIENT,PROCESS,SHOW DATABASES,SHOW VIEW ON *.* TO 'zbx_monitor'@'%';

In order to collect replication metrics, MariaDB Enterprise Server 10.5.8-5 and above and MariaDB Community Server 10.5.9 and above require the SLAVE MONITOR privilege to be set for the monitoring user. The command then looks like this:

mysql> GRANT REPLICATION CLIENT,PROCESS,SHOW DATABASES,SHOW VIEW,SLAVE MONITOR ON *.* TO 'zbx_monitor'@'%';

Then create a host to represent your MySQL/MariaDB and link the “MySQL by Zabbix agent 2” template:

MySQL database host
MySQL database host

Configure the Macros on the same host:

MySQL database host macros
MySQL database host macros

And the data will start pouring in!

You can find instruction for other databases here.

SQL database internal data monitoring

A default template will tell us a lot about performance, but what if we also need application data? Something that is stored in the database, i.e.

  • Number of orders
  • Logged in users
  • Host count
  • List of failed transactions
  • Amount of media uploaded

Zabbix agent 2 lets users collect custom SQL query results with the help of configuration files and a specific item key:

<dbtype>.custom.query[connString,<user>,<password>,queryName,<args...>]:
• Dbtype – mysql, postgresql, oracle, mssql
• connString - URI or session name;
• user, password - Database login credentials;
• queryName - name of a custom query, matches SQL file name without .sql extension;
• args - one or several comma-separated arguments to pass to a query.

The main idea of this key is to construct efficient queries that can return multiple values. The values returned will be automatically transformed to JSON, which is both easier to preprocess and use for LLD creation.

I will add a simple query to find all hosts and their main interface availability in Zabbix:

SELECT hosts.host,interface.available FROM zabbix.hosts JOIN zabbix.interface ON hosts.hostid=interface.hostid WHERE hosts.status IN (0,1) AND hosts.flags IN (0,4) AND interface.main=1;

First I need to create a directory for custom queries:

# mkdir /etc/zabbix/zabbix_agent2.d/plugins.d/custom_queries

Now I will create an .sql file with a query and paste the mentioned query into the file:

# nano /etc/zabbix/zabbix_agent2.d/plugins.d/custom_queries/interfaces.sql    

Now I will edit the MySQL plugin .conf file and set a custom queries path:

# nano /etc/zabbix/zabbix_agent2.d/plugins.d/mysql.conf
### Option: Plugins.Mysql.CustomQueriesPath
#       Full pathname of a directory containing *.sql* files with custom queries.
#
# Mandatory: no
# Default:
# Plugins.Mysql.CustomQueriesPath=
Plugins.Mysql.CustomQueriesPath=/etc/zabbix/zabbix_agent2.d/plugins.d/custom_queries/

Save the changes and restart Zabbix agent 2 to apply them:

# systemctl restart zabbix-agent2

Before adding the item using the web interface, it is always a good idea to test it:

zabbix_agent2 -t mysql.custom.query["tcp://localhost:3306","zbx_monitor","<password>","interfaces"]

The output will is now a easy to work with JSON pattern (beautified here):

[
  {
    "available": "1",
    "host": "Zabbix server"
  },
  {
    "available": "1",
    "host": "Test environment"
  },
  {
    "available": "1",
    "host": "MySQL database"
  },
  {
    "available": "1",
    "host": "MongoDB database"
  },
  {
    "available": "1",
    "host": "PostgreSQL database"
  },
  {
    "available": "1",
    "host": "Customer portal"
  }
]

Now, I’m sure the data is collected and can be used for LLD. I can create a new item on the MySQL database host to collect this data:

Interface monitoring item
Interface monitoring item

Since I know what kind of data will be returned, I can create a dependent Discovery rule on the same host:

Interface LLD item
Interface LLD item

The LLD macros tab will help to transform the current JSON to the LLD-suitable JSON, replacing “host” with {#HOST}.

Interface LLD item mecros
Interface LLD item macros

After adding the discovery itself, we can create the dependent item prototype, which will allow us to discover all hosts and their status:

Interface status item prototype
Interface status item prototype

Preprocessing here is a must, and it needs to be flexible enough to extract each individual host interface status:

Interface status item prototype preprocessing
Interface status item prototype preprocessing

Now after adding the item prototype, we can check the results:

Interface status item data
Interface status item data

An item cam be further enhanced using value mapping, to specify that 1 means available and 0 means not available.

With this approach, any internal database data can be extracted and monitored. In part 2 we will see how NoSQL databases can be monitored for both performance and internal data using Zabbix agent 2.

If you’d like more information on database monitoring, please don’t hesitate to sign up for our training course in Advanced Zabbix Database Monitoring, which covers multiple approaches to collecting database-related performance metrics and data using Zabbix Agent 2, ODBC, and API requests, as well as optimizing data collection by introducing dependent low-level discovery for minimal performance impact.

The post Database Monitoring using Zabbix agent 2 – Part 1, SQL appeared first on Zabbix Blog.

Next-Level Alert Analysis with DeepSeek and Zabbix

Post Syndicated from Zhe Cheng original https://blog.zabbix.com/next-level-alert-analysis-with-deepseek-and-zabbix/30424/

As IT infrastructures grow increasingly complex, efficiently analyzing monitoring data and accelerating incident response have become critical challenges for operations teams. This post explores a few innovative applications of DeepSeek when integrated with Zabbix.

Requirements:

– Zabbix server 7.0 or higher
– DeepSeek API (Alternatively, other AI APIs can be used if needed)

1. Scenario One: One-Click Intelligent Alert Analysis

By integrating DeepSeek Analytics into the Zabbix frontend, users can conduct intelligent alert analysis with just one click. This integration facilitates the swift generation of comprehensive fault analyses and solution suggestions, markedly decreasing the MTTR (Mean Time to Resolution). Consequently, it streamlines the troubleshooting process, alleviates the workload on IT personnel, ensures system stability, and conserves both time and resources.

1.1  On the Zabbix home page, navigate to “Alerts” > “Scripts”, and click on the “Create script” button.

1.2  Configuration script:

  • Name: Can be customized
  • Scope: Select “Manual event action”
  • Menu path: Customize menu paths for quick access
  • Type: Select “Script”
  • Execute on: Select “Zabbix proxy or server”

1.3 Enter the following command in the command bar:

/etc/zabbix/scripts/send_alert_to_ai.sh "{TRIGGER.NAME}" "{TRIGGER.SUBJECT}"  "{HOST.NAME}" "{HOST.IP}" "{EVENT.TIME}" "{TRIGGER.SEVERITY}"

1.4  Create an API call script on zabbix-server.

1.4.1 Modify the Zabbix Server Configuration File and Enable Global Scripts:

Open the Zabbix server configuration file for editing:

vi /etc/zabbix/zabbix_server.conf

Set the EnableGlobalScripts option to 1:

EnableGlobalScripts=1

Save the changes and exit the editor. Then, restart the Zabbix server service to apply the changes:

systemctl restart zabbix-server

1.4.2 Create an API Call Script.

Create a directory for custom scripts if it does not already exist:

mkdir -p /etc/zabbix/scripts && cd /etc/zabbix/scripts

Note: If the frontend prompts that the script file cannot be found, try moving the script to the directory used by the Nginx agent. Create a new script file named send_alert_to_ai.sh:

vi send_alert_to_ai.sh

Add the following content to the script, replacing DeepSeek KEY with your actual API key. Make sure you adjust the API call method if using a different AI service:

#!/bin/bash

# DeepSeek API configuration
API_URL="https://api.deepseek.com/chat/completions"
API_KEY="xxxxxxxxxxxxxxxxxxxx"

# Obtain the parameters to be passed as alarm information
TRIGGER_NAME="$1"
ALERT_SUBJECT="$2"
HOSTNAME="$3"
HOST_IP="$4"
EVENT_TIME="$5"
TRIGGER_SEVERITY="$6"

# Build a more concise JSON format for alarm information
alert_info=$(cat <<EOF
{
"model": "deepseek-chat",
"messages": [
{"role": "system", "content": "You are an assistant focused on responding quickly to system alarms。"},
{"role": "user", "content": "The following alarm information is received:\n\n: $TRIGGER_NAME\n: $ALERT_SUBJECT\n: $HOSTNAME\n: $HOST_IP\n: $EVENT_TIME\n: $TRIGGER_SEVERITY\n\nPlease tell me the cause of the alarm and the handling measures in a short and professional language with a word limit of 300 words。"}
],
"stream": false
}
EOF
)

# Send the POST request and capture the response and HTTP status code
response=$(curl -s -w "\n%{http_code}" -X POST "$API_URL" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $API_KEY" \
-d "$alert_info")

# Separate HTTP status codes from response bodies
http_code=$(echo "$response" | tail -n1)
response_body=$(echo "$response" | sed '$d')

# Parse and extract the content field
if [ "$http_code" -eq 200 ]; then
# Parse JSON using the jq tool
if ! command -v jq &> /dev/null; then
echo "jq could not be found, please install it first."
exit 1
fi

# Extract the content field and format the output
content=$(echo "$response_body" | jq -r '.choices[0].message.content')
echo -e "Analysis result:\n$content"
else
echo "failure: HTTP status code $http_code, respond: $response_body"
fi

Make the script executable:

chmod +x send_alert_to_ai.sh

Note: The script provided invokes the official DeepSeek API. Replace DeepSeek KEY with your actual API_KEY. If you are using another AI service, please confirm the appropriate API invocation method.

Important Notes:

Note: The script relies on jq to process and parse JSON data for tasks such as filtering, mapping, aggregating, and formatting. If jq is not installed on your system, follow these instructions to install it.

For Debian/Ubuntu Systems:

apt-get update
apt-get install jq

For CentOS/RHEL Systems:

yum install epel-release
yum install jq

1.5 Actual Effect Display:

1.6  Optional Optimization Items.

1.6.1 Adjust Output Box Size for Better Browsing.

After executing the script, you may find that the output box is too small and inconvenient to browse. To optimize this, you can modify the front-end CSS file as follows.

Back up the existing CSS File:

cd /usr/share/nginx/html/assets/styles/
cp blue-theme.css blue-theme.css.bak

Edit the CSS File:

vi /usr/share/nginx/html/assets/styles/blue-theme.css

Add Custom Styles at the End of the File.

Add the following CSS rules to adjust the size and behavior of the output box:

#execution-output {
height: 500px; /* Adjust to your desired height */
width: 540px; /* Optional: Adjust the width as required */
overflow-y: auto; /* Displays scrollbar when content exceeds the set height */
}

Save and exit the editor. At this point, clear the browser cache and reload the page to see the changes take effect.

1.6.2 How to Optimize Slow Output Response after Executing the One-Click Analysis Script.

During actual testing, it was estimated that returning a 300-word result takes approximately 20 to 30 seconds. While you can improve the response speed by adjusting the preset prompt words in the script, this approach may reduce the richness of the analysis content. Therefore, it is recommended to balance speed and content depth by adjusting the number of replies in the script’s prompt words according to your actual needs.

Actual effect display:

2. Scenario Two: Zabbix Documentation Knowledge Base Assistant

In today’s fast-paced IT environment, managing and retrieving information efficiently is crucial. To address this need, we’ve developed the Zabbix KB Assistant, an intelligent knowledge base solution built on MaxKB—an open-source Q&A system leveraging large language models.

This assistant streamlines access to Zabbix’s extensive documentation, making it easier than ever for users to find the information they require.
MaxKB stands out for its seamless integration capabilities, allowing for quick uploads of documents and automatic crawling of online content.

Its flexibility means it can be effortlessly embedded into third-party systems, including our very own Zabbix platform. The project is available at the GitHub repository.

The development process of Zabbix KB Assistant involved configuring MaxKB to recognize and parse the official Zabbix documentation. By utilizing this URL, we ensured that the latest updates and comprehensive guides are always accessible within our assistant. After setting up the core model configurations, we created a dedicated knowledge base tailored to Zabbix’s rich content.

With the knowledge base in place, we proceeded to integrate Zabbix KB Assistant into the Zabbix frontend. This step was essential for providing instant access to users navigating the Zabbix interface. By embedding a floating window mode, users can interact with the assistant without leaving their current page—a feature that significantly enhances user experience.

Actual effect display:

3. Scenario Three: DingTalk Alert Enhancement

By integrating DeepSeek’s deep analysis capabilities, DingTalk can automatically analyze alarm information upon receiving alerts. This integration provides precise fault diagnosis and solutions, aiding IT operations and maintenance personnel in quickly identifying and resolving issues. Consequently, this improves the efficiency of system maintenance and reduces downtime.

3.1 Create a Bot and Configure Security Settings.

First, create a new bot within the DingTalk group and ensure that the keyword “Alarm” is properly configured in the security settings. Next, retrieve the webhook URL for this bot and keep it safe for later use.

3.2 Install Python3 and Necessary Libraries.

Ensure that Python3 along with the required libraries are installed on your system. Depending on your operating system, follow these instructions.

For Ubuntu/Debian systems:

sudo apt update
sudo apt install python3 python3-pip
pip3 install requests

For CentOS/RHEL systems:

sudo dnf install python3
pip3 install requests

3.3 Below is an example script (deepseekdingding.py) located at /usr/lib/zabbix/alertscripts/.

Replace the placeholder webhook URL and DeepSeek API key in the script with your actual values:

#!/usr/bin/env python3
#coding:utf-8
import requests
import sys
import json

class DingTalkBot(object):
    # Send an alarm
    def send_news_message(self, webhook_url, subject, content, ai_response):
        url = webhook_url
        data = {
            "msgtype": "markdown",
            "markdown": {
                "title": subject,
                "text": f"{subject}\n{content}\n\n【DeepSeek analysis】:\n\n{ai_response}"  
            }
        }
        headers = {'Content-Type': 'application/json'}
        response = requests.post(url, headers=headers, data=json.dumps(data))
        return response

if __name__ == '__main__':
    WEBHOOK_URL = 'https://oapi.dingtalk.com/robot/send?access_token=224c1ff0c6df60a809b3c5b69b8448486b780d292e9d395ac8fbf84980214e30'  # Webhook
    API_URL = 'https://api.deepseek.com/chat/completions'
    API_KEY = "xxxxxxxxxxxxxxxxxxxx"  # DeepSeek API

    if len(sys.argv) < 3:
        print("Error: Not enough arguments provided.")
        sys.exit(1)

    subject = str(sys.argv[1])  
    content = str(sys.argv[2])  

    print(f"Received subject: {subject}")
    print(f"Received content: {content}")

    try:
        headers = {
            'Authorization': f'Bearer {API_KEY}',
            'Content-Type': 'application/json',
        }
        payload = {
            "model": "deepseek-chat",  # DeepSeek
            "messages": [
                {"role": "user", "content": f"If you are a professional IT operation and maintenance expert, please tell me the cause of these alarms and handling suggestions in a concise and professional language with a word limit of 100 words{content}"}
            ]
        }
        ai_response = requests.post(API_URL, headers=headers, json=payload)
        ai_response.raise_for_status()  
        ai_response_content = ai_response.json().get('choices', [{}])[0].get('message', {}).get('content', '')
    except Exception as e:
        ai_response_content = "\nThe interface call timed out or an error occurred. Please check the configuration and try again"

    bot = DingTalkBot()
    response = bot.send_news_message(WEBHOOK_URL, subject, content, ai_response_content)

    if response.status_code == 200:
        print("successfully")
    else:
        print(f"failed: {response.text}")

3.5 On the Zabbix home page, go to Alerts – Media types – Create Media type and then enter the following information:

  • Name: aiAlarm-Dingtalk
  • Type: script
  • Script name: deepseekdingding.py
  • Script parameter: {ALERT.MESSAGE} {ALERT.SUBJECT}

3.6 Create an alarm action.

Go to Alarm – Action – Trigger actions – Create action and set the name to Alarm -deepseek. Select this parameter as required:

Edit the action options as follows:

Send to media type aiAlarm-Dingtalk
Topic fault alarm: {EVENT.NAME}
message
【Zabbix Alarm Notification 】

Alarm group: {TRIGGER.HOSTGROUP.NAME}

Alarm host: {HOSTNAME1}

Alarm time: {EVENT.DATE} {EVENT.TIME}

Alert level: {TRIGGER.SEVERITY}

Problem information: {TRIGGER.NAME}

Confirm the update.

3.7 Configure notification rights for users.

The following item is added to the “User-User-Alarm” media dialog box. Once added, click Update.

Actual effect display:

4. Scenario Four: One-Click System Service Deep Analysis

Our solution integrates DeepSeek analysis to offer a one-click intelligent inspection tool that automates the collection of service configurations, logs, and status from within your system. This information is then sent via API to DeepSeek for comprehensive analysis.

Our approach begins by extracting relevant configuration data, recent log entries, and current service statuses. These pieces of information are combined with predefined prompts and submitted to DeepSeek through its API. For instance, a prompt might look like this:

“Here are the current logs for XXX service:\n\n${recent_logs}\n\nService. Status is as follows:\n${service_status}\n. Please analyze the following four aspects based on this information and provide a concise report within 500 words: service status analysis, configuration review, historical issue examination, and troubleshooting recommendations.”

DeepSeek processes this input to perform a detailed breakdown across these four areas, delivering structured feedback and actionable insights.
This integration offers deep system analysis and precise optimization suggestions, enabling swift responses to system changes or anomalies. It aids administrators in promptly identifying and addressing issues.

In addition, it’s easily integrated into existing monitoring systems, allowing adjustments to the depth and scope of analysis as needed. The solution boasts high scalability and flexibility, catering to evolving business requirements.

Actual effect display :

The post Next-Level Alert Analysis with DeepSeek and Zabbix appeared first on Zabbix Blog.

Interactive Dashboard Creation for Large Organizations and MSPs

Post Syndicated from Arturs Lontons original https://blog.zabbix.com/interactive-dashboard-creation-for-large-organizations-and-msps/30132/

Dashboard widgets have received substantial improvements in the latest Zabbix releases – everything from brand-new widgets to greatly expanding upon existing widget features. The post will cover some of the new improvements as well as lesser-known dashboard and widget features, while discussing multiple dashboard use cases targeted at large organizations and MSPs.

Broadcast and listen capabilities

Zabbix widgets can be used to not only display static data, but they can also be linked together by using widget broadcast and listen capabilities. Depending on the built-in capabilities, widgets can either broadcast data (such as the item, host, event, or time interval selected in the widget) or listen and display the selected data points – multiple widgets support both broadcast and listen capabilities.

Widgets can broadcast and listen for the following entities:

  • Hosts
  • Host groups
  • Time periods
  • Items
  • Events
  • Maps

Zabbix documentation contains the full list of widget broadcast and listen capabilities.

Widget broadcast and listen capabilities

Host and Item navigator widgets serve as simple examples of broadcast widgets. The sole purpose of these widgets is to display an organized, interactive list of hosts or items. The selected hosts and items can be broadcast to other widgets such as graphs, gauges, problem widgets, an item value widget, and many others.

In addition to regular widget filters based on hosts, host groups, and tags, navigator widgets can be configured to group hosts or items based on tags, host groups, and existing problem severities. This can be used to provide an organized overview of hosts or items based on MSP clients, organization departments, and any other grouping.

Hosts grouped by MSP clients based on host group names

Any combination of widgets from the above table can be used to create interactive dashboards. For example, you could combine the Item value widget listen capabilities with the Geomap widget broadcast capabilities to display item values for hosts selected on the Geomap.

Broadcast hosts from the Geomap widget to Item value widgets

Dashboard-level host broadcast

Host overrides can also be performed on a dashboard level. Once you have set the Override host setting in your widgets to Dashboard, you can select the host in the top right corner of the dashboard. After the host is selected, the widgets will start displaying information related to the selected host.

Host information can also be broadcast on Dashboard level

Selecting non-existing items

One final thing you should consider when implementing widgets with host/item sources from broadcast widgets is what happens if the selected item does not exist on the selected host. In that case, your widget will display a message “No permission to referred object or it does not exist!” – the same error message the users will see if they lack the read permissions on an item. Ideally, you’d want to define widget filters and broadcast/listen configuration in a way where such errors can be avoided – especially if Zabbix is used as a central monitoring hub for users from multiple departments or organizations.

The item value widget displays an error message since the selected item does not exist on the selected host

 

Advanced graph widget use cases

The Zabbix graph widget has a variety of advanced features that can enable many new use cases and provide new insights based on the collected item values.

Data sets

The graph widget utilizes data sets to select, match, and group items that would be displayed in the graph. There are two types of data sets – item pattern and item list. When using item list data sets, you have to individually select each item that you wish to display on the graph. On the other hand,  item pattern data sets provide more flexibility. Here we can utilize wildcards in host and item names to match items and hosts by name. This is especially useful for items discovered by low-level discovery in dynamic environments. With item pattern data sets, the addition or removal of items matching the pattern will be automatically reflected in the graph.

Item list and item pattern data sets in the graph widget

Trigger and problem display

Detected problems and trigger thresholds can also be displayed in dashboard graph widgets. The time periods during which a trigger related to the displayed items has been in a problem state will be highlighted in red. The graphs also provide an option to display a trigger line for triggers utilizing last, min, max, and avg functions.

Graph widget can display a trigger line and highlight periods during which a problem was active

Aggregation

The ability to aggregate data directly within the widget can be an extremely useful tool for gaining new insights from existing data. With graph widget aggregations, it is possible to aggregate each individual item (for example, displaying hourly averages for network traffic on each interface) or the whole data set (total hourly traffic from all interfaces).

Aggregations can be performed on each item or the whole data set

Time shift

The time shift feature is useful for visually comparing current values with values collected some time in the past. For example, we could compare the current CPU load on our application server with the CPU load for the same time period yesterday. This could allow us  to detect unexpected deviations just by glancing over the graph.

With the graph widget time shift feature, you can compare current values with values collected in the past

Missing data

Finally, the graph widget enables Zabbix users to choose how they wish to display missing values. Values for items could be missing for a variety of reasons – anything from data collection errors to various preprocessing workflows that could discard item values by design. Accordingly, it makes sense to design your graphs with the correct representation of missing data in mind.

Missing values in graphs can be displayed in the following formats:

  • Treat missing values as 0
  • Do not display missing values
  • Connect the last known value with the current value
  • Treat missing values as the last known value
Missing values are treated as 0
Missing values are selected to not be displayed

Defining widget value thresholds

Threshold values can be defined for multiple widgets to make the visualization of data more dynamic. This way, Zabbix dashboards can instantly highlight resources exceeding warning/critical thresholds, services in unexpected states, unreachable endpoints, and a variety of other issues. As of Zabbix 7.2, widget thresholds are available only for numeric item values.

Widgets with threshold support

Multiple widgets provide the ability to define value thresholds:

  • Item value
  • Gauge
  • Top hosts
  • Top items
  • Honeycomb

Thresholds can be defined in widget configuration. By defining one or multiple thresholds, we specify that whenever values for the selected item reach or exceed the threshold, they will be highlighted in the selected color.

Item value widget can be used to highlight problematic resources or services

Thresholds are useful for not only highlighting the problematic items in Item value or Gauge widgets, but can also be used to provide a broader view of overall resource utilization with Top hosts and Top items widgets. Since we aren’t limited to a single item, Top hosts and Top items widgets enable us to do a surface-level correlation by looking at the utilization of various resources and highlighting the resources nearing critical utilization thresholds.

Top hosts and Top items widgets can display a comprehensive overview of host resource usage

Another way to display and highlight our infrastructure state on a larger scale is by using the Honeycomb widget. The Honeycomb widget utilizes item patterns to display the matching item values. Here, thresholds can be combined with color interpolation to provide a more dynamic view of our environment. The Honeycomb widget is also capable of broadcasting the selected item and host, which enables us to quickly gain more information about the problematic host by clicking the corresponding cell in the widget.

Honeycomb widgets provide a dynamic overview of enterprise resource usage by supporting color interpolation features

Dashboards for MSPs

The previous sections have already highlighted a variety of features, useful widgets, and widget features for large organizations and MSPs. But let’s not forget that MSPs require granular access permission and control features. MSPs must also ensure that each client’s information (Hosts, items, dashboards) is fully isolated and secure from outside access.

Dashboard visibility

Each dashboard can be deployed either as a public or a private dashboard. Public dashboards are available to every user in read-only mode, while private dashboards require explicit read and write permissions for users who need access to them. MSPs can utilize private client organization dashboards to allow each client to view information about their environment in multiple views while completely hiding the dashboards assigned to other organizations.

Private dashboards require explicit read/write permissions

Host permissions

Dashboard visibility is only the first access control layer. Even when a Zabbix user has access to a dashboard, we must ensure that the user also belongs to a user group that has at least read permissions on the hosts displayed on a dashboard. Without at least read permissions, the hosts will not be displayed in dashboard widgets. This way, MSPs can utilize a single dashboard where each organization’s users can only see the information related to their environments, as opposed to having many duplicate dashboards, where each has a custom host filter that matches just the particular organization’s hosts.

User group-to-host group permissions have a direct impact on host visibility in dashboards

Restricting access to widgets

Access to each widget can also be restricted in Zabbix. This can be done globally by disabling widget modules under Administration—General—Modules or by disabling access to modules on an individual user role level. This can come in handy if the Zabbix environment in question enables users from various departments or organizations to create their own dashboards or edit existing ones. In addition, we may also have some custom community or in-house widgets which are utilized only by Zabbix administrators. which we may want to restrict access to.

If a Zabbix user opens a dashboard containing the restricted widget, the widget will be replaced with the message “No permissions to referenced object or it does not exist!” Ideally, it is recommended to avoid situations where users encounter such widgets, since such a message can be confusing to a user not familiar with various Zabbix permission and access error messages.

Access to modules can be restricted per each user role

Dashboard ownership

Dashboard ownership can also play a role in our user onboarding and offboarding process. Dashboard owners can edit permissions on the dashboards they own, but this can add an extra step in our user offboarding process since dashboards cannot remain without an owner! Therefore, before deleting a Zabbix user, we need to ensure that either their dashboards have also been removed or have their owners be changed. If we attempt to delete a user who is also a dashboard owner, Zabbix will display an error message.

Users who are owners of an existing dashboard cannot be deleted

This article touches upon only a few of the latest and lesser-known features useful to MSPs and large organizations. There are many more advanced ways of utilizing Zabbix widgets, permissions, tags, low-level discovery rules, and many other features that come in handy to organizations of various sizes, utilizing Zabbix for a variety of use cases. Follow our blog, watch the latest Zabbix videos on our YouTube channel, and check out our on-premise and online events to learn more about the flexibility of Zabbix data collection, alerting, and visualization features.

The post Interactive Dashboard Creation for Large Organizations and MSPs appeared first on Zabbix Blog.

Getting Started with Zabbix – Hosts, Items, and Triggers

Post Syndicated from Arturs Lontons original https://blog.zabbix.com/getting-started-with-zabbix-hosts-items-and-triggers/30190/

Hosts, items, and triggers are some of the most basic concepts in Zabbix. To successfully configure their monitoring workflows, Zabbix users need to have a clear understanding of how these entities are used. This article is aimed at Zabbix beginners and should help anyone better understand the basics of Zabbix while providing guidance on how to start monitoring your initial set of hosts.

Table of Contents

Hosts

Hosts are top-level entities in Zabbix and represent your monitored endpoints. Whenever we need to monitor a device, web application, service, or anything else – we start by creating a host.

The host acts as a container for our items (representing the metrics we wish to collect) and triggers (problem threshold definitions). These entities can be created directly on the host or inherited from predefined templates.

Every host has 2 mandatory parameters – its unique name and at least a single host group. Host groups are used for grouping, filtering, and assigning read/write permissions to hosts. Hosts are not limited when it comes to the number of host groups they are assigned to.

A simple Linux server host with an agent interface and a Linux template

An interface might also be required, depending on the type of items we will create on the host. Interfaces define host addresses and, in case of SNMP interfaces, some additional authentication and security parameters.

There are 4 types of interfaces in total, representing 4 different data collection methods:

  • Agent
  • SNMP
  • JMX
  • IPMI
An SNMP device host with SNMP interface

Zabbix supports other types of data collection methods, but for these 4 methods in particular an interface is required on the host. Other data collection methods define endpoint addresses directly in the item configuration or use push data collection (trapping) where Zabbix is not required to know the endpoint address.

Templates

Templates contain a set of predefined items and triggers and can be linked to hosts. This enables the standardization of monitoring workflows in your environment. Changes made on the template will be immediately applied on the hosts to which the template is linked. Zabbix comes prepackaged with over 300 templates for a variety of vendors and endpoint types.

Zabbix comes pre-packaged with over 300 official templates

Zabbix users aren’t limited to just the official templates – anyone can create their own templates with items and triggers tailored to the requirements of a particular environment. We also recommend adjusting the official templates – disable the unnecessary items and adjust the triggers so they don’t generate any unnecessary noise.

Items

Items are used to define the metrics that we wish to collect, and are configured on hosts or templates. Items can be of various types. The type of the item usually defines the protocol and the methods used to collect metrics via this item. Some examples of item types:

  • Zabbix agent
  • SNMP agent
  • SNMP trap
  • Simple check
  • HTTP agent
  • IPMI agent
  • JMX agent
  • SSH agent
  • …and many others.

The key of the item is used to specify what particular metric should be collected. There are some exceptions to this – for example, for SNMP agent items it’s the OID field, while the key can be written arbitrarily. The key should be unique per host.

Available memory Zabbix agent item

The key uses a <key>[<parameters>] format. For example, if we wish to collect available memory by utilizing Zabbix agent, we will use the vm.memory.size[available] item key. If we wish to collect available memory in percent, we would use the vm.memory.size[pavailable] item key. A quick item key reference is available by pressing select next to the Key field. You can find more about the available item keys and other configuration details in our documentation.

The update interval specifies how often metrics should be collected for this key, and the history/trend storage periods define for how long the collected data should be retained.

Triggers

Once we have configured our items, we should create triggers to react to item values reaching problem thresholds. First, let’s define a simple trigger name. The name should be simple enough for our Zabbix administrators to understand the goal of the trigger simply by glancing at it.

Trigger reaction to low available memory over the last 10 minutes

The event name field is used to define the name with which our problems will be displayed. Since the problem event name is often used not just in Zabbix but also in the alerts that your administrators will receive in their mailboxes or via messaging and ITSM systems, the event name should be more descriptive, giving general details about the problematic situation.

Operational data fields are used to display information about the current state of items analyzed by the trigger. By default, the field will display the current value of our item (available memory, for example). This allows users to compare the current item values with item values at the time of problem creation and decide if any additional interference is necessary to resolve the problem.

The expression field defines the logic behind detecting a problem. Here, we can either type in the expression manually or press the add button and build the expression by selecting the item that we wish to analyze – plus one of the various functions used for analysis. For example, the last function is used to analyze only the last received value and can generate a lot of noise when used for resource monitoring. Meanwhile, average, minimum, and maximum functions can be used to analyze values less sensitively over time. There are many more functions available for a variety of more advanced use cases – from string analysis functions to predictive functions and many others.

A large selection of functions can be used in trigger expressions to detect problems

Once the trigger is created, it will be recalculated every time any of the related items receive a new value.

This article covered only the basics of Host, item and trigger configuration. There are many more options for more advanced use cases. If you’re interested or need help with more advanced Zabbix features, please check out a variety of tutorials, how-tos and case studies in our blog and YouTube channel.

The post Getting Started with Zabbix – Hosts, Items, and Triggers appeared first on Zabbix Blog.

Creating a Personal Assistant in Zabbix with Artificial Intelligence

Post Syndicated from Cesar Caceres original https://blog.zabbix.com/creating-a-personal-assistant-in-zabbix-with-artificial-intelligence/29596/

Zabbix is dedicated to monitoring IT infrastructures based on predetermined thresholds, such as servers, networks, and applications. Incorporating artificial intelligence (AI) into Zabbix as a complement allows a user to mitigate alerts based on these predetermined thresholds, offering possible causes and solutions to problems. This can help a user resolve incidents more efficiently.

In this article, we will explain how to integrate Zabbix and Google’s AI tool Gemini by using the API provided as well as a custom widget alternative.

First steps towards integration

You can find the repository in GitHub based on the Google Gemini model. You’ll need to create an account in Google AI Studio to obtain the required API.

Script configuration in Zabbix

From Zabbix version 7.0, access:

“Alerts” > “Scripts” > “Create Script.”

For this functionality, we designated the name as “Possible cause and solution.” Next, we can configure the parameters with the trigger event and the API generated in AI Studio. We then copy and get the script from the repository mentioned in the «Script» field, as in the following image:

Application in the problem panel

After configuration, we access the alerts panel and select a specific alert. We click on “AI Assistant” and access the functionality that was previously named as “Possible cause and solution.”

The following images present an example of an agent installed on a notebook.

Possible cause:

Possible solution:

The AI ​​will be able to provide a precise solution for each problem presented, allowing us to progressively optimize the predetermined thresholds. 

Using the custom widget “What are you working on?”

Creating accurate personalized dashboards for the user is essential. With this in mind, we propose the creation of an AI-based widget called “What are you working on?” (¿Qué harías tu? in Spanish), which analyzes the current state of the problem presented in Zabbix.

This concept integrates all the functionalities present in the widget (including Summary, Perspectives, Diagnosis, Comparison, and Forecast), since the used prompt can indicate whether it is necessary to make adjustments to the strategic plan or predict future trends based on the panel data built.

To exemplify how the “What are you working?” widget works, let’s consider the analysis of disk usage on our Zabbix Server.

The creation of personalized widgets from the official Zabbix page.

Once we have knowledge for the project, on the backend of our Zabbix Server we locate the route:

/usr/share/zabbix/widgets/

Then, we create a carpet called “insights” and copy the following repository. It is necessary to place the Gemini API in the file «assets/js/class.widget.php.js» in the field “YOUR_API_KEY.”

On the frontend, we go to “Administration” > “General” > “Modules.”

In the upper right corner,  we click on “Scan Directory.” We have our widget to use:

After performing the scan, it is necessary to enable the widget, as it is disabled by default.

The importance of using AI in Zabbix

Let’s imagine a scenario with 100 monitored servers. Performance thresholds, Windows services, or other specific services can generate up to 50 weekly alerts. With the help of AI, it’s possible to reduce this number to a bare minimum, thanks to the weekly collection of possible causes and solutions.

This ground-level approach allows users to solve problems faster, but also improves overall health by minimizing necessary adjustments to the Zabbix server.

Implementing AI locally

Using a dedicated server with open source AI models like HuggingFace, it’s possible to implement the AI ​​locally and create a database collecting the possible causes and solutions of the events.

The AI ​​will learn from repetitive events, offering more accurate answers in the future. The analysis of possible trends can be based on the generated alerts. In this way, we can optimize our alerts and put artificial intelligence to work understanding and solving our problems.

Conclusion

The model we use is project-oriented. We are constantly evolving artificial intelligence, and we must use the model we know best. language is distinct due to the orientation of the prompts used for the answers and the learning we can provide, either by making requests to specific artificial intelligence platforms or by using it locally.

The post Creating a Personal Assistant in Zabbix with Artificial Intelligence appeared first on Zabbix Blog.

File Integrity Monitoring with Zabbix

Post Syndicated from Paulo R. Deolindo Jr. original https://blog.zabbix.com/file-integrity-monitoring-with-zabbix/29460/

We have often seen Zabbix used as a simple tool for monitoring network assets as well as Information and Communication Technology (ICT) infrastructure. While this concept is not incorrect, it is equally important to understand that with the advancement of Zabbix versions, more and more functionalities have been made available for other types of monitoring, enabling advanced data analysis and stunning visualizations through new and modern widgets in the frontend layer.

In this short blog post, we will explore some of the existing yet under-discussed features of Zabbix that contribute to the maturity of the cybersecurity discipline within organizations — a topic that is becoming increasingly critical in the corporate environment.

FIM – File Integrity Monitoring

FIM is a very common concept among information security tools, specifically in tools like SIEM/XDR (Security Information Event Management/Extended Detection and Response). The name is quite suggestive of its usability, but while some tools highlight this feature as one of their main functionalities, it is also available for those who use Zabbix – just not explicitly labeled under this name.
Here, we will approach FIM as a concept rather than just a functionality. This is because we aim to achieve a result, not merely have a menu with a name to claim compliance while using our tool. In fact, the outcome needs to be more important than mere “marketing.”

What should we expect from FIM?

Imagine that your servers have certain directories and/or files so critical that you cannot afford to neglect monitoring them for changes, insertions, or deletions. Additionally, these files may have owners and properties that must not be altered – otherwise, the systems that depend on them might lose the ability to read or execute their functions. This, at a minimum, is what we expect from FIM as a functionality.
To illustrate this a bit further, consider a database service like MariaDB:

# ls -lR /etc/mysql/
/etc/mysql/:
total 24
drwxr-xr-x 2 root root 4096 Jun 25 18:40 conf.d
-rwxr-xr-x 1 root root 1740 Nov 30 2023 debian-start
-rw------- 1 root root 544 Jun 25 18:43 debian.cnf
-rw-r--r-- 1 root root 1126 Nov 30 2023 mariadb.cnf
drwxr-xr-x 2 root root 4096 Sep 30 16:36 mariadb.conf.d
lrwxrwxrwx 1 root root 24 Oct 20 2020 my.cnf -> /etc/alternatives/my.cnf
-rw-r--r-- 1 root root 839 Oct 20 2020 my.cnf.fallback

/etc/mysql/conf.d:
total 8
-rw-r--r-- 1 root root 8 Oct 20 2020 mysql.cnf
-rw-r--r-- 1 root root 55 Oct 20 2020 mysqldump.cnf

/etc/mysql/mariadb.conf.d:
total 40
-rw-r--r-- 1 root root 575 Nov 30 2023 50-client.cnf
-rw-r--r-- 1 root root 231 Nov 30 2023 50-mysql-clients.cnf
-rw-r--r-- 1 root root 927 Nov 30 2023 50-mysqld_safe.cnf
-rw-r--r-- 1 root root 3795 Sep 30 16:36 50-server.cnf
-rw-r--r-- 1 root root 570 Nov 30 2023 60-galera.cnf
-rw-r--r-- 1 root root 76 Nov 8 2023 provider_bzip2.cnf
-rw-r--r-- 1 root root 72 Nov 8 2023 provider_lz4.cnf
-rw-r--r-- 1 root root 74 Nov 8 2023 provider_lzma.cnf
-rw-r--r-- 1 root root 72 Nov 8 2023 provider_lzo.cnf
-rw-r--r-- 1 root root 78 Nov 8 2023 provider_snappy.cnf

All the files, directories, and subdirectories listed above have already been configured, and the system (whatever it may be) is functioning perfectly. However, if someone suddenly decides to alter a configuration in the file /etc/mysql/mariadb.conf.d/50-server.cnf, this could be disastrous for the service. Regardless, the important thing to do is to monitor this scope and notify the relevant stakeholders so that an appropriate analysis can be conducted.

Zabbix can help with that. Let’s see how.

Zabbix and File Integrity Monitoring functions

Consider that the Zabbix agent is installed on the server to be monitored:

vfs.dir.count[/etc/mysql]

With this key, we can count the objects present within the /etc/mysql directory. Subsequently, we can create a trigger to be activated if there is any change related to the initial collection count, such as someone deleting or adding a file or directory in this location.

 

 

 

 

 

vfs.dir.size[/etc/mysql]

With this key, we can determine the total size in bytes used by the directories and configuration files. In the future, we can create a trigger that activates when this size changes, indicating the deletion or addition of a file.

vfs.file.exists[/etc/mysql/mariadb.conf.d/50-server.cnf]

Among several important files, we may have a greater interest in some configuration files, and we can validate their existence by creating a trigger that activates when such a file ceases to exist. This will clearly indicate that something important has disappeared.

In this case, the value “1” represents “OK” for the existence of the file.

vfs.file.cksum[/etc/mysql/mariadb.conf.d/50-server.cnf,sha256]

In addition to verifying the existence of the configuration file we consider important, we need to be informed if anything in it changes. This key handles that by generating a hash in a variety of possible formats, allowing a trigger to be activated in case of a hash change, which would reflect a file modification (unfortunately, we won’t know what exactly was altered).

vfs.file.regmatch[/etc/mysql/mariadb.conf.d/50-server.cnf,^max_connections\s+=\s+(\d+)]

We might have a specific parameter of interest – for example, the maximum number of connections allowed to the database. Monitoring this is important because if the configuration is set to the default value, it means that no “tuning” has been applied to the database. Alternatively, it could mean that someone simply deleted or commented out this line, causing it to be ignored by the system. Therefore, verifying whether the parameter exists and is properly configured is crucial.

In this case, the value “1” indicates that the regular expression was successfully found, meaning that the configuration or parameter we need to exist is indeed present.

vfs.file.regexp[/etc/mysql/mariadb.conf.d/50-server.cnf,^max_connections\s+=\s+(\d+),,,,\1]

Beyond verifying the existence and integrity of the file, it is also possible to determine what was changed within it. However, we would need to specify the configuration of interest using a regular expression. For example, considering that the maximum number of connections allowed by the database system is “x,” we can be alerted by a trigger if it changes to “y,” “z,” or any other value different from “x.” This setup allows us to monitor the parameter of interest with precision. This logic can be applied to any other parameter you consider important. Of course, there is another way to automate this process, but we will not cover that automation here.

In this case, the parameter defining the maximum number of connections is not only present, but we also know the exact number of connections. This way, we will have a history of the applied parameterization in case it is changed at any point.

vfs.file.owner[/etc/mysql/mariadb.conf.d/50-server.cnf]

vfs.file.owner[/etc/mysql/mariadb.conf.d/50-server.cnf,group]

The two keys above allow us to determine the owner of a file and (in the case of a Linux system) the owning group. We can also choose to monitor the user’s name or their UID in the system. Naturally, a trigger can be activated to alert us in case of an ownership change, indicating that someone might be “taking over” an important file in the system.

vfs.file.permissions[/etc/mysql/mariadb.conf.d/50-server.cnf]

The key above allows us to determine a file’s permissions—read, write, read and write, execution, or a special permission bit. Naturally, a trigger can be activated to alert us if there is any permission change in the file.

vfs.file.attr[/etc/mysql/mariadb.conf.d/50-server.cnf]

The key above does not exist by default. It was created with a UserParameter, which is a customization for verifying a command that, in this case, checks the attributes of a specific file. Consider the following command executed directly in your system’s terminal:

# lsattr /etc/mysql/mariadb.conf.d/50-server.cnf
--------------e------- /etc/mysql/mariadb.conf.d/50-server.cnf

What interests us are the attributes:

--------------e-------

If someone who invades the system modifies the attribute of a file (for example) using this command…

# chattr +A /etc/mysql/mariadb.conf.d/50-server.cnf
# lsattr /etc/mysql/mariadb.conf.d/50-server.cnf
-------A------e------- /etc/mysql/mariadb.conf.d/50-server.cnf

…it could mean that someone does not want the system to log when this file was accessed (refer to the chattr command manual). Additionally, any other attribute can be added or removed, which poses a risk to the system because these attributes can alter how files are accessed, stored on disk, and later read. Therefore, we can create a UserParameter as follows:

# cd /etc/zabbix/zabbix_agent2.d/
# echo "UserParameter=vfs.file.attr[*],lsattr \$1 | cut -d\" \" -f1" > attr.conf
# zabbix_agent2 -R userparameter_reload

Finally, we can test the reading of attributes directly from the terminal:

# zabbix_agent2 -t vfs.file.attr[/etc/mysql/mariadb.conf.d/50-server.cnf]
vfs.file.attr[/etc/mysql/mariadb.conf.d/50-server.cnf][s|-------A------e-------]

You can also try this now through the frontend.

When creating the item, don’t forget to create the trigger that should be activated in case there is a change in the attribute of a file, whatever it may be.

Paying attention to file access and modification times

To delve a bit deeper into the concept of FIM, we should ask ourselves if we are monitoring file access and modifications concerning their timestamps. In a way, if we have implemented everything proposed above, the answer is yes.

That said, there is an easier way to keep track of all the things we’ve discussed. It involves using this key:

vfs.dir.get[/etc/mysql]

When creating an item with this key, we will recursively obtain all its objects, such as subdirectories and files. The output format will be a JSON, which allows us to create LLD (Low-level Discovery) rules to automate FIM. Below is a small snippet of the monitoring output:

{
"basename": "mariadb.cnf",
"pathname": "/etc/mysql/mariadb.cnf",
"dirname": "/etc/mysql",
"type": "file",
"user": "root",
"group": "root",
"permissions": "0644",
"uid": 0,
"gid": 0,
"size": 1126,
"time": {
"access": "2024-11-30T23:01:01-0300",
"modify": "2023-11-30T01:42:37-0300",
"change": "2024-06-25T18:41:01-0300"
},
"timestamp": {
"access": 1733018461,
"modify": 1701319357,
"change": 1719351661
}
...

Considering that the output includes all objects from the main directory, this would be the most sensible approach to configure our FIM. However, it is necessary to create the LLD and prototypes. We will not cover this in detail in this article, but this is the path I recommend you follow.

Below is a “blueprint” for an LLD to create automated File Integrity Monitoring:

The “Master item”:

The “Dependent rule”:

The LLD Macro:

The item prototypes:

Below are the components of a trigger prototype (I created just one to symbolize a type of alert for file modification):

Name: Object: {#BASENAME} just changed

Event name: Object: {#BASENAME} just changed. Last hash: {ITEM.VALUE} The previous one: {?last(/MySQLDB/vfs.file.cksum["{#PATHNAME}",sha256],#2)} Object: {#BASENAME} just changed. Last hash: {ITEM.VALUE} The previous one: {?last(/MySQLDB/vfs.file.cksum["{#PATHNAME}",sha256],#2)}

Severity: Warning

Expression: last(/MySQLDB/vfs.file.cksum["{#PATHNAME}",sha256],#1)<>last(/MySQLDB/vfs.file.cksum["{#PATHNAME}",sha256],#2)

And then, some results:

Conclusion

The implementation of a robust File Integrity Monitoring system helps to ensure the security of IT infrastructure. Detecting unauthorized changes in critical files helps prevent attacks, identify security breaches, and ensure the integrity and availability of systems. With Zabbix, we have an effective solution to implement FIM, enabling process automation and the real-time visualization of changes. This monitoring not only reinforces protection against intrusions but also facilitates auditing and compliance with regulatory standards.

The main benefits of integrating File Integrity Monitoring with Zabbix include:

1. Early detection of changes in critical files, enabling quick responses.
2. Enhanced compliance with security regulations and internal policies.
3. Protection against malware and ransomware by identifying changes in essential files.
4. Ease of auditing with automated reports and modification histories.
5. Greater visibility and control over the integrity of data and systems in real time.
6. Operational efficiency through the automation of alerts and reports.
7. Improved proactive security, helping prevent attacks before they become critical.

By using Zabbix, organizations can strengthen their security posture and optimize risk management, ensuring that any unauthorized changes are detected and promptly corrected.

 

 

The post File Integrity Monitoring with Zabbix appeared first on Zabbix Blog.

NetBox as Home CMDB and Integrated with Zabbix

Post Syndicated from Janne Pikkarainen original https://blog.zabbix.com/netbox-as-home-cmdb-and-integrated-with-zabbix/29324/

Welcome to another episode of What’s up, home? weirdness! Who wouldn’t have their own NetBox at home – and who wouldn’t think of it as a  home CMDB? I’ve just started experimenting with it. For those who do not know, a Configuration Management Database (CMDB) is the source of truth for your inventory of stuff. In data centers, it keeps track of your servers, their cables, and everything else, telling you in which data center and which rack they are.

For me… well, take a look at for yourself. One picture says more than a thousand words of my storytelling.

What is it good for?

Well… in the real business world, it’s good for many things – from knowing about your assets, their serial numbers, purchase dates, hardware configuration, and so much else. I could go as deep as that, but there’s a limit how far even I want to go with these little experiments. Today’s case is merely to demonstrate the flexibility of Zabbix, yet again.

How did I do this?

I quickly threw the data in to NetBox by hand — it looks by a lot of work to do, but in fact, it wasn’t too bad – took me about 45 minutes to do the following:

  • Create a Site called “”What’s up, home?”
  • Create the rooms by adding new locations and making the previously created site as their parent
  • Add some manufacturers
  • Add some device roles
  • Add some device types

After that, adding the devices themselves is a breeze. If you have not used NetBox, this is what adding a new device looks like. Yes yes, in the real business world there would have been many more items for me to fill in, but for this case I only added the mandatory items and even those I could do just by choosing from the drop-down menus. Not a big deal.

…and the Zabbix integration?

Actually, this is something I created many years ago for other purposes, but still seems to work with today’s versions of NetBox. My little template queries NetBox over its API and asks if it has anything that matches with the host name that’s in Zabbix. If it has, then it gets the rack location and other stuff.

How this then works is pretty standard stuff. Retrieve a master item…

…and the dependent items then gather the data, parse some JSONPaths with Zabbix item preprocessing, and at least some of the items also populate bits and pieces in the Zabbix inventory. This is handy in real world, as your alerts can then contain the exact rack location and so forth about your failing devices. Add them as tags or add them as part of the alert text, your imagination is your limit.

Does it work?

Of course it does! Here’s the inventory grouped by manufacturer:

If I click on any of them, I get this:

Of course I can also browse the data through the latest data, for example…

…or I could just create some dashboards for visualizing all this. I have not done that yet, as this is what I did tonight so far and now I’m going to bed. To be continued – maybe! For now, the template only pulls data from NetBox, but I’d like to push data towards it as well, to also tell if a light bulb is powered on or not, for example. Stay tuned!

 

 

The post NetBox as Home CMDB and Integrated with Zabbix appeared first on Zabbix Blog.

An Introduction to Browser Monitoring

Post Syndicated from Alexander Petrov-Gavrilov original https://blog.zabbix.com/an-introduction-to-browser-monitoring/29245/

Website and web application monitoring can vary from simple use cases to complex multi-step scenarios. To fully cover the scope of modern website monitoring requirements, Zabbix has introduced Browser item, a new item type that brings with it multiple accompanying improvements for simulating browser behavior and collecting website metrics.

What is browser monitoring?

Browser monitoring allows users to monitor complex websites and web applications using an actual browser. It involves the constant tracking and analysis of the performance, reliability, and functionality of a website or web application from a real user perspective. This process ensures that key pages, features, and user navigation work as expected. By monitoring critical pages and flows specific to different businesses, companies can ensure optimal user experience, resolve potential or ongoing issues, and proactively address any potential problems.

Browser monitoring can be split into two main approaches:

  • Browser real user monitoring – Monitors how your web page or web application is performing, using real user data to analyze overall performance and user experience.
  • Browser synthetic monitoring – Analyzes application availability and performance, using scheduled testing to analyze website availability and emulate real user experience.

Since Zabbix is not a real person (yet) but is fully capable of emulating real user behavior on a website very precisely, we will focus on browser synthetic monitoring.

What business goals can we achieve with browser monitoring?

There are a multitude of goals that can be achieved, depending on what business we are running or expect to monitor, but some examples include:

Improving user experience

Browser monitoring helps ensure that users have a fast, smooth, and reliable experience on a website or web application. A positive user experience leads to higher user satisfaction and a greater likelihood of repeated visits or purchases.

Ensuring cross-browser and cross-device compatibility

Users access websites from a host of browsers and devices. Browser monitoring helps to detect compatibility issues that could affect certain users (e.g., JavaScript errors on specific browsers or layout shifts on mobile). By monitoring these scenarios, we can deliver a consistent experience across platforms, which is essential as multi-device usage continues to grow.

E-commerce checkout monitoring

Retailers can ensure a smooth checkout process by monitoring page load times, form interactions, and payment processing to confirm that users can easily complete purchases.

Form performance

Browser monitoring makes it easy to detect any issues preventing form completion, such as slow response times or broken validation. It also ensures a smooth, error-free experience to improve lead capture and gain more conversions.

Subscription renewal page monitoring

Subscription-based businesses rely on customers regularly renewing or upgrading their plans. Monitoring the subscription renewal page for load speed, usability, and any payment processing issues is essential, as issues on this page can directly the amount of renewals and lead to customer loss. 

Supporting portal uptime

Many businesses provide a customer support portal where users can submit requests or use a knowledge database. Downtime or slow response times can lead to frustrated customers and an increased number of complaints.

How to set up browser monitoring

There are a lot of goals we can reach, but the question remains – how can we reach them with Zabbix?  The answer is that we can use the already mentioned and newly introduced browser item.

Browser item configuration window

Browser items gather information by running custom JavaScript code and fetching data via HTTP or HTTPS protocols. These items can mimic browser activities like clicking buttons, typing text, navigating across webpages, and performing other user interactions within websites or web applications.

Along with the script, users can specify optional parameters (name-value pairs) and set a timeout limit for the actions. But before we can actually use the item, we will need to configure Zabbix server or Zabbix proxy with a WebDriver, so that Zabbix can actually control browser trough scripts.

What is a WebDriver? A WebDriver controls a browser directly, mimicking user interactions through a local machine or on a remote server, enabling full browser automation. The term WebDriver includes both the language-specific bindings and the individual browser control implementations, often simply called WebDriver. WebDriver is designed to offer a straightforward and streamlined programming interface trough an object-oriented API which efficiently manages and drives browser actions.

In this guide, for instance, we’ll use a WebDriver with Chrome within a Docker container and make a script that includes actions like button clicks and text entry.

WebDriver installation

One of the simplest ways to install a WebDriver is to use containers. To install a chrome WebDriver on a local or remote machine, you can use Docker or any other preferred container engines:

# podman run --name webdriver -d \
-p 4444:4444 \
-p 7900:7900 \
--shm-size="2g" \
--restart=always -d docker.io/selenium/standalone-chrome:latest

Port 4444 will be the port on which the WebDriver will be listening and port 7900 will be used by NoVNC, which allows us to observe browser behavior in case a browser with a GUI is used.

Zabbix server/proxy configuration

After WebDriver is installed, we need to set up the communication between Zabbix and the driver. This can be done by editing the Zabbix server/proxy configuration file and updating the following parameters:

### Option: WebDriverURL
#       WebDriver interface HTTP[S] URL. For example http://localhost:4444 used with 
#       Selenium WebDriver standalone server.
#
# WebDriverURL=
WebDriverURL=http://localhost:4444

### Option: StartBrowserPollers
#       Number of pre-forked instances of browser item pollers.
#
# Range: 0-1000
# StartBrowserPollers=1
StartBrowserPollers=5

With the configuration parameters in place, we will now configure our Browser item to collect and monitor the list of upcoming Zabbix trainings from the training schedule page.

Creating a host

First, we need to navigate to the “Data collection” > “Hosts” section and create a host that represents our web page. This is more than anything – a logical representation. This means we don’t need any specific interfaces or additional configuration. The host in our example will look like this:

Training page monitoring host
Creating a browser item

Since the data collection is done by items, we need to navigate to the “Items” section on the “Zabbix training schedule” host and create an item with the type “Browser.” It should look something like this:

Training schedule browser item

Now comes the most important part – creating the script to monitor the schedule. Click on the “Script” field.

First, we will need to define what browser we will use, and any extra options we might want to specify, like screen resolution or whether the browser should run in headless mode or not. This can be done using the Browser object. The Browser object manages WebDriver sessions and initializes a session upon creation, then terminates it upon destruction. A single script can support up to four Browser objects.

var browser, result;
var  opts = Browser.chromeOptions();
opts.capabilities.alwaysMatch['goog:chromeOptions'].args = []
browser = new Browser(opts);
browser.setScreenSize(Number(1980), Number(1020));

In this snippet, we defined that we will use the Chrome browser with a GUI. As you can see, the screen size is set to the pretty common 1980x1020p.

Now we will need to define what the browser will be doing. This can be done by using such Browser object methods as navigate – to point to the correct URL of the web page or application and (for example) findElement/findElements to return some element of the web page.

findElement/findElements methods allow us to define strategies to locate an element and selectors to provide what to look for. Strategies and selectors can be of multiple kinds:
strategy – (string, CSS selector/link text/partial link text/tag name/Xpath)
selector – (string) Element selector using the specified location strategy

Let’s take a look at the next snippet:

try {
    browser.navigate("https://www.zabbix.com/");
    browser.collectPerfEntries("open page");

    el = browser.findElement("xpath", "//span[text()='Training']");
    if (el === null) {
     throw Error("cannot find training");
    }
    el.click();

    el = browser.findElement("link text", "Schedule");
    if (el === null) {
        throw Error("cannot find application form");
    }
    el.click();

In this snippet,

  1. I am using a browser to navigate to the Zabbix page.
  2. I collect a range of performance entries related to opening the page (download speed, response time, etc.).
  3. I look for an element with the text “Training” using the XPath strategy, and the selector “Training.”
  4. I click on it, which is a method to interact with elements.
  5. In the next part, I use the strategy “link text” to find a link with the text selector “Schedule.”
  6. I click on it

A visual description would look like this:

Browser interaction with the zabbix.com website

Now, let’s do some more clicking to filter out all other trainings and leave only trainings in Korean and Dutch:

    el = browser.findElement("link text", "English");
    if (el === null) {
        throw Error("cannot find application form");
    }
    el.click();

    el = browser.findElement("xpath", "//span[text()='English']");
    if (el === null) {
        throw Error("cannot find application form");
    }
    el.click();

    el = browser.findElement("xpath", "//span[text()='Korean']");
    if (el === null) {
        throw Error("cannot find application form");
    }
    el.click();

    el = browser.findElement("xpath", "//span[text()='Dutch']");
    if (el === null) {
        throw Error("cannot find password input field");
    }
    el.click();

    Zabbix.sleep(2000);

English is selected by default, so the script “unclicks” it. Then it selects Korean and Dutch and uses the sleep function to have some extra time for the page to load and make a screenshot of the currently opened page:

List of trainings with language filters applied on it

Now let’s get the list of dates so we can monitor which trainings we have left in 2024:

el = browser.findElements("xpath", "//*[contains(text(), ' 20')]");
var dates = [];
for (var n = 0; n < el.length; n++) { 
    dates.push(el[n].getText('2024')); 
}

// Remove entries that do not contain "2024"
dates = dates.filter(function(date) {
    return date.includes('2024');
});

dates = uniq(dates);

In this case we do a bit of a jump, and now search for all elements that contain text 20 (to include all years), but filter them out by year 2024 specifically, which later can be easily replaced with 2025. The end result contains all the upcoming training dates:

Items containing the upcoming training dates

The full host export with the script snippet can be found by following this link.

An additional example

But what if I want to fill in a form? Maybe to make a purchase, create an order, or just test a contact form? Good news – that’s an even simpler operation! Let’s take a look at this snippet:

// enter name
var el = browser.findElement("xpath", "//label[text()='First Name']/following::input");
if (el === null) {throw Error("cannot find name input field");}
el.sendKeys("Aleksandrs");

// enter last name
var el = browser.findElement("xpath", "//label[text()='Last name']/following::input");
if (el === null) {throw Error("cannot find name input field");}
el.sendKeys("Petrovs-Gavrilovs");

// enter cert number
var el = browser.findElement("xpath", "//label[text()='Certificate number']/following::input");
if (el === null) {throw Error("cannot find name input field");}
el.sendKeys("CT-2404-003");

// select version
var el = browser.findElement("css selector", "form#certificate_validation>fieldset>div:nth-of-type(5)>select");
if (el === null) {throw Error("cannot find name input field");}
el.sendKeys("7.0");

// check certificate
var el = browser.findElement("xpath", "//button[text()='Check Certificate']");
if (el === null) {throw Error("cannot find name input field");}
el.click();

This way, I can validate that my certificate is still valid! 

As you can see, there are multiple ways to make a browser emulate user behavior and allow us to validate whether our pages and businesses are performing the way we expect them to! You can find even more examples in Zabbix documentation and Zabbix Certified Training, which I welcome you to attend!

The post An Introduction to Browser Monitoring appeared first on Zabbix Blog.

Using the zabbix_utils Library for Tool Development

Post Syndicated from Aleksandr Iantsen original https://blog.zabbix.com/python-zabbix-utils-alert-tracker-tool/29010/

In this article, we will explore a practical example of using the zabbix_utils library to solve a non-trivial task – obtaining a list of alert recipients for triggers associated with a specific Zabbix host. You will learn how to easily automate the process of collecting this information, and see examples of real code that can be adapted to your needs.

Over the last year, the zabbix_utils library has become one of the most popular tools for working with the Zabbix API. It is a convenient tool that simplifies interacting with the Zabbix server, proxy, or agent, especially for those who automate monitoring and management tasks.

Due to its ease of use and extensive functionality, zabbix_utils has found a following among system administrators, monitoring, and DevOps engineers. According to data from PyPI, the library has already been downloaded over 140,000 times since its release, confirming its demand within the community. It’s all thanks to you and your attention to zabbix_utils!

Task Description

Administrators often need to check which Zabbix users receive alerts for specific triggers in the Zabbix monitoring system. This can be useful for auditing, configuring new notifications, or simply for a quick diagnosis of issues. The task becomes especially relevant when you have plenty of hosts containing numerous triggers, and manually checking the recipients for each trigger through the Zabbix interface becomes very time-consuming. 

In such cases, it is advisable to use a custom solution based on the Zabbix API. You can directly access all the required data using the API, and then use additional logic to determine the final alert recipients. The zabbix_utils library makes working with the Zabbix API more convenient and allows you to automate this process. In this project, we use the zabbix_utils library to write a Python script that collects a list of alert recipients for the triggers of the selected Zabbix host. This will allow you to obtain the necessary information faster and with minimal effort.

Environment Setup and Installation

To get started with zabbix_utils, you need to install the library and configure the connection to the Zabbix API. This article provides more details and examples on getting started with the library. However, it would be better if I describe the basic steps to prepare the environment here. 

The library supports several installation methods described in the official README, making it convenient for use in different environments.

1. Installation via pip

The simplest and most common installation method is using the pip package manager. To do this, execute the command:

~$ pip install zabbix_utils

To install all necessary dependencies for asynchronous work, you can use the command:

~$ pip install zabbix_utils[async]

This method is suitable for most users, as pip automatically installs all required dependencies.

2. Installation from Zabbix Repository

Since writing the previous articles, we have added one more installation method – from the official Zabbix repository. First and foremost, you need to add the repository to your system if it has not been installed yet. Official Zabbix packages for Red Hat Enterprise Linux and Debian-based distributions are available on the Zabbix website.

For Red Hat Enterprise Linux and derivatives:

~# dnf install python3-zabbix-utils

For Debian / Ubuntu and derivatives:

~# apt install python3-zabbix-utils

3. Installation from Source Code

If you require the latest version of the library that has not yet been published on PyPI, or you want to customize the code, you can install the library directly from GitHub:

1. Clone the repository from GitHub:

~$ git clone https://github.com/zabbix/python-zabbix-utils

2. Navigate to the project folder:

~$ cd python-zabbix-utils/

3. Install the library by executing the command:

~$ python3 setup.py install

4. Testing the Connection to Zabbix API

After installing zabbix_utils, it is a good idea to check the connection to your Zabbix server via the API. To do this, use the URL to the Zabbix server, the token, or the username and password of the user who has permission to access the Zabbix API.

Example code for checking the connection:

from zabbix_utils import ZabbixAPI ZABBIX_AUTH = {     "url": "your_zabbix_server",     "user": "your_username",     "password": "your_password" } api = ZabbixAPI(**ZABBIX_AUTH) hosts = api.host.get(     output=['hostid', 'name'] ) print(hosts) api.logout()

Main Steps of the Task Solution

Now that the environment is set up, let’s look at the main steps for solving the task of retrieving the list of alert recipients for triggers associated with a specific Zabbix host in Zabbix.

In zabbix_utils, asynchronous API interaction support is built in through the AsyncZabbixAPI class. This allows multiple requests to be sent simultaneously and their results to be handled as they become ready, significantly reducing latencies when making multiple API calls. Therefore, we will use the AsyncZabbixAPI class and the asynchronous approach in this project.

Below are the main steps for solving the task, and code examples for each step. Please note that the code in this project is for demonstration purposes, may not be optimal, or could contain errors. Use it as an example or a base for your project, but not as a complete tool.

Step 1. Obtain Host ID

The first step is to identify the host for which we will retrieve information about triggers and alerts. We need to find the hostid using its name/host to do this. The Zabbix API provides a method to obtain this information, and using zabbix_utils makes this process much simpler.

Example of obtaining the host ID by its name:

host = api.host.get(     output=["hostid"],     filter={"name": "your_host_name"} )

This method returns a unique identifier for the host, which can be used further. However, for our test project, we will use a manually specified host identifier.

Step 2. Retrieve Host Triggers

With the hostid in hand, the next step is to retrieve all triggers associated with this host. Triggers contain the conditions that trigger the alerts. We need to collect information about all triggers so that we can then use it to select actions that match all the conditions.

Example of retrieving node triggers:

triggers = api.trigger.get(     hostids=[hostid],     selectTags="extend",     selectHosts=["hostid"],     selectHostGroups=["groupid"],     selectDiscoveryRule=["templateid"],     output="extend", )

This request returns complete information about the triggers for the host. We get not only the triggers but also their tags, associated host and host groups, and discovery rule information. All this information will be necessary to check the conditions of the actions.

Step 3. Initialize Trigger Metadata

At this stage, objects for each trigger are created to store their metadata. This is done using the Trigger class, which includes information about the trigger such as its name, ID, associated host groups, hosts, tags, templates, and operations.

Here’s the code defining the Trigger class:

class Trigger:     def __init__(self, trigger):         self.name = trigger["description"]         self.triggerid = trigger["triggerid"]         self.hostgroups = [g["groupid"] for g in trigger["hostgroups"]]         self.hosts = [h["hostid"] for h in trigger["hosts"]]         self.tags = {t["tag"]: t["value"] for t in trigger["tags"]}         self.tmpl_triggerid = self.triggerid         self.lld_rule = trigger["discoveryRule"] or {}         if trigger["templateid"] != "0":             self.tmpl_triggerid = trigger["templateid"]         self.templates = []         self.messages = []         self._conditions = {             "0": self.hostgroups,             "1": self.hosts,             "2": [self.triggerid],             "3": trigger["event_name"] or trigger["description"],             "4": trigger["priority"],             "13": self.templates,             "25": self.tags.keys(),             "26": self.tags,         }     def eval_condition(self, operator, value, trigger_data):         # equals or does not equal         if operator in ["0", "1"]:             equals = operator == "0"             if isinstance(value, dict) and isinstance(                 trigger_data, dict):                 if value["tag"] in trigger_data:                     if value["value"] == trigger_data[                         value["tag"]]:                         return equals             elif value in trigger_data and isinstance(                 trigger_data, list):                 return equals             elif value == trigger_data:                 return equals             return not equals         # contains or does not contain         if operator in ["2", "3"]:             contains = operator == "2"             if isinstance(value, dict) and isinstance(                 trigger_data, dict):                 if value["tag"] in trigger_data:                     if value["value"] in trigger_data[                         value["tag"]]:                         return contains             elif value in trigger_data:                 return contains             return not contains           # is greater/less than or equals         if operator in ["5", "6"]:             greater = operator != "5"             try:                 if int(value) < int(trigger_data):                     return not greater                 if int(value) == int(trigger_data):                     return True                 if int(value) > int(trigger_data):                     return greater             except:                 raise ValueError(                     "Values must be numbers to compare them"                 )       def select_templates(self, templates):         for template in templates:             if self.tmpl_triggerid in [                 t["triggerid"] for t in template["triggers"]]:                 self.templates.append(template["templateid"])             if self.lld_rule.get("templateid") in [                 d["itemid"] for d in template["discoveries"]             ]:                 self.templates.append(template["templateid"])     def select_actions(self, actions):         selected_actions = []         for action in actions:             conditions = []             if "filter" in action:                 conditions = action["filter"]["conditions"]                 eval_formula = action["filter"]["eval_formula"]             # Add actions without conditions directly             if not conditions:                 selected_actions.append(action)                 continue             condition_check = {}             for condition in conditions:                 if (                     condition["conditiontype"] != "6"                     and condition["conditiontype"] != "16"                 ):                     if (                         condition["conditiontype"] == "26"                         and isinstance(condition["value"], str)                     ):                         condition["value"] = {                             "tag": condition["value2"],                             "value": condition["value"],                         }                     if condition["conditiontype"] in self._conditions:                         condition_check[                             condition["formulaid"]                         ] = self.eval_condition(                             condition["operator"],                             condition["value"],                             self._conditions[                                 condition["conditiontype"]                             ],                         )                 else:                     condition_check[                         condition["formulaid"]                     ] = True             for formulaid, bool_result in condition_check.items():                 eval_formula = eval_formula.replace(                     formulaid, str(bool_result))
            # Evaluate the final condition formula             if eval(eval_formula):                 selected_actions.append(action)         return selected_actions       def select_operations(self, actions, mediatypes):         messages_metadata = []         for action in self.select_actions(actions):             messages_metadata += self.check_operations(                 "operations", action, mediatypes             )             messages_metadata += self.check_operations(                 "update_operations", action, mediatypes             )             messages_metadata += self.check_operations(                 "recovery_operations", action, mediatypes             )         return messages_metadata
    def check_operations(self, optype, action, mediatypes):         messages_metadata = []         optype_mapping = {             "operations": "0",  # Problem event             "recovery_operations": "1",  # Recovery event             "update_operations": "2",  # Update event         }         operations = copy.deepcopy(action[optype])         # Processing "notify all involved" scenarios         for idx, _ in enumerate(operations):             if operations[idx]["operationtype"] not in ["11", "12"]:                 continue             # Copy operation as a template for reuse             op_template = copy.deepcopy(operations[idx])             del operations[idx]             # Checking for message sending operations             for key in [                 k for k in ["operations", "update_operations"] if k != optype             ]:                 if not action[key]:                     continue                 # Checking for message sending type operations                 for op in [                     o for o in action[key] if o["operationtype"] == "0"                 ]:                     # Copy template for the current operation                     operation = copy.deepcopy(op_template)                     operation.update(                         {                             "operationtype": "0",                             "opmessage_usr": op["opmessage_usr"],                             "opmessage_grp": op["opmessage_grp"],                         }                     )                     operation["opmessage"]["mediatypeid"] = op[                         "opmessage"                     ]["mediatypeid"]                     operations.append(operation)         for operation in operations:             if operation["operationtype"] != "0":                 continue             # Processing "all mediatypes" scenario             if operation["opmessage"]["mediatypeid"] == "0":                 for mediatype in mediatypes:                     operation["opmessage"]["mediatypeid"] = mediatype[                         "mediatypeid"                     ]                     messages_metadata.append(                         self.create_messages(                             optype_mapping[optype], action, operation, [                                 mediatype                             ]                         )                     )             else:                 messages_metadata.append(                     self.create_messages(                         optype_mapping[optype],                         action,                         operation,                         mediatypes                     )                 )         return messages_metadata       def create_messages(self, optype, action, operation, mediatypes):         message = Message(optype, action, operation)         message.select_mediatypes(mediatypes)         self.messages.append(message)         return message

The code for creating Trigger class objects for each of the retrieved triggers:

for trigger in triggers:     triggers_metadata[trigger["triggerid"]] = Trigger(trigger)

This loop iterates through all triggers and saves them in a dictionary called triggers_metadata, where the key is the triggerid and the value is the trigger object.

Step 4. Retrieve Template Information

The next step is to obtain data about the templates associated with all the triggers:

templates = api.template.get(     triggerids=list(set([t.tmpl_triggerid for t in triggers_metadata.values()])),     selectTriggers=["triggerid"],     selectDiscoveries=["itemid"],     output=["templateid"], )

This request returns information about all templates linked to the host’s triggers being examined. Executing a single query for all triggers is a more optimal solution than making individual requests for each trigger. This information will be needed for evaluating the “Template” condition in actions.

Step 5. Get Actions and Media Types

Next, we obtain the list of actions and media types configured in the system:

actions = api.action.get(     selectFilter="extend",     selectOperations="extend",     selectRecoveryOperations="extend",     selectUpdateOperations="extend",     filter={"eventsource": 0, "status": 0},     output=["actionid", "esc_period", "eval_formula", "name"], )
mediatypes = api.mediatype.get(     selectUsers="extend",     selectActions="extend",     selectMessageTemplates="extend",     filter={"status": 0},     output=["mediatypeid", "name"], )

Here we retrieve actions that define how and to whom alerts are sent, and mediatypes through which users can receive notifications (for example, email or SMS).

Step 6. Match Triggers with Templates and Actions

At this stage, each trigger is associated with the corresponding templates and actions:

for trigger in triggers_metadata.values():     trigger.select_templates(templates)     messages += trigger.select_operations(actions, mediatypes)

Here, for each trigger, we update information about its templates and configured actions for sending notifications. The list of associated actions is determined by checking the conditions specified in them against the accumulated data for each trigger.

For each operation of the corresponding trigger action, a Message class object is created:

class Message:     def __init__(self, optype, action, operation):         self.optype = optype         self.mediatypename = ""         self.actionid = action["actionid"]         self.actionname = action["name"]         self.operationid = operation["operationid"]         self.mediatypeid = operation["opmessage"]["mediatypeid"]         self.subject = operation["opmessage"]["subject"]         self.message = operation["opmessage"]["message"]         self.default_msg = operation["opmessage"]["default_msg"]         self.users = [u["userid"] for u in operation["opmessage_usr"]]         self.groups = [g["usrgrpid"] for g in operation["opmessage_grp"]]         self.recipients = []         # Escalation period set to action's period if not specified         self.esc_period = operation.get("esc_period", "0")         if self.esc_period == "0":             self.esc_period = action["esc_period"]         # Use action's escalation period if unset         self.esc_step_from = self.multiply_time(             self.esc_period, int(operation.get("esc_step_from", "1")) - 1         )         if operation.get("esc_step_to", "0") != "0":             self.repeat_count = str(                 int(operation["esc_step_to"]) - int(operation["esc_step_from"]) + 1             )         # If not a problem event, set repeat count to 1         elif self.optype != "0":             self.repeat_count = "1"         # Infinite repeat count if esc_step_to is 0         else:             self.repeat_count = “&infin;”       def multiply_time(self, time_str, multiplier):         # Multiply numbers within the time string         result = re.sub(             r"(\d+)",             lambda m: str(int(m.group(1)) * multiplier),             time_str         )         if result[0] == "0":             return "0"         return result       def select_mediatypes(self, mediatypes):         for mediatype in mediatypes:             if mediatype["mediatypeid"] == self.mediatypeid:                 self.mediatypename = mediatype["name"]                 # Select message templates related to operation type                 msg_template = [                     m                     for m in mediatype["message_templates"]                     if (                         m["recovery"] == self.optype                         and m["eventsource"] == "0"                     )                 ]                 # Use default message if applicable                 if msg_template and self.default_msg == "1":                     self.subject = msg_template[0]["subject"]                     self.message = msg_template[0]["message"]       def select_recipients(self, user_groups, recipients):         for groupid in self.groups:             if groupid in user_groups:                 self.users += user_groups[groupid]         for userid in self.users:             if userid in recipients:                 recipient = copy.deepcopy(recipients[userid])                 if self.mediatypeid in recipient.sendto:                     recipient.mediatype = True                 self.recipients.append(recipient)

Each such object represents a separate message sent to users (recipients) and will contain all message information – its subject, text, recipients, and escalation parameters.

Step 7. Collect User and Group Identifiers

After matching the triggers with actions, the process of collecting unique identifiers for users and groups starts:

userids = set() groupids = set() for message in messages:     userids.update(message.users)     groupids.update(message.groups)

This code snippet collects the IDs of all users and groups involved in the operations for each trigger. This is necessary to perform only one request to the Zabbix API for all involved users and their groups, rather than making separate requests for each trigger.

Step 8. Obtain User and Group Information

The next step is to collect detailed information about users and user groups:

usergroups = {     group["usrgrpid"]: group     for group in api.usergroup.get(         selectUsers=["userid"],         selectHostGroupRights="extend",         output=["usrgrpid", "role"],     ) }   users = {     user["userid"]: user     for user in api.user.get(         selectUsrgrps=["usrgrpid"],         selectMedias=["mediatypeid", "active", "sendto"],         selectRole=["roleid", "type"],         filter={"status": 0},         output=["userid", "username", "name", "surname"],     ) }

Here we gather data about users, including their role and media types through which they receive notifications, as well as data about user groups, including access rights to host groups and the list of users in each group. All this information will be needed to check access to the host with the triggers we are working with.

Step 9. Match Users and Groups with Triggers

After obtaining user information, we match users and groups with their respective rights to receive notifications. Here we also link users with groups, updating the information regarding rights and groups for each user.

for userid in userids:     if userid in users:         user = users[userid]         recipients[userid] = Recipient(user)         for group in user["usrgrps"]:             if group["usrgrpid"] in usergroups:                 recipients[userid].permissions.update([                     h["id"]                     for h in usergroups[group["usrgrpid"]]["hostgroup_rights"]                     if int(h["permission"]) > 1                 ])   for groupid in groupids:     if groupid in usergroups:         group = usergroups[groupid]         user_groups[group["usrgrpid"]] = []         for user in group["users"]:             user_groups[group["usrgrpid"]].append(user["userid"])             if user["userid"] in recipients:                 recipients[user["userid"]].groups.update(group["usrgrpid"])             elif user["userid"] in users:                 recipients[user["userid"]] = Recipient(users[user["userid"]])             recipients[user["userid"]].permissions.update([                 h["id"]                 for h in group["hostgroup_rights"]                 if int(h["permission"]) > 1             ])

This code fragment connects each user with their groups and vice versa, creating a complete list of users with their access rights to the host, and thus their eligibility to receive notifications about events for this host.

For each recipient, a Recipient class object is created containing data about the recipient, such as the notification address, access rights to hosts, configured mediatypes, etc.

Here’s the code that describes the Recipient class:

class Recipient:     def __init__(self, user):         self.userid = user["userid"]         self.username = user["username"]         self.fullname = "{name} {surname}".format(**user).strip()         self.type = user["role"]["type"]         self.groups = set([g["usrgrpid"] for g in user["usrgrps"]])         self.has_right = False         self.permissions = set()         self.sendto = {             m["mediatypeid"]: m["sendto"] for m in user["medias"] if m["active"] == "0"         }         # Check if the user is a super admin (type 3)         if self.type == "3":             self.has_right = True

Step 10. Match Messages with Recipients

Finally, we match recipients with specific messages from Step 6:

for message in messages:     message.select_recipients(user_groups, recipients)

This step completes the main process – each message is assigned to the relevant recipients.

Step 11. Check Recipient Access Rights and Output the Result

Before the actual output of the result with the list of recipients, we can perform a check of the recipients’ message rights and filter only those who have the corresponding rights to receive notifications for the events related to the trigger, or those who have all configured media types specified and active. After these actions, the information can be output in any convenient way – whether it be exporting to a file or displaying it on the screen:

for trigger in triggers_metadata.values():     for message in trigger.messages:         for recipient in message.recipients:             recipient.show = True             if not recipient.has_right:                 recipient.has_right = (len([gid                     for gid in trigger.hostgroups                     if gid in recipient.permissions                 ]) > 0)             if not recipient.has_right and not show_unavail:                 recipient.show = False

Example Implementation

All the examples and code snippets described above have been compiled to create a solution demonstrating the algorithm for obtaining notification recipients for triggers associated with the selected host. We have implemented this algorithm as a simple web interface to make the result more illustrative and convenient for familiarization.

This interface allows users to enter the host’s ID. The script then processes the data and provides a list of notification recipients associated with the triggers on that host. The web interface uses asynchronous requests to the Zabbix API and the zabbix_utils library to ensure fast data processing and ease of use with many triggers and users.

This lets you familiarize yourself with the theoretical steps and code examples and also try to put this solution into action.

Please note once again that the code in this project is for demonstration purposes, may not be optimal, or could contain errors. Use it as an example or a base for your project, but not as a complete tool.

The web interface’s complete source code and installation instructions can be found on GitHub.

Conclusion

In this article, we explored a practical example of using the zabbix_utils library to solve the task of obtaining alert recipients for triggers associated with a selected Zabbix host using the Zabbix API. We detailed the key steps, from setting up the environment and initializing trigger metadata to working with notification recipients and optimizing performance with asynchronous requests.

Using zabbix_utils allowed us to optimize and accelerate interaction with the Zabbix API, expanding the capabilities of the  Zabbix web interface and increasing efficiency when working with large volumes of data. Thanks to support for asynchronous processing and selective API requests, it is possible to significantly reduce the load on the server and improve system performance when working with Zabbix, which is especially important in large infrastructures.

We hope this example will assist you in implementing your own solutions based on the Zabbix API and zabbix_utils, and demonstrate the possibilities for optimizing your interaction with the Zabbix API.

The post Using the zabbix_utils Library for Tool Development appeared first on Zabbix Blog.

Introducing Zabbix Cloud

Post Syndicated from Arturs Lontons original https://blog.zabbix.com/introducing-zabbix-cloud/24837/

With Zabbix Cloud you can near-instantly deploy a feature-complete Zabbix environment fine-tuned for best possible performance.

Have you ever wanted to deploy a Zabbix instance in just a few clicks without having to worry about creating orchestrations or writing scripts to achieve this task? What about always having the latest stable Zabbix version without having to go through the upgrade procedure yourself? And then there’s scalability – what if we told you that we can offer you a Zabbix instance that can be rescaled at a moment’s notice with the most optimal performance configuration for your monitoring workloads?

No, you’re not dreaming (try pinching yourself) – you’re simply looking at our latest offering – Zabbix Cloud.

 

Introducing – Zabbix Cloud

With Zabbix Cloud, anyone can deploy their own Zabbix instance with a push of a button. Forget about allocating dedicated hardware, installing packages or configuring docker containers. All you have to do is name your instance, select a compute offering and a region and you’re good to go!

With Zabbix Cloud you will get:

  • 7 Compute tiers
  • 5 Datacenter regions
  • Initial 10GB of storage free of charge
  • Full Zabbix feature set
  • A Zabbix instance optimized for best possible performance
  • Customizable history retention periods
  • Ability to define access filters and encrypt Zabbix connections
  • Automatic version updates and node backups

Zabbix Cloud does not limit your monitoring in any way – Zabbix Cloud nodes are fully capable of leveraging the same data collection, processing and analysis method as an on-prem Zabbix instance.

But how about instead of us talking about Zabbix Cloud, we give you a chance to deploy a trial Zabbix Cloud node – you know what they say – “Show, don’t tell”. Anyone can sign up for a free 5 day trial and check out Zabbix Cloud for themselves – no billing information required!

Note that Zabbix Cloud is currently in Early access. During the early access period sign-ups for Zabbix Cloud will processed and accepted in batches, which means it may take some time for us to process your Zabbix Cloud sign-up request.

Deploying a Zabbix Cloud node

Deploying a Zabbix Cloud node can be done with just a few clicks. Simply press the “Create new node” button, give your node a name and select your Region, Compute tier and Disk size. Once the node deployment is complete, you can access your Cloud Zabbix instance via its DNS name and the provided login credentials for the Admin account.

In case you forgot to copy or lost the initial password – the initial password can be copied to clipboard, cleared and reset by accessing the Overview tab of the Node configuration section.

Configure a new Zabbix Cloud node

Upgrading Zabbix Cloud node compute tier and storage space

There are only two limiting factors for Zabbix Cloud nodes – the available storage space and the number of maximum supported new values per second for a particular compute tier.

Once a maximum supported number of new values per second (NVPS) for the current compute tier is reached, the values exceeding the limit will be discarded at random. You can either reduce the number of collected metrics to match the compute tier NVPS limitations or upgrade your node to a higher compute tier to support the required number of NVPS. Upgrading to a higher compute tier can be done from the Upgrade tab of the Node Configuration section.

Disk utilization can also be increased from the Upgrade tab of the Node configuration section. Note that once you have increased your Zabbix Cloud node storage you will not be able to decrease it!

In addition, storage space usage can be adjusted by modifying the history retention periods. This can be done in the History tab of the Node configuration section. Here you can adjust storage periods for various types of history, trend and auditlog data. Reducing the storage periods will also reduce the total space usage on the current node.

Zabbix Cloud users can upgrade the Zabbix Cloud node Compute tier and Disk size

Automatic upgrades to the latest version

With Zabbix Cloud you don’t have to worry about manually upgrading your Zabbix Cloud nodes to the latest minor or major releases. Once the latest Zabbix release passes the internal testing and QA for Zabbix Cloud deployment, your Zabbix Cloud instance will be automatically upgraded to the latest version during the weekly maintenance periods chosen by you under the Node configuration section. 

The upgrade to the latest Zabbix major release can be postponed but it cannot be cancelled. Eventually, your Zabbix cloud node will be automatically upgraded to the latest major version.   

This ensures that your Zabbix Cloud nodes will always have all of the latest features, performance improvements and security fixes applied on them after passing a thorough internal QA process.

Backing up your Zabbix cloud node

Zabbix Cloud provides two options for performing Zabbix Cloud node backups:

  • Free automatic weekly backups
  • Manual backups with an attached monthly cost

Manual backups can be performed from the Backups tab of the Node configuration section. Here you can also see the next scheduled automatic backup and restore your Zabbix Cloud node from an existing backup. Currently Zabbix Cloud node backups cannot be exported from Zabbix Cloud and existing Zabbix instances cannot be migrated to Zabbix Cloud.

Zabbix Cloud nodes are automatically backed up on a weekly basis

Maintenance windows

Each of your Zabbix Cloud nodes is required to have at least 1 hour long weekly maintenance window defined in the Maintenance tab of the Node configuration section.

During the maintenance the following upgrades might be performed on your Zabbix Cloud node: 

  • Be upgraded to the latest Zabbix release 
  • Have the latest security fixes applied on the nodes 
  • Have the latest platform level upgrades applied on the nodes 
  • Be optimized for best possible performance 

Zabbix nodes may be taken offline during the maintenance period if the maintenance tasks require it. 

Zabbix Cloud nodes are required to have weekly maintenance window of at least 1 hour assigned to them

Conclusion

With Zabbix Cloud, deploying a fully optimized Zabbix instance is easier than ever. Automated upgrades and backups ensure that Zabbix Cloud will position itself as a streamlined, secure and up-to-date product with 24/7 availability and the latest Zabbix features available to everyone, no matter their technical proficiency in managing, tuning and upgrading Zabbix environments.

On top of that, Zabbix provides a large selection of professional services with the goal of helping our users to configure and troubleshoot their monitoring workflows, deliver custom templates, integrate Zabbix nodes with 3rd party solutions and more! Check out the full list of Zabbix professional services and see how we can further enhance your Zabbix experience.

 

 

The post Introducing Zabbix Cloud appeared first on Zabbix Blog.

Monitoring Zabbix Security Advisories

Post Syndicated from Brian van Baekel original https://blog.zabbix.com/monitoring-zabbix-security-advisories/28672/

Zabbix plays a crucial role in monitoring all kinds of “things” – IoT devices, domains, cloud infrastructures and more. It can also be integrated with third-party solutions – for example, with Oxidized for configuration backup monitoring. Given the nature of Zabbix, it usually contains a lot of confidential information as well as (more importantly) some kind of elevated access to network elements while being used by operators, engineers, and customers. This requires that Zabbix as a product should be as secure as possible.

Zabbix has upped their security game and is actively working with HackerOne to take full advantage of the reach of their global community by providing a bug bounty program. And though it doesn’t happen too often, from time to time a security issue arises in Zabbix or one of its dependencies, warranting the release of a Security Advisory.

The issue

Zabbix typically releases a Security Advisory and might even assign a CVE to the issue. Cool, that is what we expect from reputable software developers. They even inform their customers with support contracts before publishing the advisory, in order to allow them to patch installations beforehand.

Unfortunately, if you don’t have a support contract you’re expected to find out about these security advisories on your own, either by monitoring the Security Advisory page or by monitoring the published CVEs for Zabbix. NIST has a public API that can be used and that works well, but the issue with CVE’s is that they are often incomplete and thus useless. For example, CVE-2024-22119 contains far less information than the advisory.

Currently, Zabbix does not publish an API for their Security Advisories. There is the public tracker which contains all entries and can be queried via API, but because it is unstructured text, it is really hard to parse.

The solution

We want to automatically be notified of new security advisories, and the only data source that contains all data in a structured way is the Zabbix Security Advisory page. However, structured doesn’t mean easily parseable – in fact, it is just raw HTML. We could try to solve this issue in Zabbix, but the easier solution in this case is to scrape the page and generate a JSON file which then can be parsed by Zabbix to achieve our goal, which is automated notifications of new advisories.

Webscraping

We’ve chosen to scrape the Zabbix site using Rust, utilizing the Scraper crate to parse the HTML and flesh out the relevant parts we want. Without going into too much detail, the interesting information is stored in 2 tables, one with the table-simple class applied and one with the table-vertical class applied. Using CSS selectors (which is what the Scraper crate requires), we can retrieve the information we want.

This information is then stored in a struct, which gets added to a hashmap. The result is stored in a vector, which is added to a struct, which eventually is used to generate the JSON we require. Phew.

The resulting JSON is easily parseable by Zabbix:

{
  "last_updated": {
    "secs": ,
    "nanos": ,
  },
  "reports": [
    _list of reports_
  ]
}

The ‘reports’ array contains one entry per advisory, and each entry has the following layout. Unsurprisingly, this closely matches the information that is available on the Zabbix Security Advisory page:

    {
      "_zbxref_": {
        "zbxref": "_zbxref_",
        "cveref": "CVE-XXXX-XXXX",
        "score": X.X,
        "synopsis": "_synopsis_",
        "description": "_description_",
        "vectors": "_vectors_",
        "resolution": "_resolution_",
        "workaround": "_workaround_",
        "acknowledgement": "_acknowledgement_",
        "components": [
          _list of components_,
          _list of components_
        ],
        "affected_version": [
          {
            "affected": "_version_",
            "fixed": "_version_"
          }
        ]
      }
    }

Now, we could provide you with the code of the scraping tool and wish you good luck with making sure the tool runs every X hours and somehow, somewhere stores the resulting JSON for Zabbix to parse. That would be the easy way out, right?

Instead, we’ve chosen to host the Rust program as an AWS Lambda function, triggered every 2 hours by the AWS EventBridge Scheduler and with some code added to the Rust program (function?) to upload the resulting JSON to an AWS S3 bucket. This chain of AWS products not only makes sure that our cloud bill increases, but also guarantees we don’t have to host (and maintain!) anything ourselves.

The result? Just one HTTP GET away…

Template

TL;DR: Download the template here.

Now that the data is available in JSON, it’s fairly easy to parse it using Zabbix. Using the HTTP Agent data collection, we download the JSON from AWS. The URI is stored in the {$ZBX_ADVISORY_URI} macro, which allows for easy modification. By default, it points to the JSON file hosted on AWS S3. This retrieval is done by the Retrieve the Zabbix Security Advisories item, which acts as the source for every other operation. It retrieves the JSON every hour, and with the JSON being generated every 2 hours, the maximum delay between Zabbix publishing a new advisory and you getting it into Zabbix is 3 hours.

The retrieve the Zabbix Security Advisories item acts as a master item for the Last Updated item. This item uses a JSONPath preprocessing step to flesh out the information we want: $.last_updated.secs. The resulting data is stored as unixtime so that we mere mortals can easily read when the last update of the JSON file was performed.

A trigger is configured for this item to ensure that the JSON file isn’t too old. The trigger JSON Feed is out of date has the following expression:
last(/Zabbix Security Advisories/zbx_sec.last_updated)>{$ZBX_ADVISORY_UPDATE_INTERVAL}*{$ZBX_ADVISORY_UPDATE_THRESHOLD}

By default, {$ZBX_ADVISORY_UPDATE_INTERVAL} is set to 2 hours (which is the interval the file gets updated by our tool) and {$ZBX_ADVISORY_UPDATE_THRESHOLD} is set to 3. So, when the JSON file hasn’t been updated within the last 6 hours, this trigger will trigger.

The item Number of advisories uses the same principle, where a JSONPath preprocessing step is used to flesh out the information we want: $.reports. However, as $.reports is an array, we can use functions on it. In this case .length(), which returns an integer. This number is used in the associated trigger A new Zabbix Security Advisory has been published, which simply triggers when the value changes.

This is all very cool, but the JSON has a lot more information, including details about each report. In order to get these details into Zabbix, we use a discovery rule to ‘loop’ through the JSON and create items based on what we’ve discovered: Discover Advisories. This rule uses (again) a JSONPath preprocessing step to get the details we want: $.reports[*][*]. Based on the resulting data (which is a single report in this case), 2 LLD Macros are assigned: {#ZBXREF} – based on the JSONpath $.zbxref and {{#CVEREF} – based on the JSONpath $.cveref.

For each discovered report, 8 items are created. They all work using the same principle, so I will only describe one: Advisory {#ZBXREF} / {#CVEREF} – Acknowledgement. This item uses the master item Zabbix Security Advisories, just like all other items described so far. JSONPath is once again used to get the information we want. The expression $.reports[*][“{#ZBXREF}”].acknowledgement.first() provides exactly what we need, where we combine a LLD macro ({#ZBXREF}) and a JSONpath function (.first()) to first ‘select’ the correct advisory in the JSON and then retrieve the value.

All other 7 items work like this, and there is only one exception: Advisory {#ZBXREF} / {#CVEREF} – Components. The ‘components’ value in the JSON file is actually an array with 1 or more items, describing which components might be affected. But we cannot store arrays in Zabbix, so we use another preprocessing step to convert the array into a string. A few lines of Javascript is all we need:

components = JSON.parse(value);
return components.toString();

First, we parse the JSON input (‘value’) into an array, only to apply the javascript .toString() function on it. The toString method of arrays calls join() internally, which joins the array and returns one string containing each array element separated by commas, which is exactly what we want: a string, separated by commas.

To make working with these advisories easier, each item has the component tag applied, with the value zabbix_security. If the item belongs to an advisory, the advisory tag is added with the value of {#ZBXREF} (which is the advisory number/name). That way, we can easily filter on all Zabbix Security items, filter on all items for a single advisory, and (to make things even better) the type tag is also applied, with the actual type being ‘workaround’ or ‘description.’ This allows for filtering on all Zabbix Security items, of the type ‘score’ (et cetera) to easily gain insight into the different advisories and their score, synopsis, description, components, et cetera.

Dashboard

The tags on the items allow for filtering, but with Zabbix 7.0 we can use all great new nifty features, such as the Item Navigator widget combined with the Item Value widget. Let’s take a look at what configuring such a dashboard might look like if you set up the Item Navigator widget as follows:

Item Navigator configuration

And then ‘link’ the Item Value widget to it:

You should get a somewhat decent dashboard. It isn’t perfect (given that the Item Value widget only seems to be able to display a single line of text) but it’s something.

Disclaimer

Though we use this functionality ourselves, this all comes without any guarantee. The technology used to retrieve data (screen scraping) is mediocre at best and could break at any moment if and when Zabbix changes the layout of their page.

The post Monitoring Zabbix Security Advisories appeared first on Zabbix Blog.

Zabbix Audit Log: Underlying Design and Considerations

Post Syndicated from Artjoms Rimdjonoks original https://blog.zabbix.com/zabbix-audit-log-underlying-design-and-considerations/28328/

An audit improves the security of a product, specifically the “non-repudiation” aspect in threat-models (risks are reduced when threat-agents cannot deny they did malicious activity). Zabbix 5.0 already had audit functionality, which received a major rewrite in 6.0 and several updates since then. In this blog post, we will go through them and get an overall picture of what has changed (and why).

The server side work on 6.0 was mostly done and further improved in 7.0. Front-end work is still ongoing (due to a larger scope). The main goal of a Zabbix audit is to track all configuration and settings changes – who, when, and what. This is an enterprise-level requirement, but non-enterprise users can also benefit.

The situation before Zabbix 6.0

When a host or template is added, only its name is recorded, without info about items, triggers, tags, etc. The linking of the template on the host is not audited. Everything on the screen is an audit done by the front-end, except the script execution. Zabbix Server itself actually does a lot of configuration changes, including adding and updating hosts and updating items (during LLD or when linking templates during auto-registration or network discovery), but there is no audit for that at all. There are also non-configuration changes (events) we want to audit, including:

  • Script execution (already audited in 5.0):

  • Reloading passive proxy config data (ZBXNEXT-1580), added in 6.2:

  • HA node status change (ZBXNEXT-6923), added in 6.0, history push API requests, and sending data to Zabbix server via API (ZBXNEXT-8541), added in 7.0:

Audit overview

Most Zabbix server audit logic is in:

a) Linking of templates (as a result of auto-registration or network discovery) with updates to:

  • Hosts
  • Items
  • Triggers
  • Graphs
  • Discovery Rules (and prototypes of everything above)
  • Web Scenarios

b) LLD, with the following entities created from prototypes:

  • Hosts
  • Items
  • Triggers
  • Graphs

New audit goals

In addition to the main functional requirement to “track all configuration and settings changes,” there are additional requirements aimed at making all audits faster and easier to manage:

  • All audits are now stored in a single table (Simpler and faster SQL queries)
  • Bulk SQL inserts and efficient ids generation
  • The audit of a particular entity stays longer than this entity. If an entity – (host or user) is deleted – the audit for it stays
  • The audit has an independent housekeeping schedule
  • It is still possible to disable the audit

CUID

Zabbix uses an ids table to generate ids:

When something (items, triggers etc.) needs to be generated, the related row in the ids table gets locked. This represents a problem for generating audit rows, because an audit can be generated independently by the server and front-end:

So, we could end up in a situation where a user cannot create an item because the server is holding a lock on the ids table while generating thousands of new LLD items. That is why a new method for generating ID was used for audits:

Thanks to it, the front-end and server can independently generate ids for audit entries without locks. The chance of collision is astronomically low.

System user

When it is not clear under which user an audit entry needs to be generated, it is recorded under “System user.” Most of the audits done by Server are done under “System user.” One exception is “script execution,”” since it is clear which user clicked on the script execution button. However, under which user should the server record audit entries when new items are generated during LLD? We could track down which user created the LLD rule, but what if the LLD rule was then modified by another user? For such cases, “System user” is used.

RecordSetID

From the spec: “To have the ability to recognize that some set of audit log records was created during the processing of a separate operation, a new column “Recordset ID” for audit log records will be provided. Each audit log record of the separate operation will have the same recordset ID. The recordset ID will be generated using the CUID algorithm.”

We can see that 2 graphs were created in a single operation (e.g. during the linking of one template with 2 graphs).

Audit details

A new audit contains much more information on what was changed with new details:

Upgrade patch

The warning, old auditlog, and auditlog_details tables are removed during the upgrade patch to 6.0. A new auditlog table is created, and the schema is updated.

  • auditid is now CUID
  • userid can be NULL (no more foreign reference on users table)
  • username is added
  • resource_cuid is added(alternative to resource, only for HA)
  • recordsetid is added
  • note and other auditlog_details table data now is in details (JSON)

BulkSQL

It is much more efficient to execute SQL queries in bulk. Zabbix already relies on bulk SQL queries:

Inserting and/or updating thousands of new items in one query is much faster than running thousands of individual queries. There are many reasons why this is the case, but the most basic answer is that DBs are designed this way. Another reason is that a large single query in PostgreSQL needs to start the planner/optimizer once, and then it would be able to properly analyze this large query and create an efficient execution plan.

When running thousands of separate queries, the planner/optimizer needs to be started for each query, and every time it would analyze the small query and decide there is not much it can do. When a server is doing some configuration changes, like LLD or templates linking during auto-registration or network discovery, it will insert/update/delete items/triggers and also auditlog entries in one large query.

Performance impact

Quick performance tests showed that the audit slows the server at most by 4-5%. The larger the setup, the smaller the impact will be.

Storage impact and administration

Zabbix audits can generate a lot of data. If your setup generates a lot of configuration, audits can eventually overrun the storage space. In this case, there are several audit configurations that could be helpful.

First of all, an audit can be disabled for all Zabbix, including the front-end:

Disabling audit is not advised, however – this option exists mostly as a possible workaround. Audit is enabled by default and Zabbix is developed and tested with audit enabled.

Log system actions button:

A disabled audit done by Zabbix server during auto-registration, network discover, and LLD. On some systems, these can generate a lot of configurations and audits, for example when LLD discovers hundreds of new devices every minute.

This could help reduce the storage impact while preserving all other audit functionality.

Housekeeping schedule:

If a host, trigger, or graph is deleted (by housekeeper or manually), the audit generated for it stays (as it exists in a separate table).

A Zabbix audit has its own independent housekeeping schedule, and it can be adjusted to suit your environment.

The post Zabbix Audit Log: Underlying Design and Considerations appeared first on Zabbix Blog.

Zabbix 7.0 Proxy Load Balancing

Post Syndicated from Markku Leiniö original https://blog.zabbix.com/zabbix-7-0-proxy-load-balancing/28173/

One of the new features in Zabbix 7.0 LTS is proxy load balancing. As the documentation says:

Proxy load balancing allows monitoring hosts by a proxy group with automated distribution of hosts between proxies and high proxy availability.

If one proxy from the proxy group goes offline, its hosts will be immediately distributed among other proxies having the least assigned hosts in the group.

Proxy group is the new construct that enables Zabbix server to make dynamic decisions about the monitoring responsibilities within the group(s) of proxies. As you can see in the documentation, the proxy group has only a minimal set of configurable settings.

One important background information to understand is that Zabbix server always knows (within reasonable timeframe) which proxies in the proxy groups are online and which are not. That’s because all active proxies connect to the Zabbix server every 1 second by default (DataSenderFrequency setting in the proxy), and Zabbix server connects to the passive proxies also every 1 second by default (ProxyDataFrequency setting in the server), so if those connections are not happening anymore, then something is wrong with using the proxy.

Initially Zabbix server will balance the hosts between the proxies in the proxy group. It can also rebalance the hosts later if needed, the algorithm is described in the documentation. That’s something we don’t need to configure (that’s the “automated distribution of hosts” mentioned above). The idea is that, at any given time, any host configured to be monitored by the proxy group is monitored by one proxy only.

Now let’s see how the actual connections work with active and passive Zabbix agents. The active/passive modes of the proxies (with the Zabbix server connectivity) don’t matter in this context, but I’m using active proxies in my tests for simplicity.

Disclaimer: These are my own observations from my own Zabbix setup using 7.0.0, and they are not necessarily based on any official Zabbix documentation. I’m open for any comments or corrections in any case.

At the very end of this post I have included samples of captured agent traffic for each of the cases mentioned below.

Passive agents monitored by a proxy group

For passive agents the proxy load balancing really is this simple: Whenever a proxy goes down in a proxy group, all the hosts that were previously monitored by that proxy will then be monitored by the other available proxies in the same proxy group.

There is nothing new to configure in the passive agents, only the usual Server directive to allow specific proxies (IP addresses, DNS names, subnets) to communicate with the agent.

As a reminder, a passive agent means that it listens to incoming requests from Zabbix proxies (or the Zabbix server), and then collects and returns the requested data. All relevant firewalls also need to be configured to allow the connections from the Zabbix proxies to the agent TCP port 10050.

As yet another reminder, each agent (or monitored host) can have both passive and active items configured, which means that it will both listen to incoming Zabbix requests but also actively request any active tasks from Zabbix proxies or servers. But again, this is long-existing functionality, nothing new in Zabbix 7.0.

Active agents monitored by a proxy group

For active agents the proxy load balancing needs a bit new tweaking in the agent side.

By definition, an active agent is the party that initiates the connection to the Zabbix proxy (or server), to TCP port 10051 by default. The configuration happens with the ServerActive directive in the agent configuration. According to the official documentation, providing multiple comma-separated addresses in the ServerActive directive has been possible for ages, but it is for the purpose of providing data to multiple independent Zabbix installations at the same time. (Think about a Zabbix agent on a monitored host, being monitored by both a service provider and the inhouse IT department.)

Using semicolon-separated server addresses in ServerActive directive has been possible since Zabbix 6.0 when Zabbix servers are configured in high-availability cluster. That requires specific Zabbix server database implementation so that all the cluster nodes use the same database, and some other shared configurations.

Now in Zabbix 7.0 this same configuration style can be used for the agent to connect to all proxies in the proxy group, by entering all the proxy addresses in the ServerActive configuration, semicolon-separated. However, to be exact, this is not described in the ServerActive documentation as of this writing. Rather, it specifically says “More than one Zabbix proxy should not be specified from each Zabbix server/cluster.” But it works, let’s see how.

Using multiple semicolon-separated proxy addresses works because of the new redirection functionality in the proxy-agent communication: Whenever an active agent sends a message to a proxy, the proxy tells the agent to connect to another proxy, if the agent is currently assigned to some other proxy. The agent then ceases connecting to that previous proxy, and starts using the proxy address provided in the redirection instead. Thus the agent converges to using only that one designated proxy address.

In this simple example the Zabbix server determined that the agent should be monitored by Proxy 1, so when the agent initially contacted Proxy 1 (because its IP address is first in the ServerActive list), the proxy responded normally and agent was happy with that.

In case the Zabbix server had for any reason determined that the agent should be monitored by Proxy 2, then Proxy 1 would have responded with a redirection, and agent would have followed that. (There will be examples of redirections in the capture files below.)

To be clear, this agent redirection from the proxy group works only with Zabbix 7.0 agents as of this writing.

Note: In the initial version of this post I used comma-separated proxy addresses in ServerActive (instead of semicolon-separated), and that caused duplicate connections from the agent to the designated proxy (because the agent is not equipped to recognize that it connects to the same proxy twice), eventually causing data duplication in Zabbix database. Using comma-separated proxy addresses is thus not a working solution for proxy load balancing usage.

If the host-proxy assignments are changed by the Zabbix server for balancing the load between the proxies, the previously designated proxy will redirect the agent to the correct proxy address, and the situation is optimized again.

Side note: When configuring the proxies in Zabbix UI, there is a new Address for active agents field. That is the address value that is used by the proxies when responding with redirection messages to agents.

Proxy group failure scenarios with active agents

Proxy goes down

If the designated proxy of an active agent goes offline so that it doesn’t respond to the agent anymore, agent realizes the situation, discards the redirection information it had, and reverts to using the proxy addresses from ServerActive directive again.

Now, this is an interesting case because of some timing dependencies. In the proxy group configuration there is the Failover period configuration that controls the Zabbix server’s sensitivity to proxy availability in regards to agent rebalancing within the proxy group. Thus, if the agent reverts to using the other proxies faster than Zabbix server recognizes the situation and notifies the other proxies in the proxy group, the agent will get redirection responses from the other proxies, telling it to use the currently offline proxy. And the same happens again: agent fails to connect to the redirected proxy, and reverts to using the other locally configured proxies, and so on.

In my tests this looping was not very intense, only two rounds every second, so it was not very significant network-wise, and the situation will converge automatically when the Zabbix server has notified the proxies about the host rebalancing.

So this temporary looping is not a big deal. The takeaway is that the whole system converges automatically from a failed proxy.

After the failed proxy has recovered to online mode, the agents stay with their designated proxies in the proxy group.

As mentioned in the beginning, Zabbix server will automatically rebalance the hosts again after some time if needed.

Proxy is online but unreachable from the active agent

Another interesting case is one where the proxy itself is running and communicating with Zabbix server, thus being in online mode in the proxy group, but the active agent is not able to reach it, while still being able to connect to the other proxies in the group. This can happen due to various Internet-related routing issues for example, if the proxies are geographically distributed and far away from the agent.

Let’s start with the situation where the agent is currently monitored by Proxy 2 (as per the last picture above). When the failure starts and agent realizes that the connections to Proxy 2 are not succeeding anymore, the agent reverts to using the configured proxies in ServerActive, connecting to Proxy 1.

But, Proxy 1 knows (by the information given by Zabbix server) that Proxy 2 is still online and that the agent should be monitored by Proxy 2, so Proxy 1 responds to the agent with a redirection.

Obviously that won’t work for the agent as it doesn’t have connectivity to Proxy 2 anymore.

This is a non-recoverable situation (at least with the current Zabbix 7.0.0) while the reachability issue persists: The agent keeps on contacting Proxy 1, keeps receiving the redirection, and the same repeats over and over again.

Note that it does not matter if the agent is now locally reconfigured to only use Proxy 1 in this situation, because the load balancing of the hosts in the proxy group is not controlled by any of the agent-local configuration. The proxy group (led by Zabbix server) has the only authority to assign the hosts to the proxies.

One way to escape from this situation is to stop the unreachable Proxy 2. That way the Zabbix server will eventually notice that Proxy 2 is offline, and the hosts will be automatically rebalanced to other proxies in the group, thus removing the agent-side redirection to the unreachable proxy.

Keep this potential scenario in mind when planning proxy groups with proxy location diversity.

This is also something to think about if your Zabbix proxies have multiple network interfaces, where Zabbix server connectivity is using different interface from the agent connectivity. In that case the same problem can occur due to your own configurations.

Closing words

All in all, proxy load balancing looks very promising feature as it does not require any network-level tricks to achieve load balancing and high availability. In Zabbix 7.0 this is a new feature, so we can expect some further development for the details and behavior in the upcoming releases.


Appendix: Sample capture files

Ideally these capture files should be viewed with Wireshark version 4.3.0rc1 or newer because only the latest Wireshark builds include support for latest Zabbix protocol features. Wireshark 4.2.x should also show most of the Zabbix packet fields. Use display filter “zabbix” to see only the Zabbix protocol packets, but when examining cases more carefully you should also check the plain TCP packets (without any display filter) to get more understanding about the cases.

These samples are taken with Zabbix components version 7.0.0, using default timers in the Zabbix process configurations, and 20 seconds as the proxy group failover period.

Passive agent, with proxy failover

    • After frame #50 Proxy 1 was stopped and Proxy 2 eventually took over the monitoring

Active agent, with proxy failover

    • The agent initially communicates with Proxy 1
    • Proxy 1 was stopped before frame #425
    • Agent connected to Proxy 2, but Proxy 2 keeps sending redirects
    • Proxy 2 was assigned the agent before frame #1074, so it took over the monitoring and accepted the agent connections
    • Proxy 1 was later restarted (but agent didn’t try to connect to it yet)
    • The agent was manually restarted before frame #1498 and it connected to Proxy 1 again, was given a redirection to Proxy 2, and continued with Proxy 2 again

Active agent, with proxy unreachable

    • Started with Proxy 2 monitoring the agent normally
    • Network started dropping all packets from the agent to Proxy 2 before frame #179, agent started connecting to Proxy 1 almost immediately
    • From frame #181 on Proxy 1 responds with redirection to Proxy 2 (which is not working)
    • Proxy 2 was eventually stopped manually
    • Redirections continue until frame #781 when Proxy 1 is assigned the monitoring of the agent, and Proxy 1 starts accepting the agent requests

This post was originally published on the author’s blog.

The post Zabbix 7.0 Proxy Load Balancing appeared first on Zabbix Blog.

Make your interaction with Zabbix API faster: Async zabbix_utils.

Post Syndicated from Aleksandr Iantsen original https://blog.zabbix.com/make-your-interaction-with-zabbix-api-faster-async-zabbix_utils/27837/

In this article, we will explore the capabilities of the new asynchronous modules of the zabbix_utils library. Thanks to asynchronous execution, users can expect improved efficiency, reduced latency, and increased flexibility in interacting with Zabbix components, ultimately enabling them to create efficient and reliable monitoring solutions that meet their specific requirements.

There is a high demand for the Python library zabbix_utils. Since its release and up to the moment of writing this article, zabbix_utils has been downloaded from PyPI more than 15,000 times. Over the past week, the library has been downloaded more than 2,700 times. The first article about the zabbix_utils library has already gathered around 3,000 views. Among the array of tools available, the library has emerged as a popular choice, offering developers and administrators a comprehensive set of functions for interacting with Zabbix components such as Zabbix server, proxy, and agents.

Considering the demand from users, as well as the potential of asynchronous programming to optimize interaction with Zabbix, we are pleased to present a new version of the library with new asynchronous modules in addition to the existing synchronous ones. The new zabbix_utils modules are designed to provide a significant performance boost by taking advantage of the inherent benefits of asynchronous programming to speed up communication between Zabbix and your service or script.

You can read the introductory article about zabbix_utils for a more comprehensive understanding of working with the library.

Benefits and Usage Scenarios

From expedited data retrieval and real-time event monitoring to enhanced scalability, asynchronous programming empowers you to build highly efficient, flexible, and reliable monitoring solutions adapted to meet your specific needs and challenges.

The new version of zabbix_utils and its asynchronous components may be useful in the following scenarios:

  • Mass data gathering from multiple hosts: When it’s necessary to retrieve data from a large number of hosts simultaneously, asynchronous programming allows requests to be executed in parallel, significantly speeding up the data collection process;
  • Mass resource exporting: When templates, hosts or problems need to be exported in parallel. This parallel execution reduces the overall export time, especially when dealing with a large number of resources;
  • Sending alerts from or to your system: When certain actions need to be performed based on monitoring conditions, such as sending alerts or running scripts, asynchronous programming provides rapid condition processing and execution of corresponding actions;
  • Scaling the monitoring system: With an increase in the number of monitored resources or the volume of collected data, asynchronous programming provides better scalability and efficiency for the monitoring system.

Installation and Configuration

If you already use the zabbix_utils library, simply updating the library to the latest version and installing all necessary dependencies for asynchronous operation is sufficient. Otherwise, you can install the library with asynchronous support using the following methods:

  • By using pip:
~$ pip install zabbix_utils[async]

Using [async] allows you to install additional dependencies (extras) needed for the operation of asynchronous modules.

  • By cloning from GitHub:
~$ git clone https://github.com/zabbix/python-zabbix-utils
~$ cd python-zabbix-utils/
~$ pip install -r requirements.txt
~$ python setup.py install

The process of working with the asynchronous version of the zabbix_utils library is similar to the synchronous one, except for some syntactic differences of asynchronous code in Python.

Working with Zabbix API

To work with the Zabbix API in asynchronous mode, you need to import the AsyncZabbixAPI class from the zabbix_utils library:

from zabbix_utils import AsyncZabbixAPI

Similar to the synchronous ZabbixAPI, the new AsyncZabbixAPI can use the following environment variables: ZABBIX_URL, ZABBIX_TOKEN, ZABBIX_USER, ZABBIX_PASSWORD. However, when creating an instance of the AsyncZabbixAPI class you cannot specify a token or a username and password, unlike the synchronous version. They can only be passed when calling the login() method. The following usage scenarios are available here:

  • Use preset values of environment variables, i.e., not pass any parameters to AsyncZabbixAPI:
~$ export ZABBIX_URL="https://zabbix.example.local"
api = AsyncZabbixAPI()
  • Pass only the Zabbix API address as input, which can be specified as either the server IP/FQDN address or DNS name (in this case, the HTTP protocol will be used) or as an URL of Zabbix API:
api = AsyncZabbixAPI(url="127.0.0.1")

After declaring an instance of the AsyncZabbixAPI class, you need to call the login() method to authenticate with the Zabbix API. There are two ways to do this:

  • Using environment variable values:
~$ export ZABBIX_USER="Admin"
~$ export ZABBIX_PASSWORD="zabbix"

or

~$ export ZABBIX_TOKEN="xxxxxxxx"

and then:

await api.login()
  • Passing the authentication data when calling login():
await api.login(user="Admin", password="zabbix")

Like ZabbixAPI, the new AsyncZabbixAPI class supports version getting and comparison:

# ZabbixAPI version field
ver = api.version
print(type(ver).__name__, ver) # APIVersion 6.0.29

# Method to get ZabbixAPI version
ver = api.api_version()
print(type(ver).__name__, ver) # APIVersion 6.0.29

# Additional methods
print(ver.major)     # 6.0
print(ver.minor)     # 29
print(ver.is_lts())  # True

# Version comparison
print(ver < 6.4)        # True
print(ver != 6.0)       # False
print(ver != "6.0.24")  # True

After authentication, you can make any API requests described for all supported versions in the Zabbix documentation.

The format for calling API methods looks like this:

await api_instance.zabbix_object.method(parameters)

For example:

await api.host.get()

After completing all needed API requests, it is necessary to call logout() to close the API session if authentication was done using username and password, and also close the asynchronous sessions:

await api.logout()

More examples of usage can be found here.

Sending Values to Zabbix Server/Proxy

The asynchronous class AsyncSender has been added, which also helps to send values to the Zabbix server or proxy for items of the Zabbix Trapper data type.

AsyncSender can be imported as follows:

from zabbix_utils import AsyncSender

Values ​​can be sent in a group, for this it is necessary to import ItemValue:

import asyncio
from zabbix_utils import ItemValue, AsyncSender


items = [
    ItemValue('host1', 'item.key1', 10),
    ItemValue('host1', 'item.key2', 'Test value'),
    ItemValue('host2', 'item.key1', -1, 1702511920),
    ItemValue('host3', 'item.key1', '{"msg":"Test value"}'),
    ItemValue('host2', 'item.key1', 0, 1702511920, 100)
]

async def main():
    sender = AsyncSender('127.0.0.1', 10051)
    response = await sender.send(items)
    # processing the received response

asyncio.run(main())

As in the synchronous version, it is possible to specify the size of chunks when sending values in a group using the parameter chunk_size:

sender = AsyncSender('127.0.0.1', 10051, chunk_size=2)
response = await sender.send(items)

In the example, the chunk size is set to 2. So, 5 values passed in the code above will be sent in three requests of two, two, and one value, respectively.

Also it is possible to send a single value:

sender = AsyncSender(server='127.0.0.1', port=10051)
resp = await sender.send_value('example_host', 'example.key', 50, 1702511920))

If your server has multiple network interfaces, and values need to be sent from a specific one, the AsyncSender provides the option to specify a source_ip for sent values:

sender = AsyncSender(
    server='zabbix.example.local',
    port=10051,
    source_ip='10.10.7.1'
)
resp = await sender.send_value('example_host', 'example.key', 50, 1702511920)

AsyncSender also supports reading connection parameters from the Zabbix agent/agent2 configuration file. To do this, you need to set the use_config flag and specify the path to the configuration file if it differs from the default /etc/zabbix/zabbix_agentd.conf:

sender = AsyncSender(
    use_config=True,
    config_path='/etc/zabbix/zabbix_agent2.conf'
)

More usage examples can be found here.

Getting values from Zabbix Agent/Agent2 by item key.

In cases where you need the functionality of our standart zabbix_get utility but native to your Python project and working asynchronously, consider using the AsyncGetter class. A simple example of its usage looks like this:

import asyncio
from zabbix_utils import AsyncGetter

async def main():
    agent = AsyncGetter('10.8.54.32', 10050)
    resp = await agent.get('system.uname')
    print(resp.value) # Linux zabbix_server 5.15.0-3.60.5.1.el9uek.x86_64

asyncio.run(main())

Like AsyncSender, the AsyncGetter class supports specifying the source_ip address:

agent = AsyncGetter(
    host='zabbix.example.local',
    port=10050,
    source_ip='10.10.7.1'
)

More usage examples can be found here.

Conclusions

The new version of the zabbix_utils library provides users with the ability to implement efficient and scalable monitoring solutions, ensuring fast and reliable communication with the Zabbix components. Asynchronous way of interaction gives a lot of room for performance improvement and flexible task management when handling a large volume of requests to Zabbix components such as Zabbix API and others.

We have no doubt that the new version of zabbix_utils will become an indispensable tool for developers and administrators, helping them create more efficient, flexible, and reliable monitoring solutions that best meet their requirements and expectations.

The post Make your interaction with Zabbix API faster: Async zabbix_utils. appeared first on Zabbix Blog.