Tag Archives: zabbix 5.4

Scheduled report generation in Zabbix 5.4

Post Syndicated from Arturs Lontons original https://blog.zabbix.com/scheduled-report-generation-in-zabbix-5-4/14776/

The release of version 5.4 grants Zabbix users the ability to receive scheduled PDF reports in their mailbox, which is a very sought-after feature. This post and the video will cover all-new report-related configuration parameters and walk you through setting up scheduled report generation.

Contents

I. Reporting in Zabbix 5.4 (0:45)
II. Scheduled reports (2:26)

III. Questions & Answers (13:28)

Reporting in Zabbix 5.4

Zabbix 5.4 is our first big step in bringing out-of-the-box reporting for our end users. With this feature, we now have a foundation to build upon in the future and make reporting more robust and versatile over time. Since reports are 100% based on dashboard widgets, it’s only a matter of time until more report-focused widgets get released, thus enabling not only better dashboards, but also improving the reporting functionality.

  • We have implemented a new web service component responsible for generating reports — of course, you can install this server in a quick and easy fashion by using the provided packages.
  • Reporting works out of the box without the need to deploy or develop any custom scripts.
  • The initial configuration is easy to understand and implement.
  • Reporting will use the existing Email media types to send out these reports.
  • The reports do respect your permissions, as well as roles introduced in Zabbix 5.2.
  • You will be able to test the report before implementing it as per our schedules just by clicking the Test button.

Scheduled reports

We have added a new  Scheduled report section, where the list of reports is available, displaying the report Name, their Owner, Repeats (daily, weekly, etc.), the Period for which the report is generated, and the Last sent date.

Scheduled reports

NOTE. When you configure new reports, and they have not been sent out yet, the Last sent date will be set to ‘Never.’

Creating a report

When you create a report, you will also have to fill in a couple of fields:

  • Owner,
  • Name of the report,
  • Dashboard, the report will be based on,
  • Period — if you send the report for the Previous day, Previous week, Previous month, of Previous year,
  • Cycle — how often you send the report Daily/Weekly/Monthly/ Yearly,
  • Start time (Zabbix server time is used here),
  • Start date and end date.

Creating reports

Receiving a report

When you receive a PDF report to your mailbox, you can also use the {TIME} macro to display server time both in the subject and the body of the message.

Receiving a report

In the PDF report, you can display any information from the included dashboard – Graphs, Problems, Latest data, and much more. Thanks to all of the available widgets, we will be able to customize our reports in a very granular fashion.

Receiving a report example

The report does respect user permissions. So, in the example above, the report shows only the data to which the user (either the recipient or the report creator) has access.

Permissions

After upgrading to Zabbix 5.4, you will see two new options in the User roles section:

  • Scheduled reports UI element. Under the UI elements, you can grant or deny access to the Scheduled reports section. This is accessible only to Super admin and Admin user Types.

Permissions

If the Scheduled reports UI element is unchecked for the role, the user won’t be able to access the Scheduled report section and will see an error message. The same behavior is true if you use a URL to access the Scheduled reports.

Access to scheduled reports denied message for the users of a user role

You can also manage scheduled report permissions in the Access to actions section by checking the Manage scheduled reports box. This action permission grants or denies the ability to create or edit scheduled reports and is also accessible to Admin and Super admin user types.

Manage scheduled reports

If this check box is unchecked, the users won’t be able to create new or edit existing reports, though they will be able to access the UI section and see the list of reports and how they are configured.

Access to Manage scheduled reports restricted

Recipients of scheduled reports

When you are defining a new report, you can select the recipient. Report subscription can contain a user or a user group.

  • When selecting a user, you can specify to include or exclude the user from the subscription.
  • User group to host group permissions still apply.
  • You can specify which user is going to be generating the report – recipient or the creator of the report.:

Report recipients

For example, if we need to send some extra information to our NOC team that might not be directly available to them, you can select Current user, and the report will be generated with the permissions of the report creator. Since it is the admin that is creating the report, you can add some extra information that wouldn’t be visible to your NOC team or other regular users. They still won’t be able to access it in Zabbix, but they’ll receive it in their mailbox if you configure the report for them.

Report prerequisites

Diving a bit deeper into the technical side of things, we need to set up two additional packages to enable the reports:

  • zabbix-web-service — the additional reporting service by default listening to port 10053. The service needs to be reachable from the Zabbix server and can be deployed on the same machine as our frontend or our server. We also have the option to deploy it on a completely separate machine. The zabbix-web-service package should be available if you have added the Zabbix repository.
#yum install zabbix-web-service
  • Google Chrome is required. However, on some distributions, Chromium is reported to also work, though this is not 100% tested. Note that Google Chrome packages are not included in Zabbix. The Google Chrome packages can simply be downloaded from the Google Chrome website and then installed on the zabbix-web-service host.
#wget https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm

#yum install google-chrome-stable_current_x86_64.rpm

Configuring reports — Web service

We have a whole new configuration file for the web service. Web service supports many different configuration options:

  • Logging — similar to that for server and for proxy. You can set up debug levels, select the log types, rotations, and so on.
### Option: LogType –system (syslog), file, console (standard output)
### Option: LogFile–Log file location
### Option: LogFileSize -Size in MB before rotation
### Option: DebugLevel –0 -5
  • List of allowed server addresses that can access this web service.
### Option: AllowedIP List of comma delimited IP addresses, optionally in CIDR notation, or DNS names of Zabbix servers
  • Timeout settings
### Option: Timeout -Spend no more than Timeout seconds on processing (Default –3)
  • Listen port
### Option: ListenPort -Service will listen on this port for connections from the server
(Default -ListenPort=10053)
  • Encryption settings by using certificates. This way the communication with the web service can be secured.
### Option: TLSAccept –unencrypted or cert
### Option: TLSCAFile–pathname of a file containing top level CA(s) certificates
### Option: TLSCertFile–pathname of a file containing the service certificate
### Option: TLSKeyFile–pathname of a file containing the service private key

Configuring reports — Server

In addition, the server settings now contain report-related parameters:

  • The number of report writer instances.
### Option: StartReportWriters -Number of pre-forked report writer instances.
(Default –0)

NOTE. You need to have at least one StartReportWriter

NOTE. The number of the necessary report writers will depend on the number of reports and how often you generate them.

  • Zabbix Web Service URL (to be passed on to the server)
### Option: WebServiceURL -URL to Zabbix web service, used to perform web-related tasks. (No default value)
#Example: http://192.168.1.156:10053/report

You need to make sure that we can communicate with the Zabbix Web Service URL and permit the incoming traffic through this port to the web service.

Configuring reports — Frontend

As the last step, you need to enable communication between the frontend and the web service.

In Administration > General > Other, we have a new configuration parameter where you need to specify your frontend URL that will be reachable by the web service.

Frontend URL

Once this is done, we can create a report.

Reports — testing

After you have created the report, you can test it. You can click the Test button and send out your test report to see if it works. The users to which we’re sending the report need to have an Email media assigned to them in the User settings.

NOTE. Currently, {TIME} macros are resolved only with the scheduled generation and are not available in test reports, though this might change in the future.

Testing reports

Common issues

Some parameters can certainly be misconfigured, so let’s look at the most common issues:

  • Make sure that you have a properly configured Email media assigned to the user that should be receiving the report. Otherwise, they will fail to receive it.

— Make sure that the Email media type settings are properly configured.
— Once you define the media type, if you’re creating it from scratch, make sure that you test the media type and generate a test report.

Media configuration failed

NOTE. Sending out the report failed in this example siince no media is configured for the report recipients.

  • Make sure that the correct Web service address is configured on the Zabbix server in the WebServiceURL parameter.

— Confirm that the Zabbix server can connect to the Zabbix web service and that so that we can connect to the specified port/IP address.
— Check your firewall settings if the web service is running on a dedicated machine.
— Make sure that third-party security software, such as SELinux or firewalls don’t block the communication.

Wrong WebServiceURL parameter

Otherwise, you will receive an error message on the Frontend. The error messages should be sufficient enough to point you in the right direction.

  • Make sure that the Web service URL is configured without any typos. Otherwise, you will reach the web service, but the report page will output an error — ‘404 page not found’.
WebServiceURL=http://192.168.1.156:10053/reportwrong

Typos in configuration error message

NOTE. If you see this error message, check for typos in the Zabbix server configuration file for WebServiceURL.

  • Don’t forget to assign the Frontend URL in Administration > General > Other.

— If a URL is misconfigured, you might start receiving empty reports.
— If the URL syntax is wrong, you will receive an error message about the malformed URL.

Malformed URL error message

Frontend URL configuration parameter

  • Google Chrome is not pre-packaged with Zabbix,

— You need to have Google Chrome package installed separately. You can download Google Chrome from the official Google Chrome website, for instance, by using wget.

— Make sure that Google Chrome is available via $PATH environmental variable. If you don’t have it configured, you will receive the error message, so you will need to modify the path variable and make sure the executable is available there.

$PATH environmental variable error

Questions & Answers

Question. What are the possibilities to customize the page size like A4, A3?

Answer. It will be based on how you customize your Dashboard. Currently, you cannot customize the page and select portrait or landscape, for instance.

 

Scalability improvements

Post Syndicated from Sergey Simonenko original https://blog.zabbix.com/scalability-improvements/14832/

New improvements might be unnoticed by many Zabbix users since they come to scalability, rather than to new features or some aspects of the user interface experience. However, these improvements might be beneficial for those Zabbix users who run really large instances.

Contents

I. More efficient database use (1:15)

1. New worker processes (3:03)

2. In-memory trend cache (4:49)
3. More server resiliency (7:35)

II. Questions & Answers (10:54)

In case of large instances, the main performance bottleneck would be the database. Zabbix doesn’t establish ad-hoc connections and uses only persistent connections to the database. In Zabbix 5.4, the use of database connections has been further drastically optimized.

More efficient database use

  • In earlier versions, not only database syncers, but also pollers, and some other processes had a dedicated persistent connection to the database. These connections were necessary for calculated items and aggregate checks. These calculated items and aggregate checks are not real items, since they’re based on the queries to the database, particularly to history tables.

Connections were also required to update host availability status. Pollers (unreachable pollers, JMX pollers, as well as the IPMI manager) were updating it directly in the database.

  • In addition, in some cases, when proxies were used (that would be true for large instances) host availability was updated by the proxy poller, in case of a passive proxy, and trapper.

Why was it decided to avoid these connections in Zabbix 5.4?

  • First, they don’t really work smoothly with the default database configuration (PostgreSQL, Oracle). For instance, in PostgreSQL, max_connections is by default set to 100.
  • They can cause locking on the database side.
  • They also result in inefficient memory and CPU utilization.
  • Finally, in earlier versions, it was impossible to perfectly fine-tune the number of connections to the database.

New worker processes

In Zabbix 5.4, two new processes were introduced: history pollers and availability manager. If you have upgraded your Zabbix instance already when you log onto your server and run ps aux | grep zabbix_server, you will notice these new processes:

/usr/sbin/zabbix_server: history poller #1 [got 0 values in 0.000008 sec, idle 1 sec] 
/usr/sbin/zabbix_server: history poller #2 [got 2 values in 0.000186 sec, idle 1 sec] 
/usr/sbin/zabbix_server: history poller #3 [got 0 values in 0.000050 sec, idle 1 sec] 
/usr/sbin/zabbix_server: history poller #4 [got 0 values in 0.000010 sec, idle 1 sec] 
/usr/sbin/zabbix_server: history poller #5 [got 0 values in 0.000012 sec, idle 1 sec] 
/usr/sbin/zabbix_server: availability manager #1 [queued 0, processed 0 values, idle 5.016162 sec during 5.016415 sec]

History pollers

Since calculated items and aggregate checks represent a different types of items, now they have their own poller – history poller. History pollers are also used for several internal items (zabbix[*] item keys) as well.

New configuration parameters

History poller comes with a new configuration parameter. Here, it is important to keep in mind that more is not always better. So, the StartHistoryPollers value (how many history pollers are being pre-forked) should be increased only if history pollers are too busy according to internal self-monitoring, but should be kept as low as possible to avoid unnecessary connections to the database.

### Option: StartHistoryPollers
#     Number of pre-forked instances of history pollers.
#     Only required for calculated, aggregated and internal checks.
#     A database connection is required for each history poller instance.
#
# Mandatory: no
# Range: 0-1000
# Default:
# StartHistoryPollers=5

Availability manager

In earlier versions, pollers, unreachable pollers, JMX pollers, and the IPMI manager updated host availability directly in the database with a separate transaction for each host. Now, we have this separate availability manager, and all processes — pollers, trappers, etc. — communicate with the availability manager, and the statistics queue is flushed by the availability manager to the database every 5 seconds.

In-memory trend cache

Since Zabbix 5.2, new trigger functions like trendavg, trendmax, etc. were introduced, which operate with the trends data for long periods. Similarly to calculated items, these triggers used database queries to obtain the necessary data.

In Zabbix 5.4, finally, the trend cache has been implemented. It stores the results of calculated trends functions. If the value is not available in the cache yet, Zabbix will query the database and update the cache.

As with all newly introduced processes, this cache’s effectiveness can be monitored using internal check zabbix[tcache,cache,], which can be used to set the relevant TrendFunctionCacheSize parameter value.

### Option: TrendFunctionCacheSize
#           Size of trend function cache, in bytes.
#           Shared memory size for caching calculated trend function data.
#
# Mandatory: no
# Range: 128K-2G
# Default:
# TrendFunctionCacheSize=4M

To sum it up, with all these database-related optimizations:

  • Now it is possible to have as many database connections as you really need. So, if you, for instance, operate a very large instance and you need a hundred or more pollers, and at the same time, you don’t rely much on some calculated items or aggregate checks functionality, before Zabbix 5.4 you would end up with hundreds or more database connections that you didn’t need.

Moreover, with PostgreSQL with default configuration, if you increased the number of pollers, your database server could go down and bring down your Zabbix instance. For each PostgreSQL worker process, you would have had a limited work_mem as you had too many database connections. So, your overall database performance would have been sacrificed. That is not the case anymore.

  • In addition, if you are using trend functions with triggers using large periods of time, in the past you might have noticed, for instance, slow queries. Now, these changes will help you to drastically decrease the database load.

More server resiliency

  • Another important feature — graceful start. Active proxies can keep a backlog, which is useful if the communication between the server and the proxy breaks for any reason, for instance:

— server maintenance during upgrade to the next minor release;
— loss of Internet access at a remote site due to fiber cut, etc.

When communication restores, the proxies can easily overload the server after long downtimes, especially in large instances.

  • Since Zabbix 5.4, the server lets the proxies know if it’s busy, so the proxies throttle data sending.

Earlier, the data uploaded by the proxies was throttled when the history cache usage was 80% or greater. However, as the server was responsible for that task, all proxies were getting disabled in some situations. That meant the history data upload, as well as other tasks, such as processing of regular data and processing tasks, were getting suspended until the history cache utilization dropped lower than 80%.

This method was ineffective and unacceptable in large environments. Now, the proxies are responsible for checking whether the server can handle the data. When the history cache usage hits 80%, the following scenario is used:

  • the proxies send the data to the server and the data is accepted;
  • if the server thinks it’s busy it will respond with a special JSON tag upload set to ‘disable’;
  • the proxies will stop uploading history data, but will keep polling the server for tasks and uploading other data;
  • in a while, the proxies will upload data again;
  • if the server is not too busy, it will respond with the JSON tag upload set to ‘enable’.

Unlike the previous two scalability improvements which are based on serious architectural changes, this change was backported to earlier Zabbix versions — 5.0 and 5.2.

Questions & Answers

Question. Would you recommend using proxies even on the local site to allow for the server to be upgraded without losing data or for performance improvements?

Answer. Yes, in some cases there’re such setups. This idea mainly is to have a unified configuration, not only to improve performance. And in some cases, if you use a lot of proxies, you might want to monitor all the items only with the proxies. Such scenarios are used by many Zabbix customers.

Question. So, throttling can give you some noticeable performance benefits. Which version is required on the server and on the proxy for throttling?

Answer. All these changes have been backported to earlier versions, so you can use either Zabbix 5.4.0 released recently or the latest releases of Zabbix 5.0 or Zabbix 5.2.

Question. Is it possible to have two databases in a cluster and point the select queries to one database and, for instance, execution queries to another database? How would database clustering generally work? Is it of benefit to Zabbix? Can Zabbix utilize it?

Answer. In general, our HA setups use some basic features, which are built-in into database servers. They use replication. So, you have to use the servers that will provide some virtual IP for your cluster. That is completely transparent to Zabbix.

However, it is not recommended to split different queries on different nodes. They should still hit a single specific note. So, it is more of an HA approach rather than a horizontal scalability approach.

Question. Would you elaborate on what a large, or medium, or small instance means? What new values per second should we be looking at?

Answer. We can judge from large instances of our customers, and might not know about even larger instances managed by the customers themselves. Large instances can have, for instance, 100,000 NVPS and more. Sometimes, we upgrade really large instances with databases of dozens of terabytes. Some users like keeping really long records.

In my experience, large instances of 20,000 to 40,000 NPVS are quite common and they could benefit a lot from these changes.

Triggers, calculated and aggregated items in Zabbix 5.4

Post Syndicated from Aigars Kadiķis original https://blog.zabbix.com/triggers-calculated-and-aggregated-items-in-zabbix-5-4/14880/

In earlier Zabbix versions, we had three categories of things we could manage inside the instance — triggers, calculated items, and aggregated items. Each of them had its own syntax, so we could not be sure what syntax to use in a certain case. That’s why we introduced the unified syntax for every category inside the monitoring tool that will ease up the documentation task and the configuration process. To appreciate the innovation, we need to recap these three categories in Zabbix.

Contents

I. What’s different in Zabbix 5.4? (1:52)
II. Examples (9:20)
III. Aggregated checks (14:09)
IV. Questions & Answers (19:08)

The trigger is a logic implying calculating some formula. If it is true, it will generate an event, a so-called problem. At the same time, calculated items are doing calculations as well, but at the end, they store a value inside the database. This value will be either an integer or a floating number. So, both these things are doing calculations, but before Zabbix 5.4 the syntax was different.

What’s different in Zabbix 5.4?

Each item has a tag:tagvalue

It all starts almost with the item as it is how we collect the metrics. Previously, it was always an application type of thing. Whenever we designed an item, it could be a memory-type of thing, a network, CPU, etc. Now, we have these column tags.

TAG:TAGVALUE

As you can see in this example, the tag value is ‘Application’. This really lets us have a flawless upgrade operation. The ‘Application’ field is completely gone, and now we have only tags on this item level.

Period and time shift is one argument now

Previously, whenever we were dealing with trigger functions, most of the time, the first argument was the interval to be used as an input. We had two arguments: we could analyze metrics for the specified period with <period> and, to go back in time and use <time shift> after comma — <period>,<time shift>.

We have decided to use one argument — <period:time shift>.

If you want to analyze data for yesterday, or last week, or last year, you can now use one argument:

  • <1h> — during the last one hour,
  • <1h:now-1d> — during the same hour one day ago,
  • <1d:now/d> — yesterday during the day.

STR(), REGEX(), IREGEX() => FIND()

Another big change — the functions searching for the data containing a string will now be converted into one function — find().

So, the string function — STR(), which was less CPU-intensive, could search for a string. If we needed a more complex pattern, we used REGEX() or, in case-sensitive cases, the regular expression function — IREGEX(). Now, we can use a single function covering all the needs, including case-sensitive search, — find().

It contains more arguments, which specify a string way to search, and consumes less CPU power. Or we might really need to utilize the regular expression thing.

New syntax

So, in the earlier Zabbix version, the trigger had a curly parenthesis at the beginning and at the end, the key in the middle, and the trigger function mathematical calculations with the dot in the middle: {host:key.max(15m)}.

The unified syntax will always start with the function. This lets us use recursions — as some functions support multiple arguments, we can put a function inside the function thus eliminating the previous limitations: max(/host/key,15m).

This syntax is very similar to that used in the Linux file system — it starts with the forward slash, then comes the host name, and the item key with the interval (including shifting) after the first comma.

Now, the same syntax is used for triggers and calculated items. In addition, previously, we did have the aggregated items. Now, if you want to do some aggregation, you’ll have to use calculated items inside the interface.

Examples

Now, there are many things you can accomplish, which were not possible before.

Absolute time periods

We might need to know what has happened today, for instance, whether the workstation is on at 10 a.m. We can do it by summing up the agent.ping items. So, if the agent.ping is zero, then it is still offline and no one has turned the workstation on. Otherwise, the agent.ping will be 1.

sum(/host/key, 1d:now/d+1d) = 0 and time() > 100000

  • We might need to find out the maximum workstation uptime yesterday. If the uptime is bigger than 10 hours, a person might have been working at the computer for too long and might need a reminder that there’s life outside.

trendmax(/host/key, 1d:now/d)>10h

  • When analyzing user sessions, we might search for the maximum peak last week. To do that, here we compare the last week’s peak with yesterday’s peak and find out that it has doubled. So, it’s a problem and we can get an event.

trendmax(/host/key, 1d:now/d) > trendmax(/host/key, 1d:now/w) / 2

  • We can also analyze activity, for instance, the stock module memory utilization per Windows machine for the previous month. So, we are looking at the maximum memory used during the previous month per server, not the workstation, and then compare the used memory with the memory available.

trendmax(/host/key1, 1M:now/M) < last(/host/key2) / 2

Here, the server is not utilizing half of the available memory, so we will get notified.

The beauty of this trigger is that the stock template already contains all those 12 items — a collection per-used memory, a collection per-total memory. All we need to do is go to the Trigger section, install this trigger, and as soon as one single metric comes in, it will generate the event about the last month as all the data is stored inside the database.

Compare string between two hosts

Now, we can compare text strings. We can use a different type of input, for instance, different servers (say, node 1, node 2), and compare the agent versions. If the version is different or the same, you can decide what the problem is. It will fire up an event, and we can indicate what the difference between those versions is in the event title.

last(/node1/agent.version) <> last(/node2/agent.version)

Aggregated checks

Aggregation on current host (foreach)

You might need to calculate something and store the data in the database.

In this example, if we have a website. we will have a template containing a web scenario, which consists of multiple steps, such as checking for pages. It will provide the Latest data page and the response time per page.

In the Templates, we can select this calculated item, and, for instance, aggregate all those built-in response time metrics.

Average web page response time: avg(last_foreach(//web.test.time[check performance,*,resp]))

The ‘*’, the wildcard, will tell Zabbix to aggregate all the items reflecting the response time, and the double forward slash means aggregation for the current host. ‘foreach’ will now be all over the place whenever we do the aggregation. This command is the same as used in the Windows PowerShell used to go through the different types of elements: either through the hosts, either through the items.

We might use a different type of aggregation using the same approach. For instance, when monitoring the disk space, we are capturing the used space by the Linux or Windows machine that can have different drives or different modes. Using one calculated item, we can aggregate the total used space per all the mounts or all the drives on the server. This might be useful, for instance, to estimate how much disk space you need if it’s time to purchase an SSD drive.

Total used space on all drives: sum(last_foreach(//vfs.fs.size[*,used]))

Aggregation by host group

Another type of functionality — aggregation by the host group.

You might have a group, for instance, ‘MySQL servers’ with the official solution looking for the queries per second. It will aggregate all the queries per second in one single item. If there is a pool of MySQL servers, then you will end up seeing the total number of these queries and afterward linking a trigger on this item.

MySQL queries per second: sum(last_foreach(/*/mysql.queries.rate?[group=”MySQL servers”]))

Aggregation by tag:value

You can do the aggregation not only by host group but also by utilizing the item Tags (tag name and tag value). It does not need to be a tag on the item level. You can mark your host element with the role on the host level. So, you create this item inside the template, and it will search for all these items, which belong to a specific host with this tag and tag value, and do the aggregation. You will end up with an item, which has used space metrics (domain controllers in this example).

Used space on domain controllers: sum(last_foreach(/*/vfs.fs.size[*,used]?[tag=”Role:Domain Controller”]))

Questions & Answers

Question. After the upgrade, what will happen with the trigger item syntax? Do we have to rebuild everything?

Answer. It gets upgraded automatically. However, as per the results of my testing in the test environment, some items have to be checked. I would highly suggest extracting the configuration and testing the upgrade process beforehand to learn how those items will affect the system (just to be on the safe side).

Question. Are there any changes or improvements in the predictive trigger syntax?

Answer. You should definitely take a look at the roadmap for the exact information.

Question. When we’re doing an aggregation, is it always going to be on a single host or can we specify a host group or a tag?

Answer. Surely. Aggregation by host group is possible as it was before. Still, it’s more flexible now. We need to specify a host, a wildcard for the host item, and then we can add additional value by using the host group or by specifying what kind of tag the item should have. Then your parameters will be respected. We can combine all those things, that is, filter by host group and by the tag name and tag value.

In addition, when we’re doing the upgrade, the prototype items and triggers for the low-level discovery rules are also going to be automatically switched over.

 

What’s new in Zabbix 5.4

Post Syndicated from Alexei Vladishev original https://blog.zabbix.com/whats-new-in-zabbix-5-4/14603/

Zabbix 5.4.0 released on May 17, 2021 — a non-LTS release that will be supported only for 7-8 months, has already received a lot of attention from our users, our community, and our customers due to a number of very significant and long-anticipated improvements. Zabbix 5.4.0 release comes with scheduled PDF report generation, robust problem detection, advanced data aggregation, and other significant improvements.

Contents

I. Reporting and visualization (1:22)
II. More powerful and simple (9:04)
III. Breaking changes (37:59)
IV. Upgrade notes (40:12)
V. Questions & Answers (44:01)

Reporting and visualization

Unification of screens

In Zabbix 5.2, we introduced pre-defined views for Problems. By accessing Monitoring > Problems, you may create different views based on different filtering options, allowing you to filter problems by certain criteria, and then save this filter as a separate view. You can easily switch between views with one click, for instance, between ‘All problems‘, ‘Services‘ (i.e. service-related problems), or ‘High severity problems‘ in this example.

Pre-defined views for Monitoring > Problems

In Zabbix 5.4, we implemented the unification of screens and dashboards. This means that screens are not supported anymore. In Zabbix 5.2, we had Dashboard and Screens on the menu. In Zabbix 5.4, the Screens functionality was moved to Dashboards, where all screens and all dashboards are available, which makes the workflow much more simple and user-friendly.

New Dashboard menu time

This change affected global screens, as well as local screens, which we always had on a template level. Now, we have introduced the dashboards for templates in Configuration > Templates.

For instance, we have the template for Nginx performance monitoring, so in Configuration > Templates, we have a dashboard ‘Nginx performance’. In Monitoring > Hosts, we have two dashboards for the host (cdn.example.com in this example) — ‘Nginx performance’ and the second template that may have come from some other template.

Templates for Dashboards

By clicking Dashboards for this specific host, you can go to the Dashboards view of this host.

Dashboards view for a specific host

Here, you can quickly switch between templates available for the host at the moment, such as ‘Nginx performance’ and ‘System performance’ in this example to see some Linux OS-specific metrics, such as CPU load, Disk I/O, and so on.

Multi-page dashboards

Previously, we had a very nice feature in Zabbix — slideshows or screen slideshows. Since we have moved everything to Dashboards, we spent much time thinking about how to fit it into the Dashboard functionality and found a very good solution. We introduced Multi-page Dashboards – dashboards containing several pages.

Multi-page Dashboards

In this example, in the Zabbix ‘Server performance‘ dashboard, you can see CPU memory metrics, matrix or graphs related to network performance, or any other page that can be created by clicking Add page and defining the Dashboard page properties. Then you’ll see all the pages available on the top of the Dashboard and switch easily between them. You can run the slideshow with the slideshow controls available in the full-screen mode.

Slideshow mode

Scheduled PDF reports

Scheduled reporting was highly anticipated by our community members, our customers, and our users. In Zabbix 5.4, we introduced scheduled PDF reporting allowing you to define, generate, and send PDF reports straight to your inbox.

Scheduled PDF reporting

This new functionality provides some nice features, for instance:

  • Centralized management of reports, so that super admins can see what reports are generated by Zabbix and sent to different users.
  • Any dashboard can be converted to a PDF report and sent to your email box.
  • This functionality is accessible to all users though can be restricted by a new user role.

In addition, now you can determine that you need a report, for instance, with the previous week’s data every Monday at 7.00 am. All you need is to select the period for the report, and Zabbix will generate and email it to you or any other user. You can also select to receive reports daily, monthly, or yearly.

PDF reports can be scheduled daily, weekly, monthly, or yearly

There are many other configuration parameters for PDF reports. Still, the most important advantage of PDF reporting is that this functionality is accessible from Dashboards. You just click Dashboard in Monitoring > Dashboards and select the period for PDF reporting.

PDF reporting accessible from Dashboards

 

More powerful and simple

When we think about what features need to be implemented in Zabbix and in what direction we would like Zabbix to go, we always consider different functionalities and improvements aimed at improving Zabbix usability and making Zabbix a simpler monitoring solution, and on the other hand, more powerful and much more flexible. Zabbix 5.4 is not an exception. We have introduced a number of very significant improvements, which simplify monitoring and make Zabbix even more flexible.

Tags for items

Zabbix already supports tags for almost all essential objects, such as triggers, hosts, host prototypes, and templates. Tags are everywhere. In Zabbix 5.4, we introduced tags for items (metrics).

The item-level applications have now been replaced with tags, so applications are not supported anymore. We now use a much more flexible concept of named tags. This way you can have tags providing information and values while having as many tags as you want, which is much more flexible comparing to applications.

You will notice that in the configuration view of your item you now have an additional tab — Tags where you can see the defined tags, as well as their values.

Tags instead of applications

You don’t have to worry about your applications though. Your applications will be automatically converted to the tag “Application: <app name>” during the upgrade. For instance, the application ‘CPU’ will be converted to the tag “Application: <CPU>”. All information will be preserved during the upgrade.

Syntax

Another very interesting functionality, about which we have been thinking for several years, is a new syntax for trigger expressions. We now have unified syntax for everything in Zabbix. Let’s talk about why this is an important step forward.

In previous versions we used a special syntax for triggers, some functional syntax for calculated items, and a different syntax for aggregated items:

  • TRIGGERS: {host:key.func(params)}>0
  • CALCULATED: 100*last(“vfs.fs.size[/,free]/last(“vfs.fs.size[/,total]”)
  • AGGREGATE: grpsum[“MySQL Servers”,”vfs.fs.size[/,total]”,last]

This was quite confusing, and users had to remember the syntax or consult the documentation. In Zabbix 5.4, we introduced a new and, most importantly, unified syntax. Now we use exactly the same syntax for triggers, calculated items, and aggregated items. In addition, this new syntax is more functional, while the old one was more object-oriented.

  • TRIGGERS: func(/host/key, params)>0
  • CALCULATED: sum(/host/vfs.fs.size[*,free], 10m)
  • AGGREGATE: min(avg_foreach(/*/qps?[group=“PostgreSQL” and tag=“Env:Production”], 5m))

The new syntax has a number of advantages. The new unified syntax:

  • is much simpler and unified for everything: triggers, calculated and aggregated items;
  • supports absolute time periods, such as last day, previous hour, etc. So, now, it’s easy to calculate, for instance, the minimum for the previous hour or an average value for the previous or the current day;
  • is free from any limitations of the old syntax;
  • allows for powerful aggregations and selection of items by tags utilizing wildcards, etc.;
  • allows for a function to be applied to results of other functions: func1(func2(item));
  • allows for multiple items as function arguments: min(item1, item2);
  • supports calculated metrics for everything working around the old limitations.

New set of functions

We have also introduced new sets of functions:

https://www.zabbix.com/documentation/current/manual/appendix/functions

NOTE. If you think that something is missing or you’re not able to represent a problem condition using the new syntax or a new function, just let us know. 

API tokens

Finally, we have added the support of per-user named API tokens with expiry dates. We have introduced the new user role — access to API tokens, and now any user can generate a private API token in Zabbix for some specific use. All tokens can be managed globally by super admins with appropriate permissions. Now we have a very understandable way to define which users have permissions to operate with API tokens and which users don’t.

So, if you are an ordinary user, you may go to User settings, click API tokens, and you will see your tokens with the Name, optional expiry date, the time it was last accessed, and the status — Any, Enabled, or Disabled.

API token properties

API tokens can be created by any user with appropriate permissions. Super admins can select the user.

API tokens created by any user with permissions

NOTE. You can use the tokens for any integrations from the Zabbix API. You are not forced to use the username and password to start using the API, you just need to generate a token — copy the token to the clipboard (it will not be visible later), store it in a secure location, and then reuse it.

All tokens generated by different users can be managed by super admins with sufficient permissions to review the tokens.

API tokens managed globally by super admins

Easy-to-manage templates

In Zabbix, it is not that simple  to update templates. When you make some adjustments, for instance, create new items, new triggers, or add any other entity, and it is not  easy to make an update as you don’t quite understand the final impact of the changes.

In Zabbix 5.4, we introduced unique universal IDs for each template element, such as items, triggers, graphs, and so on, which help to perform template updates in a safe way.

Universal template IDs

In the example above, these universal IDs are contained in the templates in YAML format to monitor Memcached elements. The IDs are unique, and they are used to match an item, a trigger, etc. These IDs serve as the uniqueness criteria. Zabbix can easily understand what item we are trying to update, which items no longer exists, whether it is a new item or we are making an adjustment to an existing item.

The IDs also simplify working with templates. Now you can actually keep all your templates in a Git repository, for instance, in JSON or YAML format, and then push them to Zabbix by CI/CD pipeline using Zabbix API. So, as soon as you have made some adjustments in your template, your CI/CD system takes the latest version of the template, and using Zabbix API applies it in Zabbix. Such an approach really helps to look at your infrastructure as a code.

Better import

When importing a new template or a new version of the template, Zabbix now will show you any differences when comparing it to an existing template, as well as the changes that are going to be made in Zabbix.

Comparing existing and new templates

In this example, I have a new version of the Memcached template. I removed one tag – Application, and replaced it with two tags— Service (Memcached) and Class (Infrastructure). During import, I can see the difference between my existing template in Zabbix and the new template, so that I can easily review all of the changes. After pressing Import, these changes are applied to the configuration.

Scalability improvements

We have also introduced a few scalability improvements:

  • Zabbix Server and Proxy poller processes do not require database connection anymore.
  • In-memory cache for trend data, significant speedup for trend-related functions has been introduced.
  • Better parallel data processing on the Zabbix Server side for heavy loads. For instance, environments with 10,000 – 50 000 and more new values per second will benefit from the improved performance.
  • Graceful startup of Zabbix Server, which is very useful especially for instances with thousands or tens of thousands of proxies.

NOTE. When the server goes down, for instance, for maintenance or for an upgrade from one minor version to another, you have a down time in range of 30 seconds to a few minutes. When the Zabbix Server is back again, all proxies start pushing large amounts of data at the same time. So, it’s really important to maintain the stability and good health of the Zabbix Server at this moment. That’s why we have implemented a graceful startup for the Zabbix Server.

Universal global scripts

Another nice improvement simplifying the Zabbix setup — universal global scripts.

First, we introduced JavaScript Webhooks for global scripts for easy integration with third-party alerting and ticketing systems. Zabbix uses Webhooks for many different purposes — preprocessing, integration, data collection, etc. Now you can use it for global scripts.

Global scripts now can be used for everything — auto-remediation, alerting, integrations, and the manual execution on hosts from the Zabbix UI. Now, when you define a global script, you can also define that the script should be used, for instance, only for Action operations.

Universal global scripts’ parameters

In this example, this particular script will be based on a Webhook, it will accept parameters such as event ID, event severity, and the tags in the JSON format. This global script will open a ticket in ServiceNow. After the script has been defined you need to navigatte to Actions and define what operations should be executed. You can send a message to admins or just open a ticket in the service desk.

Defining Actions

It is a very simple, easy-to-use, and easy-to-understand feature simplifying configuration of actions.

Powerful value maps

In Zabbix 5.4, we have introduced some changes related to value maps. The value map is a simple way to convert, for instance, numeric values collected by Zabbix, into human-readable values.

A common example would be monitoring the state of a service, which returns the numbers zero or one, with zero meaning ‘Down‘ or one meaning ‘Up‘. In this case we can define a value map based on the exact value match. This is the current behavior.

In Zabbix 5.4, we extended it to support matching by ranges. So, you can define that if you receive the status between 0 and 127, you consider the service to be ‘Down‘ and if any other value — ‘Up‘, or vice versa.

We have also introduced matching by regular expression. Now, when you define value mapping, you just like a service state in my case you may specify if the value is in the range between 0 and 31 or 64 and 127, then it will be mapped to ‘Up‘ and any other value — ‘Down‘.

Value mapping by range

Value  maps for templates and hosts

We have introduced value on the template level and on the host level. This means that we do not support global value maps anymore.

Value maps were easy to maintain on the global level, but only for smaller installations. As soon as you have multiple templates and different teams working with a different set of templates, value maps become a nightmare to maintain on a global level. In addition, global value mapping hinders multi-tenancy support for any object that is linked to a value map.

Advantages of value maps on the template level:

  • Now we can deliver self-contained templates without any references to global objects. A template contains all information needed for monitoring: a set of items, set of triggers, graphs, template-level dashboards, and value maps. That means we don’t have any references to any global objects now, and templates have become truly independent. You can go to zabbix.com/integrations, download the template to monitor, for instance, for a Cisco device, and we can be absolutely sure that this template will work perfectly on the system as it isn’t linked to any global elements.
  • This new feature enables better support for multi-tenant environments.
  • We have also introduced new mass actions — mass-update operations for easier management of value maps on a template level.

You can Add, Update, Rename, Remove, or Remove all value maps.

Mass-update operations

Usability improvements

  • In Zabbix 5.4, we have introduced a number of usability improvements. The one visible straight away — the third-level menu for better navigation. The hidden features in Administration > GeneralAutoregistration, Housekeeping, Images, Icon mapping, Regular expressions, etc. were difficult to spot. The third level menu provides better visibility for these submenus.

Third-level menu

  • In Zabbix, there are some usability problems that sometimes you have to jump from one page to another, for instance, from a graph to Latest data, from Latest data to a host, etc. We have been thinking about improving usability to keep users focused on what they do. So, we have introduced modal windows for mass-update and import forms. It is a small, though important step in improving overall performance as when you select the list of hosts to do some mass-update operation, you are staying on this list and don’t have to go to a separate page.

Modal windows for mass-update and import forms

  • Another small, but useful feature — negated filtering for tags. Now, in Monitoring > Problems, you can, for instance, display all problems, which are not from your staging environment. You can define a condition, for instance, TagsEnv Does not equal Staging‘, and save it as a new view ‘Excluding staging‘.

Negated filtering by tags

Better support of XML preprocessing

Zabbix has been supporting XML XPath for four years. Now we decided to introduce another conversion — a native conversion from XML to JSON format.

Since most of the operations in Zabbix are JSON-based, it is a nice way to work with XML data — you convert your XML document to JSON as, for instance, the first preprocessing step, and then you work with this data as with any JSON document.

Better XML preprocessing

More improvements

There are also security-related changes and other changes related to real-time export.

  • Security-related:
    — Support of all SNMPv3 encryption protocols.
    — Unified error messages in case of unsuccessful login.
    — Disabled autocomplete for password fields in Zabbix UI.
  • Real-time export:
    — Information about event severity is included in real-time export files.
    — More granular configuration of information exported in real-time.
  • Also:
    — Support of VMWare cluster monitoring.
    — Support of filtering by the presence of LLD macro for low-level discovery.
    — Support of macro {ITEM.VALUETYPE} for notifications.
    — Support of service name lookup for Oracle for HA setups, so that Zabbix can switch from one node to another.
    — Support of NTLM authentication for JavaScript Webhooks.
    — Support of multiple JMX metrics having the same key on one host.
    — Increased size of memory available to JavaScript Webhooks and preprocessing.
    — CurlHttpRequest renamed to HttpRequest in Webhooks.
    — ‘Alias‘ renamed to ‘Username‘ in user configuration.
    — American English is a default language of Zabbix UI and also Zabbix documentation.

New integrations and templates

  • With any new Zabbix major version, we have new integrations and templates. Zabbix 5.4 is not an exception, and it comes with the new integrations with iTOP, VictorOps,  Rocket.Chat, Signal, Express.ms, and other solutions.
  • We have also introduced a new set of official templates for monitoring of APC UPS hardware, Hikvision cameras, etcd, Hadoop, Zookeeper, Kafka, AMQ, HashiCorp Vault, MS Sharepoint, MS Exchange, smartclt, Gitlab, Jenkins, Apache Ignite, and more applications and services.

New integrations and templates

To find out what Zabbix is capable of monitoring and what integrations are available with the third-party systems, you are welcome to visit zabbix.com/integrations, as the solution you want to monitor or a system you’d like to integrate Zabbix with might be already supported or exist as a community-supported solution.

Official solutions for monitoring and alerting

Recently, we introduced device-specific templates on our integration page, where you can see what devices are currently supported. For instance, we started with APC devices. Here, if you click the required device, you will go to the page with the template for the specific device.

Templates for hardware vendor devices

We are planning to increase the number of out-of-the-box supported vendor devices, and we’ll have a look at the Cisco, Juniper, F5, and some other vendors very soon.

NOTE. Zabbix is a free and open-source solution. We don’t have any closed source components. We are just as free as Linux and use exactly the same GPLv2 license. You can download Zabbix from zabbix.com/download and deploy it anywhere you want.

Deploy Zabbix on-premise

We support a range of the most popular operating systems. So, you may deploy Zabbix on MAC OS, Windows, Docker Containers, Docker environment, Cloud AWS, Azure, OpenStack, Digital Ocean, Google Cloud. Recently, we have introduced support of Linode.

Deploy in the cloud

To have your Zabbix instance running in a cloud, you don’t need to spend a lot. You can use Digital Ocean or Linode, and for about $5 per month, you may have Zabbix Server up and running with Zabbix UI, which will be capable of monitoring thousands of devices.

Breaking changes

  • Applications and screens are not supported and many related API methods are affected.
  • We don’t support global value maps anymore, so we have to switch mentally from supporting value maps globally to supporting value maps on a template level (preferred) or on a host level.
  • New syntax for trigger expressions and calculated metrics, which is easier to understand and use.
    —  Aggregate metrics are merged into calculated metrics.

Integration with Grafana

Following the release of Zabbix 5.4, our users quickly realized that our integration with Grafana was broken as we have introduced the aforementioned API changes. Since we do not support applications anymore, all API changes were documented just a few days before the official release of Zabbix 5.4.

Thanks to the swift reaction of Alexander Zobnin, maintainer of the Grafana plugin for Zabbix, this broken integration was fixed very quickly, and you now can easily and safely use Grafana with Zabbix 5.4.

Upgrade notes

  • Upgrade to Zabbix 5.4 from your existing 5.0, or 5.2, or 4.4, or 4.2 is easy as usual. You need to install new binaries for Zabbix Server and Zabbix Proxies, upgrade Zabbix UI, start the Zabbix Server, and the Zabbix Server will upgrade the structure of your database automatically.
  • All trigger expressions, calculated and aggregate metrics will be automatically converted to the new syntax.
  • All applications will be automatically converted to tags. For instance, “CPU” will be converted to the tag “Application:CPU”.
  • Global values maps will be moved to template and host level.
  • All screens will be automatically converted to Dashboards.

NOTE. The next LTS release, Zabbix 6.0, is expected by the end of 2021. Our development team is working on High Availability for the Zabbix Server, which will be available out of the box so that you could install the Zabbix Server and you run it in a high-availability cluster mode.

We also invest much time to improve the Business Service Monitoring (BSM) in Zabbix, which is related to the service tree and dependencies between services, business service monitoring SLA, and SLA reporting.

Zabbix roadmap

More information about the expected features is available at zabbix.com/roadmap. Here, you may click the version you’re interested in, for instance, 6.0, 6.2, 6.4, or 7.0 LTS. We are now keeping a dashboard of improvement, so you will immediately see any progress down the road of our 2.5-year plan.

Zabbix roadmap

Now, we maintain a dynamic dashboard for our roadmap displaying the progress of development of any feature. For instance, ‘Support of multi-tenancy for Services’ is marked with ‘In dev’, as it is under development and has been added to the roadmap recently.

In addition, this feature has a reference to the ZNXNEXT-59 issue, where more detailed information on the feature is available. ‘Top voted’ mark here means that the feature has been voted for by many Zabbix users and community members.

Questions & Answers

Question. Is migration from screens to dashboards automatic? How does that work?

Answer. All screens — global screens and template-level screens will be automatically converted to global dashboards and template-level dashboards. You don’t need to worry about that, all information will be preserved.

Question. Now you have dashboards delivered as reports. Do you plan to natively send these dashboards to some third-party integration, for instance, put it in some frame in a company’s  website?

Answer. If we have such plans, they should be on the roadmap. You can do it right now using some sort of fixed-frame technology, though it is an ugly way to integrate one solution with another. I am looking forward to adding the ability to include a widget or a dashboard into a third-party HTML page. I think it will be implemented in Zabbix sooner or later.

Question. Do you have any plans to have reports based on some other sections of Zabbix, such as inventory reports, or availability reports, and so on?

Answer. This functionality can evolve as a result of improving the visualization capabilities of Zabbix dashboards or more specifically by introducing additional widgets for different purposes, such as geographical maps, data table widgets, which are already planned for Zabbix 6.0, capacity planning, and widgets for all possible use cases. So, as soon as we have a rich set of widgets for dashboards, it will automatically put these values into PDF reports.

Some changes planned for Zabbix 6.0 are related to business service monitoring and as part of this development, we will also introduce new widgets made specifically for SLA reporting. So, we are developing in this direction — we are going to have more widgets, more widget flexibility, and maybe more visually appealing widgets in the future.

Question. Why did we implement the value maps in the way that we did? Why didn’t we implement it on three levels just as user macros global template host?

Answer. If we implemented it in three levels, the global value maps would still be there, and it would be very hard to manage value maps in this case. Suppose you have a global map defined, such as a service state. What should Zabbix do after you import a new template with the same value map service state, which is defined on a different level? Should it keep the value map service state on a template level or should it upgrade the global service state without creating the service state on the template level? This would introduce a new set of different problems and confusion in the end. So, we really need to keep templates independent to prevent those hidden dependencies on global objects such as value maps, especially for larger Zabbix deployments. Even one hundred templates in your setup might become a huge problem from the maintenance point of view.

Question. Why do we release so many intermediate versions? Why don’t we support versions 5.2 and 5.4 for two or three years?

Answer. The reason is very simple. We maintain backward and forward compatibility within one major release, and we guarantee that the database structure remains intact. So, if you install 5.4.0, everything within 5.4 (5.4.10, 5.4.2, 5.4.0) remains backward and forward compatible and the database structure remains the same. If we start introducing new features in minor releases, we will have to modify and extend the structure of the database, then minor versions of a major release will not be compatible anymore. I don’t think it is a good approach, and you will have no possibility to downgrade if a newer version of Zabbix doesn’t work the way you want. This approach is described in the document release cycle on our website and is has proved its feasibility over time.

We support 5.2 and 5.4 only for several months not to put additional load on our support team. We have two different types of releases — LTS releases with five-year support and non-LTS releases. If we started supporting everything, then at every given moment there will be 10 major Zabbix versions supported by our team. If a customer reports a problem to our support team, we have to fix this problem, for instance, create a patch, follow the QA procedure, test the solution very carefully, etc. Even Microsoft doesn’t maintain 10 major versions. So, we support just a few LTS releases at a time.

Question. Will you continue support of Red Hat 7 or CentOS 7 when Zabbix 6.0 is released? What about RHEL 7?

Answer. We dropped support of CentOS 7 for a good reason. We discovered that in the version of the software we need, some things, for instance, TLS encryption, are outdated in CentOS 7.0. There are also other unsupported dependencies, for example PHP and so on. In addition, we realized that in Zabbix 6.0 we will not support CentOS 7.0 anymore as Zabbix 6.0 will be supported for five years and we just won’t be able to support CentOS 7.0 for extra five years starting from the end of 2021. So, we could drop support of CentOS 7.0 starting from Zabbix 5.2 or keep it supported in Zabbix 5.2 or Zabbix 5.4 and drop it starting from Zabbix 6.0. We decided to drop support of CentOS 7.0 for Zabbix Server, Zabbix Proxy, and Zabbix UI immediately starting from Zabbix 5.2. In addition, we would have to rely on a third party repository to get the particular versions of software dependencies that Zabbix requires.